Thursday, March 23, 2017

CloudWatch DataAlreadyAcceptedException issue

The Problem:

We recently started using AWS Cloud Watch Agent for uploading logs for a few of our NodeJS processes. Soon after our initial move we began to notice that logs were not been uploaded for the majority of the processes.

Debugging Process:
In order to debug the issue further we began by looking at the CloudWatch agent logs located in the var/log directory.

We noticed the following logs in the /var/log/awslog.log:

2017-03-24 05:32:48,758 - cwlogs.push.publisher - WARNING - 14818 - Thread-5 - Caught exception: An error occurred (DataAlreadyAcceptedException) when calling the PutLogEvents operation: The given batch of log events has already been accepted. The next batch can be sent with sequenceToken: 49567766878003081255417777553615842777473923345314516562


The cause of DataAlreadyAcceptedException is a bit vague:

The event was already logged.


However, reading about the component which throws this exception gives an inkling of the cause [2]. The exception is thrown by the PutLogEvents API. The official documentation [1] lists the following conditions under which a log may fail to upload:

 - The maximum batch size is 1,048,576 bytes, and this size is calculated as the sum of all event messages in UTF-8, plus 26 bytes for each log event.
 - None of the log events in the batch can be more than 2 hours in the future.
 - None of the log events in the batch can be older than 14 days or the retention period of the log group.
 - The log events in the batch must be in chronological ordered by their timestamp (the time the event occurred, expressed as the number of milliseconds since Jan 1, 1970 00:00:00 UTC).
 - The maximum number of log events in a batch is 10,000.
 - A batch of log events in a single request cannot span more than 24 hours. Otherwise, the operation fails.

In our case logs which failed to upload met all criteria except the following:


 A batch of log events in a single request cannot span more than 24 hours. Otherwise, the operation fails.


Oddly enough this was preventing even current logs from been uploaded to CloudWatch.

Solution:

The state of the AWS CloudWatch agent is maintained in a file defined in the CloudWatch configuration file.

In our case the state file was defined in our cwlogs.cfg file was:

/var/awslogs/state/agent-state


In order to force the CloudWatch Agent to force upload the logs we deleted the state file.



References

[1] http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html
[2] http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html#API_PutLogEvents_Errors
[2] CloudWatch Agent Reference, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
[3] CloudWatch Agent start command, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/StartTheCWLAgent.html
[4] CloudWatch Agent stop command, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/StopTheCWLAgent.html
[5] CloudWatch Agent status command, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/ReportCWLAgentStatus.html