We recently started using AWS Cloud Watch Agent for uploading logs for a few of our NodeJS processes. Soon after our initial move we began to notice that logs were not been uploaded for the majority of the processes.
Debugging Process:
In order to debug the issue further we began by looking at the CloudWatch agent logs located in the var/log directory.
We noticed the following logs in the /var/log/awslog.log:
2017-03-24 05:32:48,758 - cwlogs.push.publisher - WARNING - 14818 - Thread-5 - Caught exception: An error occurred (DataAlreadyAcceptedException) when calling the PutLogEvents operation: The given batch of log events has already been accepted. The next batch can be sent with sequenceToken: 49567766878003081255417777553615842777473923345314516562
The cause of DataAlreadyAcceptedException is a bit vague:
The event was already logged.
However, reading about the component which throws this exception gives an inkling of the cause [2]. The exception is thrown by the PutLogEvents API. The official documentation [1] lists the following conditions under which a log may fail to upload:
- The maximum batch size is 1,048,576 bytes, and this size is calculated as the sum of all event messages in UTF-8, plus 26 bytes for each log event.
- None of the log events in the batch can be more than 2 hours in the future.
- None of the log events in the batch can be older than 14 days or the retention period of the log group.
- The log events in the batch must be in chronological ordered by their timestamp (the time the event occurred, expressed as the number of milliseconds since Jan 1, 1970 00:00:00 UTC).
- The maximum number of log events in a batch is 10,000.
- A batch of log events in a single request cannot span more than 24 hours. Otherwise, the operation fails.
In our case logs which failed to upload met all criteria except the following:
A batch of log events in a single request cannot span more than 24 hours. Otherwise, the operation fails.
Oddly enough this was preventing even current logs from been uploaded to CloudWatch.
Solution:
The state of the AWS CloudWatch agent is maintained in a file defined in the CloudWatch configuration file.
In our case the state file was defined in our cwlogs.cfg file was:
/var/awslogs/state/agent-state
In order to force the CloudWatch Agent to force upload the logs we deleted the state file.
References
[1] http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html
[2] http://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html#API_PutLogEvents_Errors
[2] CloudWatch Agent Reference, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
[3] CloudWatch Agent start command, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/StartTheCWLAgent.html
[4] CloudWatch Agent stop command, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/StopTheCWLAgent.html
[5] CloudWatch Agent status command, http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/ReportCWLAgentStatus.html