Avoid infinite log resending if exceptions happen during the log-sending process#359
Avoid infinite log resending if exceptions happen during the log-sending process#359
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates AWSLoggerCore’s batching/sending behavior to prevent a log batch that triggers send-time exceptions (notably CloudWatch Logs’ 24-hour timestamp-range constraint) from being resent indefinitely by the monitor loop.
Changes:
- Resets (drops) the current in-flight batch when an exception occurs in the monitor loop, to avoid infinite resend.
- Adds
PrepareLogEventBatchForSending()to sort events and trim older events until the batch fits within a 24-hour timestamp range. - Adds
RemoveMessageAt(int index)toLogEventBatchto support removing individual events while keeping message-size accounting consistent.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/AWS.Logger.Core/Core/AWSLoggerCore.cs:489
SendMessagesstill only catchesResourceNotFoundException. Any otherPutLogEventsAsyncfailure will leave_repointact, and the monitor loop will keep retrying the same failing batch indefinitely. To match the PR intent, catch/log broader exceptions (e.g.,AmazonCloudWatchLogsException/AmazonServiceException) and drop/reset the current batch (optionally with backoff) to prevent infinite resend loops.
catch (ResourceNotFoundException ex)
{
// The specified log stream does not exist. Refresh or create new stream.
LogLibraryServiceError(ex);
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This reverts commit 06e75bc.
| private static readonly TimeSpan MaxLogEventBatchAllowedTimeRange = TimeSpan.FromHours(24); | ||
| private void PrepareLogEventBatchForSending() | ||
| { | ||
| //Make sure the log events are in order from the oldest to the newest. | ||
| _repo._request.LogEvents.Sort((ev1, ev2) => | ||
| ev1.Timestamp.GetValueOrDefault().CompareTo(ev2.Timestamp.GetValueOrDefault())); | ||
| if (_repo._request.LogEvents.Count > 0) | ||
| { | ||
| DateTime latestLogDateTime = _repo._request.LogEvents.Last().Timestamp ?? DateTime.UtcNow; | ||
| //Avoid the error that log events must be within a 24-hour window. | ||
| //https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_PutLogEvents.html | ||
| int lastInvalidEventIndexToRemove = -1; | ||
| for (int i = 0; i < _repo._request.LogEvents.Count; i++) | ||
| { | ||
| var logEvent = _repo._request.LogEvents[i]; | ||
| if (!logEvent.Timestamp.HasValue || (latestLogDateTime - logEvent.Timestamp.Value) > MaxLogEventBatchAllowedTimeRange) | ||
| { | ||
| lastInvalidEventIndexToRemove = i; | ||
| } | ||
| else | ||
| { | ||
| break; // Events are in order, so we can stop checking once we find a valid event | ||
| } | ||
| } | ||
| if (lastInvalidEventIndexToRemove >= 0) | ||
| { | ||
| _repo.RemoveMessages(0, lastInvalidEventIndexToRemove + 1); | ||
| } |
|
@GarrettBeatty
thank you |
Issue #, if available:
This is a PR which replaces #219 with the latest dev branch
Description of changes:
We drop the logs in the log batch if the current log batch causes exceptions in the log-sending process
We did this because we observed that our log batch causes the over 24 hours spanning problem
and since the monitor is not stopped, it just resends the log batch forever.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.