fix: correct OutboxGCWorkflow timeout and tighten schedule config#2947
fix: correct OutboxGCWorkflow timeout and tighten schedule config#2947disintegrator wants to merge 4 commits into
Conversation
The workflow was timing out on every run. After GCOutboxProcessedRows returned 0 eligible rows it called workflow.Sleep(1h), but the schedule set WorkflowRunTimeout to 15 minutes — guaranteeing the workflow was always killed before the sleep fired. Changes: - Remove the in-workflow sleep entirely. The workflow now returns nil once a partial batch confirms no further rows remain; the Temporal schedule handles re-triggering. - Tighten timeouts to match actual workload: schedule interval 6h→5min, activity StartToCloseTimeout 10min→1min, WorkflowRunTimeout 15min→2min. At current volume (~40K rows steady-state, ~4 rows/min arriving) each run deletes ~20 rows in a single activity call that completes in milliseconds. - Make AddOutboxGCSchedule upsert: on ErrScheduleAlreadyRunning it now calls handle.Update to push the new spec and action to the existing schedule, so config changes take effect on deploy without manual intervention via the Temporal UI. DB impact: the more frequent schedule means terminal rows are cleaned up within ~5 minutes of crossing the 7-day retention threshold instead of up to 6 hours late. Each run issues at most one batched DELETE of ≤100 rows, well within autovacuum's ability to reclaim dead tuples at this volume. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
This comment has been minimized.
This comment has been minimized.
🚀 Preview Environment (PR #2947)Preview URL: https://pr-2947.dev.getgram.ai
Gram Preview Bot |
|
Found 1 test failure on Blacksmith runners: Failure
|
The workflow was timing out on every run. After GCOutboxProcessedRows returned 0 eligible rows it called workflow.Sleep(1h), but the schedule set WorkflowRunTimeout to 15 minutes — guaranteeing the workflow was always killed before the sleep fired.
Changes:
DB impact: the more frequent schedule means terminal rows are cleaned up within ~5 minutes of crossing the 7-day retention threshold instead of up to 6 hours late. Each run issues at most one batched DELETE of ≤100 rows, well within autovacuum's ability to reclaim dead tuples at this volume.