fix(tasks): keep final last on the event tape under cancel; de-flake cancel tests (#256)#258
Conversation
…e cancel tests (#256) `test_task_span_marks_cancel_not_error` (+ sibling `test_pre_cancel_via_token`) intermittently timed out at ~30s waiting for the cancelled task's terminal state. Root cause is two distinct races, both empirically reproduced: 1. **Late event after `final` (the real product bug).** A cancelled runner's in-flight `reporter.progress(...)` → `store.append_event(...)` runs via `asyncio.to_thread`. Cancelling the await does NOT stop the worker thread (`run_in_executor` cancellation detaches), so the progress DB write commits AFTER the manager's `final` event — the tape ends with `progress`, not `final`, and any consumer waiting for `events[-1] == "final"` hangs. This violates events.py's documented "final is always the last event" invariant, which `append_event` claimed to enforce but did not. Fix: `append_event` (sqlite + postgres) now reads the task's status alongside max(seq) — folded into one query under the existing append lock — and DROPS a NON-`final` event when the row is already terminal (the manager commits the terminal status immediately before emitting `final`, so the `final` itself is exempt). A new both-adapter contract test pins this; it goes red without the guard (tape ends in a stray `progress`). The drop logs at DEBUG. store.py adds `TERMINAL_STATUS_VALUES` and its Protocol docstring now documents the drop (the return is the current max seq, the event dict is left unstamped). 2. **Cancel before the runner starts.** A cancel landing before `_run`'s first step throws CancelledError before the try block opens, so no `final` is ever emitted (row stays PENDING). The two tests guarded against this by polling for RUNNING / `sleep(0.05)` but did not actually assert it, so under CI scheduling load they could cancel too early. Replaced with the deterministic `started = asyncio.Event()` pattern already used by `test_cancel_after_terminal_emits_no_contradictory_final`: the runner sets it as its first line, the test awaits it before cancelling, so the cancel always lands inside `_run`'s try. The product gap behind (2) — a cancel before `_run` starts stranding a PENDING task — is left as-is: it is unreachable via the HTTP API (the submit→cancel round-trip guarantees `_run` has stepped) and recovered by `restart_cleanup`. Verified: progress-runner stress 0 strands / 2560 scenarios (was reproducing before the guard); cancel + drop tests 60/60 in a loop; task-store contract green on sqlite AND real Postgres (pgvector pg18); full `tools/check.py`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Next review available in: 1 minute Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
What & why
The sibling of #256 —
test_task_span_marks_cancel_not_error(andtest_pre_cancel_via_token) intermittently timed out at ~30s waiting for a cancelled task to reach its terminal state. Unlike #256 (a test-side status-row-vs-tape race), this turned out to be two distinct races, one a real product bug, both empirically reproduced.1. Late event after
final— the real product bugA cancelled runner's in-flight
reporter.progress(...)→store.append_event(...)runs viaasyncio.to_thread. Cancelling the await does NOT stop the worker thread (run_in_executorcancellation detaches), so the progress DB write commits after the manager'sfinalevent:→ tape ends with
progress, notfinal→ every consumer waiting forevents[-1] == "final"hangs. This violates events.py's documented "final is always the last event" invariant, whichappend_eventclaimed to enforce but did not.Fix:
append_event(sqlite + postgres) reads the task's status alongsidemax(seq)— folded into one query under the existing append lock — and drops a non-finalevent when the row is already terminal. The manager commits the terminal status immediately before emittingfinal, so thefinalitself is exempt. A new both-adapter contract test pins this and goes red without the guard. The drop logs at DEBUG;store.pygainsTERMINAL_STATUS_VALUESand its Protocol docstring now documents the drop semantics.2. Cancel before the runner starts
A cancel landing before
_run's first event-loop step throwsCancelledErrorbefore thetryopens → nofinalemitted, row stays PENDING. The two tests guarded against this by polling for RUNNING /sleep(0.05)but never asserted it, so under CI scheduling load they could cancel too early. Replaced with the deterministicstarted = asyncio.Event()pattern already used bytest_cancel_after_terminal_emits_no_contradictory_final— the runner sets it as its first line, the test awaits it before cancelling, so the cancel always lands inside_run's try.The underlying product gap behind (2) — a cancel before
_runstarts stranding a PENDING task — is left as-is: unreachable via the HTTP API (the submit→cancel network round-trip guarantees_runhas stepped) and recovered byrestart_cleanup.Verification
2 strands / 1280before the guard →0 / 2560after.tools/check.py: ruff + mypy + 2413 passed.test_append_after_terminal_dropped_keeping_final_last.Delivery receipt
Step 3 — verify: stress 0/2560 (reproduced 2/1280 before guard);
tools/check.py2413 passed; PG task-store contract 42 passed (both adapters); cancel+drop loop 60/60.Step 4 — codex: round 1, no findings ("consistently enforce the final-event-last invariant across both stores").
Step 5 — fresh-review (clean subagent): PASS. Verified all 6 correctness questions, confirmed the manager commits terminal status before
finalin all 4 arms (the guard'sfinal-exemption rests on this), and proved the contract test goes red without the guard.Step 5 — /code-review (xhigh, 24 agents): 5 findings (all PLAUSIBLE, none confirmed-harmful) → all 5 addressed:
append_eventdocstringtson drop while sqlite kept it (cross-adapter divergence)ts/bodyafter the guard so both leave the dropped dict untouched; documentedSELECT status) per non-final appendSELECT (max_seq),(status)in both adapterslogger.debugon the drop in both adapterstask_started-emit can drop the detached task_started →[final]-only tape7 further panel-refuted nitpicks (TERMINAL_STATUS_VALUES "redundant", comment "copy-paste", Event pattern "replicated") rejected with reason.
Step 6 — doc-sync: N/A — no stale refs; events.py's existing "enforced by append_event" claim is now actually true.
🤖 Generated with Claude Code