Skip to content

fix(tasks): keep final last on the event tape under cancel; de-flake cancel tests (#256)#258

Merged
helebest merged 1 commit into
mainfrom
fix-cancel-span-final-race
Jun 29, 2026
Merged

fix(tasks): keep final last on the event tape under cancel; de-flake cancel tests (#256)#258
helebest merged 1 commit into
mainfrom
fix-cancel-span-final-race

Conversation

@helebest

Copy link
Copy Markdown
Collaborator

What & why

The sibling of #256test_task_span_marks_cancel_not_error (and test_pre_cancel_via_token) intermittently timed out at ~30s waiting for a cancelled task to reach its terminal state. Unlike #256 (a test-side status-row-vs-tape race), this turned out to be two distinct races, one a real product bug, both empirically reproduced.

1. Late event after final — the real product bug

A cancelled runner's in-flight reporter.progress(...)store.append_event(...) runs via asyncio.to_thread. Cancelling the await does NOT stop the worker thread (run_in_executor cancellation detaches), so the progress DB write commits after the manager's final event:

append trace of a stranded task:
  task_started  commit seq=1
  progress      submit            ← in-flight when the runner is cancelled
  final         submit → commit seq=2   ← except-arm emits final while progress is still in flight
  progress      commit seq=3            ← detached thread lands progress AFTER final

→ tape ends with progress, not final → every consumer waiting for events[-1] == "final" hangs. This violates events.py's documented "final is always the last event" invariant, which append_event claimed to enforce but did not.

Fix: append_event (sqlite + postgres) reads the task's status alongside max(seq) — folded into one query under the existing append lock — and drops a non-final event when the row is already terminal. The manager commits the terminal status immediately before emitting final, so the final itself is exempt. A new both-adapter contract test pins this and goes red without the guard. The drop logs at DEBUG; store.py gains TERMINAL_STATUS_VALUES and its Protocol docstring now documents the drop semantics.

2. Cancel before the runner starts

A cancel landing before _run's first event-loop step throws CancelledError before the try opens → no final emitted, row stays PENDING. The two tests guarded against this by polling for RUNNING / sleep(0.05) but never asserted it, so under CI scheduling load they could cancel too early. Replaced with the deterministic started = asyncio.Event() pattern already used by test_cancel_after_terminal_emits_no_contradictory_final — the runner sets it as its first line, the test awaits it before cancelling, so the cancel always lands inside _run's try.

The underlying product gap behind (2) — a cancel before _run starts stranding a PENDING task — is left as-is: unreachable via the HTTP API (the submit→cancel network round-trip guarantees _run has stepped) and recovered by restart_cleanup.

Verification

  • Reproduced then fixed: progress-runner stress — 2 strands / 1280 before the guard → 0 / 2560 after.
  • tools/check.py: ruff + mypy + 2413 passed.
  • Task-store contract green on sqlite AND real Postgres (pgvector pg18), both adapters — incl. the new test_append_after_terminal_dropped_keeping_final_last.
  • cancel + drop tests 60/60 in a loop.

Delivery receipt

Step 3 — verify: stress 0/2560 (reproduced 2/1280 before guard); tools/check.py 2413 passed; PG task-store contract 42 passed (both adapters); cancel+drop loop 60/60.

Step 4 — codex: round 1, no findings ("consistently enforce the final-event-last invariant across both stores").

Step 5 — fresh-review (clean subagent): PASS. Verified all 6 correctness questions, confirmed the manager commits terminal status before final in all 4 arms (the guard's final-exemption rests on this), and proved the contract test goes red without the guard.

Step 5 — /code-review (xhigh, 24 agents): 5 findings (all PLAUSIBLE, none confirmed-harmful) → all 5 addressed:

finding action
TaskStore Protocol docstring still promised strictly-increasing seq, omitted "final is last" + the drop fixed — rewrote MUST-list + append_event docstring
drop path returned before the seq/ts stamp; postgres lost ts on drop while sqlite kept it (cross-adapter divergence) fixed — moved postgres ts/body after the guard so both leave the dropped dict untouched; documented
postgres added a 2nd round-trip (SELECT status) per non-final append fixed — folded into one scalar-subquery SELECT (max_seq),(status) in both adapters
silent drop, no log fixed — logger.debug on the drop in both adapters
cancel-during-task_started-emit can drop the detached task_started → [final]-only tape fixed — documented the benign edge in events.py

7 further panel-refuted nitpicks (TERMINAL_STATUS_VALUES "redundant", comment "copy-paste", Event pattern "replicated") rejected with reason.

Step 6 — doc-sync: N/A — no stale refs; events.py's existing "enforced by append_event" claim is now actually true.

🤖 Generated with Claude Code

…e cancel tests (#256)

`test_task_span_marks_cancel_not_error` (+ sibling `test_pre_cancel_via_token`)
intermittently timed out at ~30s waiting for the cancelled task's terminal state.
Root cause is two distinct races, both empirically reproduced:

1. **Late event after `final` (the real product bug).** A cancelled runner's
   in-flight `reporter.progress(...)` → `store.append_event(...)` runs via
   `asyncio.to_thread`. Cancelling the await does NOT stop the worker thread
   (`run_in_executor` cancellation detaches), so the progress DB write commits
   AFTER the manager's `final` event — the tape ends with `progress`, not
   `final`, and any consumer waiting for `events[-1] == "final"` hangs. This
   violates events.py's documented "final is always the last event" invariant,
   which `append_event` claimed to enforce but did not.

   Fix: `append_event` (sqlite + postgres) now reads the task's status alongside
   max(seq) — folded into one query under the existing append lock — and DROPS a
   NON-`final` event when the row is already terminal (the manager commits the
   terminal status immediately before emitting `final`, so the `final` itself is
   exempt). A new both-adapter contract test pins this; it goes red without the
   guard (tape ends in a stray `progress`). The drop logs at DEBUG. store.py
   adds `TERMINAL_STATUS_VALUES` and its Protocol docstring now documents the
   drop (the return is the current max seq, the event dict is left unstamped).

2. **Cancel before the runner starts.** A cancel landing before `_run`'s first
   step throws CancelledError before the try block opens, so no `final` is ever
   emitted (row stays PENDING). The two tests guarded against this by polling for
   RUNNING / `sleep(0.05)` but did not actually assert it, so under CI scheduling
   load they could cancel too early. Replaced with the deterministic
   `started = asyncio.Event()` pattern already used by
   `test_cancel_after_terminal_emits_no_contradictory_final`: the runner sets it
   as its first line, the test awaits it before cancelling, so the cancel always
   lands inside `_run`'s try.

The product gap behind (2) — a cancel before `_run` starts stranding a PENDING
task — is left as-is: it is unreachable via the HTTP API (the submit→cancel
round-trip guarantees `_run` has stepped) and recovered by `restart_cleanup`.

Verified: progress-runner stress 0 strands / 2560 scenarios (was reproducing
before the guard); cancel + drop tests 60/60 in a loop; task-store contract green
on sqlite AND real Postgres (pgvector pg18); full `tools/check.py`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@helebest, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 1 minute

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 06f95d90-422f-4658-8301-4b64955c7977

📥 Commits

Reviewing files that changed from the base of the PR and between afb71ab and cf98ba4.

📒 Files selected for processing (7)
  • src/dikw_core/server/tasks/events.py
  • src/dikw_core/server/tasks/store.py
  • src/dikw_core/server/tasks/store_postgres.py
  • src/dikw_core/server/tasks/store_sqlite.py
  • tests/server/test_task_manager.py
  • tests/server/test_task_store_contract.py
  • tests/server/test_telemetry_tracing.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-cancel-span-final-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@helebest helebest merged commit fd341d1 into main Jun 29, 2026
10 checks passed
@helebest helebest deleted the fix-cancel-span-final-race branch June 29, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant