fix(tasks): keep `final` last on the event tape under cancel; de-flake cancel tests (#256) by helebest · Pull Request #258 · OpenDIKW/dikw-core

helebest · 2026-06-29T15:21:40Z

What & why

The sibling of #256 — test_task_span_marks_cancel_not_error (and test_pre_cancel_via_token) intermittently timed out at ~30s waiting for a cancelled task to reach its terminal state. Unlike #256 (a test-side status-row-vs-tape race), this turned out to be two distinct races, one a real product bug, both empirically reproduced.

1. Late event after `final` — the real product bug

A cancelled runner's in-flight reporter.progress(...) → store.append_event(...) runs via asyncio.to_thread. Cancelling the await does NOT stop the worker thread (run_in_executor cancellation detaches), so the progress DB write commits after the manager's final event:

append trace of a stranded task:
  task_started  commit seq=1
  progress      submit            ← in-flight when the runner is cancelled
  final         submit → commit seq=2   ← except-arm emits final while progress is still in flight
  progress      commit seq=3            ← detached thread lands progress AFTER final

→ tape ends with progress, not final → every consumer waiting for events[-1] == "final" hangs. This violates events.py's documented "final is always the last event" invariant, which append_event claimed to enforce but did not.

Fix: append_event (sqlite + postgres) reads the task's status alongside max(seq) — folded into one query under the existing append lock — and drops a non-final event when the row is already terminal. The manager commits the terminal status immediately before emitting final, so the final itself is exempt. A new both-adapter contract test pins this and goes red without the guard. The drop logs at DEBUG; store.py gains TERMINAL_STATUS_VALUES and its Protocol docstring now documents the drop semantics.

2. Cancel before the runner starts

A cancel landing before _run's first event-loop step throws CancelledError before the try opens → no final emitted, row stays PENDING. The two tests guarded against this by polling for RUNNING / sleep(0.05) but never asserted it, so under CI scheduling load they could cancel too early. Replaced with the deterministic started = asyncio.Event() pattern already used by test_cancel_after_terminal_emits_no_contradictory_final — the runner sets it as its first line, the test awaits it before cancelling, so the cancel always lands inside _run's try.

The underlying product gap behind (2) — a cancel before _run starts stranding a PENDING task — is left as-is: unreachable via the HTTP API (the submit→cancel network round-trip guarantees _run has stepped) and recovered by restart_cleanup.

Verification

Reproduced then fixed: progress-runner stress — 2 strands / 1280 before the guard → 0 / 2560 after.
tools/check.py: ruff + mypy + 2413 passed.
Task-store contract green on sqlite AND real Postgres (pgvector pg18), both adapters — incl. the new test_append_after_terminal_dropped_keeping_final_last.
cancel + drop tests 60/60 in a loop.

Delivery receipt

Step 3 — verify: stress 0/2560 (reproduced 2/1280 before guard); tools/check.py 2413 passed; PG task-store contract 42 passed (both adapters); cancel+drop loop 60/60.

Step 4 — codex: round 1, no findings ("consistently enforce the final-event-last invariant across both stores").

Step 5 — fresh-review (clean subagent): PASS. Verified all 6 correctness questions, confirmed the manager commits terminal status before final in all 4 arms (the guard's final-exemption rests on this), and proved the contract test goes red without the guard.

Step 5 — /code-review (xhigh, 24 agents): 5 findings (all PLAUSIBLE, none confirmed-harmful) → all 5 addressed:

finding	action
TaskStore Protocol docstring still promised strictly-increasing seq, omitted "final is last" + the drop	fixed — rewrote MUST-list + `append_event` docstring
drop path returned before the seq/ts stamp; postgres lost `ts` on drop while sqlite kept it (cross-adapter divergence)	fixed — moved postgres `ts`/`body` after the guard so both leave the dropped dict untouched; documented
postgres added a 2nd round-trip (`SELECT status`) per non-final append	fixed — folded into one scalar-subquery `SELECT (max_seq),(status)` in both adapters
silent drop, no log	fixed — `logger.debug` on the drop in both adapters
cancel-during-`task_started`-emit can drop the detached task_started → `[final]`-only tape	fixed — documented the benign edge in events.py

7 further panel-refuted nitpicks (TERMINAL_STATUS_VALUES "redundant", comment "copy-paste", Event pattern "replicated") rejected with reason.

Step 6 — doc-sync: N/A — no stale refs; events.py's existing "enforced by append_event" claim is now actually true.

🤖 Generated with Claude Code

…e cancel tests (#256) `test_task_span_marks_cancel_not_error` (+ sibling `test_pre_cancel_via_token`) intermittently timed out at ~30s waiting for the cancelled task's terminal state. Root cause is two distinct races, both empirically reproduced: 1. **Late event after `final` (the real product bug).** A cancelled runner's in-flight `reporter.progress(...)` → `store.append_event(...)` runs via `asyncio.to_thread`. Cancelling the await does NOT stop the worker thread (`run_in_executor` cancellation detaches), so the progress DB write commits AFTER the manager's `final` event — the tape ends with `progress`, not `final`, and any consumer waiting for `events[-1] == "final"` hangs. This violates events.py's documented "final is always the last event" invariant, which `append_event` claimed to enforce but did not. Fix: `append_event` (sqlite + postgres) now reads the task's status alongside max(seq) — folded into one query under the existing append lock — and DROPS a NON-`final` event when the row is already terminal (the manager commits the terminal status immediately before emitting `final`, so the `final` itself is exempt). A new both-adapter contract test pins this; it goes red without the guard (tape ends in a stray `progress`). The drop logs at DEBUG. store.py adds `TERMINAL_STATUS_VALUES` and its Protocol docstring now documents the drop (the return is the current max seq, the event dict is left unstamped). 2. **Cancel before the runner starts.** A cancel landing before `_run`'s first step throws CancelledError before the try block opens, so no `final` is ever emitted (row stays PENDING). The two tests guarded against this by polling for RUNNING / `sleep(0.05)` but did not actually assert it, so under CI scheduling load they could cancel too early. Replaced with the deterministic `started = asyncio.Event()` pattern already used by `test_cancel_after_terminal_emits_no_contradictory_final`: the runner sets it as its first line, the test awaits it before cancelling, so the cancel always lands inside `_run`'s try. The product gap behind (2) — a cancel before `_run` starts stranding a PENDING task — is left as-is: it is unreachable via the HTTP API (the submit→cancel round-trip guarantees `_run` has stepped) and recovered by `restart_cleanup`. Verified: progress-runner stress 0 strands / 2560 scenarios (was reproducing before the guard); cancel + drop tests 60/60 in a loop; task-store contract green on sqlite AND real Postgres (pgvector pg18); full `tools/check.py`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-29T15:21:53Z

Warning

Review limit reached

@helebest, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 1 minute

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 06f95d90-422f-4658-8301-4b64955c7977

📥 Commits

Reviewing files that changed from the base of the PR and between afb71ab and cf98ba4.

📒 Files selected for processing (7)

src/dikw_core/server/tasks/events.py
src/dikw_core/server/tasks/store.py
src/dikw_core/server/tasks/store_postgres.py
src/dikw_core/server/tasks/store_sqlite.py
tests/server/test_task_manager.py
tests/server/test_task_store_contract.py
tests/server/test_telemetry_tracing.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-cancel-span-final-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-29T15:23:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

helebest merged commit fd341d1 into main Jun 29, 2026
10 checks passed

helebest deleted the fix-cancel-span-final-race branch June 29, 2026 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tasks): keep `final` last on the event tape under cancel; de-flake cancel tests (#256)#258

fix(tasks): keep `final` last on the event tape under cancel; de-flake cancel tests (#256)#258
helebest merged 1 commit into
mainfrom
fix-cancel-span-final-race

helebest commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Review limit reached

Uh oh!

codecov Bot commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

helebest commented Jun 29, 2026

What & why

1. Late event after final — the real product bug

2. Cancel before the runner starts

Verification

Delivery receipt

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Review limit reached

Uh oh!

codecov Bot commented Jun 29, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Late event after `final` — the real product bug