fix(tests): wait for the final event on the tape, not the status row (#256) by helebest · Pull Request #257 · OpenDIKW/dikw-core

helebest · 2026-06-29T13:35:33Z

What & why

Fixes #256 — the recurring CI flake tests/server/test_ingest_task.py::test_event_tape_replay_after_terminal (assert events[-1]["type"] == "final" intermittently seeing 'progress').

Root cause (test-side race). TaskManager._run flips the task status row to terminal before it appends the final event to the event tape — deliberately, so a follower that sees final and immediately calls /result always finds the row terminal (the reverse order races task_not_terminal). The affected tests waited on the status row (wait_task_terminal) and then read the tape with wait=0, so they raced the trailing progress event onto the last slot. A wait>0 long-poll can't substitute: the events endpoint only enters its wait loop when the from_seq slice is empty, but these tests read from_seq=0 where the tape already holds events, so it returns immediately.

Fix (test-infra only, no product code). Add a shared wait_event_tape_final conftest helper that polls the events endpoint until the tape's last event is final — the signal the asserts actually depend on — and route the three affected HTTP-level tests through it (test_event_tape_replay_after_terminal, test_resume_from_seq_returns_tail_only, test_synth_task_emits_per_source_progress_and_final_report). This is the HTTP-layer analogue of the store-level waiters already in test_task_manager.py / test_telemetry_tracing.py.

The helper also fails fast on a non-200 from /events (so a real 500 isn't masked as a misleading "never reached final" timeout) and pages through has_more/next_from_seq so the trailing final is found past the 1000-event page cap. New tests/server/test_event_tape_waiter.py pins these on known-bad input.

Scope note — the sibling cancel flake is intentionally NOT in this PR

test_task_span_marks_cancel_not_error (~30s terminal-state timeout) already waits for the final event, so its flake is a manager-side cancel race (a different mechanism: manager.cancel fires token.cancel() + asyncio_task.cancel() and the cancelled final may not land), not the status-row-proxy race fixed here. It needs a separately root-caused product change — bundling a speculative fix would violate surgical/simplicity discipline. Tracked as follow-up.

Verification

tools/check.py: ruff + mypy clean, 2412 passed, 138 skipped.
Flake repro loop after the fix: ingest event-tape + waiter tests ×50 → 50/50, synth tape test ×10 → 10/10, 0 fail.

Delivery receipt

Step 3 — in-loop verify: tools/check.py PASS (ruff + mypy + 2412 passed). Flake loop 50/50 + 10/10.

Step 4 — codex review: round 1, no findings — "did not identify any regressions in the modified tests or helper for their current usage." Quiet at round 1.

Step 5 — fresh-review (clean subagent): PASS, no blocking findings. Independently confirmed the manager ordering, that wait>0 can't substitute, the match to the proven store-level waiters, the sound failure modes (returns on a FAILED task too → caller's succeeded assert fails clearly; hung task times out with a descriptive error), and that all 3 call-sites preserve their assertions. Rubric: Scope & simplicity = yes; Verification discipline = yes; rest N/A (tests-only).

Step 5 — /code-review (xhigh, 26 agents, 14 raw → 3 distinct), triaged:

finding	verdict	action
`status_code == 200` guard silently skips non-200 → masks a real `/events` 500 as a 10s timeout	TP	Fixed — per-fetch non-200 raises immediately with status + body
`limit=1000` page cap → on a >1000-event tape `final` pages off the end	TP (latent)	Fixed — paginate via `has_more`/`next_from_seq`
helper duplicates the "wait until final" predicate of 2 store-level waiters	rejected	store-level dup is pre-existing + across the HTTP-vs-store I/O boundary; a unified waiter is a single-use abstraction → more indirection than it removes. Mentioned, not swept.

Step 6 — doc-sync: N/A — test-infra only; no doc/CLI/route/env-var/frontmatter references to update; no CHANGELOG entry (not a published-package behavior change, per the changelog's product-facing convention).

🤖 Generated with Claude Code

…256) `test_event_tape_replay_after_terminal` (and its same-family siblings `test_resume_from_seq_returns_tail_only` and the synth tape assert) trusted the task status row as a proxy for "the `final` event is on the event tape". But `TaskManager._run` flips the status row terminal *before* appending the `final` event (so a `final`-watching follower that calls `/result` always finds the row terminal). A `wait_task_terminal` wait followed by a `wait=0` tape read therefore races the trailing `progress` event onto the last slot, intermittently failing `events[-1]["type"] == "final"` with `'progress'`. Add a shared `wait_event_tape_final` conftest helper that polls the events endpoint until the tape's last event is `final` — the signal the asserts actually depend on — and route the three affected HTTP-level tests through it. This is the HTTP-layer analogue of the store-level waiters already in `test_task_manager.py` / `test_telemetry_tracing.py`. No product code changes. The helper fails fast on a non-200 from `/events` (so a real 500 isn't masked as a "never reached final" timeout) and pages through `has_more`/`next_from_seq` so the trailing `final` is found past the 1000-event page cap. New `test_event_tape_waiter.py` pins these behaviours on known-bad input. The sibling `test_task_span_marks_cancel_not_error` flake is NOT addressed here: it already waits for the `final` event, so its ~30s timeout is a manager-side cancel race (a different mechanism), left to a separate root-caused change rather than bundling a speculative product fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-29T13:36:44Z

Warning

Review limit reached

@helebest, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 43 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a07884b-ef5d-4550-a459-43f2876e12a9

📥 Commits

Reviewing files that changed from the base of the PR and between ef219da and 6c5328b.

📒 Files selected for processing (4)

tests/server/conftest.py
tests/server/test_event_tape_waiter.py
tests/server/test_ingest_task.py
tests/server/test_synth_tasks.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-fix+event-tape-final-race-256

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-29T13:37:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

helebest merged commit afb71ab into main Jun 29, 2026
10 checks passed

helebest deleted the worktree-fix+event-tape-final-race-256 branch June 29, 2026 13:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tests): wait for the final event on the tape, not the status row (#256)#257

fix(tests): wait for the final event on the tape, not the status row (#256)#257
helebest merged 1 commit into
mainfrom
worktree-fix+event-tape-final-race-256

helebest commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Review limit reached

Uh oh!

codecov Bot commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

helebest commented Jun 29, 2026

What & why

Scope note — the sibling cancel flake is intentionally NOT in this PR

Verification

Delivery receipt

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Review limit reached

Uh oh!

codecov Bot commented Jun 29, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant