Skip to content

fix(tests): wait for the final event on the tape, not the status row (#256)#257

Merged
helebest merged 1 commit into
mainfrom
worktree-fix+event-tape-final-race-256
Jun 29, 2026
Merged

fix(tests): wait for the final event on the tape, not the status row (#256)#257
helebest merged 1 commit into
mainfrom
worktree-fix+event-tape-final-race-256

Conversation

@helebest

Copy link
Copy Markdown
Collaborator

What & why

Fixes #256 — the recurring CI flake tests/server/test_ingest_task.py::test_event_tape_replay_after_terminal (assert events[-1]["type"] == "final" intermittently seeing 'progress').

Root cause (test-side race). TaskManager._run flips the task status row to terminal before it appends the final event to the event tape — deliberately, so a follower that sees final and immediately calls /result always finds the row terminal (the reverse order races task_not_terminal). The affected tests waited on the status row (wait_task_terminal) and then read the tape with wait=0, so they raced the trailing progress event onto the last slot. A wait>0 long-poll can't substitute: the events endpoint only enters its wait loop when the from_seq slice is empty, but these tests read from_seq=0 where the tape already holds events, so it returns immediately.

Fix (test-infra only, no product code). Add a shared wait_event_tape_final conftest helper that polls the events endpoint until the tape's last event is final — the signal the asserts actually depend on — and route the three affected HTTP-level tests through it (test_event_tape_replay_after_terminal, test_resume_from_seq_returns_tail_only, test_synth_task_emits_per_source_progress_and_final_report). This is the HTTP-layer analogue of the store-level waiters already in test_task_manager.py / test_telemetry_tracing.py.

The helper also fails fast on a non-200 from /events (so a real 500 isn't masked as a misleading "never reached final" timeout) and pages through has_more/next_from_seq so the trailing final is found past the 1000-event page cap. New tests/server/test_event_tape_waiter.py pins these on known-bad input.

Scope note — the sibling cancel flake is intentionally NOT in this PR

test_task_span_marks_cancel_not_error (~30s terminal-state timeout) already waits for the final event, so its flake is a manager-side cancel race (a different mechanism: manager.cancel fires token.cancel() + asyncio_task.cancel() and the cancelled final may not land), not the status-row-proxy race fixed here. It needs a separately root-caused product change — bundling a speculative fix would violate surgical/simplicity discipline. Tracked as follow-up.

Verification

  • tools/check.py: ruff + mypy clean, 2412 passed, 138 skipped.
  • Flake repro loop after the fix: ingest event-tape + waiter tests ×50 → 50/50, synth tape test ×10 → 10/10, 0 fail.

Delivery receipt

Step 3 — in-loop verify: tools/check.py PASS (ruff + mypy + 2412 passed). Flake loop 50/50 + 10/10.

Step 4 — codex review: round 1, no findings — "did not identify any regressions in the modified tests or helper for their current usage." Quiet at round 1.

Step 5 — fresh-review (clean subagent): PASS, no blocking findings. Independently confirmed the manager ordering, that wait>0 can't substitute, the match to the proven store-level waiters, the sound failure modes (returns on a FAILED task too → caller's succeeded assert fails clearly; hung task times out with a descriptive error), and that all 3 call-sites preserve their assertions. Rubric: Scope & simplicity = yes; Verification discipline = yes; rest N/A (tests-only).

Step 5 — /code-review (xhigh, 26 agents, 14 raw → 3 distinct), triaged:

finding verdict action
status_code == 200 guard silently skips non-200 → masks a real /events 500 as a 10s timeout TP Fixed — per-fetch non-200 raises immediately with status + body
limit=1000 page cap → on a >1000-event tape final pages off the end TP (latent) Fixed — paginate via has_more/next_from_seq
helper duplicates the "wait until final" predicate of 2 store-level waiters rejected store-level dup is pre-existing + across the HTTP-vs-store I/O boundary; a unified waiter is a single-use abstraction → more indirection than it removes. Mentioned, not swept.

Step 6 — doc-sync: N/A — test-infra only; no doc/CLI/route/env-var/frontmatter references to update; no CHANGELOG entry (not a published-package behavior change, per the changelog's product-facing convention).

🤖 Generated with Claude Code

…256)

`test_event_tape_replay_after_terminal` (and its same-family siblings
`test_resume_from_seq_returns_tail_only` and the synth tape assert) trusted the
task status row as a proxy for "the `final` event is on the event tape". But
`TaskManager._run` flips the status row terminal *before* appending the `final`
event (so a `final`-watching follower that calls `/result` always finds the row
terminal). A `wait_task_terminal` wait followed by a `wait=0` tape read therefore
races the trailing `progress` event onto the last slot, intermittently failing
`events[-1]["type"] == "final"` with `'progress'`.

Add a shared `wait_event_tape_final` conftest helper that polls the events
endpoint until the tape's last event is `final` — the signal the asserts actually
depend on — and route the three affected HTTP-level tests through it. This is the
HTTP-layer analogue of the store-level waiters already in `test_task_manager.py` /
`test_telemetry_tracing.py`. No product code changes.

The helper fails fast on a non-200 from `/events` (so a real 500 isn't masked as a
"never reached final" timeout) and pages through `has_more`/`next_from_seq` so the
trailing `final` is found past the 1000-event page cap. New
`test_event_tape_waiter.py` pins these behaviours on known-bad input.

The sibling `test_task_span_marks_cancel_not_error` flake is NOT addressed here: it
already waits for the `final` event, so its ~30s timeout is a manager-side cancel
race (a different mechanism), left to a separate root-caused change rather than
bundling a speculative product fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@helebest, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 43 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a07884b-ef5d-4550-a459-43f2876e12a9

📥 Commits

Reviewing files that changed from the base of the PR and between ef219da and 6c5328b.

📒 Files selected for processing (4)
  • tests/server/conftest.py
  • tests/server/test_event_tape_waiter.py
  • tests/server/test_ingest_task.py
  • tests/server/test_synth_tasks.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-fix+event-tape-final-race-256

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@helebest helebest merged commit afb71ab into main Jun 29, 2026
10 checks passed
@helebest helebest deleted the worktree-fix+event-tape-final-race-256 branch June 29, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: test_event_tape_replay_after_terminal — final event races the terminal status row

1 participant