replay.t: stop flapping on per-job event interleave by atoomic · Pull Request #443 · Test-More/Test2-Harness

atoomic · 2026-04-26T09:06:12Z

Summary

t/Yath/integration/replay.t is a flapper. CI hit it on run #24952534922 (linux Perl 5.26): attempt 1 failed, attempt 2 passed.

The test runs yath test, captures the live stdout, then runs yath replay on the saved log archive and asserts both outputs are byte-identical after clean_output normalisation. The live and replay renderers emit the same set of per-job lines but in different orders depending on how the failed job's diagnostics interleave with another job's ( PASSED ) status line:

Live (timing-dependent): the failed job's [ FAIL ] / ( DIAG ) / < REASON > lines drain through the renderer pipeline before pass.tx's status line arrives — ( FAILED ) fail.tx → diags → ( PASSED ) pass.tx.
Replay: status lines emit first, then all the per-test events trail at the end — ( FAILED ) fail.tx → ( PASSED ) pass.tx → diags.

The previous clean_output only sorted consecutive ( PASSED ) / ( FAILED ) status lines. That fired in the replay case (where the two status lines ARE consecutive) but not in the live case (the [ FAIL ] / ( DIAG ) / < REASON > lines break the run), so the two normalisations ended up in different orders and the equality assertion flapped.

Fix

Trying to attach diag lines back to "their" status block is fragile because the diags carry only job N (already normalised to a sentinel by clean_output) — there is no filename to disambiguate which status they belong to.

The robust normalisation is to canonicalise the order of every job N-prefixed line:

Walk the cleaned output line by line.
Buffer consecutive job N lines into a run.
On hitting a non-job line, sort the run lexicographically and flush, then pass the non-job line through.

Non-job lines (the "The following jobs failed:" table, the "Yath Result Summary" block, etc.) keep their position so the file's overall narrative survives.

Test plan

20/20 back-to-back local runs of yath -D test t/Yath/integration/replay.t pass.
A simulated diff of the exact CI failure case (live with interleaved diags vs replay with status-first) normalises to identical strings under the new algorithm.
Watch the next CI run on this branch to confirm the flake is gone.

🤖 Generated with Claude Code

troglodyne

Probably could have been done in less lines with a more clever map, but not worth the effort to golf it down, especially since it is a test.

May not be the right fix if you think the ordering is in fact important here @exodist

PR Test-More#437 brought back a live replay.t on 2.0. Our branch still had plan skip_all, causing the GitHub merge commit to pick up upstream's unpatched version, which fails on Perl 5.24/5.28 due to event ordering differences between live and replay renderers. Update replay.t to match the upstream 2.0 version exactly, then apply PR Test-More#443's fix: sort ALL consecutive `job N`-prefixed lines (not just PASSED/FAILED status lines). Diag/Reason/Fail lines also carry `job N` and can interleave differently between live and replay, so the narrower sort was insufficient. Also adds the missing `Wrote archive:` strip and `job N` normalisation that upstream 2.0 introduced in PR Test-More#437. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Instead of running yath test live, capturing non-deterministic output, and sorting job lines to paper over ordering differences, commit a pre-generated run.yath archive and an expected_output.txt golden file. The test now replays only the committed archive. Replay is deterministic (events come out in archive order), so no sorting normalization is needed. The clean_output helper gains absolute-path stripping so the golden file is portable across machines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

atoomic · 2026-04-26T12:02:57Z

is it a cruft?

nope: looks like used later

note: this is fragile if we decide to update the test format
we will not notice the regression here

That's why I generated saved archive generated from 1.0 branch

If we change format of output and it doesn't break old replays then we probably haven't deduplicated enough code in the default renderer. Both go through the OutputManager so I don't think this should be the case.

If we suddenly replace [ PASSED ] with << Passed >> or whatever that should break due to us saving the "golden" expected text.

Are you saying that if we suddenly introduce new event types that these won't be covered? Yea sure I guess so, though if we drop event types via filters or something then the test will fail as the filters are chained together to the renderer via the OutputManager, so I doubt dropping event types we already expect here would result in anything other than a failure.

A way we could put a stop to this is a test that doesn't care about order but instead generates every possible event type then replays them. Then it's just iterating over lines populating a %seen hash then asserting on keys after the replay.

We can still improve that replay test later as needed, for now it s kind of blocking other PRs
this is fine to use it that way I reverted cruft previously pushed there to restore your state

The previous path normalization only matched the diag line shape: s{ at \S+/(t/\S+) line }{ at $1 line }g That works for the one place where a recorded archive's absolute path leaks through into the rendered output today (the `( DIAG ) job N at <abs>/t/... line N` line, where Test2's `like()` failure prints the file from caller's __FILE__ at runtime). It does not cover other contexts where a future renderer change might surface an absolute path -- the failure summary table cells, status lines, etc. -- forcing us to extend the regex case-by-case every time the renderer grows a new path-bearing line type. Generalize: strip the absolute-path prefix from any repo-rooted `t/...` reference anywhere in the output. The pattern walks one or more `/segment` components followed by `/(t/...)`, replaces with just the captured `t/...`, and stops at whitespace or `|` so table-cell paths normalise too without crossing cell boundaries. Verified against the original failure shape (diag line), table-cell paths, and status-line paths -- all six test inputs normalise correctly. 10/10 back-to-back local runs pass. CI for the current branch SHA (3ba86fc) is green; this change is forward-looking robustness, not a fix for an active failure. Co-Authored-By: Claude <noreply@anthropic.com>

troglodyne approved these changes Apr 26, 2026

View reviewed changes

atoomic commented Apr 26, 2026

View reviewed changes

Comment thread t/Yath/integration/replay.t

atoomic commented Apr 26, 2026

View reviewed changes

Comment thread t/Yath/integration/replay.t Outdated

troglodyne mentioned this pull request Apr 26, 2026

feat(options): add --no-plugins to disable auto-plugin discovery #441

Closed

troglodyne force-pushed the replay-flake-fix branch from e6329ad to 908b79c Compare April 26, 2026 11:50

troglodyne force-pushed the replay-flake-fix branch from 908b79c to 3ba86fc Compare April 26, 2026 12:01

atoomic commented Apr 26, 2026

View reviewed changes

atoomic force-pushed the replay-flake-fix branch from 1f86575 to cadf4f4 Compare April 26, 2026 12:15

exodist merged commit eb7010c into 2.0 Apr 26, 2026
20 of 34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replay.t: stop flapping on per-job event interleave#443

replay.t: stop flapping on per-job event interleave#443
exodist merged 2 commits into2.0from
replay-flake-fix

atoomic commented Apr 26, 2026 •

edited

Loading

Uh oh!

troglodyne left a comment

Uh oh!

Uh oh!

Uh oh!

atoomic Apr 26, 2026

Uh oh!

atoomic Apr 26, 2026

Uh oh!

atoomic Apr 26, 2026

Uh oh!

troglodyne Apr 26, 2026

Uh oh!

troglodyne Apr 26, 2026

Uh oh!

atoomic Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

atoomic commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Uh oh!

troglodyne left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

atoomic Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

atoomic Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

atoomic Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

troglodyne Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

troglodyne Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

atoomic Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

atoomic commented Apr 26, 2026 •

edited

Loading