replay.t: stop flapping on per-job event interleave#443
Conversation
troglodyne
left a comment
There was a problem hiding this comment.
Probably could have been done in less lines with a more clever map, but not worth the effort to golf it down, especially since it is a test.
May not be the right fix if you think the ordering is in fact important here @exodist
PR Test-More#437 brought back a live replay.t on 2.0. Our branch still had plan skip_all, causing the GitHub merge commit to pick up upstream's unpatched version, which fails on Perl 5.24/5.28 due to event ordering differences between live and replay renderers. Update replay.t to match the upstream 2.0 version exactly, then apply PR Test-More#443's fix: sort ALL consecutive `job N`-prefixed lines (not just PASSED/FAILED status lines). Diag/Reason/Fail lines also carry `job N` and can interleave differently between live and replay, so the narrower sort was insufficient. Also adds the missing `Wrote archive:` strip and `job N` normalisation that upstream 2.0 introduced in PR Test-More#437. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e6329ad to
908b79c
Compare
Instead of running yath test live, capturing non-deterministic output, and sorting job lines to paper over ordering differences, commit a pre-generated run.yath archive and an expected_output.txt golden file. The test now replays only the committed archive. Replay is deterministic (events come out in archive order), so no sorting normalization is needed. The clean_output helper gains absolute-path stripping so the golden file is portable across machines. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
908b79c to
3ba86fc
Compare
There was a problem hiding this comment.
nope: looks like used later
There was a problem hiding this comment.
note: this is fragile if we decide to update the test format
we will not notice the regression here
There was a problem hiding this comment.
That's why I generated saved archive generated from 1.0 branch
If we change format of output and it doesn't break old replays then we probably haven't deduplicated enough code in the default renderer. Both go through the OutputManager so I don't think this should be the case.
If we suddenly replace [ PASSED ] with << Passed >> or whatever that should break due to us saving the "golden" expected text.
Are you saying that if we suddenly introduce new event types that these won't be covered? Yea sure I guess so, though if we drop event types via filters or something then the test will fail as the filters are chained together to the renderer via the OutputManager, so I doubt dropping event types we already expect here would result in anything other than a failure.
There was a problem hiding this comment.
A way we could put a stop to this is a test that doesn't care about order but instead generates every possible event type then replays them. Then it's just iterating over lines populating a %seen hash then asserting on keys after the replay.
There was a problem hiding this comment.
We can still improve that replay test later as needed, for now it s kind of blocking other PRs
this is fine to use it that way I reverted cruft previously pushed there to restore your state
The previous path normalization only matched the diag line
shape:
s{ at \S+/(t/\S+) line }{ at $1 line }g
That works for the one place where a recorded archive's
absolute path leaks through into the rendered output today
(the `( DIAG ) job N at <abs>/t/... line N` line, where
Test2's `like()` failure prints the file from caller's
__FILE__ at runtime). It does not cover other contexts where
a future renderer change might surface an absolute path -- the
failure summary table cells, status lines, etc. -- forcing us
to extend the regex case-by-case every time the renderer
grows a new path-bearing line type.
Generalize: strip the absolute-path prefix from any
repo-rooted `t/...` reference anywhere in the output. The
pattern walks one or more `/segment` components followed by
`/(t/...)`, replaces with just the captured `t/...`, and stops
at whitespace or `|` so table-cell paths normalise too without
crossing cell boundaries.
Verified against the original failure shape (diag line),
table-cell paths, and status-line paths -- all six test
inputs normalise correctly. 10/10 back-to-back local runs
pass. CI for the current branch SHA (3ba86fc) is green;
this change is forward-looking robustness, not a fix for an
active failure.
Co-Authored-By: Claude <noreply@anthropic.com>
1f86575 to
cadf4f4
Compare
Summary
t/Yath/integration/replay.tis a flapper. CI hit it on run #24952534922 (linux Perl 5.26): attempt 1 failed, attempt 2 passed.The test runs
yath test, captures the live stdout, then runsyath replayon the saved log archive and asserts both outputs are byte-identical afterclean_outputnormalisation. The live and replay renderers emit the same set of per-job lines but in different orders depending on how the failed job's diagnostics interleave with another job's( PASSED )status line:[ FAIL ]/( DIAG )/< REASON >lines drain through the renderer pipeline beforepass.tx's status line arrives —( FAILED ) fail.tx → diags → ( PASSED ) pass.tx.( FAILED ) fail.tx → ( PASSED ) pass.tx → diags.The previous
clean_outputonly sorted consecutive( PASSED )/( FAILED )status lines. That fired in the replay case (where the two status lines ARE consecutive) but not in the live case (the[ FAIL ]/( DIAG )/< REASON >lines break the run), so the two normalisations ended up in different orders and the equality assertion flapped.Fix
Trying to attach diag lines back to "their" status block is fragile because the diags carry only
job N(already normalised to a sentinel byclean_output) — there is no filename to disambiguate which status they belong to.The robust normalisation is to canonicalise the order of every
job N-prefixed line:job Nlines into a run.Non-job lines (the "The following jobs failed:" table, the "Yath Result Summary" block, etc.) keep their position so the file's overall narrative survives.
Test plan
yath -D test t/Yath/integration/replay.tpass.🤖 Generated with Claude Code