bridge: fix checkpoint tx height (off-by-one) post tx_index removal#593
Merged
Conversation
PR #587 replaced the cometbft tx_index lookup in the bridge checkpoint flow with helper.QueryTxBytesFromBlock(hash, height), passing the block height attached to the EventTypeCheckpoint event. That height is wrong by one: the event is emitted in PreBlocker against the *previous* block's txs, so node.Block(height) returns a block that does not contain the MsgCheckpointBlock tx, and the bridge fails to submit checkpoints to L1. Pass height-1 and document the invariant inline so this can't silently regress again.
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #593 +/- ##
=======================================
Coverage 45.1% 45.1%
=======================================
Files 172 172
Lines 18660 18663 +3
=======================================
+ Hits 8428 8435 +7
+ Misses 9025 9020 -5
- Partials 1207 1208 +1
... and 1 file with indirect coverage changes
🚀 New features to boost your workflow:
|
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
marcello33
previously approved these changes
May 19, 2026
cffls
reviewed
May 19, 2026
Routes the QueryTxBytesFromBlock call through a small named helper that owns the "event at H, tx in H-1" invariant. The helper is unit-testable via a queryTxBytesFromBlock function field on CheckpointProcessor, so a regression in the height calculation now fails a fast unit test instead of only surfacing in the kurtosis E2E. The unit test is not a substitute for the E2E (the surrounding flow is too integration-shaped to mock fully). It is a fast tripwire on the one line that broke #587.
Adds a unit test that drives createAndSendCheckpointToRootChain with a queryTxBytesFromBlock stub that fails, asserting the original error is propagated verbatim (no swallow, no fall-through into decode/sign with nil bytes). Pushes coverage on the diff from 66.6% to 100% — all newly added lines are now exercised.
|
cffls
approved these changes
May 19, 2026
marcello33
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Hot-fix for the checkpoints-to-L1 regression introduced by #587. The bridge stopped submitting checkpoints because the new tx-bytes lookup reads from the wrong block. Single-line fix (
height - 1), with a comment so the invariant is impossible to miss next time.Background
#587 made
indexer = "null"the default for heimdall by removing the bridge's last dependency on cometbft'stx_index. The previous flow looked up the checkpoint tx by hash throughnode.Tx(hash, prove=true)— that path traversed the tx indexer and didn't care which block the tx lived in.The replacement,
helper.QueryTxBytesFromBlock(cliCtx, hash, height), reads the block atheightdirectly and scansblock.Txsfor the matching hash. That requires the caller to pass the exact height of the block that contains the checkpoint tx. The new code passed the wrong height.What was wrong
bridge/processor/checkpoint.gowas passing theblockHeightattached to theEventTypeCheckpointevent — but that's the height where the event was emitted, not the height of the checkpoint tx. They differ by one. Walkthrough:MsgCheckpointBlock. The tx is included in heimdall block H. Vote extensions over H carry the side-tx signatures.app.PreBlocker(app/abci.go) tallies the vote extensions from H and runs the side-tx post-handler against the txs that were in H — those txs are loaded viaStakeKeeper.GetLastBlockTxs(ctx), which is populated at the end of every block bySetLastBlockTxs(ctx, req.Txs[1:])(app/abci.go:536).x/checkpoint/keeper/side_msg_server.goemitsEventTypeCheckpoint. Because PreBlocker runs at H+1, the event lands in H+1'sFinalizeBlockEvents.bridge/listener/heimdall.go:104) polls heimdall blocks and forwards events to the processor with the block height where it found them — i.e. H+1.QueryTxBytesFromBlock(cliCtx, txHash, H+1).node.Block(H+1)does not contain theMsgCheckpointBlocktx (that tx is in H), sofindTxInBlockreturns "tx not found", the bridge errors out, and no checkpoint is submitted to L1.Symptom in CI: the smoke test for checkpoints polls
/checkpoints/latest80+ times and never sees one (gh run view 26037788347 --repo 0xPolygon/heimdall-v2).The old tx_index path masked the issue completely —
node.Tx(hash, true)was looking the tx up by hash, not by block, so the wrong height was harmless then. The new direct-block path made the off-by-one observable for the first time.What's right now
Pass
height - 1toQueryTxBytesFromBlock. The invariant is exact, not approximate: the post-handler runs at H+1 against exactly the txs saved from H. There is no scenario where the tx would be in H-2 or H — thelastBlockTxsstate is rewritten every block. So a precise-1is correct (and a windowed scan would just be noise).Also added a multi-line comment at the callsite explaining the H-1 invariant and pointing at
app/abci.goso the next reader doesn't have to retrace this. The previous comment only said "this works with indexer=null" — true but it hid the precondition onheight.Why this won't repeat
bridge/processor/checkpoint.gonow names the exact files (StakeKeeper.SetLastBlockTxs/GetLastBlockTxs,app/abci.go) that establish the H-1 invariant. Anyone touching this callsite has to read past it.height→txBlockHeightat the callsite makes it obvious that the value passed toQueryTxBytesFromBlockis not the event's height — they are semantically distinct.QueryTxBytesFromBlockis now only ever called from this one site. If a second caller is ever added, they will encounter the same naming distinction and the same comment via grep.Verification
Reproduced the regression locally and confirmed the fix end-to-end:
Built
heimdall-v2:localfrom this branch andbor:localfrombor/develop.Ran
kurtosis runwith a 4-validator + 1-rpc config (same participant shape aspos-workflows/configs/kurtosis-e2e.yml).Polled
/checkpoints/lateston the first validator:Two checkpoints landed in the same window where CI saw zero with the broken
developimage. Bridge logs showsendCheckpointToHeimdallfollowed by a successful root-chain submission.Confirming the broader claim from #587
Re-grepped
origin/developfor any remaining cometbft tx-index reads in heimdall code:Only hit is the definition of
helper.QueryTxWithProofitself (helper/query.go:35) — and it has no callers. So with this fix in,indexer = "null"is genuinely safe on every node role (validator, sentry, RPC, bridge). The deadQueryTxWithProofcan be removed in a separate cleanup PR.Test plan
heimdall-v2:localwith the fix.