Attempt to fix progression halting via tainted SyncStatus by Ayiga · Pull Request #435 · EspressoSystems/optimism-espresso-integration

Ayiga · 2026-05-13T20:29:06Z

This PR:

Attempts to resolve a potential stalling issue due to a tainted SyncStatus, and a mismatched ChannelManager head.

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1214604487015085

When a `SyncStatus` from the `L2` comes in and moves the `SafeL2` reference backwards (at the very least in the case of `0` specifically) we can end up in a state where progression of the L2 chain through Espresso halts entirely. One such case where this can occur is when the `Streamer` gets `Reset` back to a prior point, or even potentially to a `Future` point, which causes a desync in the progress of a `ChannelManager` versus the `Streamer` itself causing the `Streamer` not to have the next expected `Batch` to be served. To address this specific issue, when loading `batches` for the `L1` submission, we add additional consideration to the `ChannelManager` to ensure that we are looking at the correct block in addition to the correct parent hash.

claude · 2026-05-13T20:29:32Z

Claude finished @Ayiga's task in 2m 35s —— View job

Review of PR #435: Attempt to fix progression halting via tainted SyncStatus

Gather context and read changed files
Review op-batcher/batcher/espresso.go changes
Review espresso/environment/8_reorg_test.go changes
Post review feedback

Summary

This PR addresses a stalling issue where a tainted SyncStatus (e.g., SafeL2.Number regressing to 0) causes the batcher's peekNextBatch to lose track of the chain tip, halting progression.

The approach is sound. The key insight is replacing the reliance on channelMgr.tip (which is only set after AddL2Block and reset to zero on Clear) with a more robust strategy:

Use the last block in channelMgr.blocks as the authoritative tip — this is the actual pending state
Fall back to SafeL2 when blocks is empty, immediately deriving the tip hash
Skip the streamer forward via skipStreamerToTargetBlockHeight to consume already-processed batches

The old code had a fragile recovery path: after Clear(), tip would be common.Hash{} and could only recover if the next peeked batch happened to be exactly SafeL2.Number + 1. If the sync status was tainted, this condition might never be met, causing permanent stalling.

What looks good

The skipStreamerToTargetBlockHeight helper is clean and correctly handles the "best effort" semantics
The test (TestE2eEspressoStreamerSyncStatusTainting) directly exercises the failure path by calling Refresh with safeBatchNumber = 0 and verifying the chain continues
Good addition of defer env.Stop(t, ...) calls to the existing TestE2eDevnetWithL1Reorg test — proper cleanup prevents resource leaks in CI
Consistent dereferencing pattern change from (*batch).Method() to batch.Method() where the pointer-to-interface is unnecessary

Issues found

I left detailed inline comments. The main items:

Doc comment typo — isHashNoptSet in the comment vs isHashEmpty as the actual function name (see comment)
Concurrency consideration — The blocks slice is read under the mutex, but the derived targetBlock is used outside it for skipping. Worth confirming the single-caller invariant and documenting it (see comment)
Edge case question — When blocks is empty and the peeked batch doesn't match targetBlock, we still skip to SafeL2.Number + 1. This seems safe because Refresh(safeBatchNumber=0) triggers a streamer reset, but the recovery sequence after a tainted-then-recovered SyncStatus deserves attention (see comment)
Test cleanup — Commented-out client declarations in the new test (see comment)

Overall

The fix correctly addresses the root cause and the test validates the exact failure scenario. The issues above are minor — the doc typo is trivially fixable, the concurrency note is a documentation ask, and the edge case question is about verifying behavior rather than a bug. This looks good to merge after addressing the minor items.

claude · 2026-05-13T20:31:28Z

+
+// isHashNoptSet is a helper function to check if a hash is the zero


Nit: the doc comment says isHashNoptSet but the function is named isHashEmpty. The comment should be updated to match.

Suggested change

// isHashNoptSet is a helper function to check if a hash is the zero

// isHashEmpty is a helper function to check if a hash is the zero

// value (not set).

claude · 2026-05-13T20:31:34Z

+	if tipBlock != (SizedBlock{}) && tipBlock.Block != nil {
+		targetBlock = tipBlock.NumberU64() + 1
+		tipHash = tipBlock.Hash()
+	} else if batch := l.EspressoStreamer().Peek(ctx); batch != nil && batch.Number() == targetBlock {


This Peek call is done outside the mutex, then skipStreamerToTargetBlockHeight (line 915) and another Peek (line 916) follow. But since espressoBatchLoadingLoop calls peekNextBatch in a tight for loop (line 969-971) and then immediately AddL2Block (which updates blocks), there's a TOCTOU window: between reading blocks (line 893) and using the derived targetBlock for skipping (line 915), the loading loop could concurrently add blocks that change what targetBlock should be.

In practice this may be safe because peekNextBatch is only called from espressoBatchLoadingLoop (single goroutine), so the blocks can't change between reading them and using them here. But it's worth confirming — is peekNextBatch guaranteed to only be called from the single loading loop goroutine? If so, a brief comment noting the single-caller invariant would help future readers.

claude · 2026-05-13T20:31:48Z

+	targetBlock := syncStatus.SafeL2.Number + 1
+	tipHash := syncStatus.SafeL2.Hash
+	if tipBlock != (SizedBlock{}) && tipBlock.Block != nil {


This is the core logic change and the heart of the fix. In the old code, tip was read from channelMgr.tip — a hash that's only set after AddL2Block succeeds. After a Clear(), tip would be common.Hash{} and the only way to recover was if the peeked batch happened to be exactly SafeL2.Number + 1.

The new approach improves on this significantly:

Uses the last block in channelMgr.blocks as the authoritative tip (since blocks is what actually tracks pending state)

Falls back to SafeL2 when blocks is empty, immediately setting tipHash instead of requiring a second Peek check

skipStreamerToTargetBlockHeight advances the streamer past already-known blocks

This should handle the tainted SyncStatus case well — if SafeL2.Number regresses to 0, targetBlock becomes 1, and skipStreamerToTargetBlockHeight won't skip anything meaningful if the streamer is already ahead, so the streamer just continues from where it is.

One question: when channelMgr.blocks is empty and the peeked batch number does not equal targetBlock (neither branch of the if/else if matches), we fall through with tipHash = syncStatus.SafeL2.Hash and targetBlock = syncStatus.SafeL2.Number + 1. The skip call will then consume batches up to targetBlock. But if the streamer is already well past SafeL2.Number + 1 (e.g., after a tainted SyncStatus resets SafeL2.Number to 0 then recovers), won't this skip consume batches we actually need?

I think this is fine because Refresh with safeBatchNumber=0 would trigger a streamer reset (per the interface: "This will only automatically reset the Streamer if the safeBatchNumber moves backwards"), so the streamer would be repositioned. But it'd be worth verifying that the skip + peek sequence is correct in the scenario where SyncStatus recovers from a tainted value between ticks.

claude · 2026-05-13T20:31:50Z

+	l2Seq := system.NodeClient(e2esys.RoleSeq)
+	// l1Client := system.NodeClient(e2esys.RoleL1)
+	// caffClient := system.NodeClient(env.RoleCaffNode)


Nit: these commented-out client declarations suggest the test may be incomplete or these were used during development. If they're planned for future use, that's fine; otherwise consider removing them to keep the test clean.

claude · 2026-05-13T20:31:54Z

+	status, err := l2Node.SyncStatus(ctx)
+	require.NoError(t, err)
+
+	err = system.BatchSubmitter.EspressoStreamer().Refresh(ctx, status.FinalizedL1, 0, status.FinalizedL2.L1Origin)


Good test — this directly exercises the tainted SyncStatus path by calling Refresh with safeBatchNumber = 0. The subsequent WaitForBlockToBeFinalized with status.UnsafeL2.Number + 2 validates that the chain continues to progress despite the tainted refresh.

One consideration: with a 5-minute timeout and only needing 2 more blocks finalized, this should be robust. But if the devnet is slow to start, the initial WaitForBlockToBeFinalized(5) with a 2-minute timeout on line 279 might be tight. Worth monitoring in CI.

gemini-code-assist

Code Review

This pull request introduces a new end-to-end test for EspressoStreamer sync status tainting and improves resource cleanup in existing tests. In the batcher, it refactors the batch peeking logic to more accurately track the chain tip using blocks from the channel manager and adds a utility to skip streamer batches to a target height. Feedback includes correcting a documentation typo, ensuring consistent use of the .Len() method for block queues, simplifying redundant conditional checks, and standardizing hexadecimal formatting in log entries.

gemini-code-assist · 2026-05-13T20:31:57Z

+	}
+}
+
+// isHashNoptSet is a helper function to check if a hash is the zero


There is a typo in the function documentation: isHashNoptSet should be isHashEmpty to match the function name.

Suggested change

// isHashNoptSet is a helper function to check if a hash is the zero

// isHashEmpty is a helper function to check if a hash is the zero

gemini-code-assist · 2026-05-13T20:31:57Z

+	if len(l.channelMgr.blocks) > 0 {
+		tipBlock = l.channelMgr.blocks[len(l.channelMgr.blocks)-1]


There is an inconsistency in how the length of l.channelMgr.blocks is accessed. In op-batcher/batcher/channel_manager.go (e.g., line 98 and 632), the code uses the .Len() method, whereas here it uses the built-in len() function. For consistency and to avoid potential compilation issues if queue.Queue is a struct type, it is recommended to use the .Len() method throughout the package.

Suggested change

if len(l.channelMgr.blocks) > 0 {

tipBlock = l.channelMgr.blocks[len(l.channelMgr.blocks)-1]

if l.channelMgr.blocks.Len() > 0 {

tipBlock = l.channelMgr.blocks[l.channelMgr.blocks.Len()-1]

gemini-code-assist · 2026-05-13T20:31:57Z


+	targetBlock := syncStatus.SafeL2.Number + 1
+	tipHash := syncStatus.SafeL2.Hash
+	if tipBlock != (SizedBlock{}) && tipBlock.Block != nil {


The check tipBlock != (SizedBlock{}) is redundant if tipBlock.Block != nil is also checked. Simplifying this check improves readability and ensures that we only proceed if we have a valid block pointer to access.

Suggested change

if tipBlock != (SizedBlock{}) && tipBlock.Block != nil {

if tipBlock.Block != nil {

gemini-code-assist · 2026-05-13T20:31:57Z

+			"batchNr", batch.Number(),
+			"batchParent", batch.Header().ParentHash.Hex(),
+			"tip", tipHash,


For consistency in logging, it is recommended to use .Hex() for all common.Hash types. This ensures that the output is always formatted as a hexadecimal string, matching the style used for batchParent on line 910.

Suggested change

"batchNr", batch.Number(),

"batchParent", batch.Header().ParentHash.Hex(),

"tip", tipHash,

"batchNr", batch.Number(),

"batchParent", batch.Header().ParentHash.Hex(),

"tip", tipHash.Hex(),

gemini-code-assist · 2026-05-13T20:31:58Z

 			"batchNr", (*batch).Number(),
 			"batchParent", (*batch).Header().ParentHash,
-			"tip", tip,
+			"tip", tipHash,


The logging here is inconsistent with previous log entries in the same function (e.g., line 910). It is recommended to use .Hex() for common.Hash types. Additionally, the dereference (*batch) is unnecessary in Go when calling methods on a pointer to a struct.

Suggested change

"batchNr", (*batch).Number(),

"batchParent", (*batch).Header().ParentHash,

"tip", tip,

"tip", tipHash,

"batchNr", batch.Number(),

"batchParent", batch.Header().ParentHash.Hex(),

"tip", tipHash.Hex(),

shenkeyao

Looks good overall! Some of the Claude comments seem reasonable, though, so worth addressing.

claude Bot reviewed May 13, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

shenkeyao reviewed May 15, 2026

View reviewed changes


		// isHashNoptSet is a helper function to check if a hash is the zero

-// isHashNoptSet is a helper function to check if a hash is the zero
+// isHashEmpty is a helper function to check if a hash is the zero
+// value (not set).

	// isHashNoptSet is a helper function to check if a hash is the zero
	// isHashEmpty is a helper function to check if a hash is the zero

		if len(l.channelMgr.blocks) > 0 {
		tipBlock = l.channelMgr.blocks[len(l.channelMgr.blocks)-1]

	if tipBlock != (SizedBlock{}) && tipBlock.Block != nil {
	if tipBlock.Block != nil {

Conversation

Ayiga commented May 13, 2026 • edited by philippecamacho Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR:

Uh oh!

claude Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #435: Attempt to fix progression halting via tainted SyncStatus

Summary

What looks good

Issues found

Overall

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

shenkeyao left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ayiga commented May 13, 2026 •

edited by philippecamacho

Loading

claude Bot commented May 13, 2026 •

edited

Loading