Skip to content
This repository was archived by the owner on May 29, 2026. It is now read-only.

debug: log allowEmptyBlocks wireup to triage MaruMultiValidatorTest CI failure#513

Closed
gauravahuja wants to merge 1 commit into
mainfrom
debug/log-allow-empty-blocks
Closed

debug: log allowEmptyBlocks wireup to triage MaruMultiValidatorTest CI failure#513
gauravahuja wants to merge 1 commit into
mainfrom
debug/log-allow-empty-blocks

Conversation

@gauravahuja

@gauravahuja gauravahuja commented May 8, 2026

Copy link
Copy Markdown
Contributor

Summary

CI run 25536072042 failed on MaruMultiValidatorTest > validators converge to stable block production without round skips with the test timing out after 240s. The CI report shows the chain in a periodic state where every 3rd block is at round=1 — the maximum consecutive round=0 streak is 2, but the test demands 5.

The reason for the round-skip is a Besu QBFT log line:

ProposalPayloadValidator - Invalid Proposal Payload: block did not pass validation.
  Reason BlockValidationError(message=Block is empty number=2 executionPayloadBlockNumber=2 hash=…)

That error string is unique to maru's EmptyBlockValidator (consensus/src/main/kotlin/maru/consensus/validation/BlockValidator.kt:257-259), but the test sets allowEmptyBlocks = true and a source audit shows every chain-construction site correctly omits EmptyBlockValidator when the flag is true:

  • BeaconBlockValidatorFactory.kt:32 — QBFT validator path
  • SyncSealedBlockImporterFactory.kt:90 — CL sync importer path
  • QbftFollowerFactory.kt:90 — follower path (not exercised in this test)

So either the runtime allowEmptyBlocks value diverges from config.allowEmptyBlocks=true, or EmptyBlockValidator is being invoked from a code path not in the audit.

What this PR adds

Three INFO-level diagnostic logs gated by [debug/allowEmptyBlocks] so they're easy to grep:

  1. BeaconBlockValidatorFactoryImpl.<init> — logs the runtime allowEmptyBlocks value when the QBFT validator factory is constructed.
  2. SyncSealedBlockImporterFactory.create — same, for the CL sync importer factory.
  3. EmptyBlockValidator.validateBlock — once-per-VM (via AtomicBoolean.compareAndSet) log on first invocation, with a stack trace, so we can see what's calling it when allowEmptyBlocks=true should have removed it from the chain.

Expected outcomes

Scenario Log signature Diagnosis
allowEmptyBlocks=true logged at both factories AND EmptyBlockValidator never logs rejections come from elsewhere (re-investigate) n/a
allowEmptyBlocks=false at one factory wireup leak — fix the leak bug in maru
allowEmptyBlocks=true everywhere AND EmptyBlockValidator first-invocation logs with stack chain composition diverges from source audit bug in factory ctor / DI

Test plan

  • Wait for CI on this branch and inspect the artifact test reports from the testing / maru tests job.
  • Grep [debug/allowEmptyBlocks] in the integration-test logs.
  • Read the EmptyBlockValidator first-invocation stack trace if present.
  • Once root cause is identified, revert this PR and fix the underlying issue.

🤖 Generated with Claude Code


Note

Low Risk
Low risk: adds INFO-level diagnostic logging (including a one-time stack trace) around allowEmptyBlocks and EmptyBlockValidator usage without changing validation behavior or control flow beyond logging.

Overview
Adds targeted INFO diagnostics tagged [debug/allowEmptyBlocks] to help triage unexpected empty-block rejections.

Logs the runtime allowEmptyBlocks value (and whether EmptyBlockValidator is included) when BeaconBlockValidatorFactoryImpl is constructed and when SyncSealedBlockImporterFactory.create is called. Also updates EmptyBlockValidator.validateBlock to emit a once-per-process first-invocation log with block metadata and a stack trace to identify the caller path.

Reviewed by Cursor Bugbot for commit e97e251. Bugbot is set up for automated code reviews on this repo. Configure here.

…I failure

CI run 25536072042 failed on `validators converge to stable block production
without round skips` with a periodic "Block is empty" rejection pattern (every
3rd block has round=1 because a validator's proposal is rejected with
"Block is empty number=N executionPayloadBlockNumber=N hash=…"). That error
string is unique to maru's `EmptyBlockValidator`, but the test sets
`allowEmptyBlocks = true` and a source audit shows every chain-construction
site correctly omits `EmptyBlockValidator` when the flag is true.

This change adds three diagnostic logs (gated by [debug/allowEmptyBlocks])
so the next CI run pinpoints whether:
  (a) the runtime `allowEmptyBlocks` value at the validator-factory ctor
      differs from what the test passed (config drift), or
  (b) `EmptyBlockValidator.validateBlock` is being invoked from a code path
      I haven't traced (the stack trace on first invocation locates it).

After we have the answer, this PR should be reverted and the underlying
issue fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter

Copy link
Copy Markdown

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
757 1 756 1
View the top 1 failed test(s) by shortest run time
maru.app.MaruPeerScoringTest::node disconnects validator when requests time out()
Stack Traces | 15.3s run time
org.opentest4j.AssertionFailedError: 
expected: 1
 but was: 0
	at maru.app.MaruPeerScoringTest.node disconnects validator when requests time out(MaruPeerScoringTest.kt:135)
	at java.base/java.lang.reflect.Method.invoke(Method.java:565)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1490)
	at java.base/java.util.concurrent.ForkJoinPool.helpJoin(ForkJoinPool.java:2248)
	at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:499)
	at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:669)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.tryRemoveAndExec(ForkJoinPool.java:1490)
	at java.base/java.util.concurrent.ForkJoinPool.helpJoin(ForkJoinPool.java:2248)
	at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:499)
	at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:669)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:511)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1450)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:2019)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:187)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@gauravahuja

Copy link
Copy Markdown
Contributor Author

Debug logs confirmed allowEmptyBlocks wireup is correct end-to-end. The 'Block is empty' rejections seen earlier were log bleed-through from sibling tests, not the failing test's own validators. Closing — replacing with separate fixes for runner sizing, log isolation, and Thread.sleep flakes.

@gauravahuja gauravahuja closed this May 8, 2026
@gauravahuja gauravahuja deleted the debug/log-allow-empty-blocks branch May 8, 2026 17:19
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants