Skip to content

x/bor: fix off-by-one in producer downtime span#573

Merged
marcello33 merged 6 commits into
developfrom
kamui/fix-downtime-span-length
May 19, 2026
Merged

x/bor: fix off-by-one in producer downtime span#573
marcello33 merged 6 commits into
developfrom
kamui/fix-downtime-span-length

Conversation

@kamuikatsurgi
Copy link
Copy Markdown
Member

Description

The veBlop span created by PostHandleSetProducerDowntime used endBlock = startBlock + spanDuration, but spans are inclusive, so the correct formula is startBlock + spanDuration - 1. This made every downtime span one block longer than intended (6401 instead of 6400), causing overlaps with adjacent spans.

Confirmed on-chain:

Gated behind producerDowntimeSpanFixHeight to preserve pre-fork behaviour.

Copilot AI review requested due to automatic review settings April 1, 2026 05:44
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@kamuikatsurgi
Copy link
Copy Markdown
Member Author

@claude review once

@claude
Copy link
Copy Markdown

claude Bot commented Apr 1, 2026

Claude finished @kamuikatsurgi's task in 1m 54s —— View job


Code Review: x/bor: fix off-by-one in producer downtime span

  • Read changed files and gather context
  • Review helper/config.go changes
  • Review x/bor/keeper/side_msg_server.go changes
  • Review x/bor/keeper/side_msg_server_test.go changes
  • Post review feedback

Correctness

The fix is correct. The codebase consistently uses inclusive spans where duration = EndBlock - StartBlock + 1 (confirmed at side_msg_server.go:94). The old formula endBlock = startBlock + SpanDuration produced spans of SpanDuration + 1 blocks. The new formula startBlock + SpanDuration - 1 gives exactly SpanDuration blocks.

I also verified the other two AddNewVeBlopSpan call sites in app/abci.go (lines 797, 905) — they already compute the correct end block since they use startBlock = lastSpan.EndBlock + 1 and endBlock = lastSpan.EndBlock + SpanDuration, which yields duration = SpanDuration. Only PostHandleSetProducerDowntime had the bug.

Consensus Safety

The fork-height gating is the right approach for a post-handler state change, per the consensus-critical rules — changing how past blocks are processed without a gate would cause app hash mismatch. The implementation correctly preserves pre-fork behavior.

Issues

1. TBD fork heights — must be set before merge

All three chain-specific heights are 0 // TBD:

  • helper/config.go:490 (Mainnet)
  • helper/config.go:500 (Mumbai)
  • helper/config.go:510 (Amoy)

With producerDowntimeSpanFixHeight = 0, the fix is effectively always active (all block heights ≥ 0). This means the pre-fork behavior is currently unreachable on any chain. If this is intentional (i.e., the fix should apply retroactively because no node has reached these code paths yet), the // TBD comments are misleading and should be removed. If not, the heights need to be set to actual fork blocks before merging.

2. Pre-fork test uses block height -1

side_msg_server_test.go:796: The pre-fork test case uses fixHeight - 1 = -1 as the block height. While technically valid for the branch condition, a block height of -1 is not a realistic Cosmos SDK state. Consider using a non-zero producerDowntimeSpanFixHeight in the test setup (e.g., setting it to some test value like 100) to make both test cases use realistic positive block heights. This would also serve as a better regression test for the fork-height boundary.

Fix this →

3. Missing boundary test case

There's no test for blockHeight == fixHeight (the exact fork height). The condition is >=, so this should produce the fixed span. Adding a third test case for the exact boundary would strengthen coverage.

Summary

The bug fix is sound and well-motivated by on-chain evidence. The fork-height gating is appropriate for consensus safety. Main action items before merge: (1) resolve the TBD fork heights, and (2) consider improving test boundary coverage.


@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 45.1%. Comparing base (df13498) to head (3ada1d5).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
x/bor/keeper/side_msg_server.go 83.3% 0 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (83.3%) is below the target coverage (90.0%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop    #573   +/-   ##
=======================================
  Coverage     45.1%   45.1%           
=======================================
  Files          172     172           
  Lines        18655   18660    +5     
=======================================
+ Hits          8421    8428    +7     
+ Misses        9026    9025    -1     
+ Partials      1208    1207    -1     
Files with missing lines Coverage Δ
x/bor/keeper/side_msg_server.go 60.1% <83.3%> (+0.4%) ⬆️

... and 1 file with indirect coverage changes

Files with missing lines Coverage Δ
x/bor/keeper/side_msg_server.go 60.1% <83.3%> (+0.4%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an off-by-one error when creating veBlop spans in PostHandleSetProducerDowntime by computing an inclusive endBlock (start + spanDuration - 1) and gating the behavior behind a new fork height (producerDowntimeSpanFixHeight) to preserve historical behavior.

Changes:

  • Adjust veBlop span endBlock calculation to be inclusive (- 1) after a configured fix height.
  • Add test coverage to assert pre-/post-fork veBlop span duration behavior.
  • Introduce producerDowntimeSpanFixHeight config plumbing and getter in helper.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
x/bor/keeper/side_msg_server.go Implements inclusive end-block calculation for veBlop spans, gated by producerDowntimeSpanFixHeight.
x/bor/keeper/side_msg_server_test.go Adds a targeted test verifying span duration behavior before/after the fix height.
helper/config.go Adds producerDowntimeSpanFixHeight and accessor, with chain-specific initialization placeholders.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread helper/config.go Outdated
Comment thread x/bor/keeper/side_msg_server.go
Comment thread x/bor/keeper/side_msg_server_test.go
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 1, 2026

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix is correct in intent, but producerDowntimeSpanFixHeight = 0 for mainnet and amoy makes the fork gate permanently inactive — the pre-fork else-branch is dead code from block 0. Since the producer downtime feature is already live on both networks, a rolling upgrade with height=0 will cause validators on old and new code to compute different spanEndBlock values, producing a consensus split. Real activation heights need to be chosen and filled in before this is safe to deploy.

Extended reasoning...

Overview

This PR adds a producerDowntimeSpanFixHeight fork gate in helper/config.go and updates PostHandleSetProducerDowntime in x/bor/keeper/side_msg_server.go to use startBlock + spanDuration - 1 (inclusive) above the gate height, preserving the old startBlock + spanDuration behavior below it. A new test (TestPostHandleSetProducerDowntime_VeBlopSpanDuration) exercises both branches.

Security / Consensus Risk

All four chain entries in InitHeimdallConfigWith set producerDowntimeSpanFixHeight = 0 (with TBD comments for mainnet, mumbai, and amoy). Because valid block heights are always >= 1, the guard ctx.BlockHeight() >= helper.GetProducerDowntimeSpanFixHeight() is always true, and the else-branch (old formula) is permanently unreachable. Mainnet has had producer downtime active since block ~34,966,593 and amoy since ~20,457,139; with height=0 a rolling upgrade causes nodes on old code to commit spanEndBlock = start + spanDuration while nodes on new code commit start + spanDuration - 1, diverging state roots and halting or forking the chain. The same divergence affects full-history replay.

Level of Scrutiny

This is a production-critical consensus change touching the span calculation that drives Bor chain block production. The correct activation heights for mainnet and amoy must be decided in coordination with the validator set and cannot be placeholder zeros. This requires human review and sign-off before merge.

Other Factors

The test pre-fork case uses blockHeight = fixHeight - 1 = -1, an impossible production height that does not validate the gate in any realistic scenario. The fixHeight +/- 1 pattern is intentionally future-proof, but until real heights are filled in the test provides false confidence. SonarQube passes because it cannot reason about consensus semantics.

Comment thread helper/config.go
Comment thread x/bor/keeper/side_msg_server_test.go
@claude
Copy link
Copy Markdown

claude Bot commented Apr 1, 2026

Code review

1 issue found. Checked for bugs and CLAUDE.md compliance.

Issue: Consensus-critical fork gate is ineffective with height = 0

File: helper/config.go

producerDowntimeSpanFixHeight is set to 0 for all chains (Mainnet, Mumbai, Amoy, default). The gating condition ctx.BlockHeight() >= 0 is always true, making the new formula unconditionally active and the else branch dead code.

On Mainnet (downtime spans from block 34966593) and Amoy (from block 20457139), replaying historical blocks will compute a different spanEndBlock, causing app hash mismatch.

This violates .claude/rules/consensus-critical.md: "Any change to post-handler state logic applied to previous blocks requires a hard fork."

Suggested fix: Use math.MaxInt64 as the default sentinel so the gate is closed until a real fork height is set:

producerDowntimeSpanFixHeight = math.MaxInt64 // TBD: set to actual fork height before deployment

marcello33
marcello33 previously approved these changes Apr 1, 2026
cffls
cffls previously approved these changes Apr 1, 2026
Copy link
Copy Markdown
Contributor

@cffls cffls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days.

Copilot AI review requested due to automatic review settings May 4, 2026 09:08
@kamuikatsurgi kamuikatsurgi dismissed stale reviews from cffls and marcello33 via 602bb2e May 4, 2026 09:08
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread helper/config.go
marcello33
marcello33 previously approved these changes May 19, 2026
marcello33
marcello33 previously approved these changes May 19, 2026
Copilot AI review requested due to automatic review settings May 19, 2026 14:12
@marcello33 marcello33 merged commit 3f642d8 into develop May 19, 2026
15 of 16 checks passed
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comment thread x/bor/keeper/side_msg_server.go
Comment thread x/bor/keeper/side_msg_server_test.go
@claude
Copy link
Copy Markdown

claude Bot commented May 19, 2026

Code review

1 issue found. Checked for bugs and CLAUDE.md compliance.


Issue: Hardfork gate is a no-op when v080HardforkHeight = 0

File: x/bor/keeper/side_msg_server.go, line 697

ctx.BlockHeight() >= helper.GetV080HardforkHeight() evaluates to ctx.BlockHeight() >= 0, which is always true when v080HardforkHeight = 0 (the current value for all networks). The else branch preserving the old formula is unreachable dead code.

The codebase has helper.IsV080Hardfork() (helper/config.go:931-932) which correctly handles zero with a v080HardforkHeight > 0 guard. Using IsV080Hardfork() would make the pre-fork branch reachable and match the pattern used in other modules (e.g. clerk).

From .claude/rules/hardfork-rollout.md: A network is not silently left at 0 or TODO when the behavior should activate at a specific height. Check each helper's zero semantics.

Note: Other >= GetV080HardforkHeight() usages in x/bor/keeper/ are safe because they gate V080-specific field validation (e.g. TargetProducerId != RoundRobinDefault), which has a secondary guard. This usage has no such guard -- it changes the span end-block computation for all messages.

Suggested fix: Replace ctx.BlockHeight() >= helper.GetV080HardforkHeight() with helper.IsV080Hardfork(ctx.BlockHeight())

@kamuikatsurgi kamuikatsurgi deleted the kamui/fix-downtime-span-length branch May 19, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants