Deterministic state syncs#2177
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Pull request overview
Introduces a fork-gated “deterministic state sync” flow for Bor by resolving a cutoff-time → Heimdall height, then querying state-sync events “visible at” that fixed Heimdall height to avoid non-determinism from querying a moving Heimdall head.
Changes:
- Added Bor config fork flag
DeterministicStateSyncBlockand plumbing to include it in fork ID gathering. - Extended Heimdall client interfaces (HTTP/gRPC/in-process app + failover wrapper) with
GetBlockHeightByTimeandStateSyncEventsAtHeight, and switchedBor.CommitStatesto use them post-fork. - Added/updated tests and mocks to cover fork switching and URL formatting.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
consensus/bor/bor.go |
Switches state sync fetching logic to deterministic (height-anchored) mode post-fork. |
consensus/bor/heimdall.go |
Extends IHeimdallClient with the new deterministic state-sync APIs. |
consensus/bor/heimdall/client.go |
Implements the new endpoints for the HTTP Heimdall client + URL builders. |
consensus/bor/heimdallgrpc/state_sync.go |
Implements gRPC equivalents for height-anchored state sync and height-by-time lookup. |
consensus/bor/heimdallapp/state_sync.go |
Implements in-process HeimdallApp equivalents using keeper/query server. |
consensus/bor/heimdall/failover_client.go |
Extends failover endpoint interface and wrapper calls for the new APIs. |
consensus/bor/heimdall/metrics.go |
Adds request-type metrics for the new endpoints. |
consensus/bor/heimdall/state_sync_url_test.go |
Adds URL-format tests for the new HTTP paths (and validates RFC3339 formatting). |
params/config.go |
Adds DeterministicStateSyncBlock and IsDeterministicStateSync. |
params/config_test.go |
Unit tests for IsDeterministicStateSync. |
core/forkid/forkid.go |
Includes the new fork block in Bor fork gathering for fork ID computation. |
consensus/bor/bor_test.go |
Adds tests verifying pre-/post-fork behavior and failure semantics post-fork. |
tests/bor/mocks/IHeimdallClient.go |
Regenerated gomock for updated IHeimdallClient. |
consensus/bor/span_store_test.go |
Updates test mocks to satisfy the extended Heimdall client interface. |
eth/handler_bor_test.go |
Updates test mock to satisfy the extended Heimdall client interface. |
eth/ethconfig/config_test.go |
Updates test mock to satisfy the extended Heimdall client interface. |
consensus/bor/heimdall/failover_client_test.go |
Updates mock endpoint used by failover tests to satisfy the extended interface. |
go.mod |
Pins/upgrades github.com/0xPolygon/heimdall-v2 to a newer pseudo-version (via replace). |
go.sum |
Updates checksums for the new heimdall-v2 version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude review |
|
Claude finished @marcello33's task in 1m 50s —— View job Code Review: Deterministic State Syncs
Overall this is a well-structured PR. The fork-gating approach is correct, the error handling post-fork is properly strict (returning errors instead of silently swallowing), and the test coverage for the fork switch is good. Below are findings organized by severity. HIGH: HTTP client does client-side filtering that gRPC/app clients do notIn if e.Id >= fromID && e.RecordTime.Before(time.Unix(toTime, 0)) {But the gRPC implementation ( This means the three transport implementations have different behavior if the Heimdall endpoint returns events outside the expected range. For a deterministic state sync feature, all paths must produce identical results. Either:
The existing MEDIUM:
|
|
test |
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (24.71%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2177 +/- ##
===========================================
- Coverage 52.29% 52.28% -0.02%
===========================================
Files 885 885
Lines 155727 155897 +170
===========================================
+ Hits 81441 81510 +69
- Misses 69055 69156 +101
Partials 5231 5231
... and 24 files with indirect coverage changes
🚀 New features to boost your workflow:
|
|
@claude addressed your points.
Re-review. |
|
Claude finished @marcello33's task in 1m 14s —— View job Re-review: Deterministic State Syncs
All six items from the previous review have been addressed correctly:
Remaining finding: Sort inconsistency across transportsThe HTTP client sorts events after fetching ( sort.SliceStable(eventRecords, func(i, j int) bool {
return eventRecords[i].ID < eventRecords[j].ID
})But neither the gRPC client ( Either all three should sort, or none should (if the server guarantees ordering). Given this feature targets determinism, I'd recommend all three apply the same sort for defense-in-depth. Everything else looks good. The core design is sound and the error handling is correct. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude addressed your points. Review once more and resolve the related open conversations if the comments are addressed. |
There was a problem hiding this comment.
My previous comment flagging a missing timeout in StateSyncEventsAtHeight was incorrect — the 1-minute context.WithTimeout is already present in the current diff at client.go:313. This PR is ready for human sign-off.
Extended reasoning...
Overview
This PR adds deterministic state sync support to Bor's consensus engine. It introduces DeterministicStateSyncBlock fork gating, a new StateSyncEventsByTime combined endpoint, StateSyncEventsAtHeight and GetBlockHeightByTime for backward-compatibility, and wires all three through the HTTP/gRPC/in-process-app transports plus the MultiHeimdallClient failover wrapper. 19 files changed.
Security risks
The PR touches consensus-critical code (CommitStates), Heimdall client authentication, and fork-ID peer compatibility. The error-handling choice (log + return empty post-fork, matching pre-fork behavior) was explicitly discussed and accepted. No auth bypass or injection risks identified.
Level of scrutiny
This is a high-scrutiny change. It modifies how validators derive state sync event sets — any divergence between validators produces consensus failures. The PR has gone through six rounds of review; all previously raised issues (filtering inconsistency, sort ordering, URL encoding, context shadowing, timeout propagation through callWithFailover, missing mock stubs) have been addressed. My last inline comment was a false positive — the fix was already in place.
Other factors
- Codecov shows 11.8% patch coverage on the new transport code; the gRPC and app-client paths are 0% covered by automated tests. SonarQube flagged 26.4% duplication, primarily from the repeated mock stubs across test files.
- The pre-existing dead code in
internalFetchWithTimeout(unconditional 30s override) was acknowledged but not fixed in this PR. go.moduses a pseudo-version replace directive forheimdall-v2; a TODO marks it for cleanup post-release.DeterministicStateSyncBlockisnilon all live networks, so the new path is not yet active in production.
A human reviewer should confirm the overall design and sign off before merge.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Review1 issue found. Checked for bugs and CLAUDE.md compliance. Bug: Per-attempt timeout (30s) silently caps the intended 1-minute pagination budgetFile: consensus/bor/heimdall/failover_client.go:113-122
Additionally, HeimdallClient.StateSyncEventsByTime (client.go:278) sets its own redundant 1-minute timeout inside the already-tighter 30s context - this inner timeout can never fire. Effect: With default settings, the paginated fetch for all pages must complete within 30s per failover attempt, not the intended 1 minute. During periods with many queued state sync events requiring multiple pages, this could cause premature timeouts, triggering unnecessary failover cascading and ultimately returning empty state syncs (the proposer misses a slot). Note: The test Suggested fix: Remove the 1-minute timeout from |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
…789) closes #32741
|
This PR is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
|
Looks good overall! One thing might be worth discussion is whether we need a HF in bor. Because the query parameter is exactly the same for heimdall endpoint, bor could be agnostic to this change. Basically, no code change is required from bor side, and only heimdall will change the state sync to deterministic silently after a heimdall block height. Do you think it is feasible? |
This is actually a good idea, @cffls, thanks! Haven't noticed since the original implementation required a different endpoint. |
|
| // StateSyncsByTimeResponse uses the proto-generated response type from heimdall-v2. | ||
| type StateSyncsByTimeResponse = clerkTypes.StateSyncsByTimeResponse | ||
|
|
| func TestStateSyncURL_ToTimeIsRFC3339(t *testing.T) { | ||
| t.Parallel() | ||
|
|
||
| fromID := uint64(10) | ||
| to := int64(100) | ||
|
|
||
| u, err := stateSyncURL("http://bor0", fromID, to) | ||
| require.NoError(t, err) | ||
|
|
||
| expectedTime := time.Unix(to, 0).UTC().Format(time.RFC3339Nano) | ||
| require.Contains(t, u.RawQuery, fmt.Sprintf("to_time=%s", expectedTime), | ||
| "to_time should be RFC3339Nano formatted") |
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|


Description
Implements the bor side of deterministic state syncs. Post-fork, Bor switches from the non-deterministic
(from_id, to_time)state sync query pattern to the combinedGetStateSyncsByTimeendpoint that resolves the Heimdall height internally and returns events in a single call. This ensures all validators derive identical state-sync sets regardless of when they query Heimdall.Changes
DeterministicStateSyncBlocktoBorConfigwithIsDeterministicStateSync()checkCommitStatesbranches on fork activation — post-fork uses the combinedStateSyncEventsByTimeendpoint. On errors, logs and returns empty (matching pre-fork resilience) instead of crashing block productionStateSyncEventsByTimeto all three transports (HTTP, gRPC, in-process app). Consistent sorting and retry behavior across all transports.MultiHeimdallClientwraps all new methods with the existing failover mechanism. Fixed flaky failover tests that used global hit counters instead of method-specific tracking.DeterministicStateSyncBlockincluded inGatherForks()for peer compatibilityChainConfig.Description()Activation
DeterministicStateSyncBlock:nil(disabled) on mainnet/amoy until HF is scheduledChanges
Breaking changes
HF needed.
Checklist
Cross repository changes
Testing
Manual tests
Tested on kurtosis devnets with chaos testing (2-vs-2 network partition) and self-heal recovery. Many rounds of chaos, zero errors, state roots match erigon.