Skip to content

Add streaming one-wesolowski compaction APIs#325

Open
hoffmang9 wants to merge 9 commits intoChia-Network:mainfrom
hoffmang9:pr1-streaming-prover-upstream
Open

Add streaming one-wesolowski compaction APIs#325
hoffmang9 wants to merge 9 commits intoChia-Network:mainfrom
hoffmang9:pr1-streaming-prover-upstream

Conversation

@hoffmang9
Copy link
Copy Markdown
Member

@hoffmang9 hoffmang9 commented Feb 24, 2026

Summary

  • add a new c_bindings/fast_wrapper API surface for streaming one-wesolowski proof generation when y_ref is known up front
  • include incremental GetBlock and memory-budgeted (k, l) tuning to improve compaction-worker throughput/memory behavior
  • add minimal embedding/build support (vdf_fast_pairindex, quiet_mode, fastlib target, PIC/PIE options) and document the path in docs/bluebox_compaction.md

Test plan

  • Verify commit signature on the branch tip (git log -1 --show-signature)
  • Confirm branch diff against origin/main only includes intended PR1 files
  • Build/test on target platforms (Linux x86_64, macOS Intel, macOS ARM, Windows)

Made with Cursor


Note

High Risk
Touches core VDF execution (repeated_square callbacks, counter slotting, stdout behavior) and adds substantial new proving logic and build targets, so regressions could affect correctness/performance or embedded concurrency.

Overview
Adds a new src/c_bindings/fast_wrapper C API for fast one-wesolowski proving, including a streaming mode that requires y_ref up front (plus an optional incremental GetBlock implementation), a memory-budgeted (k,l) tuner, progress callbacks, and optional per-thread debug stats/parameter introspection; also adds a batch API for multiple streaming jobs.

Updates the core VDF loop to support embedding/multi-worker use: introduces quiet_mode to suppress stdout, assigns per-thread vdf_fast_pairindex() slots for fast counters, and adds OnBatchStart/OnBatchReplay callback hooks to allow streaming bucket updates to be rolled back when the fast path replays a batch.

Improves build/CI support by adding a fastlib static library target (libchiavdf_fastc.a), optional PIC/PIE flags and asm compilation rules in Makefile.vdf-client, ensures cmake is present on macOS GitHub runners, and documents the compaction path in docs/bluebox_compaction.md.

Written by Cursor Bugbot for commit 707b2f4. This will update automatically on new commits. Configure here.

Introduce a fast C wrapper with streaming proof generation, incremental GetBlock optimization, and memory-budgeted (k,l) tuning, plus the minimal runtime/build infrastructure needed to embed chiavdf in multi-worker clients.

Co-authored-by: Cursor <cursoragent@cursor.com>
@hoffmang9
Copy link
Copy Markdown
Member Author

hoffmang9 commented Feb 24, 2026

@Ealrann I'm hoping to cleanly upstream your changes so you can rely on chiavdf directly and not a fork. See the plan here:
https://gist.github.com/hoffmang9/6a848ff22f7cd29f4b6507600b099a5c

Guard the fast pairindex slot selection behind the existing x86/asm feature checks and return slot 0 on non-x86 targets, where threading counters are not compiled.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/c_bindings/fast_wrapper.cpp
Comment thread src/vdf.h Outdated
hoffmang9 and others added 2 commits February 23, 2026 23:57
Install cmake via Homebrew and export its bin path in the C libraries and wheel workflows so self-hosted macOS jobs don't fail when cmake is missing from PATH.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ocation.

Track and roll back per-batch checkpoints when replaying a failed fast batch, and switch pairindex slot allocation to unsigned atomics to avoid negative modulo indexing after counter wraparound.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/vdf.h
Document that batch bounds use completed-iteration base values while OnIteration is normalized to 1-based indices to avoid ambiguity in replay tracking.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread src/vdf.h
Comment thread src/c_bindings/fast_wrapper.cpp Outdated
hoffmang9 and others added 4 commits February 24, 2026 00:48
Expose missing batch C bindings and debug visibility so downstream Rust tests can validate tuner behavior end-to-end.

Co-authored-by: Cursor <cursoragent@cursor.com>
Default CHIA_VDF_FAST_COUNTER_SLOTS to 100 in threading.h so upstream builds keep lower BSS usage while allowing embedded deployments to override via compiler defines.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use one program-wide atomic slot allocator for `vdf_fast_pairindex()` so concurrent VDF computations started from different translation units cannot collide on shared fast counter slots.

Co-authored-by: Cursor <cursoragent@cursor.com>
Reject k>=64 before any 64-bit left-shift and reuse validated bucket spans for allocation, indexing, and finalization loops so invalid parameter tuning cannot trigger undefined behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread src/vdf.h
#else
return 0;
#endif
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pairindex modulo can divide by zero

High Severity

vdf_fast_pairindex() computes slot % kSlots where kSlots comes from sizeof(master_counter). With the new CHIA_VDF_FAST_COUNTER_SLOTS macro, setting it to 0 makes master_counter length 0, and the modulo becomes undefined behavior (likely crash) in fast VDF mode.

Additional Locations (1)

Fix in Cursor Fix in Web

@Ealrann
Copy link
Copy Markdown

Ealrann commented Feb 24, 2026

Oh, yes the plan is perfect. I'll definitely update the wesoforge client to use the main chiavdf.
It's ok if you keep enable_threads=true indeed, I'll use repeated_square_fast_single_thread instead.

Note: the plan mentions "Trick 2 (unreleased, on side branch)", but in fact trick 2 was properly merged on bbr branch (it's the commit Ealrann@f3c73bf). This one is important because it allows the group optimisation.
Commit Ealrann@445cb0d is also very important because it fixes a memory leak (also in bbr branch), it was problematic on high core count servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants