Add BlindFold ZK support for all operators by Acentelles · Pull Request #239 · ICME-Lab/jolt-atlas

Acentelles · 2026-05-03T18:51:07Z

This PR subsumes all the other previous PRs on BlindFold. I will leave the other ones for reference, but we can close them. In particular, it implements BlindFold constraint methods on:

Single-stage: Square, Add, Sub, Neg, Mul, And, Cube, Iff, Concat, Reshape, Slice, ScalarConstDiv
Custom-flow: Div (with range checks + one-hot), Rsqrt (dual range checks)
Neural teleportation: Sigmoid, Tanh, Erf (default flow), Cos, Sin (custom flow with transcript-derived r_node_output)
Lookup: Relu (OpLookupProvider + shout one-hot)
Matrix ops: Einsum (8 variants), Sum
Gather: GatherLarge (shout one-hot), GatherSmall (HammingBooleanity)
SoftmaxLastAxis (4-stage pipeline with Pedersen-committed inter-stage claims)

Test PR: cargo run --release --features zk -p jolt-atlas-core --example gpt2_zk_bench.

On a 14-core Apple M3:

Path	Prove (mean ± range)	Verify (mean ± range)	Proof size
non-zk on `blindfold-all-ops`	12.81 s (12.5-13.2)	1.04 s (1.0-1.1)	2504.6 KiB
zk (BlindFold) on `blindfold-all-ops`	23.41 s (23.2-23.9)	2.43 s (2.2-2.6)	2729.96 KiB
ratio	1.83×	2.34×	1.09×

Both paths run on the same workload (gpt2_zk_bench, 16-token GPT-2, max_num_vars=20, deterministic Pedersen generators). ZK_BENCH_THREADS=14 for the ZK rayon pool.

Prover (target: ~23.3 s)

Stage	Wall / busy	Share	What it does
`commit_witness_polynomials`	725 ms	3.1%	Pedersen-commit every witness MLE (62.9 ms `polynomial_map` + 631 ms `commit_to_polynomials`).
Per-node ZK sumcheck loop	~11 s wall	~47%	1103× `BatchedSumcheck::prove_zk`, cumulative busy 10.96 s (avg 9.94 ms per call). One per ONNX operator stage. Dominated by per-round Pedersen commits.
`prove_batch_opening_sumcheck_zk` (Stage 7)	1.75 s	7.5%	Single batched ZK sumcheck reducing all witness-MLE openings to one point.
`build_materialized_rlc` (Step 8)	335 ms	1.4%	Build gamma-weighted RLC of all witness polys.
`HyperKZG::prove` (Step 8)	1.96 s	8.4%	Single HyperKZG opening at `r_sumcheck`.
Bundle / Hyrax-row commits (uninstrumented)	~5.6 s	~24%	Build R1CS, assign witness, Pedersen-commit witness rows.
`BlindFoldProver::prove`	1.99 s	8.5%	Nova folding + Spartan tail. Sub-split below.

Inside BlindFoldProver::prove (1.99 s):

BlindFold::sample_random_satisfying_pair (Nova cross-term sampling): 1.04 s (~52%)
BlindFold::commit_cross_term_rows: 271 ms
BlindFoldInnerSumcheckProver::new: 117 ms
RelaxedR1CSInstance::fold: 109 ms
BlindFoldSpartanProver::witness_contributions: 86 ms
RelaxedR1CSWitness::fold: 35 ms
BlindFold::compute_cross_term: 11 ms
BlindFoldSpartanProver::new: 7 ms

Top three prover costs: per-node ZK sumchecks (~47%), bundle assembly + Hyrax row MSMs (~24%), BlindFold folding tail (~9%). Step 8 (~10%) and Stage 7 (~7.5%) are the remaining major chunks.

Verifier (target: ~2.44 s)

Stage	Wall / busy	Share	What it does
*Per-op `verify_zk_` loop**	~1.4 s	~57%	299× `Constant::verify`-equivalent MLE eval (cum 645 ms on the non-zk path for comparison) + 1× `Input::verify`. `BatchedSumcheck::verify_zk` itself is essentially free (~60 ms across 1102 calls — just transcript absorbs).
Public-node R1CS binding consistency	~0.6 s	~25%	Per-binding: recompute `public_mle(opening_point)`, check baked challenge equals it, check eval-commitment is identity G1. Currently does the per-op MLE eval a second time — dedup opportunity.
`BlindFoldVerifier::verify`	385 ms	16%	Spartan + Nova-folded `eval_commitments` check.
Step 8 `HyperKZG::verify`	1.4 ms	0.06%	One pairing + MSM.
Other (R1CS build, transcript)	~50 ms	~2%	`VerifierR1CSBuilder::build` + bookkeeping.

Top three verifier costs: Constant/Input local MLE evaluations (~57%), R1CS binding consistency (~25%, mostly redundant), BlindFold Spartan-verify (~16%). Step 8 PCS verify is essentially free.

BlindFold constraint methods on all sumcheck params: - Single-stage: Square, Add, Sub, Neg, Mul, And, Cube, Iff, Concat, Reshape, Slice, ScalarConstDiv - Custom-flow: Div (with range checks + one-hot), Rsqrt (dual range checks) - Neural teleportation: Sigmoid, Tanh, Erf (default flow), Cos, Sin (custom flow with transcript-derived r_node_output) - Lookup: Relu (OpLookupProvider + shout one-hot) - Matrix ops: Einsum (8 variants), Sum - Gather: GatherLarge (shout one-hot), GatherSmall (HammingBooleanity) - SoftmaxLastAxis (4-stage pipeline with Pedersen-committed inter-stage claims) Sub-protocol constraints: - ReadRafSumcheckParams (ps_shout), ReadRafParams (simple shout) - HammingWeightSumcheckParams, BooleanitySumcheckParams, RaSumcheckParams - HammingBooleanitySumcheckParams, IdentityRCParams - TeleportDivisionParams ZK pipeline infrastructure: - prove/verify helpers for each custom-flow pattern - Batched constraint combining with offset challenge indices - Opening value reordering to match constraint required_openings order - Round polynomial coefficient padding for UniPoly trailing-zero stripping - Multi-chain initial claims for BlindFoldWitness - ZK-mode fallback for verifier append_virtual/append_dense/append_sparse - Prover append_sparse pushes to pending_claims for BlindFold - Production Pedersen generators from HyperKZG SRS - Softmax auxiliary_claims (public only) + inter_stage_commitments - zk_sumcheck_num_instances for transcript batching coefficient replay 29 ZK tests (25 single-op + 4 multi-op chains).

joltworks: ReadRafParams.table is only read inside the zk-gated output_constraint_challenge_values method. jolt-atlas-core: recip_mult_params is only mutated inside the zk-gated inv_sum_evals assignment in build_stage1_instances. Without the zk feature both trip dead_code/unused_mut, which CI promotes to errors via -D warnings.

Refactor SoftmaxLastAxisVerifier so its stage methods take decomposed args (accumulator, transcript, scale_bits) instead of &mut Verifier, and split each stageN into a build_stageN_verifiers method that returns the boxed instance vector. Drive the actual sumcheck verification from a new run_stage helper on the verifier struct (non-ZK path) or from verify_zk_sumcheck_instances (ZK path). This lets verify_softmax_zk drop the temp-Verifier construction with mem::take, drop the manual transcript-replay closure that was emulating the IOP appends, and call full per-stage verify_zk_sumcheck_instances for stages 1-4 instead of just absorbing the proof commitments. Add cache_R_zk and cache_r_exp_zk for the two private inter-stage claims: insert explicit (r, F::zero()) placeholders at the right OpeningId rather than relying on the accumulator's permissive zk_mode silent-zero fallback, and absorb the Pedersen commitment from the bundle into the transcript (mirror of the prover side in prove_softmax_zk). operand_link is left out of the ZK path with an explanatory comment: it is a verifier-side algebraic identity over SoftmaxZHi, SoftmaxZLo, SoftmaxSatDiff, X(r2), all of which are private openings that the verifier only sees as zk_mode placeholder zeros. Encoding the identity as a BlindFold R1CS constraint is the right fix and is tracked separately.

Expand the NOTE in verify_softmax_zk so whoever picks up the operand_link follow-up has the full recipe: which BlindFold-side primitive is missing (an AlgebraicIdentity layout variant emitting (SoP)*1 = 0 with no output_var / blinding / Pedersen commitment), why the existing extra_constraints mechanism is the wrong shape (commits and binds output_value via Pedersen, used today only for PCS binding and currently dormant in prove_zk), the exact OutputClaimConstraint terms for the softmax operand_link, and the estimated cost. No code change; comment only.

Replace the if-else-if matches!() chains in prove_zk and verify_zk with exhaustive match expressions over &node.operator. Adding a new Operator variant now fails to compile until it is routed to a specific flow or to one of the default branches, instead of silently falling through to the catch-all. Verifier dispatch additionally separates the previous `has_sumcheck` runtime check into two explicit arms: a no-sumcheck arm (Input, Identity, Broadcast, MoveAxis, Constant, IsNan) that runs only verify_zk_eval_reduction, and a default-with-sumcheck arm for the ordinary algebraic ops. Behavior preserved exactly: Clamp stays in the with-sumcheck arm to match the original verifier (the prover side returns None for Clamp, so this remains a latent assert mismatch if ever exercised; tracked as a separate follow-up not addressed here). MeanOfSquares similarly kept in the with-sumcheck arm to match the original (the prover panics on it via the unimplemented-arm panic in create_prover_instance, so it is broken either way until ZK support lands).

Clamp: the prover treats Clamp as a no-op (create_prover_instance returns None), but the verifier was routing it to the with-sumcheck arm, where it would hit an assert_eq! mismatch on proof_node_idx if ever exercised. Move Clamp to the no-sumcheck arm on the verifier so both sides only run eval_reduction. Also split the prover's default branch into explicit no-sumcheck and with-sumcheck arms so the dispatch shape is symmetric with the verifier and the no-op semantics for Clamp are visible at dispatch time rather than buried inside a runtime create_prover_instance branch. MeanOfSquares: not implemented in the OperatorProofTrait system at all (see ops/mod.rs:254 macro fallback). Make this surface as an explicit unimplemented!() at both prove and verify dispatch with a matching message, instead of a deep panic in create_prover_instance and an opaque assertion failure on the verifier. Behavior preserved: any model that contains MeanOfSquares fails loudly on either side.

The prover's `append_virtual`/`append_dense`/`append_sparse` on `ProverOpeningAccumulator` unconditionally appended cleartext claims to the transcript while the verifier in `zk_mode` did not. Add a `zk_mode` flag on the prover that suppresses those cleartext appends (keeping the `pending_claims` push so the downstream Pedersen commit step still hides them). `prove_zk` sets `zk_mode = true` at entry and toggles it off only for the two intentionally-public scalars: the output-claim append and Softmax's `send_auxiliary_vectors` / `cache_exp_sum`. Verifier-side, reorder `append_virtual` / `append_dense` / `append_sparse` to check `zk_mode` first. Without this, when the same `opening_id` is registered twice in a sumcheck batch (e.g. `SoftmaxExpQ` from both `RecipMult` and `ExpSum` cache_openings), the second call would fall into the standard branch (key now exists) and emit a stray `transcript.append_scalar(F::zero())` the prover never produces. Pull Broadcast / MoveAxis / Identity / IsNan / Clamp out of the no-sumcheck arm in `prove_zk` / `verify_zk` and run the operator's input-cache step. Without this their inputs (e.g. ScalarConstDiv feeding a Broadcast) saw empty openings and the next default-branch sumcheck panicked looking up `reduced_evaluations[node]`. `prove_zk_eval_reduction` now skips empty openings, matching the pre-existing skip in `verify_zk_eval_reduction`. `examples/gpt2_zk_bench.rs`: bump `PedersenGenerators::deterministic` to 4096 (16 was sized for the synthetic single-op ZK tests). `e2e_tests.rs`: add `test_gpt2_zk` as a regression target, mirroring `test_gpt2` and `#[ignore]`d on the model download. GPT-2 seq_len=16, MacBook M3: | stage | non-zk | zk | ratio | | ------ | ------ | ------- | ----- | | prove | 25.6 s | 50.4 s | 1.97x | | verify | 2.5 s | 4.4 s | 1.78x | | size | 2.43 M | 2.68 M | 1.10x | Run with `RUST_MIN_STACK=33554432 RAYON_NUM_THREADS=2` on macOS: the ZK pipeline overflows rayon's default 2 MB worker stack on GPT-2-sized MLE work, and the patched arkworks MSM (`ec/src/scalar_mul/variable_base/mod.rs:813-857`) builds a nested 2-thread `ThreadPoolBuilder` per chunk per call, so without the RAYON cap macOS hits `pthread_create EAGAIN` under heavy parallelism.

Removes the need to invoke them with `RUST_MIN_STACK=33554432 RAYON_NUM_THREADS=2`. The constraint comes from the patched arkworks MSM (`ec/src/scalar_mul/variable_base/mod.rs:813-857`) building a fresh nested 2-thread `rayon::ThreadPoolBuilder` per chunk per MSM call: with default rayon (≈ num CPUs) and many MSMs in rapid succession on GPT-2-sized polynomials, macOS hits the per-process pthread limit (`pthread_create` → EAGAIN). The ZK pipeline also overflows rayon's default 2 MB worker stack on deep MLE recursion. Both call sites now call `rayon::ThreadPoolBuilder::new().num_threads(2) .stack_size(32 MB).build_global()` at entry. `build_global` is a no-op if rayon was already initialised, so user-set env vars still win. A proper fix would patch arkworks-algebra to reuse a single pool instead of building one per MSM chunk, but that's upstream work.

The prover materializes Q = sum_i gamma_i * P_i_padded (`build_materialized_rlc`) and produces a HyperKZG opening on Q at the batched-opening `r_sumcheck`. `joint_claim` includes the per-poly Lagrange factor for zero-padded variables, matching the existing non-zk `compute_joint_claim`. The verifier reconstructs the joint commitment via `combine_commitments(bundle.commitments, gamma_powers)` and calls `PCS::verify`, binding `bundle.commitments` to the batched-opening reduction. `effective_coeffs = gamma_i * lagrange_i` are threaded as the BlindFold extra-constraint challenge values so the R1CS enforces the same identity on `y_com`. Adds `prove_batch_opening_sumcheck_zk` on `ProverOpeningAccumulator`, `sumchecks_keys()` on `VerifierOpeningAccumulator`, and a `BlindFoldBatchOpening` opening-id registration in `Opening::cache_openings` on both sides. Bench scopes the bounded rayon pool to ZK calls only so the non-zk path keeps full parallelism; Pedersen generators bumped to 4096 to cover the larger sumcheck stage. Adds `test_gpt2_zk` (ignored) mirroring `test_gpt2`. All 53 zk e2e tests pass; `gpt2_zk_bench` runs end-to-end. Known gap: the verifier reads `effective_coeffs` from `bundle.baked.extra_constraint_challenges` instead of regenerating them from `gamma_powers_v` and `r_sumcheck`. Tracked in wiki/jolt-atlas/.../batched-opening-sound-verifier.md.

The ZK verifier was skipping `Constant::verify` (`ops/constant.rs:28-43`) and `Input::verify` (`ops/input.rs:27-49`), which in non-zk locally evaluate the public tensor's MLE at the reduced opening point and check against the prover's reduced-eval claim. On GPT-2 `Constant::verify` alone accumulates 618 ms across 299 invocations, so the gap meant ZK verify ran faster than non-zk verify by skipping a real check, not by doing equivalent work cheaper. Port the check: - New `ZkProofBundle::public_node_reduced_claims: BTreeMap<usize, F>` carries cleartext reduced-eval claims for Input/Constant nodes. The prover populates from `prover.accumulator.reduced_evaluations[node_idx]`, which is already cleartext on the prover side. - The `Input | Constant` arm of `verify_zk` now evaluates `MultilinearPolynomial::from(public_tensor).evaluate(r_reduced)` and asserts equality with the bundle value, then writes the verified claim back into `accumulator.reduced_evaluations` (overwriting the `F::zero()` placeholder). Bench (patched arkworks, full 14-thread parallelism, GPT-2): | | non-zk | zk (before) | zk (after) | |--------|---------|-------------|-------------| | prove | 13.14 s | 22.4 s | 21.8 s | | verify | 1.04 s | 0.63 s ⚠ | 1.53 s ✓ | | ratio | - | 0.60× | 1.47× | ZK verify is now correctly slower than non-zk (1.47×) instead of faster (the previous 0.60× was the soundness symptom). All 53 zk e2e tests pass. Bench harness: `gpt2_zk_bench.rs` accepts a `ZK_BENCH_THREADS` env var (defaults to 2 to preserve baseline behaviour with upstream arkworks) to flip the local ZK rayon pool size between the throttled and the full-parallelism configurations without code edits. Remaining soundness gap (future work, documented in wiki/jolt-atlas/.../zk-prove-overhead.md): the cleartext check is honest-prover defense. Active malice (correct bundle value but wrong BlindFold witness value for the constant opening) would still verify. Jolt closes the analogous gap by baking public values into R1CS matrices via `ValueSource::Constant(i128)` or by committing public data as PCS polynomials.

Picks up formatting drift in zk.rs / opening_proof.rs / gpt2_zk_bench.rs from earlier commits on this branch so `cargo fmt --check` (CI) passes.

Acentelles requested review from AntoineF4C5, ClankPan and Forpee May 3, 2026 18:51

Acentelles changed the base branch from main to blindfold-einsum-sum May 3, 2026 23:14

Acentelles changed the base branch from blindfold-einsum-sum to main May 3, 2026 23:15

Acentelles marked this pull request as draft May 3, 2026 23:16

Acentelles added 6 commits May 4, 2026 00:20

cargo fmt

7e7ad34

Acentelles marked this pull request as ready for review May 4, 2026 17:58

Acentelles added 5 commits May 12, 2026 23:50

chore: cargo fmt

b166aaa

Picks up formatting drift in zk.rs / opening_proof.rs / gpt2_zk_bench.rs from earlier commits on this branch so `cargo fmt --check` (CI) passes.

Acentelles force-pushed the blindfold-all-ops branch from 913f2b6 to b166aaa Compare May 14, 2026 15:10

Forpee merged commit 625259a into main May 18, 2026
7 checks passed

Forpee deleted the blindfold-all-ops branch May 18, 2026 09:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BlindFold ZK support for all operators#239

Add BlindFold ZK support for all operators#239
Forpee merged 12 commits into
mainfrom
blindfold-all-ops

Acentelles commented May 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Acentelles commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prover (target: ~23.3 s)

Verifier (target: ~2.44 s)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Acentelles commented May 3, 2026 •

edited

Loading