Gaussian math: cdf and pdf improvements (Step-1)#431
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (4)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR increases Gaussian CDF/PDF fixed-point evaluation precision to ChangesWAD rescaling of Gaussian CDF/PDF pipeline
Estimated code review effort: 4 (Complex) | ~75 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #431 +/- ##
==========================================
- Coverage 96.62% 96.59% -0.03%
==========================================
Files 34 34
Lines 3405 3405
Branches 793 794 +1
==========================================
- Hits 3290 3289 -1
Misses 70 70
- Partials 45 46 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
math/fixed_point/tests/sd29x9_tests/pdf_tests.move (1)
85-92: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winUse
pdf()here, notcdf().This guard still exercises the CDF path and compares against
ONE_RAW, so it never validates the PDF tail behavior you changed. Switch it topdf()and bound it toPDF_0_RAW(or assert the zero cutoff directly) so the test covers the intended contract.🐛 Proposed fix
- probes.destroy!(|raw| assert!(fixed(raw).cdf().unwrap() <= ONE_RAW)); + probes.destroy!(|raw| assert!(fixed(raw).pdf().unwrap() <= PDF_0_RAW));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@math/fixed_point/tests/sd29x9_tests/pdf_tests.move` around lines 85 - 92, The guard test in max_z_raw_is_analytical_saturation_point is still exercising the CDF path and checking ONE_RAW instead of validating the PDF tail behavior. Update the test to call pdf() on the relevant sd29x9 value, and assert against PDF_0_RAW or the zero cutoff directly so the contract matches the intended saturation behavior. Keep the change localized to the existing pdf_tests.move test and use the sd29x9::pdf symbol to verify the corrected path.scripts/gaussian_codegen/pdf/emit_test_vectors.py (1)
97-102: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winStale saturation-bound reference in docstring.
The docstring still says
not on a real-valued > 6.5 comparison, referencing the pre-PR saturation boundary.MAX_Z_RAWis now6.402729806(line 64); this example should match to avoid confusing future readers about the actual cutoff.📝 Proposed fix
Saturation mirrors the on-chain boundary exactly: it triggers when the *quantized* input `z_raw` meets or exceeds `MAX_Z_RAW` (the `>=` test in - `pdf.move::pdf_nonneg_raw`), not on a real-valued `> 6.5` comparison.""" + `pdf.move::pdf_nonneg_raw`), not on a real-valued `> 6.402729806` comparison."""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gaussian_codegen/pdf/emit_test_vectors.py` around lines 97 - 102, Update the docstring in expected_pdf_raw to remove the stale “> 6.5” saturation wording and replace it with the current MAX_Z_RAW-based cutoff so it matches the actual boundary used by pdf_nonneg_raw; keep the explanation aligned with the quantized-input >= MAX_Z_RAW behavior and reference the existing MAX_Z_RAW constant rather than a hardcoded numeric example.scripts/gaussian_codegen/cdf/emit_test_vectors.py (1)
89-94: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winStale saturation-bound reference in docstring.
The docstring still says
not on a real-valued > 6.3 comparison, referencing the old saturation boundary. WithMAX_Z_RAWnow6.109410205(line 62), this example value is outdated and could mislead future maintainers about the actual cutoff.📝 Proposed fix
Saturation mirrors the on-chain boundary exactly: it triggers when the *quantized* input `z_raw` meets or exceeds `MAX_Z_RAW` (the `>=` test in - `cdf.move::cdf_nonneg_raw`), not on a real-valued `> 6.3` comparison.""" + `cdf.move::cdf_nonneg_raw`), not on a real-valued `> 6.109410205` comparison."""🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gaussian_codegen/cdf/emit_test_vectors.py` around lines 89 - 94, The docstring on expected_phi_raw still references the old real-valued saturation cutoff and should be updated to match the current MAX_Z_RAW-based behavior. Edit the documentation in emit_test_vectors.py so it describes saturation using the quantized boundary handled by cdf_nonneg_raw and replaces the outdated “> 6.3” example with the current cutoff value or a neutral reference to MAX_Z_RAW.
🧹 Nitpick comments (1)
scripts/gaussian_codegen/cdf/emit_coefficients.py (1)
56-58: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winHardcoded
10^36in doc text could drift fromconstants.CDF_WAD.Unlike the
MAX_Z_RAWnote at Line 98, which dynamically interpolates{fmt_u128(MAX_Z_RAW)}, this doc text hardcodes`10^36`as a literal string rather than deriving it fromconstants.CDF_WAD. If the WAD scale changes again (as it just did in this PR, from10^18), this comment could silently go stale.♻️ Proposed fix to interpolate the WAD value dynamically
return f"""{banner} /// Numerator and denominator coefficients for the AAA-rational standard-normal /// CDF approximation on the central domain `[0, {constants.MAX_Z}]`. All values are -/// sign-magnitude pairs at CDF WAD (`10^36`) scale, indexed in ascending power +/// sign-magnitude pairs at CDF WAD (`{constants.CDF_WAD}`) scale, indexed in ascending power /// order (index 0 is the constant term).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gaussian_codegen/cdf/emit_coefficients.py` around lines 56 - 58, The CDF docstring in emit_coefficients currently hardcodes the WAD scale as 10^36, which can drift from constants.CDF_WAD if the scale changes again. Update the comment near the central-domain CDF approximation to interpolate the value dynamically using the existing formatting helper pattern used elsewhere (for example, the MAX_Z_RAW note), so the documentation always reflects constants.CDF_WAD.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@math/fixed_point/sources/internal/pdf.move`:
- Around line 32-45: Clarify the doc comment in pdf.move so the “degree 10” note
refers to the PDF approximation, not the CDF. Update the wording around
WAD/WAD_PER_RAW to say the headroom is tighter than the PDF/Horner polynomial
used here (and optionally mention TARGET_DEGREE or the PDF coefficient set), so
it no longer suggests the CDF is degree 10. Use the existing constant comments
near WAD and WAD_PER_RAW to locate the text.
In `@math/fixed_point/tests/gaussian_tests/cdf_monotonicity_tests.move`:
- Around line 45-49: The comment in monotone_just_below_saturation has the wrong
endpoint description for the assert_neighbor_monotone call. Update the note to
match the actual scanned range for assert_neighbor_monotone(6_108_000_000,
WINDOW), using the correct raw-input end value and keeping the explanation
aligned with the WINDOW-based span in cdf_monotonicity_tests.
In `@math/fixed_point/tests/gaussian_tests/pdf_monotonicity_tests.move`:
- Around line 46-50: The comment in monotone_just_below_saturation uses the
wrong end value for the scanned range, since
assert_neighbor_monotone(6_400_000_000, WINDOW) only covers inputs through
6_400_000_999, not 6.400999999. Update the inline note in
monotone_just_below_saturation to match the actual range implied by
assert_neighbor_monotone and WINDOW, and keep the wording consistent with the
analogous CDF test to avoid the same digit-grouping mistake.
In `@scripts/gaussian_codegen/shared/gates.py`:
- Around line 130-134: The overflow margin check in the sampling loop is not a
proof, since `range(0, max_z_raw, step)` only probes a coarse subset and
`horner_peak_product` may peak between samples. Replace the current
sampling-based gate in this block with an exhaustive or analytically bounded
check, such as an interval/derivative-based bound over the full domain, and only
use the result to enforce the `MIN_HEADROOM_BITS` margin if it is proven for all
inputs. Keep the logic centered around the existing `step`, `peak`, and
`horner_peak_product` flow so the location is easy to update.
- Around line 88-107: The monotonicity gate in the pair-checking loop is still
relying on the float64 proxy to skip some cases, so it is not formally provable.
Update the logic around `_proxy_output`, `_eval_int`, and the `bad | ambiguous`
filtering so every pair that cannot be ruled out by a coefficient-specific error
bound is exact-checked, or add a rigorous interval proof that excludes all
skipped pairs before trusting the proxy result.
---
Outside diff comments:
In `@math/fixed_point/tests/sd29x9_tests/pdf_tests.move`:
- Around line 85-92: The guard test in max_z_raw_is_analytical_saturation_point
is still exercising the CDF path and checking ONE_RAW instead of validating the
PDF tail behavior. Update the test to call pdf() on the relevant sd29x9 value,
and assert against PDF_0_RAW or the zero cutoff directly so the contract matches
the intended saturation behavior. Keep the change localized to the existing
pdf_tests.move test and use the sd29x9::pdf symbol to verify the corrected path.
In `@scripts/gaussian_codegen/cdf/emit_test_vectors.py`:
- Around line 89-94: The docstring on expected_phi_raw still references the old
real-valued saturation cutoff and should be updated to match the current
MAX_Z_RAW-based behavior. Edit the documentation in emit_test_vectors.py so it
describes saturation using the quantized boundary handled by cdf_nonneg_raw and
replaces the outdated “> 6.3” example with the current cutoff value or a neutral
reference to MAX_Z_RAW.
In `@scripts/gaussian_codegen/pdf/emit_test_vectors.py`:
- Around line 97-102: Update the docstring in expected_pdf_raw to remove the
stale “> 6.5” saturation wording and replace it with the current MAX_Z_RAW-based
cutoff so it matches the actual boundary used by pdf_nonneg_raw; keep the
explanation aligned with the quantized-input >= MAX_Z_RAW behavior and reference
the existing MAX_Z_RAW constant rather than a hardcoded numeric example.
---
Nitpick comments:
In `@scripts/gaussian_codegen/cdf/emit_coefficients.py`:
- Around line 56-58: The CDF docstring in emit_coefficients currently hardcodes
the WAD scale as 10^36, which can drift from constants.CDF_WAD if the scale
changes again. Update the comment near the central-domain CDF approximation to
interpolate the value dynamically using the existing formatting helper pattern
used elsewhere (for example, the MAX_Z_RAW note), so the documentation always
reflects constants.CDF_WAD.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5a7db493-6b00-4b26-831b-8fed4bef3faa
📒 Files selected for processing (38)
CHANGELOG.mdmath/fixed_point/README.mdmath/fixed_point/sources/internal/cdf.movemath/fixed_point/sources/internal/cdf_coefficients.movemath/fixed_point/sources/internal/horner.movemath/fixed_point/sources/internal/pdf.movemath/fixed_point/sources/internal/pdf_coefficients.movemath/fixed_point/sources/sd29x9/sd29x9_base.movemath/fixed_point/sources/ud30x9/ud30x9_base.movemath/fixed_point/tests/gaussian_tests/cdf_monotonicity_tests.movemath/fixed_point/tests/gaussian_tests/horner_tests.movemath/fixed_point/tests/gaussian_tests/pdf_monotonicity_tests.movemath/fixed_point/tests/sd29x9_tests/cdf_test_vectors.movemath/fixed_point/tests/sd29x9_tests/cdf_tests.movemath/fixed_point/tests/sd29x9_tests/pdf_test_vectors.movemath/fixed_point/tests/sd29x9_tests/pdf_tests.movemath/fixed_point/tests/ud30x9_tests/cdf_test_vectors.movemath/fixed_point/tests/ud30x9_tests/cdf_tests.movemath/fixed_point/tests/ud30x9_tests/pdf_test_vectors.movemath/fixed_point/tests/ud30x9_tests/pdf_tests.movescripts/gaussian_codegen/README.mdscripts/gaussian_codegen/cdf/.derive_output.jsonscripts/gaussian_codegen/cdf/derive.pyscripts/gaussian_codegen/cdf/emit_coefficients.pyscripts/gaussian_codegen/cdf/emit_test_vectors.pyscripts/gaussian_codegen/cdf/validate.pyscripts/gaussian_codegen/pdf/.derive_output.jsonscripts/gaussian_codegen/pdf/derive.pyscripts/gaussian_codegen/pdf/emit_coefficients.pyscripts/gaussian_codegen/pdf/emit_test_vectors.pyscripts/gaussian_codegen/pdf/validate.pyscripts/gaussian_codegen/shared/arithmetic.pyscripts/gaussian_codegen/shared/constants.pyscripts/gaussian_codegen/shared/gates.pyscripts/gaussian_codegen/shared/move_emit.pyscripts/gaussian_codegen/tests/test_emit.pyscripts/gaussian_codegen/tests/test_shared.pyscripts/gaussian_codegen/tests/test_validate_mirror.py
|
bidzyyys
left a comment
There was a problem hiding this comment.
Great job @immrsd, please remember to resolve @coderabbitai comments.
|
All remaining review follow-ups addressed:
Note: the Step-2 follow-up branch was renamed |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/gaussian_codegen/shared/constants.py`:
- Around line 99-104: The docstring for the inverse-CDF saturation constant is
stale and still claims it matches the CDF bound. Update the rationale near
INVERSE_CDF_MAX_Z in constants.py to reflect that CDF MAX_Z is now 6.109410205
while INVERSE_CDF_MAX_Z remains 6.3, and rewrite the explanation so it no longer
says the two bounds are equal at the corner. Keep the note focused on why
inverse_cdf still clamps to 6.3 and preserve the existing context around the
endpoint p = 1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 41ea5c74-afee-42fb-8826-7fef4b51d64b
📒 Files selected for processing (8)
math/fixed_point/README.mdmath/fixed_point/sources/internal/horner.movemath/fixed_point/sources/internal/inverse_cdf.movemath/fixed_point/sources/sd29x9/sd29x9_base.movemath/fixed_point/sources/ud30x9/ud30x9_base.movescripts/gaussian_codegen/README.mdscripts/gaussian_codegen/shared/arithmetic.pyscripts/gaussian_codegen/shared/constants.py
✅ Files skipped from review due to trivial changes (4)
- math/fixed_point/sources/sd29x9/sd29x9_base.move
- math/fixed_point/sources/ud30x9/ud30x9_base.move
- math/fixed_point/README.md
- scripts/gaussian_codegen/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/gaussian_codegen/shared/arithmetic.py
- math/fixed_point/sources/internal/horner.move
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/gaussian_codegen/shared/constants.py`:
- Around line 99-104: The docstring for the inverse-CDF saturation constant is
stale and still claims it matches the CDF bound. Update the rationale near
INVERSE_CDF_MAX_Z in constants.py to reflect that CDF MAX_Z is now 6.109410205
while INVERSE_CDF_MAX_Z remains 6.3, and rewrite the explanation so it no longer
says the two bounds are equal at the corner. Keep the note focused on why
inverse_cdf still clamps to 6.3 and preserve the existing context around the
endpoint p = 1.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 41ea5c74-afee-42fb-8826-7fef4b51d64b
📒 Files selected for processing (8)
math/fixed_point/README.mdmath/fixed_point/sources/internal/horner.movemath/fixed_point/sources/internal/inverse_cdf.movemath/fixed_point/sources/sd29x9/sd29x9_base.movemath/fixed_point/sources/ud30x9/ud30x9_base.movescripts/gaussian_codegen/README.mdscripts/gaussian_codegen/shared/arithmetic.pyscripts/gaussian_codegen/shared/constants.py
✅ Files skipped from review due to trivial changes (4)
- math/fixed_point/sources/sd29x9/sd29x9_base.move
- math/fixed_point/sources/ud30x9/ud30x9_base.move
- math/fixed_point/README.md
- scripts/gaussian_codegen/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/gaussian_codegen/shared/arithmetic.py
- math/fixed_point/sources/internal/horner.move
🛑 Comments failed to post (1)
scripts/gaussian_codegen/shared/constants.py (1)
99-104: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Update the stale CDF-bound rationale.
Line 100 through Line 102 still say the inverse-CDF clamp is equal to CDF
MAX_Z, but CDFMAX_Zis now6.109410205whileINVERSE_CDF_MAX_Zremains6.3.Suggested wording update
"""Output saturation clamp for the quantile, as an exact decimal string: the -largest `|z|` `inverse_cdf` returns. Deliberately equal to the CDF domain bound -`MAX_Z` so the two functions agree at the corner (`cdf(6.3)` saturates to `1`, so -`inverse_cdf(1)` saturates to `6.3`). The deepest real value at `10^9` input +largest `|z|` `inverse_cdf` returns. This endpoint clamp intentionally remains +`6.3`; CDF now rounds to saturation before this bound, so `cdf(6.3)` is already +`1` and `inverse_cdf(1)` returns `6.3`. The deepest real value at `10^9` input resolution is `Φ⁻¹(1 − 10⁻⁹) ≈ 5.998`, so `6.3` is only ever returned for the exact endpoint `p = 1`."""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements."""Output saturation clamp for the quantile, as an exact decimal string: the largest `|z|` `inverse_cdf` returns. This endpoint clamp intentionally remains `6.3`; CDF now rounds to saturation before this bound, so `cdf(6.3)` is already `1` and `inverse_cdf(1)` returns `6.3`. The deepest real value at `10^9` input resolution is `Φ⁻¹(1 − 10⁻⁹) ≈ 5.998`, so `6.3` is only ever returned for the exact endpoint `p = 1`."""🧰 Tools
🪛 Ruff (0.15.20)
[warning] 103-103: String contains ambiguous
−(MINUS SIGN). Did you mean-(HYPHEN-MINUS)?(RUF001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/gaussian_codegen/shared/constants.py` around lines 99 - 104, The docstring for the inverse-CDF saturation constant is stale and still claims it matches the CDF bound. Update the rationale near INVERSE_CDF_MAX_Z in constants.py to reflect that CDF MAX_Z is now 6.109410205 while INVERSE_CDF_MAX_Z remains 6.3, and rewrite the explanation so it no longer says the two bounds are equal at the corner. Keep the note focused on why inverse_cdf still clamps to 6.3 and preserve the existing context around the endpoint p = 1.
TL;DR
cdf/pdfare now monotone between every pair of adjacent representableinputs — the previously documented 1-ULP far-tail inversions are gone.
Fixed by raising the internal Horner accumulation scale from
10^18to10^36: gas-neutral (the arithmetic already runs inu256), coefficientsstill fit
u128, no public API change.CDF
6.3 → 6.109410205(the smallestzwhoseΦrounds to1at the10^-9output resolution), PDF6.5 → 6.402729806(whereφrounds to0). The cut-off is lossless, and the fit no longer spends approximationbudget on the dead tail.
grid-based check stepped ~10^5 raw inputs at a time and could not see
single-input inversions by construction.
a bit-exact model of the on-chain integer pipeline, raising the
correctly-rounded rate at the same degree, gas, and API.
Problem
10^18, per-step floor-truncation noise(~2×10⁻⁵ ULP) exceeds
Φ's true per-step increment in the far tail, soneighboring inputs could invert (first at
cdf(4.533726001) < cdf(4.533726000); ~6.4k cases, with analogouspdfup-blips). This was adocumented caveat, and it breaks consumers that assume a monotone CDF —
e.g. a future binary-search
inverse_cdf.Φalready rounds to1.000000000forz ≥ 6.109410205, but saturation sat at6.3— so ~35M tail inputsreturned
0.999999999where the correctly-rounded answer is1.0(PDF: ~97M inputs,
6.5vs6.402729806). Fitting across that flat tailalso degraded the approximation everywhere else.
Changes
horner.move: the accumulation scale is now a per-call parameter ofmul_wad/horner_eval!;cdf.move/pdf.movepromote raw inputs by10^27and evaluate atWAD = 10^36.100-dps
mpmathoracle); coefficients regenerated on the clamped domain withthe degrees pinned (CDF 9 / PDF 10) so a regeneration can never silently
change them. The fitting method itself is unchanged in this PR.
shared/gates.py, run bymake ci):at-risk tail (2.1B / 2.4B pairs) via a vectorized proxy with exact
big-integer recheck — 0 inversions.
u256Horner product is 246 / 248 bits; thegate enforces clearance below
2^256, since10^36leaves only ~10 / ~8bits and any future degree or domain change must re-prove it.
at-risk inputs plus peak-product no-abort probes — 568 tests, up from 558.
precision, and error-bound prose updated. The 5-ULP accuracy contract is
unchanged (empirical worst case on the validation grid stays < 1 ULP).
Generated files (
*_coefficients.move,*_test_vectors.move,.derive_output.json) are emitter output, protected by the byte-for-byte--checkdrift guard.PR Checklist
Summary by CodeRabbit