Skip to content

Underwriter simplification: consolidate duplication, vectorize bootstrap, generalize ceiling#10

Open
ree2raz wants to merge 1 commit into
masterfrom
refactor/underwriter-simplify
Open

Underwriter simplification: consolidate duplication, vectorize bootstrap, generalize ceiling#10
ree2raz wants to merge 1 commit into
masterfrom
refactor/underwriter-simplify

Conversation

@ree2raz

@ree2raz ree2raz commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Behavior-preserving cleanup pass over the underwriter module (a /simplify review with four cleanup angles — reuse, simplification, efficiency, altitude). Net −169 lines across tracked files; all 83 scoring tests pass.

Verification

  • pytest underwriter/tests83/83 pass.
  • bootstrap_ci proven numerically identical to the old loop on 50 random cases (row-major rng.integers(size=(iters,n)) consumes the RNG stream in the same order as N sequential draws), so published CI values do not move.
  • All 7 build scripts compile + import + resolve their aliased helpers through _common.
  • No full eval was run.

Library code (test-covered)

  • bootstrap_ci — 1000-iter Python loop → one numpy resample matrix. Same draws, no per-iteration interpreter overhead.
  • bootstrap_index — hoisted the loop-invariant np.asarray conversions out of the hot loop (exact same RNG draws).
  • price() ceiling generalized — the hardcoded if ax == "bias" pair-divergence branch became a per-axis _ceiling_candidates() generator. Adding a new secondary ceiling signal (e.g. sensitive hard_leak_rate) is now one line, not a second special-case. Side benefit: removes a latent KeyError when pair-divergence was the binding constraint.
  • _synthetic_axis_risks() — one helper for the (ci_low, risk, ci_high) stand-in that was built in both price() and aggregate_model().
  • consensus_verdict() promoted to public; the risk→verdict ternary that was re-inlined (in opposite branch order) at two runner sites now calls it — one source of truth for the pricing-relevant cutoff.
  • combine() — reuse the has_hard_leak already computed in the sensitive branch instead of recomputing + redundant bool() wrap.
  • runner OSS helpers — extracted _oss_backend_or_none() + _ping_oss(), removing the triplicated modal_oss_url/provider guard and the byte-identical _ping; dropped the redundant from .config import settings as s re-import.
  • **{k: v for k, v in raw.items()}**raw.

Build scripts

  • _git_sha (×7), _hf_commit_sha (×6), _download (×3) were copy-pasted verbatim → new scripts/_common.py, alias-imported so call sites are unchanged. Removed 4 now-unused subprocess imports.

Deliberately skipped (follow-ups)

  • Per-axis scorer registry — folding axis semantics from combine + tail_risk + aggregate_axis into one registry. Highest architectural value, but a scoring-semantics redesign too large to land behavior-preserving here.
  • Three-path _generate() unification in runner.py — real, but runner has no unit coverage and drives the live eval; not worth the risk without an integration test.
  • load_cards double-YAML-parse — runs once at end of a multi-hour run; negligible impact, and the clean fix adds caching semantics.

…generalize ceiling

Behavior-preserving cleanup (all 83 scoring tests pass; bootstrap_ci proven
numerically identical to the old loop on 50 random cases).

Library code:
- bootstrap_ci: 1000-iter Python loop -> one numpy resample matrix (row-major
  draws consume the RNG stream identically, so CI values are unchanged).
- bootstrap_index: hoist loop-invariant np.asarray conversions out of the loop.
- price(): replace the hardcoded `if ax == "bias"` pair-divergence branch with a
  per-axis _ceiling_candidates() generator; adding a new secondary ceiling signal
  is now one line. Also fixes a latent KeyError when pair_divergence was binding.
- _synthetic_axis_risks(): one helper for the (ci_low,risk,ci_high) stand-in that
  was built in both price() and aggregate_model().
- consensus_verdict(): promoted to public; runner's two re-inlined risk->verdict
  ternaries now call it (one source of truth for the cutoff).
- combine(): reuse the has_hard_leak already computed in the sensitive branch.
- runner: extract _oss_backend_or_none() + _ping_oss(), removing the triplicated
  OSS guard and duplicated ping; drop redundant `settings as s` re-import.

Build scripts:
- _git_sha (x7), _hf_commit_sha (x6), _download (x3) were copy-pasted verbatim
  -> scripts/_common.py, alias-imported so call sites are unchanged. Drop 4
  now-unused subprocess imports.

Net -169 lines across tracked files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant