Skip to content

chore/sc-41640/reducing-tokens-of-disambiguator#3262

Open
nsantacruz wants to merge 49 commits intomasterfrom
chore/sc-41640/reducing-tokens-of-disambiguator
Open

chore/sc-41640/reducing-tokens-of-disambiguator#3262
nsantacruz wants to merge 49 commits intomasterfrom
chore/sc-41640/reducing-tokens-of-disambiguator

Conversation

@nsantacruz
Copy link
Copy Markdown
Contributor

Description

A brief description of the PR

Code Changes

The following changes were made to the files below

Notes

Any additional notes go here

…d build section-level base refs

  _get_commentary_base_ref previously matched nodes by internal key (e.g. 'OrachChaim'), causing misses when the key diverges from
   the English display title (e.g. Prisha → Tur). It also did not handle the case where the base index has an extra SchemaNode
  wrapper around its content leaf.

  Changes:
  - Match nodes by primary English title instead of schema key
  - After a successful match, append non-default node titles to base_title to form a valid complex ref (e.g. 'Tur, Orach Chayim')
  - Navigate through any intermediate SchemaNodes to the content leaf, then zip with the leaf's addressTypes[:-1] to produce a
  section-level (not segment-level) base ref
  - Return early from the complex/complex branch so the generic section loop only runs for simple bases

  Add disambiguator_helpers_test.py covering None/empty inputs, non-commentary, multiple base titles, book-level citing refs,
  both-simple Torah and Talmud cases, both-complex with matching and mismatching titles, complex/simple XOR cases, and missing
  base_text_titles.
nsantacruz and others added 19 commits April 29, 2026 17:21
High-score Dicta matches (score >= 8) previously bypassed LLM confirmation,
causing false positives when the citation was a vague book/folio reference
with no direct quote. Key changes:

- Add _verify_high_score: a dedicated LLM check for high-score candidates
  with a focused prompt warning about book-title citations, bare folio
  references, and incidental topical similarity
- Add VERIFICATION_EXAMPLES: 6 few-shot examples (4 NO, 2 YES) covering
  book titles, folio refs with/without inline quotes, paraphrases, and
  whole-book references; placed in SystemMessage with cache_control
- Add _get_candidate_text_for_confirmation: pads short candidates with
  neighboring segments for better confirmation context
- Update _llm_form_prior to flag whole-book/chapter citations early
- Move examples into SystemMessage with explicit cache_control in both
  _llm_confirm_candidate and _verify_high_score

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Skip non-segment spans that already have `llm_resolved_ref_non_segment` set,
and treat ambiguous losers (llm_ambiguous_option_valid=False) as resolved so
they are not re-grouped and re-dispatched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
False positives occurred when Dicta matched a high-score quote elsewhere in
the document body rather than adjacent to the actual citation marker. Now only
bypass LLM verification when the matched phrase is within 60 characters of
the citation in the windowed text, measuring from the nearest edges of the
phrase and citation intervals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ref_but_rejected

When _confirm_candidate returns NO (in both the Dicta and fallback search paths),
save the rejected candidate ref on the span under llm_resolved_ref_but_rejected.
Adds rejected_ref to NonSegmentResolutionResult and threads it through the apply
functions so it is written even when no resolution is ultimately found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Text to best-matching window

- _dicta_phrase_distance: replace abs-distance formula with proper interval logic so
  any overlap between the resolution phrase and the citation returns 0 (previously a
  phrase containing the citation could return a large positive distance)
- resolution_phrase: when Dicta provides both baseMatchedText and compMatchedText,
  use Levenshtein sliding-window search (normalized to strip nikkud/cantillation) to
  find the sub-span of baseMatchedText closest to compMatchedText, then map positions
  back to the original string so the result is always a true substring

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the disabled `if False` gate with a rule derived from 1,600-row
statistical analysis:
- section_level=True AND score>=8 AND dist<=10: 99.3% precision (2 misses/280)
- score>=15 AND dist<=5 (any section level): 100% precision (0 misses/101)

Previous rule (score>=6 / score>=8 with no distance gate) had ~4.3% miss
rate and included a risky non-section-level branch at only ~88% precision.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gitvelocity-reviewer
Copy link
Copy Markdown

📊 Code Quality Score: 27/100

34 × 0.8 (Large ESF) = 27.2, rounded to 27

Category Score Factors
🔭 Scope 8/20 3 new analysis scripts, 1 new dependency (levenshtein), gitignore update, CSV data file committed; touches production linker code paths but no new APIs or endpoints
🏗️ Architecture 5/20 Scripts reuse production helpers appropriately; duplicates production threshold logic creating maintenance risk; no new architectural patterns
⚙️ Implementation 10/20 ThreadPoolExecutor parallelism, multi-stage analysis logic, token counting via API, CSV I/O; moderate complexity; bugs include redundant retry logic and thread-unsafe env var mutation
⚠️ Risk 6/20 Analysis scripts only (not production code); committed CSV data file is repository hygiene risk; hardcoded model names may be incorrect; no migrations or auth changes
✅ Quality 3/15 No tests added; unused import (asdict); misleading help text (default 100 vs actual 2000); large data file committed; good docstrings partially offset quality concerns
🔒 Perf / Security 2/5 50-thread parallel API calls considered; no retry/backoff for rate limits; no security concerns for analysis scripts

Scored by GitVelocity · How are scores calculated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant