Skip to content

Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)#809

Open
AayushBaniya2006 wants to merge 2 commits intoopenai:mainfrom
AayushBaniya2006:submission/chunk-ngram-0.295
Open

Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)#809
AayushBaniya2006 wants to merge 2 commits intoopenai:mainfrom
AayushBaniya2006:submission/chunk-ngram-0.295

Conversation

@AayushBaniya2006
Copy link

Summary

  • val_bpb: 0.29519 (mean of 3 seeds, std 0.00013)
  • Artifact: 13.4MB (code 181KB + model 13.2MB)
  • Training: 525s on 8xH100 SXM (~6,091 steps at 86ms/step)
  • Eval: 340s (TTT 53s + N-gram 287s)

Approach

Eval-time order-9 N-gram backoff cache is the primary technique. The cache is built incrementally from already-scored validation tokens (score-first, legal per competition rules). Processing in 1M-token sequential chunks with all GPU ranks sharing cache state ensures maximum cache utilization.

Key innovations:

  • Entropy-adaptive mixing: alpha varies by model confidence and N-gram order
  • Per-order multipliers: high-order matches (5-9) boosted 2x, low-order (2-3) suppressed 0.3x
  • Chunk-synchronized multi-GPU: all ranks update cache with full chunk data after scoring

Also includes score-first TTT (LoRA rank 8, AdamW) contributing ~0.015 BPB.

3-Seed Results

Seed Steps Pre-Quant BPB N-gram BPB
1337 6,084 1.1408 0.2953
42 6,094 1.1483 0.2950
2024 6,096 1.1490 0.2952

Timing

Phase Time Budget
Training + GPTQ + export 592s 600s
Eval (roundtrip + TTT + N-gram) 424s 600s

Architecture

11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)^2, BigramHash(4096), GPTQ int5, 27.3M params.

Compliance

  • N-gram cache: score-first (cache updated AFTER scoring each chunk)
  • TTT: score-first with hard enforcement (raises if disabled)
  • No hindsight/oracle selection
  • GPTQ calibration fits within training budget (525s + 75s post-train < 600s)
  • No training data accessed during eval phase

Order-9 chunk-based N-gram eval cache with entropy-adaptive alpha
and per-order multipliers, combined with score-first TTT (LoRA).
Mean val_bpb 0.29519 across 3 seeds (std 0.00013).

Architecture: 11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)^2,
BigramHash(4096), GPTQ int5. 13.4MB artifact, 525s training + 340s eval
on 8xH100 SXM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Match depth of PR openai#549 README: explain why techniques work,
full N-gram cache walkthrough, entropy-adaptive alpha details,
compliance section, timing budget with data access column,
ablation with deltas, and proper credits to prior work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@newjordan
Copy link

HGNGNGNGHGNGNGN bro.... my brain

XinghanLi66 added a commit to XinghanLi66/parameter-golf that referenced this pull request Mar 26, 2026
Today (2026-03-26) the leaderboard was transformed by eval-time n-gram
backoff cache technique. Add comprehensive context for agents:

- URGENT_ngram_backoff_breakthrough.md: full implementation guide with
  NgramEvalCache code, entropy-adaptive alpha, complementary training,
  priority order for implementation
- latest_sota_snapshot.md: updated with new PR landscape
- 3 reference code files from top PRs (openai#809 0.295, openai#803 0.442, openai#813 0.667)

The n-gram backoff is purely eval-time — adding it to our existing best
checkpoint should immediately jump from 1.119 to ~0.67 BPB.
Implementing it is now the single highest-priority task.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
Three variants targeting the 0.187 BPB gap to openai#1:
- bwing_alpha: clip 0.95, alpha 0.05-0.60 (isolate alpha curve)
- bwing_entropy_shift: per-order entropy center shift (isolate)
- bwing_full_port: all openai#809 techniques + fixed order mults (fire first)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
- Cubric 3D back online (CADENCE=32, warm-start)
- Per-order entropy center shift from openai#809
- Alpha 0.05-0.60, clip 0.95
- Our sliding-window TTT spliced in (1 epoch, SGD, freeze 2 blocks)
- TTT runs BEFORE n-gram eval → adapted model feeds n-gram

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
newjordan pushed a commit to newjordan/parameter-golf-1 that referenced this pull request Mar 26, 2026
- Port openai#809 LoRA TTT: rank-8 adapters on Q/V/LM head, AdamW, Polyak
- Add LoRA injection to CausalSelfAttention, Block, GPT forward paths
- 53s vs our old 410s TTT, 6x better BPB gain
- Cubric 3D ON + entropy shift + alpha 0.05-0.60 clip 0.95

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Programmerryoki added a commit to Programmerryoki/parameter-golf that referenced this pull request Mar 26, 2026
Implements the breakthrough eval-time technique from PR openai#809 (0.295 BPB):
- BackoffNgramMixer: order-2 to order-9 N-gram cache
- Entropy-adaptive alpha blending (model + N-gram predictions)
- Sequential eval building cache from scored tokens (legal/backward-looking)
- Configurable via NGRAM_EVAL=1 and NGRAM_MAX_ORDER=9 env vars
- GPT.forward() now supports _return_logits mode for N-gram blending

Enable with: export NGRAM_EVAL=1 NGRAM_MAX_ORDER=9
Robby955 added a commit to Robby955/parameter-golf that referenced this pull request Mar 26, 2026
Add complementary training (from @pentxayc openai#803) and per-order
multipliers (from @AayushBaniya2006 openai#809) on top of distributed
prefill + 15-gram + order-adaptive gating.

New 3-seed results: 0.28798 / 0.28804 / 0.28810
All seeds under 16MB, training under 560s, eval under 330s.

Updated README with legality hedge, full ablation, credits.
XinghanLi66 added a commit to XinghanLi66/parameter-golf that referenced this pull request Mar 26, 2026
…k trivial proposals

- research_memory.md: add PARADIGM SHIFT header, correct the eval_011 conclusion
  (failed due to naive/slow implementation, not because n-gram doesn't work),
  add OVERRIDING note in Open Hypotheses directing agents to PR openai#809 code
- codex_research_prompt.txt: add explicit ban on trivial proposals (random seed,
  minor hyperparams) in aggressive phase; add eval_011 correction note so agents
  use the correct vectorized chunk-based n-gram approach

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
XinghanLi66 added a commit to XinghanLi66/parameter-golf that referenced this pull request Mar 26, 2026
The Negative Results section said 'do not retry n-gram/lambda sweeps' and
'eval_011 does not justify cross-seed confirmation'. These entries would
block agents from implementing the correct PR openai#809 vectorized n-gram cache.

Replace with correct framing: eval_011's naive per-segment implementation
was the problem (1901s, 3× over budget), not the concept. The correct
vectorized chunk-based approach achieves 0.2952 BPB in 287s.

Also supersede the 'next single-variable refinement' hypothesis entry
which assumed refinement phase; we are now in aggressive phase (gap=0.827).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
XinghanLi66 added a commit to XinghanLi66/parameter-golf that referenced this pull request Mar 26, 2026
…(legality review)

- SOTA target is now PR openai#803: Complementary Training + Backoff N-gram + TTT
- PR openai#809 (0.2952) excluded pending legality review
- research_memory.md: fix Working SOTA Anchor section (agent had written it
  to explicitly ignore the URGENT file and stick to 1.1194 — removed that)
- All PR openai#809 references updated to PR openai#803/openai#813
- Dashboard: SOTA now 0.4416, gap 0.681

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
quietsmile added a commit to quietsmile/parameter-golf that referenced this pull request Mar 26, 2026
Extended eval-time n-gram backoff from order 9 to order 12, reduced chunk size
from 1M to 256K tokens for faster cache refresh, and increased alpha_max from
0.60 to 0.70. Two-seed validation: 0.2835 (seed=1337), 0.2833 (seed=42).
Improvement over PR openai#809 baseline: -0.0118 BPB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants