Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB) by AayushBaniya2006 · Pull Request #809 · openai/parameter-golf

AayushBaniya2006 · 2026-03-26T04:20:24Z

Summary

val_bpb: 0.29519 (mean of 3 seeds, std 0.00013)
Artifact: 13.4MB (code 181KB + model 13.2MB)
Training: 525s on 8xH100 SXM (~6,091 steps at 86ms/step)
Eval: 340s (TTT 53s + N-gram 287s)

Approach

Eval-time order-9 N-gram backoff cache is the primary technique. The cache is built incrementally from already-scored validation tokens (score-first, legal per competition rules). Processing in 1M-token sequential chunks with all GPU ranks sharing cache state ensures maximum cache utilization.

Key innovations:

Entropy-adaptive mixing: alpha varies by model confidence and N-gram order
Per-order multipliers: high-order matches (5-9) boosted 2x, low-order (2-3) suppressed 0.3x
Chunk-synchronized multi-GPU: all ranks update cache with full chunk data after scoring

Also includes score-first TTT (LoRA rank 8, AdamW) contributing ~0.015 BPB.

3-Seed Results

Seed	Steps	Pre-Quant BPB	N-gram BPB
1337	6,084	1.1408	0.2953
42	6,094	1.1483	0.2950
2024	6,096	1.1490	0.2952

Timing

Phase	Time	Budget
Training + GPTQ + export	592s	600s
Eval (roundtrip + TTT + N-gram)	424s	600s

Architecture

11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)^2, BigramHash(4096), GPTQ int5, 27.3M params.

Compliance

N-gram cache: score-first (cache updated AFTER scoring each chunk)
TTT: score-first with hard enforcement (raises if disabled)
No hindsight/oracle selection
GPTQ calibration fits within training budget (525s + 75s post-train < 600s)
No training data accessed during eval phase

Order-9 chunk-based N-gram eval cache with entropy-adaptive alpha and per-order multipliers, combined with score-first TTT (LoRA). Mean val_bpb 0.29519 across 3 seeds (std 0.00013). Architecture: 11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)^2, BigramHash(4096), GPTQ int5. 13.4MB artifact, 525s training + 340s eval on 8xH100 SXM. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Match depth of PR openai#549 README: explain why techniques work, full N-gram cache walkthrough, entropy-adaptive alpha details, compliance section, timing budget with data access column, ablation with deltas, and proper credits to prior work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

newjordan · 2026-03-26T05:24:48Z

HGNGNGNGHGNGNGN bro.... my brain

Today (2026-03-26) the leaderboard was transformed by eval-time n-gram backoff cache technique. Add comprehensive context for agents: - URGENT_ngram_backoff_breakthrough.md: full implementation guide with NgramEvalCache code, entropy-adaptive alpha, complementary training, priority order for implementation - latest_sota_snapshot.md: updated with new PR landscape - 3 reference code files from top PRs (openai#809 0.295, openai#803 0.442, openai#813 0.667) The n-gram backoff is purely eval-time — adding it to our existing best checkpoint should immediately jump from 1.119 to ~0.67 BPB. Implementing it is now the single highest-priority task. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three variants targeting the 0.187 BPB gap to openai#1: - bwing_alpha: clip 0.95, alpha 0.05-0.60 (isolate alpha curve) - bwing_entropy_shift: per-order entropy center shift (isolate) - bwing_full_port: all openai#809 techniques + fixed order mults (fire first) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Cubric 3D back online (CADENCE=32, warm-start) - Per-order entropy center shift from openai#809 - Alpha 0.05-0.60, clip 0.95 - Our sliding-window TTT spliced in (1 epoch, SGD, freeze 2 blocks) - TTT runs BEFORE n-gram eval → adapted model feeds n-gram Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Port openai#809 LoRA TTT: rank-8 adapters on Q/V/LM head, AdamW, Polyak - Add LoRA injection to CausalSelfAttention, Block, GPT forward paths - 53s vs our old 410s TTT, 6x better BPB gain - Cubric 3D ON + entropy shift + alpha 0.05-0.60 clip 0.95 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements the breakthrough eval-time technique from PR openai#809 (0.295 BPB): - BackoffNgramMixer: order-2 to order-9 N-gram cache - Entropy-adaptive alpha blending (model + N-gram predictions) - Sequential eval building cache from scored tokens (legal/backward-looking) - Configurable via NGRAM_EVAL=1 and NGRAM_MAX_ORDER=9 env vars - GPT.forward() now supports _return_logits mode for N-gram blending Enable with: export NGRAM_EVAL=1 NGRAM_MAX_ORDER=9

@pentxayc

Add complementary training (from @pentxayc openai#803) and per-order multipliers (from @AayushBaniya2006 openai#809) on top of distributed prefill + 15-gram + order-adaptive gating. New 3-seed results: 0.28798 / 0.28804 / 0.28810 All seeds under 16MB, training under 560s, eval under 330s. Updated README with legality hedge, full ablation, credits.

…k trivial proposals - research_memory.md: add PARADIGM SHIFT header, correct the eval_011 conclusion (failed due to naive/slow implementation, not because n-gram doesn't work), add OVERRIDING note in Open Hypotheses directing agents to PR openai#809 code - codex_research_prompt.txt: add explicit ban on trivial proposals (random seed, minor hyperparams) in aggressive phase; add eval_011 correction note so agents use the correct vectorized chunk-based n-gram approach Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The Negative Results section said 'do not retry n-gram/lambda sweeps' and 'eval_011 does not justify cross-seed confirmation'. These entries would block agents from implementing the correct PR openai#809 vectorized n-gram cache. Replace with correct framing: eval_011's naive per-segment implementation was the problem (1901s, 3× over budget), not the concept. The correct vectorized chunk-based approach achieves 0.2952 BPB in 287s. Also supersede the 'next single-variable refinement' hypothesis entry which assumed refinement phase; we are now in aggressive phase (gap=0.827). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…(legality review) - SOTA target is now PR openai#803: Complementary Training + Backoff N-gram + TTT - PR openai#809 (0.2952) excluded pending legality review - research_memory.md: fix Working SOTA Anchor section (agent had written it to explicitly ignore the URGENT file and stick to 1.1194 — removed that) - All PR openai#809 references updated to PR openai#803/openai#813 - Dashboard: SOTA now 0.4416, gap 0.681 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extended eval-time n-gram backoff from order 9 to order 12, reduced chunk size from 1M to 256K tokens for faster cache refresh, and increased alpha_max from 0.60 to 0.70. Two-seed validation: 0.2835 (seed=1337), 0.2833 (seed=42). Improvement over PR openai#809 baseline: -0.0118 BPB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Idan3011 mentioned this pull request Mar 26, 2026

Record: Per-Order Adaptive Alpha + N-gram Backoff (val_bpb=0.2995, 3-seed) #810

Open

Robby955 mentioned this pull request Mar 26, 2026

Record: 0.2880 BPB — Complementary Training + Per-Order Multipliers + Distributed Prefill + 15-Gram + EBLS #796

Open

10 tasks

quietsmile mentioned this pull request Mar 26, 2026

Record: Order-12 N-gram Backoff + 256K Chunks — 0.2834 BPB #843

Open

5 tasks

himanshudongre mentioned this pull request Mar 26, 2026

Record: Two-Pass N-gram Rescoring (val_bpb 0.1434) #846

Open

4 tasks

callithyia mentioned this pull request Mar 26, 2026

Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT #850

Open

7 tasks

quietsmile mentioned this pull request Mar 26, 2026

Record: Two-Pass Order-12 N-gram Backoff + 256K Chunks — 0.1315 BPB #853

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)#809

Record: Chunk-Based N-gram Backoff + Score-First TTT (0.295 BPB)#809
AayushBaniya2006 wants to merge 2 commits intoopenai:mainfrom
AayushBaniya2006:submission/chunk-ngram-0.295

AayushBaniya2006 commented Mar 26, 2026

Uh oh!

newjordan commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AayushBaniya2006 commented Mar 26, 2026

Summary

Approach

3-Seed Results

Timing

Architecture

Compliance

Uh oh!

newjordan commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants