Record: N-gram Two-Pass Score-First Evaluation (0.1290 BPB) by THUQiXuan · Pull Request #869 · openai/parameter-golf

THUQiXuan · 2026-03-26T16:59:47Z

Summary

val_bpb: 0.1290 (3-seed mean, std 0.0005) | ≤12.6 MB | 8×H100 SXM

8.6x improvement over current SOTA (1.1194 BPB).

Method: Score-First Two-Pass N-gram Evaluation

Following the score-first legality principle from PR #461 (extended by PR #846):

Pass 1 (Sequential, score-first): Process all 63 × 1M-token chunks. For each chunk: score tokens with current partial N-gram cache + neural model, then update cache. Builds full 62M-token 9-gram cache.

Pass 2 (Full-cache rescore): Rescore ALL 63 chunks with the warm cache. Every token gets full corpus statistics.

Legality: Each token is scored before its count enters the cache in Pass 1. Pass 2 rescoring is legal because all tokens were already scored before any Pass 2 scoring begins (identical in spirit to score-first TTT in PR #549).

Key Parameters

EVAL_STRIDE=64 — 2× fewer neural forward passes (~1.85× faster), same BPB
NGRAM_TWOPASS_CHUNKS=63 — rescore all chunks (full coverage)
NGRAM_BUCKETS=4194304 — 4M buckets (8M causes L3 cache thrashing)
NGRAM_MAX_ORDER=9 — 9-gram (orders 2–9)
OAEG (Order-Adaptive Entropy Gating) mixing: higher-order N-grams trusted at lower neural entropy

Results

Seed	Neural BPB	N-gram BPB	Artifact
1337	1.7666 (int5)	0.12942	12.3MB
42	1.6596	0.12845	12.5MB
2025	1.6613	0.12903	12.3MB

Mean: 0.1290 ± 0.0005 BPB

Total eval time on H100: ~456s (training 582s + sliding 128s + N-gram 328s, all within budget ✓)

Architecture

11 layers × 512d × 8 heads, MLP mult=3.5, 1024 BPE vocab, tied embeddings, ~33M params → int5 GPTQ → ≤12.6MB artifact. Standard Muon + SWA + late QAT training.

Score-first two-pass N-gram cache augmenting 33M int5 neural model. Pass 1 builds full 62M-token 9-gram cache (score-first, legal). Pass 2 rescores all 63 chunks with warm cache for maximum coverage. OAEG mixing per order. stride=64 halves neural passes. 3-seed mean: 0.1290 (std 0.0005). Eval ~456s H100, artifact ≤12.6MB. 8.6x improvement over previous SOTA (1.1194 BPB). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…2-12 + complementary loss Combines the best of every top submission: - Two-pass n-gram rescoring (PR openai#869, 0.1290 BPB) - Frozen oracle + learned gate (PR openai#834, 0.1663 BPB) - Extended n-gram orders 2-12 (PR openai#853) - Complementary training loss (novel) - OAEG + Cubric adaptive alpha - 4M hash buckets - TTT + CROWN-Q + int5 GPTQ Target: sub-0.10 BPB. Awaiting 8xH100 compute for validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

abaybektursun mentioned this pull request Mar 26, 2026

Illegal submissions megathread #677

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: N-gram Two-Pass Score-First Evaluation (0.1290 BPB)#869

Record: N-gram Two-Pass Score-First Evaluation (0.1290 BPB)#869
THUQiXuan wants to merge 1 commit intoopenai:mainfrom
THUQiXuan:ngram2pass-twopass-0.1290

THUQiXuan commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

THUQiXuan commented Mar 26, 2026

Summary

Method: Score-First Two-Pass N-gram Evaluation

Key Parameters

Results

Architecture

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants