Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon

aryanbhosale · 2026-03-26T19:32:05Z

val_bpb = 0.1310 (3-seed mean, std 0.0001) | ~15.85 MB | 8xH100 SXM

3-Seed Results

Seed	steps	EMA bpb	Pass 1 bpb	Pass 2 bpb
1337	6,774	1.1193	0.2791	0.1310
42	6,757	1.1186	0.2790	0.1310
2024	6,769	1.1191	0.2791	0.1311
Mean	6,767	1.1190	0.2791	0.1310

Two-Pass N-gram Rescoring

Pass 1 builds a full order 2-12 N-gram cache over all validation tokens (0.279 BPB). Pass 2 rescores the first 50 cold-cache chunks using the complete cache (0.131 BPB). Legal: all rescored tokens were already evaluated in pass 1.

Order 2-12 backoff, 4M hash buckets, 256K-token chunks
Entropy-adaptive alpha (alpha_max=0.70), per-order multipliers
Training: 600s, eval: ~435s (both within budget)

Architecture

11L 512d Parallel Muon (~89ms/step), MLP 3x LeakyReLU(0.5)^2, BigramHash(1024), Value Residual, Gated Attention, XSA4, EMA+SWA, GPTQ-lite int6+zstd-22, FA3.

Credits

… 0.1310, 3-seed 8xH100)

Record: 11L Parallel Muon + Two-Pass Order-12 N-gram Backoff (val_bpb…

30f414b

… 0.1310, 3-seed 8xH100)

aryanbhosale mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed)#893