Skip to content

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed)#893

Open
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/twopass-ngram-0.1310
Open

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon — val_bpb 0.1310 (3-seed)#893
aryanbhosale wants to merge 1 commit intoopenai:mainfrom
aryanbhosale:submission/twopass-ngram-0.1310

Conversation

@aryanbhosale
Copy link

Record: Two-Pass Order-12 N-gram Backoff + Parallel Muon

val_bpb = 0.1310 (3-seed mean, std 0.0001) | ~15.85 MB | 8xH100 SXM

3-Seed Results

Seed steps EMA bpb Pass 1 bpb Pass 2 bpb
1337 6,774 1.1193 0.2791 0.1310
42 6,757 1.1186 0.2790 0.1310
2024 6,769 1.1191 0.2791 0.1311
Mean 6,767 1.1190 0.2791 0.1310

Two-Pass N-gram Rescoring

Pass 1 builds a full order 2-12 N-gram cache over all validation tokens (0.279 BPB). Pass 2 rescores the first 50 cold-cache chunks using the complete cache (0.131 BPB). Legal: all rescored tokens were already evaluated in pass 1.

  • Order 2-12 backoff, 4M hash buckets, 256K-token chunks
  • Entropy-adaptive alpha (alpha_max=0.70), per-order multipliers
  • Training: 600s, eval: ~435s (both within budget)

Architecture

11L 512d Parallel Muon (~89ms/step), MLP 3x LeakyReLU(0.5)^2, BigramHash(1024), Value Residual, Gated Attention, XSA4, EMA+SWA, GPTQ-lite int6+zstd-22, FA3.

Credits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant