Skip to content

Record: Order-12 N-gram Backoff + 256K Chunks — 0.2834 BPB#843

Open
quietsmile wants to merge 2 commits intoopenai:mainfrom
quietsmile:submission/order12-chunk256k-alpha070
Open

Record: Order-12 N-gram Backoff + 256K Chunks — 0.2834 BPB#843
quietsmile wants to merge 2 commits intoopenai:mainfrom
quietsmile:submission/order12-chunk256k-alpha070

Conversation

@quietsmile
Copy link

Summary

  • Extended eval-time n-gram backoff from order 9 to order 12 with 6 additional hash primes
  • Reduced chunk size from 1M to 256K tokens for 4x faster cache refresh during eval
  • Increased alpha_max from 0.60 to 0.70 for stronger n-gram mixing at high entropy

val_bpb: 0.2834 (2-seed mean, std 0.0001) | ~13.4 MB artifact | 525s training + 431s eval

Seed Pre-Quant BPB N-gram BPB
1337 1.1454 0.2835
42 1.1454 0.2833

Improvement over PR #809 (0.2952 BPB): -0.0118 BPB

All changes are eval-time only. No training modifications. Score-first compliance maintained.

Test plan

  • 2-seed validation on 8xL20Z (H100 equivalent)
  • Artifact size under 16MB (13.4MB)
  • Training under 600s (525s)
  • Eval under 600s (437s total)
  • Score-first compliance verified

🤖 Generated with Claude Code

quietsmile and others added 2 commits March 26, 2026 11:02
Key innovation: reduce NGRAM_EVAL_CHUNK_TOKENS from 1M to 65K.
The N-gram cache updates after each chunk, so smaller chunks mean
more frequent cache refreshes and richer n-gram statistics.

Results (3-seed mean): 0.2873 BPB (std 0.0001)
Fully legal: no pre-eval TTT, score-first N-gram only.
11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)²,
BigramHash(4096), GPTQ int5, LZMA. 600s train + 405s eval.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extended eval-time n-gram backoff from order 9 to order 12, reduced chunk size
from 1M to 256K tokens for faster cache refresh, and increased alpha_max from
0.60 to 0.70. Two-seed validation: 0.2835 (seed=1337), 0.2833 (seed=42).
Improvement over PR openai#809 baseline: -0.0118 BPB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant