Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT by callithyia · Pull Request #850 · openai/parameter-golf

callithyia · 2026-03-26T13:34:01Z

Summary

val_bpb: 0.3212 (3-seed mean, std 0.0003)
Complementary training (alpha=0.50) + order-9 n-gram eval cache with 65K-token chunks (15x cache refresh)
Full Hessian GPTQ int5 + LZMA compression (~14.9 MB artifact)
LoRA TTT (rank 8, Polyak averaging, score-first backward-looking)
LeakyReLU(0.9)² + XSA-4 + VRL + Gated Attention + Parallel Muon

Results (8xH100 SXM)

Seed	Steps	ms/step	val_bpb	Post-quant BPB	Artifact
1337	5,457	101	0.3211	1.1817	14,965,401 bytes
42	5,437	101	0.3210	1.1794	14,926,117 bytes
2024	5,498	101	0.3216	1.1831	14,874,853 bytes
Mean	5,464	101	0.3212	1.1814	14,922,124 bytes

Key Techniques

Complementary training: Downweights bigram-predictable tokens, making the model deliberately weaker where n-grams are strong
65K-token chunks: Cache updates 15x more frequently than 1M chunks, reducing cold-cache penalty
Per-order entropy centers + multipliers: Orders 5-9 boosted 2x, orders 2-3 suppressed 0.3x
Full Hessian GPTQ: Activation-order column permutation + Cholesky error compensation (not naive quantization)
LoRA TTT: Rank 8, Q+V on blocks 9-10, Polyak averaging decay=0.998

Compliance

3 seeds on 8xH100 SXM (1337, 42, 2024)
All seeds train ≤600s, eval ≤600s (~570s)
Artifact ≤16,000,000 bytes (~14.9MB)
No validation data during training
TTT backward-looking (score-first per chunk)
No multi-pass rescoring
Reproducible single script

Credits

Built on: PR #809 (n-gram cache), PR #803 (complementary training), PR #798 (entropy centers, Polyak TTT), PR #840 (65K chunks), PR #779 (integrated eval), PR #414 (GPTQ baseline).

3-seed mean 0.3212 (std 0.0003). Complementary training + order-9 n-gram eval cache with 65K-token chunks + Full Hessian GPTQ int5 + LoRA TTT with Polyak averaging.

Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT

0444ef1

3-seed mean 0.3212 (std 0.0003). Complementary training + order-9 n-gram eval cache with 65K-token chunks + Full Hessian GPTQ int5 + LoRA TTT with Polyak averaging.

notapplica mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT#850

Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT#850
callithyia wants to merge 1 commit intoopenai:mainfrom
callithyia:record/complementary-ngram-65k-0.3212

callithyia commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

callithyia commented Mar 26, 2026

Summary

Results (8xH100 SXM)

Key Techniques

Compliance

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant