Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x by zeytx · Pull Request #805 · openai/parameter-golf

zeytx · 2026-03-26T03:40:02Z

Summary

11 transformer layers (vs 9 baseline) with 3x MLP expansion
LeakyReLU(0.5)^2 activation (~0.003 BPB improvement over ReLU^2)
Int6 per-row GPTQ-lite quantization with clip search + zstd-22 compression
Late QAT via STE (activates when LR scale < 0.15)
EMA weight averaging (decay 0.997)
GQA: 8 query heads, 4 KV heads

Results

val_bpb: 1.1807 (pre-quantization, 8xH100 SXM, 9881 steps in 10 min)
Compressed model size: ~3.9 MB (well under 16MB limit)
26.5M parameters

Run command

torchrun --standalone --nproc_per_node=8 train_gpt.py

Requires zstandard package (pip install zstandard).

11 layers, 3x MLP, LeakyReLU(0.5)^2, Int6 per-row GPTQ-lite + zstd-22, Late QAT via STE, EMA(0.997). Achieved val_bpb 1.1807 pre-quant on 8xH100.

Add Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x submission

6c26454

11 layers, 3x MLP, LeakyReLU(0.5)^2, Int6 per-row GPTQ-lite + zstd-22, Late QAT via STE, EMA(0.997). Achieved val_bpb 1.1807 pre-quant on 8xH100.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x#805

Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x#805
zeytx wants to merge 1 commit intoopenai:mainfrom
zeytx:submission/int6-leakyrelu2-ema-11l-mlp3x

zeytx commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zeytx commented Mar 26, 2026

Summary

Results

Run command

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant