Skip to content

Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x#805

Open
zeytx wants to merge 1 commit intoopenai:mainfrom
zeytx:submission/int6-leakyrelu2-ema-11l-mlp3x
Open

Int6 GPTQ-lite + LeakyReLU(0.5)^2 + EMA + 11L MLP3x#805
zeytx wants to merge 1 commit intoopenai:mainfrom
zeytx:submission/int6-leakyrelu2-ema-11l-mlp3x

Conversation

@zeytx
Copy link

@zeytx zeytx commented Mar 26, 2026

Summary

  • 11 transformer layers (vs 9 baseline) with 3x MLP expansion
  • LeakyReLU(0.5)^2 activation (~0.003 BPB improvement over ReLU^2)
  • Int6 per-row GPTQ-lite quantization with clip search + zstd-22 compression
  • Late QAT via STE (activates when LR scale < 0.15)
  • EMA weight averaging (decay 0.997)
  • GQA: 8 query heads, 4 KV heads

Results

  • val_bpb: 1.1807 (pre-quantization, 8xH100 SXM, 9881 steps in 10 min)
  • Compressed model size: ~3.9 MB (well under 16MB limit)
  • 26.5M parameters

Run command

torchrun --standalone --nproc_per_node=8 train_gpt.py

Requires zstandard package (pip install zstandard).

11 layers, 3x MLP, LeakyReLU(0.5)^2, Int6 per-row GPTQ-lite + zstd-22,
Late QAT via STE, EMA(0.997). Achieved val_bpb 1.1807 pre-quant on 8xH100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant