Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer by zlxi02 · Pull Request #830 · openai/parameter-golf

zlxi02 · 2026-03-26T07:55:54Z

swapped relu^2 for leaky_relu(0.5)^2, bumped to 11 layers, and added a warmdown of 3500 steps. also wrote a backoff n-gram mixer in C that runs at eval time; builds a token cache as it scores the val set and mixes with the neural logits using entropy-adaptive alpha.
ran on 4xA100 SXM (was low on credits) so only got 849 steps. val_bpb came out to 1.4096 neural-only. on 8xH100 this should get ~1700 steps and around 1.20 BPB, with the n-gram mixer dropping it further.
artifact size: 13.49 MB

add LeakyMixer submission — 11L leaky_relu(0.5)^2 + n-gram mixer

64da311

zlxi02 changed the title ~~LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer~~ Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer Mar 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer#830

Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer#830
zlxi02 wants to merge 1 commit intoopenai:mainfrom
zlxi02:leaky-mixer-submission

zlxi02 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zlxi02 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant