Skip to content

Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer#830

Open
zlxi02 wants to merge 1 commit intoopenai:mainfrom
zlxi02:leaky-mixer-submission
Open

Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer#830
zlxi02 wants to merge 1 commit intoopenai:mainfrom
zlxi02:leaky-mixer-submission

Conversation

@zlxi02
Copy link

@zlxi02 zlxi02 commented Mar 26, 2026

  • swapped relu^2 for leaky_relu(0.5)^2, bumped to 11 layers, and added a warmdown of 3500 steps. also wrote a backoff n-gram mixer in C that runs at eval time; builds a token cache as it scores the val set and mixes with the neural logits using entropy-adaptive alpha.
  • ran on 4xA100 SXM (was low on credits) so only got 849 steps. val_bpb came out to 1.4096 neural-only. on 8xH100 this should get ~1700 steps and around 1.20 BPB, with the n-gram mixer dropping it further.
  • artifact size: 13.49 MB

@zlxi02 zlxi02 changed the title LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer Non-record: LeakyMixer: 11L leaky_relu(0.5)^2 + backoff n-gram mixer Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant