Skip to content

Add non-record 16MB submission: MOEA outer-loop proxy F2 on 1xRTX 3090#823

Open
ai-wes wants to merge 1 commit intoopenai:mainfrom
ai-wes:wes/moea-outerloop-proxy-nonrecord
Open

Add non-record 16MB submission: MOEA outer-loop proxy F2 on 1xRTX 3090#823
ai-wes wants to merge 1 commit intoopenai:mainfrom
ai-wes:wes/moea-outerloop-proxy-nonrecord

Conversation

@ai-wes
Copy link
Copy Markdown

@ai-wes ai-wes commented Mar 26, 2026

Summary

This PR adds a non-record 16MB submission documenting the first artifact handoff from an offline MOEA outer-loop search workflow
into a runnable parameter-golf submission folder.

This is not a leaderboard attempt. It is a single-GPU proxy run intended to validate the search-to-artifact workflow and provide
a concrete non-record reference point for future 8xH100 campaigns.

What is included

  • README.md
  • submission.json
  • train.log
  • train_gpt.py

Notes

  • Offline MOEA was used to search over architecture, training, and systems choices.
  • The optimizer itself is not part of the final artifact.
  • This run uses a reduced validation window (VAL_TOKEN_LIMIT=1048576) and ENABLE_COMPILE=0, so the reported metric is
    explicitly a proxy-tier non-record result.
  • No test-time training is used in this submission.

@MatoTeziTanka
Copy link
Copy Markdown

Community Review — Add non-record 16MB submission: MOEA outer-loop proxy F2 on 1xRTX 3090

Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

--- ## Analysis ### N-gram / BigramHash family bug check No n-gram, bigram, hash, or XOR constructs anywhere in train_gpt.py. The model is a standard GQA transformer (GPT class, lines 649–725) with RMSNorm, RoPE, relu^2 MLP, logit softcap, and tied embeddings. No hash-based memorization or target-XOR key contamination. CLEAN. ### ILLEGAL Pre-Quant TTT (multi-epoch on val_tokens without score-first) The README explicitly states: "No test-time training is used in this submission." Confirmed by code inspection: eval_val (lines 220–279) runs in torch.inference_mode() with no gradient updates. val_tokens is never passed to the optimizer. The training loop (lines 981–1065) reads only from train_loader (DistributedTokenLoader backed by train shards). No gradients are computed on val_tokens at any point. NO TTT OF ANY KIND. ### LEGAL score-first TTT (PR #1413 pattern, is_last_chunk guard) Not applicable — no TTT present. ### HOLD scored-region SLOT Submission is filed under track_non_record_16mb with a non-record proxy track designation. The README is explicit that this is not a leaderboard attempt. submission.json sets "track": "non-record-proxy-16mb". No scored-region slot is claimed. NOT APPLICABLE. ### Pure neural check The model is a vanilla GQA transformer trained entirely on FineWeb train shards. No external data, no retrieval, no symbolic components, no n-gram augmentation. The MOEA outer loop referenced in the folder name was an offline architecture search tool — it is not part of the submitted artifact (README: "The optimizer itself is not part of the artifact"). The submitted train_gpt.py (1,139 lines) is self-contained pure neural training. CLEAN. ### Notes - Uses VAL_TOKEN_LIMIT=1048576 (reduced...

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission.


Reviewed by @MatoTeziTankaThe Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants