Skip to content

Dev/molt feature dashboard#23

Open
Ky-Ng wants to merge 4 commits into
masterfrom
dev/MOLT-feature-dashboard
Open

Dev/molt feature dashboard#23
Ky-Ng wants to merge 4 commits into
masterfrom
dev/MOLT-feature-dashboard

Conversation

@Ky-Ng
Copy link
Copy Markdown
Collaborator

@Ky-Ng Ky-Ng commented May 6, 2026

No description provided.

KyleNg2868 and others added 4 commits April 29, 2026 07:05
Adds the Molt model class, MoltModule Lightning wrapper, and the
necessary plumbing to support a single-layer MoLT alongside the existing
CrossLayerTranscoder. Includes:

- crosslayer_transcoder/model/molt.py: new Molt nn.Module
- model/__init__.py: export Molt
- model/jumprelu.py: allow n_layers=1 to produce a 2-D theta parameter
- model/clt_lightning.py: import Molt, widen model type to
  Union[CrossLayerTranscoder, Molt], wrap the encoder/decoder
  assertions and last_active buffer in an isinstance check, and append
  the MoltModule subclass with its own training_step
- data/datamodule.py: guard self.data_loader teardown with is not None

Known limitations (follow-ups):
- MoltModule.training_step is hardcoded to layer 8
- compute_dead_features config flags are inert for MoLT
- Molt does not yet inherit from SerializableModule / save_pretrained

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eight YAML configs for the Lightning CLI:
- config/molt.yaml, config/molt-long.yaml: baseline
- config/molt-5090.yaml: tuned for a 5090 with 31 GB /dev/shm
- config/molt-5090_20M_tokens_*.yaml: sparsity sweep at 20M tokens
- config/molt-5090_50M_tokens_0_00015.yaml: 50M-token run

class_path entries point at the master package (crosslayer_transcoder.*).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tests/test_molt_smoke.py covers three cases:
- test_molt_cpu_forward: builds a tiny Molt, runs one forward pass,
  checks shapes and finiteness — no GPU, no Lightning, no dataset
- test_molt_gpu_fp32_train_step: forward + backward + Adam step on
  synthetic activations on cuda; asserts loss and params remain finite
- test_molt_gpu_amp_train_step: same, inside torch.amp.autocast(float16)
  with a GradScaler — mirrors Lightning's precision="16-mixed"

Both GPU tests are guarded with skipif(not cuda.is_available()), so they
silently skip on CPU CI runners while still exercising mixed-precision
locally.

Verified locally on RTX 5090: 3/3 tests pass; full suite is 210 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants