Dev/molt feature dashboard by Ky-Ng · Pull Request #23 · Goreg12345/crosslayer-transcoder

Ky-Ng · 2026-05-06T13:39:11Z

No description provided.

Adds the Molt model class, MoltModule Lightning wrapper, and the necessary plumbing to support a single-layer MoLT alongside the existing CrossLayerTranscoder. Includes: - crosslayer_transcoder/model/molt.py: new Molt nn.Module - model/__init__.py: export Molt - model/jumprelu.py: allow n_layers=1 to produce a 2-D theta parameter - model/clt_lightning.py: import Molt, widen model type to Union[CrossLayerTranscoder, Molt], wrap the encoder/decoder assertions and last_active buffer in an isinstance check, and append the MoltModule subclass with its own training_step - data/datamodule.py: guard self.data_loader teardown with is not None Known limitations (follow-ups): - MoltModule.training_step is hardcoded to layer 8 - compute_dead_features config flags are inert for MoLT - Molt does not yet inherit from SerializableModule / save_pretrained Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Eight YAML configs for the Lightning CLI: - config/molt.yaml, config/molt-long.yaml: baseline - config/molt-5090.yaml: tuned for a 5090 with 31 GB /dev/shm - config/molt-5090_20M_tokens_*.yaml: sparsity sweep at 20M tokens - config/molt-5090_50M_tokens_0_00015.yaml: 50M-token run class_path entries point at the master package (crosslayer_transcoder.*). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tests/test_molt_smoke.py covers three cases: - test_molt_cpu_forward: builds a tiny Molt, runs one forward pass, checks shapes and finiteness — no GPU, no Lightning, no dataset - test_molt_gpu_fp32_train_step: forward + backward + Adam step on synthetic activations on cuda; asserts loss and params remain finite - test_molt_gpu_amp_train_step: same, inside torch.amp.autocast(float16) with a GradScaler — mirrors Lightning's precision="16-mixed" Both GPU tests are guarded with skipif(not cuda.is_available()), so they silently skip on CPU CI runners while still exercising mixed-precision locally. Verified locally on RTX 5090: 3/3 tests pass; full suite is 210 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KyleNg2868 and others added 4 commits April 29, 2026 07:05

support visualization of high activating transform contexts

e738d71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev/molt feature dashboard#23

Dev/molt feature dashboard#23
Ky-Ng wants to merge 4 commits into
masterfrom
dev/MOLT-feature-dashboard

Ky-Ng commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ky-Ng commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants