root@ly2026050700229-8474747479-rv525:~/TIDE# python examples/quickstart.py --model /root/Qwen3-8B
[TIDE INFO] CUDA kernels loaded via torch.ops.load_library
Loading /root/Qwen3-8B...
torch_dtype is deprecated! Use dtype instead!
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.97s/it]
Calibrating routers (200 samples)...
[TIDE INFO] === TIDE Calibration ===
[TIDE INFO] Config: interval=4, threshold=0.98
[TIDE INFO] Step 1/3: Collecting hidden states...
[TIDE INFO] No registered adapter for 'Qwen3ForCausalLM', trying UniversalAdapter
[TIDE INFO] UniversalAdapter probed Qwen3ForCausalLM: 36 layers, hidden_dim=4096
README.md: 10.5kB [00:00, 3.40MB/s]
[TIDE INFO] Collected 200 calibration texts
[TIDE INFO] Collected hidden states at 9 checkpoints, 33671 total tokens
[TIDE INFO] Step 2/3: Computing convergence labels...
[TIDE INFO] Layer 3: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 7: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 11: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 15: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 19: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 23: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 27: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 31: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 35: 100.0% tokens converged (cosine > 0.98)
[TIDE INFO] Step 3/3: Training routers...
[TIDE INFO] Layer 3 epoch 25: loss=0.0001 acc=1.000
[TIDE INFO] Layer 3 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 3 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 3 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 3 final loss: 0.0000
[TIDE INFO] Layer 7 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 final loss: 0.0000
[TIDE INFO] Layer 11 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 final loss: 0.0000
[TIDE INFO] Layer 15 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 final loss: 0.0000
[TIDE INFO] Layer 19 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 final loss: 0.0000
[TIDE INFO] Layer 23 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 final loss: 0.0000
[TIDE INFO] Layer 27 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 final loss: 0.0000
[TIDE INFO] Layer 31 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 final loss: 0.0000
[TIDE INFO] Layer 35 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 final loss: 0.0000
[TIDE INFO] Saved router checkpoint to router.pt
Saved to router.pt
[TIDE INFO] No registered adapter for 'Qwen3ForCausalLM', trying UniversalAdapter
[TIDE INFO] UniversalAdapter probed Qwen3ForCausalLM: 36 layers, hidden_dim=4096
[TIDE INFO] TIDERuntime initialized: 36 layers, 9 routers, CUDA=on
[TIDE INFO] Generation: 128 tokens, 128 exits (100.0%), estimated 1.00x equivalent speedup
============================================================
Explain how transformers work in simple terms: the basic concept of the inductance process in the primary and the secondary coiled wire parts the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the
Total tokens: 128, Exited: 128 (100.0%)
Layer 35: 128 exits (100.0%)
Ran all layers: 0
root@ly2026050700229-8474747479-rv525:~/TIDE# python examples/quickstart.py --model /root/Qwen3-8B
[TIDE INFO] CUDA kernels loaded via torch.ops.load_library
Loading /root/Qwen3-8B...
torch_dtypeis deprecated! Usedtypeinstead!Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.97s/it]
Calibrating routers (200 samples)...
[TIDE INFO] === TIDE Calibration ===
[TIDE INFO] Config: interval=4, threshold=0.98
[TIDE INFO] Step 1/3: Collecting hidden states...
[TIDE INFO] No registered adapter for 'Qwen3ForCausalLM', trying UniversalAdapter
[TIDE INFO] UniversalAdapter probed Qwen3ForCausalLM: 36 layers, hidden_dim=4096
README.md: 10.5kB [00:00, 3.40MB/s]
[TIDE INFO] Collected 200 calibration texts
[TIDE INFO] Collected hidden states at 9 checkpoints, 33671 total tokens
[TIDE INFO] Step 2/3: Computing convergence labels...
[TIDE INFO] Layer 3: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 7: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 11: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 15: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 19: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 23: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 27: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 31: 0.0% tokens converged (cosine > 0.98)
[TIDE INFO] Layer 35: 100.0% tokens converged (cosine > 0.98)
[TIDE INFO] Step 3/3: Training routers...
[TIDE INFO] Layer 3 epoch 25: loss=0.0001 acc=1.000
[TIDE INFO] Layer 3 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 3 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 3 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 3 final loss: 0.0000
[TIDE INFO] Layer 7 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 7 final loss: 0.0000
[TIDE INFO] Layer 11 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 11 final loss: 0.0000
[TIDE INFO] Layer 15 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 15 final loss: 0.0000
[TIDE INFO] Layer 19 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 19 final loss: 0.0000
[TIDE INFO] Layer 23 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 23 final loss: 0.0000
[TIDE INFO] Layer 27 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 27 final loss: 0.0000
[TIDE INFO] Layer 31 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 31 final loss: 0.0000
[TIDE INFO] Layer 35 epoch 25: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 epoch 50: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 epoch 75: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 epoch 100: loss=0.0000 acc=1.000
[TIDE INFO] Layer 35 final loss: 0.0000
[TIDE INFO] Saved router checkpoint to router.pt
Saved to router.pt
[TIDE INFO] No registered adapter for 'Qwen3ForCausalLM', trying UniversalAdapter
[TIDE INFO] UniversalAdapter probed Qwen3ForCausalLM: 36 layers, hidden_dim=4096
[TIDE INFO] TIDERuntime initialized: 36 layers, 9 routers, CUDA=on
[TIDE INFO] Generation: 128 tokens, 128 exits (100.0%), estimated 1.00x equivalent speedup
============================================================
Explain how transformers work in simple terms: the basic concept of the inductance process in the primary and the secondary coiled wire parts the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the process the
Total tokens: 128, Exited: 128 (100.0%)
Layer 35: 128 exits (100.0%)
Ran all layers: 0