Add Golden Gate Claude interactive tutorial and CLI demo by SolshineCode · Pull Request #7 · SolshineCode/nanochat-SAE

SolshineCode · 2026-03-31T04:25:44Z

Summary

This PR adds two comprehensive educational resources that teach users how to recreate Anthropic's famous "Golden Gate Claude" experiment using nanochat-SAE: an interactive Jupyter notebook for Google Colab and a standalone CLI walkthrough script.

Key Changes

golden_gate_walkthrough.ipynb - Interactive Jupyter notebook (202 cells) that guides users through:
- Building a miniature transformer model
- Collecting and visualizing internal activations
- Training a Sparse Autoencoder to decompose thoughts into interpretable features
- Identifying and steering with individual features
- Creating interactive visualizations and dashboards
- Includes setup instructions for Google Colab with automatic dependency installation
golden_gate_demo.py - Standalone Python script (535 lines) providing:
- Step-by-step terminal walkthrough with detailed educational explanations
- ASCII art visualizations (Golden Gate Bridge banner)
- Pretty-printed progress tracking and results
- Feature steering demonstrations at various amplification strengths
- Comparison of steered vs. unsteered model outputs
- Feature landscape visualization and analysis
- Negative steering (feature suppression) examples
- HTML dashboard generation for feature exploration
- Runs on CPU in ~2 minutes with no GPU required
README.md - Updated documentation:
- Added "Build Your Own Golden Gate Claude!" section with Colab badge
- Added comparison table explaining key SAE concepts
- Updated file structure to reference new tutorials
- Reorganized tutorials section to highlight new resources

Implementation Details

Both resources follow the same pedagogical structure:

Build a small GPT model (8 layers, 128-dim hidden state, ~1.3M parameters)
Collect 15,000 activation vectors from layer 4 (middle of the model)
Train a TopK-SAE with 1,024 features (8x expansion) and k=16 sparsity
Identify the most active feature as the "Golden Gate" feature
Demonstrate feature steering by amplifying the feature at strengths 1.0→50.0
Show how steering changes output probability distributions
Compare steering with different features and negative steering

The notebook includes matplotlib visualizations of training loss, feature statistics, and probability distributions. The CLI script uses formatted text output with progress bars and ASCII art for engagement.

Both implementations use the existing nanochat and SAE infrastructure (GPT, TopKSAE, ActivationCollector, SAETrainer, SAEEvaluator, FeatureVisualizer, InterpretableModel) without requiring modifications to core libraries.

https://claude.ai/code/session_01RuUhm1SYvQYE61feFRAc3e

Adds an interactive tutorial that teaches SAE feature steering by recreating Anthropic's famous Golden Gate Claude experiment at miniature scale using nanochat-SAE. New files: - golden_gate_demo.py: Standalone CLI walkthrough (~2min on CPU) - golden_gate_walkthrough.ipynb: Rich Colab notebook with visualizations (probability distributions, heatmaps, steering landscape plots) Both walk through the full pipeline: build model -> collect activations -> train SAE -> find features -> steer behavior, with detailed educational explanations at every step. README updated with new section and tutorial links. https://claude.ai/code/session_01RuUhm1SYvQYE61feFRAc3e

gemini-code-assist

Code Review

This pull request introduces a 'Golden Gate Claude' tutorial to the repository, featuring a new interactive Jupyter notebook (golden_gate_walkthrough.ipynb) and a CLI demo script (golden_gate_demo.py). These resources guide users through building a miniature transformer, training a Sparse Autoencoder (SAE), and performing feature steering to manipulate model behavior. Feedback on the demo script highlights a missing comparison in the feature landscape visualization step and suggests simplifying redundant tensor slicing for better readability.

gemini-code-assist · 2026-03-31T04:31:27Z

golden_gate_demo.py

+    # We need to manually steer and track features
+    steering_config = {hook_point: (golden_feature, 10.0)}
+    with interp_model.steering_enabled(steering_config):
+        interp_model.model(test_tokens)


The demo script for Step 6 claims to "compare" unsteered vs steered feature landscapes, but it currently only prints the unsteered landscape (lines 382-390). Line 401 runs the model with steering enabled, but the resulting feature activations are never retrieved or printed. To make this comparison meaningful for the user, you should enable interpretation alongside steering, capture the active features, and print them similarly to the unsteered section.

gemini-code-assist · 2026-03-31T04:31:27Z

golden_gate_demo.py

+
+    steered_probs = F.softmax(steered_logits[:, -1, :], dim=-1)
+    diff = (steered_logits - baseline_logits).abs()
+    top_tok = steered_probs[:, :].argmax(dim=-1).item()


The slicing [:, :] on steered_probs is redundant as the tensor is already 2D with shape (1, vocab_size). You can simplify this to steered_probs.argmax(dim=-1).item() or even steered_probs.argmax().item() for better readability, matching the style used later in the script (e.g., line 447).

Suggested change

top_tok = steered_probs[:, :].argmax(dim=-1).item()

top_tok = steered_probs.argmax().item()

Address code review feedback: - Step 6 now captures and prints the steered feature landscape alongside the unsteered one, making the comparison meaningful for readers - Simplify redundant `[:, :]` slice to `.argmax().item()` https://claude.ai/code/session_01RuUhm1SYvQYE61feFRAc3e

SolshineCode · 2026-04-01T03:37:18Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a "Golden Gate Claude" tutorial, featuring a new CLI demo and a Jupyter notebook. These resources demonstrate how to use Sparse Autoencoders (SAEs) to identify and steer internal model features, recreating Anthropic's famous interpretability experiment at a smaller scale. The review feedback suggests optimizing the activation collection process by terminating the loop once the maximum activation limit is reached and ensuring robust hook management through the use of try...finally blocks to prevent persistent hooks in the event of an error.

gemini-code-assist · 2026-04-01T03:39:23Z

golden_gate_demo.py

+with torch.no_grad(), collector:
+    for i in range(30):
+        # Feed random "text" through the model
+        tokens = torch.randint(0, config.vocab_size, (8, config.sequence_len))
+        model(tokens)


The activation collection loop continues for all 30 iterations even after the ActivationCollector has reached its max_activations limit (15,000). Since each batch collects 1,024 activations (8 samples * 128 tokens), the collector will be full after approximately 15 batches. Breaking the loop early will save unnecessary computation.

Suggested change

with torch.no_grad(), collector:

for i in range(30):

# Feed random "text" through the model

tokens = torch.randint(0, config.vocab_size, (8, config.sequence_len))

model(tokens)

with torch.no_grad(), collector:

for i in range(30):

if collector.counts[hook_point] >= collector.max_activations:

break

# Feed random "text" through the model

tokens = torch.randint(0, config.vocab_size, (8, config.sequence_len))

model(tokens)

gemini-code-assist · 2026-04-01T03:39:23Z

golden_gate_demo.py

+        handle = module.register_forward_hook(capture_steered_features)
+        interp_model.model(test_tokens)
+        handle.remove()


The manual hook registration and removal should be wrapped in a try...finally block. This ensures that the hook is properly removed even if an exception occurs during the model forward pass, preventing the hook from persisting and affecting subsequent model executions.

Suggested change

handle = module.register_forward_hook(capture_steered_features)

interp_model.model(test_tokens)

handle.remove()

handle = module.register_forward_hook(capture_steered_features)

try:

interp_model.model(test_tokens)

finally:

handle.remove()

gemini-code-assist bot reviewed Mar 31, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Golden Gate Claude interactive tutorial and CLI demo#7

Add Golden Gate Claude interactive tutorial and CLI demo#7
SolshineCode wants to merge 2 commits intoclaude/nanochat-sae-interpretability-011CUT2TocZpFerXthoW9LMffrom
claude/nanochat-educational-walkthrough-Le6mP

SolshineCode commented Mar 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

gemini-code-assist bot Mar 31, 2026

Uh oh!

SolshineCode commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	top_tok = steered_probs[:, :].argmax(dim=-1).item()
	top_tok = steered_probs.argmax().item()

Conversation

SolshineCode commented Mar 31, 2026

Summary

Key Changes

Implementation Details

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

SolshineCode commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants