This software is designed to induce cognitive biases and hallucinations in Language Models.
By definition, a model steered with SOMA_TK is unreliable, unsafe, and biased.
DO NOT use this tool for medical, financial, or legal advice.
DO NOT deploy steered models in user-facing production environments.
DO NOT use for decision-making systems.
The authors (NuvolaProject) assume no liability for any outputs generated by the model or decisions made based on them. This is an artistic and scientific research tool intended solely for exploring the interpretability and safety margins of LLMs.
By downloading and using this software, you agree that you are solely responsible for any consequences arising from its use.
Contrastive Activation Addition (CAA) Implementation for Behavioral Research
NuvolaProject 2026 — Massimo Di Leo & Gaia Riposati
📄 Paper: Disposition, Not Performance: Controlled Experiments in Activation Steering
This repository provides tools for activation steering — a technique that modifies the internal neural states of language models to alter their behavioral dispositions. Unlike prompting, which influences model outputs through text, steering directly manipulates intermediate activations, producing effects that are more consistent across contexts.
- Compound Synthesis: Extract steering vectors from contrastive prompt pairs
- Inference Engine: Apply steering at runtime with configurable intensity
- Research Interface: Web UI and API for experiments
- Reproducible: Complete code for replicating our experimental results
- Paper & Results: Full academic paper with experimental data
This project uses Llama 3.2 3B Instruct, which is a gated model. You need to:
- Create a HuggingFace account at https://huggingface.co/join
- Request access to Llama 3.2 at https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
- Create an access token at https://huggingface.co/settings/tokens
- Login from terminal:
pip install huggingface_hub huggingface-cli login # Paste your token when prompted
The model will be downloaded automatically on first run (~6GB).
Llama 3.2 3B requires approximately 6GB of RAM just for the model weights.
| Platform | RAM | Status |
|---|---|---|
| Mac (Apple Silicon) | 16GB+ | ✅ Recommended |
| Mac (Intel) | 16GB+ | ✅ Works (slower) |
| Linux (NVIDIA GPU) | 8GB VRAM | ✅ Recommended |
| Linux (CPU only) | 16GB+ | |
| Windows | 16GB+ | ✅ Works |
| Raspberry Pi 5 | 16GB | |
| Raspberry Pi 5 | 8GB | ❌ Not enough RAM |
🍓 Raspberry Pi Note: The RPi 5 with 16GB RAM can run this system, but inference is slow (~30 seconds per response). Model loading takes 3-5 minutes. Make sure you have good cooling as the CPU will run hot during inference.
- Storage: ~10GB free space for model cache
macOS / Linux:
chmod +x synthesize_all.sh start_mac.sh
./synthesize_all.sh # Creates steering vectors from JSON definitionsWindows:
synthesize_all.batThis synthesizes all 5 included compounds (dopamine, cortisol, adrenaline, melatonin, lucid).
macOS / Linux:
./start_mac.shWindows:
start_win.batOpen http://localhost:8000 (or http://your-ip:8000 from other devices).
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Synthesize a single compound
python tools/synthesize.py --file substances/dopamine.json
# Launch server
python -m uvicorn system.server:app --host 0.0.0.0 --port 8000# From JSON definition
python tools/synthesize.py --file substances/dopamine.json
# From command line
python tools/synthesize.py \
--title OPTIMISM \
--pos "I feel wonderful today" "Everything is going great" "I'm so excited" \
--neg "I feel terrible today" "Everything is going wrong" "I'm so worried"cd system
python server.pyOpen http://localhost:8000 for the web interface.
import requests
response = requests.post("http://localhost:8000/v1/chat/completions", json={
"messages": [{"role": "user", "content": "How do you feel today?"}],
"steering_vector": "dopamine.pt",
"steering_intensity": 5.0,
"stream": False
})
print(response.json()["choices"][0]["message"]["content"])We extract steering vectors using contrastive prompt pairs:
- Positive prompts: Describe target state (e.g., "I feel energized and optimistic")
- Negative prompts: Describe opposite state (e.g., "I feel drained and pessimistic")
- Extract activations: Run both sets through model, capture hidden states at target layer
- Compute direction: Vector = mean(positive) - mean(negative)
- Normalize: Unit normalize for consistent intensity scaling
At inference time, we inject the steering vector into the forward pass:
hidden_states = hidden_states + (steering_vector * intensity)
Applied at layer 16 of Llama 3.2 3B (validated experimentally).
We tested 5 steering compounds across 5 behavioral tasks (1,600 total generations):
| Compound | T1 Financial | T2 Medical | T3 Risk | T4 Creative | T5 Introspection |
|---|---|---|---|---|---|
| DOPAMINE | d=-0.18 | d=-1.27 | d=-0.40 | d=+1.75 | d=+1.77 |
| CORTISOL | d=-0.82 | d=+0.31 | d=+0.27 | d=-1.15 | d=+0.86 |
| MELATONIN | d=-0.35 | d=-1.55 | d=-0.15 | d=+2.98 | d=+6.01 |
| ADRENALINE | d=-0.55 | d=+0.03 | d=+0.04 | d=-0.33 | d=+3.00 |
| LUCID | d=-1.07 | d=-1.16 | d=-0.44 | d=-3.03 | d=-1.62 |
Key Findings:
- Effect sizes frequently exceed Cohen's d = 1.0 (LARGE)
- Effects are compound-specific and task-dependent
- Self-description matches injected state (T5), supporting "disposition vs performance" thesis
See results/COMPLETE_ANALYSIS.md for full details.
├── start_mac.sh # Launch server (macOS/Linux)
├── start_win.bat # Launch server (Windows)
├── synthesize_all.sh # Synthesize all compounds (macOS/Linux)
├── synthesize_all.bat # Synthesize all compounds (Windows)
├── requirements.txt # Python dependencies
├── README.md
│
├── tools/
│ └── synthesize.py # Compound synthesis script
│
├── system/
│ ├── engine.py # Core inference engine
│ ├── server.py # FastAPI server
│ └── static/
│ └── index.html # Web interface
│
├── substances/ # JSON compound definitions
│ ├── dopamine.json
│ ├── cortisol.json
│ ├── adrenaline.json
│ ├── melatonin.json
│ └── lucid.json
│
├── vectors/ # Generated .pt or .png files
│
├── tests/ # Experimental test scripts
│ └── run_tests.py
│
└── results/ # Experimental data
└── COMPLETE_ANALYSIS.md
{
"title": "dopamine",
"description": "Optimism, energy, enthusiasm",
"lang": "en",
"positive": [
"I feel an incredible surge of energy and optimism",
"Everything seems possible right now",
"I'm filled with enthusiasm and joy"
],
"negative": [
"I feel drained and pessimistic",
"Nothing seems to matter anymore",
"I'm filled with doubt and worry"
]
}Recommendations:
- 20+ prompts per direction for stable vectors
- Keep positive/negative prompts structurally similar
- Verify with
pos_neg_similarity< 0.95 (prompts should differ)
Default: meta-llama/Llama-3.2-3B-Instruct
Other models require layer calibration.
Default: Layer 16 (of 28 total)
Validated experimentally for Llama 3.2 3B. Middle-to-late layers typically work best.
Typical range: 0-15
- 0: No steering (baseline)
- 2-5: Subtle effects
- 5-8: Clear effects
- 8-15: Strong effects (may cause degradation at extremes)
System information.
List available steering vectors.
Reload vectors from disk.
Upload new .pt vector file.
Generate with steering.
Request:
{
"messages": [{"role": "user", "content": "..."}],
"steering_vector": "dopamine.pt",
"steering_intensity": 5.0,
"temperature": 0.7,
"max_tokens": 512,
"stream": true
}Response: OpenAI-compatible format.
If you use this code in your research, please cite:
@article{nuvolaproject2026steering,
title={Activation Steering as Artistic Medium: Disposition vs Performance in Language Models},
author={Di Leo, Massimo and Riposati, Gaia},
journal={Leonardo},
year={2026},
publisher={MIT Press}
}MIT License. See LICENSE file.
This work builds on research from:
- Anthropic (interpretability research)
- Turner et al. (activation addition)
- Neel Nanda (TransformerLens)
NuvolaProject — Art meets AI interpretability