This document outlines the standards and best practices for contributing to the OdyssNet project — whether you are adding example scripts, improving the library, or fixing bugs.
OdyssNet relies on a highly modular library structure. To ensure long-term maintainability and performance, all contributions must adhere to these guidelines.
# Clone the repository
git clone <repo-url> && cd odyssnet
# Install in development mode (required — makes `from odyssnet import ...` work everywhere)
pip install -e ".[dev]"
# Verify installation
python -m pytest tests/
# OR simply
pytest tests/ # but not recommended as it will not see your env setupCUDA note:
requirements.txtpins CUDA 11.8. For RTX 4000/5000 series GPUs, install PyTorch separately:pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
odyssnet/ # Library source code
core/network.py # OdyssNet model
training/trainer.py # OdyssNetTrainer
utils/ # Data, checkpointing, neurogenesis, history
tests/ # Test suite (mirrors odyssnet/ structure)
examples/ # Core validation scripts (identity, XOR, MNIST)
advanced/ # Complex experiments (reasoning, generation, transfer)
We distinguish between Core Validations and Feature Experiments.
- Purpose: Contains minimal, "hello world" style scripts that validate the core laws of OdyssNet physics.
- Examples:
convergence_identity.py(Can signals pass?),convergence_gates.py(Can it solve XOR?). - Rule: Scripts here should be extremely simple, fast, and prove a fundamental property of the architecture.
- Purpose: Contains complex tasks, task-specific logic, and demonstrations of advanced cognitive behaviors.
- Examples:
convergence_detective_thinking.py(Reasoning),convergence_latch.py(Willpower). - Rule: If you are building a task (like adding numbers, generating waves, or playing a game), it goes here.
⛔ DO NOT re-invent the wheel. ✅ DO use the Library.
Never write your own manual PyTorch training loop (optimizer.step(), loss.backward(), etc.) unless absolutely necessary for low-level research.
- Why? The
OdyssNetTrainerhandles:- Automatic Mixed Precision (AMP): Faster training on Tensor Cores.
- Gradient Accumulation: Simulating large batches.
- Ghost Gradients (Persistence): Advanced stabilization.
- State Management: Resetting hidden states automatically.
# ❌ BAD: Manual Loop
output = model(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()
# ✅ GOOD: Trainer
trainer = OdyssNetTrainer(model, device='cuda')
trainer.train_batch(input, target, thinking_steps=10)If you need a new feature (e.g., a new loss function or a custom metric), extend OdyssNetTrainer or pass arguments to it. If the library is missing a critical feature, implement it in the library first, then use it in your example.
OdyssNet is sensitive to initialization. The default weight_init='resonant' is the recommended starting point for all tasks — it places the weight matrix at the Edge of Chaos (ρ(W) = 1.0) from the start and works across all network sizes.
For most tasks without specific constraints, use the default resonant initialization.
- Activation:
'tanh' - Weight Init:
'resonant'(Default) — Rademacher ±1 skeleton + spectral normalization to ρ = 1.0. Ensures signal fidelity without exploding or vanishing. Projection layers (embed/proj/decoder) automatically use'quiet'init. - Gate:
None(Default) — resolves to['none', 'none', 'identity'](memory identity gate enabled; starts closed with zero gate init and opens if training needs it). - Dropout:
0.0(Default) — Enable explicitly (e.g.,0.1) only when overfitting is observed.
model = OdyssNet(..., activation='tanh') # weight_init='resonant' is already the defaultIf resonant convergence is too slow on very small circuits:
- Activation:
'gelu'— Better gradient flow in sparse/small graphs. - Weight Init:
'xavier_uniform'— High variance ensures signals don't die in small circuits. - Gate (Optional):
'sigmoid'— Stronger branch control if needed. - Dropout:
0.0— Every neuron is vital in small networks.
model = OdyssNet(..., activation='gelu', weight_init='xavier_uniform', dropout_rate=0.0)
trainer = OdyssNetTrainer(model, ..., synaptic_noise=0.0) # Disable noise for pure logicFor long-horizon temporal stability:
- Activation:
'tanh' - Weight Init:
'orthogonal'— Solid fallback for pure stability. - Gate (Optional):
['none', 'none', 'sigmoid']— Memory-only gating. - Dropout:
0.0(Default) — Enable explicitly when overfitting is a concern.
model = OdyssNet(..., activation='tanh', weight_init='orthogonal')gate=None: Default branch layout['none', 'none', 'identity'].gate='sigmoid': Applies same gate activation to all[encoder_decoder, core, memory]branches.gate=['none', 'none', 'sigmoid']: Only memory branch is gated.gate=['none', 'none', 'none']: Disables all gating.- List supports 1-3 entries, right-padded from defaults.
'none': Disables gate branch entirely (no learnable parameters).'identity': Enables explicit identity gating (learnable gate params exist, starts at identity).- Gate parameter initialization uses the 4th
weight_initslot. Default layout is['quiet', 'resonant', 'quiet', 'zero']. - Activation layout supports 1-4 entries with default
['none', 'tanh', 'tanh', 'none']; 4th slot is reserved for config symmetry.
For tasks requiring precise storage and retrieval of values over time (e.g., Neural Database):
- Structure: High neuron count (256+) to provide storage space for memories.
- Init: Default
resonantinitialization is appropriate.
model = OdyssNet(num_neurons=256, ...) # resonant default works wellFor tasks requiring high input/output dimensionality (like vision or LLMs) without scaling the core state size:
- Feature: Use
vocab_size=(V_IN, V_OUT)to decouple input/output resolution from internal neuron count. - Optimization: Allows a tiny "Thinking Core" (e.g., 10 neurons) to process high-resolution signals (e.g., 784 pixels), achieving extreme parametric efficiency.
- Usage: Best used in conjunction with sequential signal processing for maximum compression.
- Note: When
weight_init='resonant', projection layers (embed/proj/decoder) automatically use'quiet'init (Normal(0, 0.02)) — no manual override needed.
# OdyssNet core has N=10 neurons, but processes 784 input channels and 10 output classes
model = OdyssNet(num_neurons=10, ..., vocab_size=(784, 10))For tasks where online synaptic plasticity may help — e.g., fast-adaptation, continual learning, or tasks with shifting statistics — enable one of the three resolution levels:
hebb_type |
Extra Params | When to use |
|---|---|---|
"global" |
+2 | Quick experiments; uniform plasticity across all synapses. |
"neuron" |
+2N | RL and reactive environments; per-neuron "caste" differentiation. |
"synapse" |
+2N² | Logic, NLP, and reasoning tasks requiring dynamic variable binding. |
-
What it does: At each step the network accumulates temporal cross-neuron correlations
$C_t = h_t \otimes h_{t-1}$ and applies them as$W_\text{eff} = W + (f_h \odot C_t)$ (where$f_h$ ishebb_factor). Bothhebb_factorandhebb_decayare learnable — the network discovers how plastic each synapse should be. -
State: Correlations are persisted via buffers (
hebb_state_W,hebb_state_mem) across intra-sequence forward calls and are explicitly cleared onreset_state()between sequences. - Best Use Case (Generation / Sequential Building): Hebbian shines in tasks where step T relies heavily on expanding or completing a pattern from step T-1. It provides a powerful short-term working memory between steps, acting as a dynamic shortcut that fast-tracks sequence generation.
- When not to use it (Classification / Independent Features): Avoid Hebbian in classification tasks where each step processes distinct, independent chunks of information (e.g. sequential MNIST classification). In these tasks, inter-step short-term memory acts as "overfit noise".
-
Compatibility: Fully compatible with
gradient_checkpointing=True. - Combined with gating: Hebbian and gate parameters are independent groups; both can be active simultaneously.
# NLP / Logic / Reasoning — synapse-level plasticity for dynamic variable binding
model = OdyssNet(
num_neurons=32,
input_ids=[0, 1],
output_ids=[31],
activation='tanh',
hebb_type='synapse', # Per-synapse plasticity
device='cuda',
)
# RL / reactive environments — neuron-level caste differentiation
model = OdyssNet(
num_neurons=64,
input_ids=list(range(8)),
output_ids=list(range(56, 64)),
activation='tanh',
hebb_type='neuron', # Per-neuron plasticity
device='cuda',
)
# Quick experiment — global plasticity
model = OdyssNet(..., hebb_type='global')
# Default: Prodigy optimizer — auto-calibrates LR, no tuning needed
trainer = OdyssNetTrainer(model)
# AdamW: pass an explicit learning rate
trainer = OdyssNetTrainer(model, lr=3e-4)Always enable TF32 on Ampere+ GPUs for free speedup.
import torch
torch.set_float32_matmul_precision('high')For production or long training runs, compile the model.
model.compile() # Uses torch.compile (PyTorch 2.0+)Experiments should handle training stagnation intelligently by adding neurons when needed.
- Metric: If
losshas not improved forNepochs. - Action: Call
trainer.expand(amount=...). - Amount:
- Small nets (< 100 neurons): +1 neuron per expansion.
- Large nets (≥ 100 neurons): +10 neurons or +1% of current size per expansion.
if loss > prev_loss:
trainer.expand(amount=10)Experiments that run for a long time should handle training stagnation or spikes intelligently without manual restarts.
You can pass an anomaly_hook to the OdyssNetTrainer to automate recovery and logging.
def my_hook(anomaly_type, loss_val):
if anomaly_type == "plateau":
print("Triggering plateau escape!")
trainer = OdyssNetTrainer(model, anomaly_hook=my_hook)Since tanh is our primary activation for robust systems, avoid using 0.0 and 1.0 for logical states.
- OFF:
-1.0 - ON:
1.0 - Neutral/Silence:
0.0
This symmetry helps the gradients flow much better than a 0.0 (which is the most unstable point of tanh).
Use the prepare_input utility implicitly via the Trainer.
- Pulse: Single Event at t=0.
- Stream: Continuous sequence. pass
full_sequence=Truetotrainer.predict()ortrain_batch()if you need frame-by-frame monitoring.
-
Reproducibility & Seeding: 🔴 MANDATORY — All example and experiment scripts MUST set a fixed seed for reproducible results.
- Why? Reproducible results are essential for debugging, comparing strategies, and publishing findings.
- How? Always call
set_seed(42)at the very start of yourmain()function, before any random operations. - Import:
from odyssnet import set_seed - Example:
def main(): set_seed(42) # ← FIRST LINE in main() # Now all randomness is locked: model init, data shuffling, dropout, etc. model = OdyssNet(...) trainer = OdyssNetTrainer(model, lr=1e-4) # pin lr for deterministic curves trainer.fit(X, Y, epochs=100)
- Applies to:
- Model weight initialization (deterministic via
torch.manual_seed). - Data shuffling / batch sampling.
- Dropout and stochastic regularization.
- CUDA random state (for GPU consistency).
- Model weight initialization (deterministic via
- Test: If you run the script twice with the same seed, loss curves and final results should be identical, byte-for-byte. Note: this requires passing an explicit
lrtoOdyssNetTrainer— the defaultlr=None(Prodigy) adapts its learning rate online and will produce different curves across runs.
-
Visuals: Your example should print a cool visualization. Don't just print "Loss: 0.01". Print the timeline.
- Example:
t=05 | Input: 1 | Output: 0.99 🟢
- Example:
-
Comments: Explain why you chose a specific setup.
- Example:
# GAP=3 allows the model time to digest the previous bit.
- Example:
-
File Paths: Never use hardcoded absolute paths or assume the CWD. Always construct paths relative to the script file.
- Example:
DATA_FILE = os.path.join(os.path.dirname(__file__), '..', '..', 'data', 'file.txt')
- Example:
-
Checkpointing: Always use the library's
save_checkpoint,load_checkpoint, andtransplant_weightsfunctions fromodyssnet.utils.odyssstore. Do NOT write custom checkpoint code. If the library is missing a feature, extend the library instead.- Example:
from odyssnet import save_checkpoint, load_checkpoint, transplant_weights
- Example:
-
Training History & Plotting: All finite-duration examples MUST use
TrainingHistoryto record metrics (loss, accuracy, lr, etc.) and callhistory.plot()at the end. This generates a multi-panel plot of all tracked metrics over time. (Note: When examples are run viatest_all.py, theODYSSNET_DISABLE_PLOT=1environment variable automatically bypasses the interactive plotting UI).- Example:
from odyssnet import TrainingHistory history = TrainingHistory() for epoch in range(epochs): loss = trainer.train_batch(...) history.record(loss=loss, lr=current_lr) history.plot(title="My Experiment")
-
Imports: Import
odyssnetdirectly (installed viapip install -e .). Never usesys.path.appendhacks.
If your experiment produces Loss nan / PPL nan, enable the built-in diagnosis mode before anything else:
model = OdyssNet(..., debug=True)With debug=True the model checks every critical forward-pass operation (linear recurrence, memory feedback, activation, StepNorm, Hebbian correlation and accumulation) and raises a RuntimeError at the first operation that produces a non-finite value, with the operation name and step index. debug=True also automatically enables torch.autograd.set_detect_anomaly(True), so backward-pass NaN is caught with a full stack trace at no extra setup cost. Overhead is zero when debug=False.
If your model trains but doesn't converge or gets stuck:
Track and visualize all key metrics to identify patterns:
from odyssnet import TrainingHistory
history = TrainingHistory()
for epoch in range(epochs):
loss = trainer.train_batch(x, y, thinking_steps=10)
acc = evaluate_accuracy(...)
lr = trainer.optimizer.param_groups[0]['lr'] if hasattr(trainer.optimizer, 'param_groups') else trainer.initial_lr
history.record(loss=loss, accuracy=acc, lr=lr)
# Visual inspection reveals patterns
history.plot(title="Training Diagnosis")
# Or save for later analysis
history.plot(save_path="diagnosis/training.png", title="Debug Run")What to look for:
- Flat loss: May need more thinking steps, different initialization, or learning rate adjustment
- Oscillating loss: Reduce learning rate or enable gradient persistence
- Sudden spikes: Check for batch corruption or use anomaly_hook to catch them
Monitor training health in real-time:
for epoch in range(epochs):
loss = trainer.train_batch(x, y, thinking_steps=10)
# Get comprehensive diagnostics (add debug=True for detailed stats)
diag = trainer.get_diagnostics()
if epoch % 10 == 0:
print(f"Epoch {epoch}")
print(f" Step count: {diag['step_count']}")
print(f" Last loss: {diag['last_loss']:.6f}")
# For detailed debugging, use debug=True
if need_detailed_analysis:
diag = trainer.get_diagnostics(debug=True)
# Now includes gradient_stats, persistent_grads_active, anomaly_tracking,
# loss_tracking, scaler_state, and detailed optimizer per-param statsKey metrics to monitor:
- last_loss trend: Sustained increase suggests instability
- gradient_stats: Large norm swings can indicate optimization difficulty
- step_count progression: Confirms stable optimizer stepping cadence
Debug mode additions:
- gradient_stats: Per-parameter gradient norms and means (min/max/std)
- persistent_grads_active: Number of parameters with persistent gradients
- anomaly_tracking: EWMA, variance, and plateau detection state
- loss_tracking: Recent losses and buffer statistics
Set up intelligent automated responses to training anomalies:
def handle_anomaly(anomaly_type, loss_val):
"""Called automatically on training anomalies."""
if anomaly_type == "spike":
# Violent loss surge (possible gradient explosion)
print(f"⚠️ SPIKE detected! Loss: {loss_val:.4f}")
# Could reduce LR, reload checkpoint, etc.
elif anomaly_type == "plateau":
# Loss stagnated over a window
print(f"🔄 PLATEAU detected. Triggering escape...")
elif anomaly_type == "increase":
# Loss increased from previous step (happens every time loss goes up)
# Useful for custom patience counters or early stopping
global patience_counter
patience_counter += 1
if patience_counter > 50:
print(f"⛔ 50 consecutive increases. Early stopping.")
raise KeyboardInterrupt
# Initialize trainer with hook
patience_counter = 0
trainer = OdyssNetTrainer(
model,
anomaly_hook=handle_anomaly
)
# Now train — anomalies trigger automatic responses
for epoch in range(1000):
loss = trainer.train_batch(x, y, thinking_steps=10)Anomaly types:
- "spike": Sudden violent surge in loss (exploding gradient)
- "plateau": Loss stagnated and barely moving over a window
- "increase": Loss strictly greater than previous step (fired every time, even 0.0001 increase)
If loss oscillates or training is unstable:
-
Enable gradient persistence for smoother optimization:
trainer = OdyssNetTrainer(model, gradient_persistence=0.1)
-
Use AdamW with a lower explicit learning rate (bypasses Prodigy):
trainer = OdyssNetTrainer(model, lr=1e-4)
-
Try different initialization if using tiny networks:
model = OdyssNet(..., weight_init='xavier_uniform', activation='gelu')
If loss doesn't decrease at all:
- Verify data preprocessing: Check that inputs/targets are properly normalized and on correct device
- Increase thinking steps: Model may need more temporal depth
trainer.train_batch(x, y, thinking_steps=20) # Was 10
- Check initialization: For very small networks (<10 neurons), try:
model = OdyssNet(..., weight_init='xavier_uniform', activation='gelu')
- Use anomaly_hook and adjust lr/steps dynamically based on diagnostics.
If training is too slow:
-
Enable TF32 on Ampere+ GPUs:
import torch torch.set_float32_matmul_precision('high')
-
Compile the model (PyTorch 2.0+):
model.compile()
-
Use gradient accumulation instead of larger batches:
# Simulates batch_size=128 with batch_size=32 trainer.train_batch(x, y, thinking_steps=10, gradient_accumulation_steps=4)
If running out of VRAM:
- Reduce batch size and use gradient accumulation
- Enable gradient checkpointing:
model = OdyssNet(..., gradient_checkpointing=True)
- Use vocab projection for high-dimensional inputs:
# Instead of num_neurons=784 for MNIST model = OdyssNet(num_neurons=10, vocab_size=[784, 10])
When modifying the library itself (not examples), follow these additional rules:
- Every change to a public interface must have a corresponding test update under
tests/. - The test suite mirrors
odyssnet/:tests/core/,tests/training/,tests/utils/. Place new tests in the matching subdirectory. - New behavior requires new test cases. Changed signatures require updated tests. Deleted code requires removed orphaned tests.
- Run
python -m pytest tests/or justpytest tests/(but not recommended as it will not see your env setup) after every change to confirm the suite stays green.
- Every public API change must be reflected in the relevant markdown files (
docs/LIBRARY.md,CONTRIBUTING.md). - Document what the system is, not what it was. Version history belongs in
CHANGELOG.mdonly.
- Comments explain why, not what — the code already says what.
- Use precise, professional language. Avoid filler phrases ("simply", "just", "obviously").
- Do not use comments or docs as changelog.
- Tests pass (
python -m pytest tests/orpytest tests/) - No
sys.path.appendhacks — imports usefrom odyssnet import ...directly
- Corresponding test added/updated under
tests/ - Documentation updated in relevant markdown files (docs/LIBRARY.md, CONTRIBUTING.md)
- Does your script call
set_seed(42)at the START ofmain()? (MANDATORY for reproducibility) - Does
OdyssNetTrainerreceive an explicitlr? (e.g.lr=1e-4). The defaultlr=Noneactivates Prodigy, which adapts LR online and breaks byte-for-byte reproducibility. Examples must pin a float lr. - Did you place it in the correct folder (
examples/for core validations,examples/advanced/for complex tasks)? - Are you using
OdyssNetTrainer? - Did you select the correct
activation,weight_init, andgatesetup? (Defaultresonant+gate=Noneis fine for most tasks.) - If you set
hebb_type, did you review the Hebbian Optimizer Contract above and confirm weight decay is not applied to the Hebbian group? - Does it converge reliably? (If you see
Loss nan, see Troubleshooting above.) - Does the terminal output clearly explain what is happening?
- Does the script use
TrainingHistoryand callhistory.plot()at the end? - Are file paths relative to
__file__, not hardcoded?
Welcome to the Order of the Algorithm. Let's code Time.