Add modular RLSSM simulator framework#278
Conversation
Milestone 1, Commit 1: Defines the LearningProcess protocol (the handshake contract between learning and decision processes) and the first built-in implementation — RescorlaWagnerDeltaRule — which is numerically equivalent to HSSM's compute_v_trial_wise(). Includes 13 unit tests covering Q-value trajectories, drift ordering, HSSM numerical equivalence, and protocol compliance. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 2: Defines the TaskEnvironment protocol for reward generation, TwoArmedBandit (Bernoulli bandit with configurable per-arm probabilities), and TaskConfig convenience dataclass for common paradigms. Includes 14 unit tests covering reward statistics, reproducibility, input validation, protocol compliance, and TaskConfig builder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 3: Defines RLSSMModelConfig — the structural model specification that resolves the handshake between learning process and decision process (SSM). Auto-derives list_params, bounds, and defaults from components. Includes validate() for config consistency checking and to_hssm_config_dict() for bridging to HSSM's RLSSMConfig. 13 tests cover auto-derivation, handshake validation, computed_param_mapping, TaskConfig auto-build, and HSSM dict contract. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 4: Implements the core RLSSMSimulator class that runs the trial-by-trial interleaved loop: compute SSM params from learning state, simulate one SSM trial, observe choice, generate reward, update learning. Reuses ssm-simulators' existing simulator() with n_samples=1 — all 40+ SSM models work as decision processes. Includes 15 tests covering DataFrame output shape, balanced panel, reproducibility, theta validation, edge cases, and omission handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 5: Adds the preset registry (register/get/list_rlssm_preset) with rlssm1 preset (RW delta rule + angle SSM + two-armed bandit). Wires up the ssms.rl public API with full __all__ exports and adds `from . import rl` to ssms/__init__.py. Fixes circular import in rl_simulator.py (OMISSION_SENTINEL). Includes 13 contract tests for HSSM compatibility (output dtypes, no NaNs, config dict schema) and registry smoke tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR introduces a new modular RLSSM simulation framework under ssms.rl, designed to interleave trial-wise reinforcement learning updates with existing SSM decision simulators and to export HSSM-compatible configuration/data.
Changes:
- Added
ssms.rlcore components:ModelConfig,Simulator, task environments (bandits), learning rules (Rescorla–Wagner variants), and a preset registry. - Added comprehensive RLSSM-focused tests (learning, env, simulator behavior, and HSSM compatibility/contract checks).
- Added an MkDocs tutorial entry for the new RLSSM simulator workflow.
Reviewed changes
Copilot reviewed 13 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
ssms/rl/config.py |
Defines the structural RLSSM configuration and HSSM export support. |
ssms/rl/simulator.py |
Implements the interleaved learning + SSM simulation loop and output formatting. |
ssms/rl/env.py |
Adds a task environment protocol plus Bernoulli/Gaussian bandit implementations and task registry. |
ssms/rl/learning.py |
Adds the learning process protocol and Rescorla–Wagner learning rules. |
ssms/rl/preset.py |
Adds an RLSSM preset registry and a built-in rlssm1 preset. |
ssms/rl/__init__.py |
Exposes the public ssms.rl API surface. |
ssms/__init__.py |
Re-exports the rl module at the package top level. |
tests/rl/test_task_environment.py |
Tests bandit environment behavior, validation, and task config building. |
tests/rl/test_learning_process.py |
Tests RW learning rules’ numerical behavior and protocol compliance. |
tests/rl/test_rl_config.py |
Tests config auto-derivation, handshake validation, response mapping, and HSSM dict export. |
tests/rl/test_rl_simulator.py |
Tests simulation output schema, reproducibility, omission handling, and response/action mapping. |
tests/rl/test_hssm_compatibility.py |
Contract tests for HSSM consumability and preset registry behavior. |
tests/rl/__init__.py |
Initializes the RL test package. |
mkdocs.yml |
Adds the RLSSM tutorial notebook to the docs nav. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # list_params / params_default consistency | ||
| if self.list_params and self.params_default: | ||
| if len(self.list_params) != len(self.params_default): | ||
| raise ValueError( | ||
| f"list_params length ({len(self.list_params)}) != " |
|
|
||
|
|
||
| def _build_bandit(reward: str | None, options: dict) -> TaskEnvironment: | ||
| reward = reward or "bernoulli" |
Addresses #279
Summary
ssms.rlsimulator framework that composes learning processes, task environments, and existing SSM decision processesWhy
This fills the simulation-side gap for RLSSM workflows by letting users generate HSSM-compatible trial-wise RLSSM datasets without adding new Cython simulator code.
Validation
uv run pre-commit run --all-filesuv run pytest tests/rl tests/test_hssm_support.py tests/test_simulator.py -q --no-covMPLCONFIGDIR=/tmp/.mpl uv run --extra docs mkdocs buildNotes
AGENTS.md,CONTEXT.md) were left out of this PR.