Skip to content

Add modular RLSSM simulator framework#278

Open
krishnbera wants to merge 11 commits into
mainfrom
feature/rlssm-simulator
Open

Add modular RLSSM simulator framework#278
krishnbera wants to merge 11 commits into
mainfrom
feature/rlssm-simulator

Conversation

@krishnbera
Copy link
Copy Markdown
Member

@krishnbera krishnbera commented May 19, 2026

Addresses #279

Summary

  • add a modular ssms.rl simulator framework that composes learning processes, task environments, and existing SSM decision processes
  • add Rescorla-Wagner learning rules, generic Bernoulli/Gaussian bandit environments, response-label to action-index mapping, and HSSM config export support
  • add RLSSM tests and a rendered MkDocs tutorial under Core Tutorials

Why

This fills the simulation-side gap for RLSSM workflows by letting users generate HSSM-compatible trial-wise RLSSM datasets without adding new Cython simulator code.

Validation

  • uv run pre-commit run --all-files
  • uv run pytest tests/rl tests/test_hssm_support.py tests/test_simulator.py -q --no-cov
  • MPLCONFIGDIR=/tmp/.mpl uv run --extra docs mkdocs build

Notes

  • The docs build currently emits pre-existing MkDocs/autorefs warnings unrelated to the new RLSSM tutorial.
  • Untracked local instruction files (AGENTS.md, CONTEXT.md) were left out of this PR.

krishnbera and others added 10 commits May 13, 2026 14:45
Milestone 1, Commit 1: Defines the LearningProcess protocol (the handshake
contract between learning and decision processes) and the first built-in
implementation — RescorlaWagnerDeltaRule — which is numerically equivalent
to HSSM's compute_v_trial_wise(). Includes 13 unit tests covering Q-value
trajectories, drift ordering, HSSM numerical equivalence, and protocol compliance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 2: Defines the TaskEnvironment protocol for reward
generation, TwoArmedBandit (Bernoulli bandit with configurable per-arm
probabilities), and TaskConfig convenience dataclass for common paradigms.
Includes 14 unit tests covering reward statistics, reproducibility,
input validation, protocol compliance, and TaskConfig builder.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 3: Defines RLSSMModelConfig — the structural model
specification that resolves the handshake between learning process and
decision process (SSM). Auto-derives list_params, bounds, and defaults
from components. Includes validate() for config consistency checking and
to_hssm_config_dict() for bridging to HSSM's RLSSMConfig. 13 tests cover
auto-derivation, handshake validation, computed_param_mapping, TaskConfig
auto-build, and HSSM dict contract.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 4: Implements the core RLSSMSimulator class that runs
the trial-by-trial interleaved loop: compute SSM params from learning state,
simulate one SSM trial, observe choice, generate reward, update learning.
Reuses ssm-simulators' existing simulator() with n_samples=1 — all 40+ SSM
models work as decision processes. Includes 15 tests covering DataFrame
output shape, balanced panel, reproducibility, theta validation, edge cases,
and omission handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Milestone 1, Commit 5: Adds the preset registry (register/get/list_rlssm_preset)
with rlssm1 preset (RW delta rule + angle SSM + two-armed bandit). Wires up the
ssms.rl public API with full __all__ exports and adds `from . import rl` to
ssms/__init__.py. Fixes circular import in rl_simulator.py (OMISSION_SENTINEL).
Includes 13 contract tests for HSSM compatibility (output dtypes, no NaNs,
config dict schema) and registry smoke tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 94.35666% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ssms/rl/config.py 92.56% 11 Missing ⚠️
ssms/rl/env.py 94.69% 7 Missing ⚠️
ssms/rl/learning.py 90.27% 7 Missing ⚠️
Flag Coverage Δ
unittests 92.71% <94.35%> (+0.41%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
ssms/rl/preset.py 100.00% <100.00%> (ø)
ssms/rl/simulator.py 100.00% <100.00%> (ø)
ssms/rl/env.py 94.69% <94.69%> (ø)
ssms/rl/learning.py 90.27% <90.27%> (ø)
ssms/rl/config.py 92.56% <92.56%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@krishnbera krishnbera marked this pull request as ready for review May 19, 2026 23:36
Copilot AI review requested due to automatic review settings May 19, 2026 23:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new modular RLSSM simulation framework under ssms.rl, designed to interleave trial-wise reinforcement learning updates with existing SSM decision simulators and to export HSSM-compatible configuration/data.

Changes:

  • Added ssms.rl core components: ModelConfig, Simulator, task environments (bandits), learning rules (Rescorla–Wagner variants), and a preset registry.
  • Added comprehensive RLSSM-focused tests (learning, env, simulator behavior, and HSSM compatibility/contract checks).
  • Added an MkDocs tutorial entry for the new RLSSM simulator workflow.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
ssms/rl/config.py Defines the structural RLSSM configuration and HSSM export support.
ssms/rl/simulator.py Implements the interleaved learning + SSM simulation loop and output formatting.
ssms/rl/env.py Adds a task environment protocol plus Bernoulli/Gaussian bandit implementations and task registry.
ssms/rl/learning.py Adds the learning process protocol and Rescorla–Wagner learning rules.
ssms/rl/preset.py Adds an RLSSM preset registry and a built-in rlssm1 preset.
ssms/rl/__init__.py Exposes the public ssms.rl API surface.
ssms/__init__.py Re-exports the rl module at the package top level.
tests/rl/test_task_environment.py Tests bandit environment behavior, validation, and task config building.
tests/rl/test_learning_process.py Tests RW learning rules’ numerical behavior and protocol compliance.
tests/rl/test_rl_config.py Tests config auto-derivation, handshake validation, response mapping, and HSSM dict export.
tests/rl/test_rl_simulator.py Tests simulation output schema, reproducibility, omission handling, and response/action mapping.
tests/rl/test_hssm_compatibility.py Contract tests for HSSM consumability and preset registry behavior.
tests/rl/__init__.py Initializes the RL test package.
mkdocs.yml Adds the RLSSM tutorial notebook to the docs nav.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ssms/rl/env.py Outdated
Comment thread ssms/rl/env.py Outdated
Comment thread ssms/rl/config.py
Comment thread ssms/rl/config.py Outdated
Comment thread ssms/rl/simulator.py
Comment thread ssms/rl/simulator.py
Comment thread ssms/rl/learning.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 2 comments.

Comment thread ssms/rl/config.py
Comment on lines +307 to +311
# list_params / params_default consistency
if self.list_params and self.params_default:
if len(self.list_params) != len(self.params_default):
raise ValueError(
f"list_params length ({len(self.list_params)}) != "
Comment thread ssms/rl/env.py


def _build_bandit(reward: str | None, options: dict) -> TaskEnvironment:
reward = reward or "bernoulli"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants