reversion-study

Anthropic's Nov 2025 post described an external-state pattern for long-running coding agents. This repo implements a minimal version of that pattern and measures one failure mode it may help constrain: reversion.

What reversion means here

A reversion event = a session modifies a file explicitly mapped (via feature_to_files.json) to an already-passing feature, while assigned to work on a different feature.

The task may still complete. Completion metrics do not capture this instability.

What this repo does

Implements a minimal external-state harness
Assigns one feature per session with explicit file boundaries
Measures reversion_rate = events / completed_features
Compares harness arm vs baseline arm on one synthetic task

Results

BENCHMARK.md is a placeholder. Run python3 agent.py --task broken_saas --run both to generate real numbers.

Dry-run shows synthetic mock data only — see banner in output.

Running

pip install -r requirements.txt
python3 -m pytest tests/ -v
python3 agent.py --task broken_saas --dry-run

Real runs (requires ANTHROPIC_API_KEY):

python3 agent.py --task broken_saas --run harness
python3 agent.py --task broken_saas --run baseline
python3 agent.py --task broken_saas --run both

Limitations

Single synthetic task. Hand-authored file mapping. Same model in both arms. See ARCHITECTURE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
baseline		baseline
evals		evals
harness		harness
tasks		tasks
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
BENCHMARK.md		BENCHMARK.md
Dockerfile		Dockerfile
README.md		README.md
agent.py		agent.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reversion-study

What reversion means here

What this repo does

Results

Running

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

reversion-study

What reversion means here

What this repo does

Results

Running

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages