This folder is a standalone replication bundle for the chunk-elicitation simulation and analysis pipeline. Simulation outputs are saved as local JSON files, and the analysis script reads those files to regenerate LaTeX tables and figures.
src/db_ops/: local JSON database layer with a small PyMongo-like API.scripts/01_Process_Benchmark.py: rebuilds processed benchmark JSON from raw benchmark files.scripts/02_Run_Experiment_1.py: runs Experiment 1 intodata/exp1/.scripts/03_Run_Experiment_2.py: runs Experiment 2 intodata/exp2/.scripts/04_Run_Experiment_3.py: runs Experiment 3 intodata/exp3/.scripts/05_Run_Analysis.py: reads local JSON and writestex/artifacts.data/raw/: duplicated raw benchmark inputs used by the benchmark processor.data/benchmark/benchmarks.json: generated benchmark records.data/exp*/simulations.json: simulation-level records.data/exp*/simulation_sessions.json: session-level model outputs.tex/tables/andtex/figs/: analysis outputs for paper tables/figures.
From this folder:
uv sync
cp .env.example .envFill in OPENROUTER_API_KEY in .env if you plan to run new simulations.
The included runner scripts default to OpenRouter model IDs.
The raw benchmark inputs are duplicated inside this replication folder under
data/raw/. The benchmark processor does not read from the main project and
does not use MongoDB. It rebuilds the processed benchmark JSON from scratch.
uv run python scripts/01_Process_Benchmark.pyThis overwrites:
data/benchmark/benchmarks.jsondata/benchmark/benchmark_manifest.json
The JSON shape matches the main project benchmark documents:
{
"_id": "uuid",
"game_type": "Dictator",
"decisions": [[50], [0], [20]]
}First inspect each plan without calling any LLM APIs:
uv run python scripts/02_Run_Experiment_1.py --dry-run
uv run python scripts/03_Run_Experiment_2.py --dry-run
uv run python scripts/04_Run_Experiment_3.py --dry-runThen run a script with confirmation skipped:
uv run python scripts/02_Run_Experiment_1.py --yes --max-workers 1Each runner saves to its experiment folder. For example, Experiment 1 writes
data/exp1/simulations.json and data/exp1/simulation_sessions.json.
The top-level output folder is controlled by --data-root:
uv run python scripts/03_Run_Experiment_2.py --data-root data --yesAfter rebuilding benchmarks and running simulations:
uv run python scripts/05_Run_Analysis.pyThe analysis script reads data/exp1, data/exp2, data/exp3, and
data/benchmark, then writes:
tex/tables/*.textex/figs/*.pngtex/result.tex
The generated tex/result.tex is a compact article-style wrapper that inputs
the tables and figures.
Each experiment folder stores two collections:
simulations.json: one record per simulation configuration.simulation_sessions.json: one record per LLM call/session.
The simulation record keeps references to session IDs:
{
"_id": "simulation uuid",
"phase_name": "phase_2",
"simulation_config": {"game_type": "Dictator"},
"instruction_config": {"explain_reasoning": true},
"llm_config": {"model": "openai/gpt-5.2"},
"simulation_sessions": ["session uuid"],
"failed_sessions": [],
"completed": true
}The local database layer supports the small subset of queries used by the
simulation and analysis code: find, find_one, dotted keys, $in, $or,
and $exists.