Tsimulation: balanced data collection (circle_small + chunked replay)#497
Tsimulation: balanced data collection (circle_small + chunked replay)#497ElmoPA wants to merge 1 commit into
Conversation
- scripted_collect: BucketTracker + per-frame replay drift validation (DEFAULT_REPLAY_DRIFT_THRESHOLD=0.5, shared _replay_step_loop with replay_zarr); rejection reasons surfaced to caller. - collect/balance.py: bucket assignment helpers used to balance scripted collections across coverage buckets. - collect/gympusht_collect.py: gym-pushT collection front-end. - mouse_collect: expanded recording flow with bucket awareness. - zarr_writer: one-chunk-per-array bulk-write path (matches scripts/rechunk_zarr_dataset.py output). - pushshapes/obstacles.py: obstacle-level definitions feeding new collection variants. - pushshapes/env.py, shapes.py, render.py: support circle_small pusher and align rendering with new shape set. - examples/play_random.py: '--pusher circle_small' option. - examples/replay_zarr.py: _replay_step_loop with early-stop drift check. - README + SCHEMA_NOTES: document the compact one-chunk layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Claude Code ReviewReview of PR #497: Tsimulation balanced data collectionSummaryAdds bucket-balanced data collection (16-cell quadrant grid), a Key concerns
Suggestions
Verdict: Request ChangesThe L-inf/L2 docstring mismatch and the unconfirmed dataloader compatibility for the bulk one-chunk layout are the blocking items. The bucket-balancing and replay-validation logic itself is well-structured and the shared Reviewed by Claude · Review workflow |
Summary
Stacked on top of #477. Adds the data-collection updates for the
circle_smallpusher and the bulk one-chunk-per-array Zarr layout.scripted_collect.py:BucketTracker+ per-frame replay drift validation against a recorded episode (DEFAULT_REPLAY_DRIFT_THRESHOLD=0.5), sharing_replay_step_loopwithexamples/replay_zarr.py. Episodes are rejected with explicit reasons (bucket_full/low_coverage/replay_drift).collect/balance.py(new): coverage-bucket helpers (BucketTracker,bucket_for,count_existing_buckets,N_BUCKETS).collect/gympusht_collect.py(new): gym-pushT collection front-end.mouse_collect.py: extended manual recording flow with bucket awareness.zarr_writer.py: one-chunk-per-array bulk-write path (matches the layout produced byscripts/rechunk_zarr_dataset.py).pushshapes/:circle_smallpusher (env.py,shapes.py,render.py) + obstacle-level definitions (obstacles.py).examples/play_random.py:--pusher circle_smallchoice.examples/replay_zarr.py: factored_replay_step_loopwith early-stop drift check (shared withscripted_collect).README.md+SCHEMA_NOTES.md: document the compact one-chunk bulk-write layout.13 files changed, +1753 / -91.
Test plan
python -m Tsimulation.examples.play_random --pusher circle_smallruns without errorpython -m Tsimulation.collect.scripted_collect --num-episodes 10produces a balanced bucket distributionpython -m Tsimulation.examples.replay_zarr <ep.zarr>reportsdrift_max < 0.5on a freshly collected episodeZarrDataset/ packed dataloader with no schema drift🤖 Generated with Claude Code