Tsimulation: balanced data collection (circle_small + chunked replay) by ElmoPA · Pull Request #497 · GaTech-RL2/EgoVerse

ElmoPA · 2026-06-14T01:03:27Z

Summary

Stacked on top of #477. Adds the data-collection updates for the circle_small pusher and the bulk one-chunk-per-array Zarr layout.

scripted_collect.py: BucketTracker + per-frame replay drift validation against a recorded episode (DEFAULT_REPLAY_DRIFT_THRESHOLD=0.5), sharing _replay_step_loop with examples/replay_zarr.py. Episodes are rejected with explicit reasons (bucket_full / low_coverage / replay_drift).
collect/balance.py (new): coverage-bucket helpers (BucketTracker, bucket_for, count_existing_buckets, N_BUCKETS).
collect/gympusht_collect.py (new): gym-pushT collection front-end.
mouse_collect.py: extended manual recording flow with bucket awareness.
zarr_writer.py: one-chunk-per-array bulk-write path (matches the layout produced by scripts/rechunk_zarr_dataset.py).
pushshapes/: circle_small pusher (env.py, shapes.py, render.py) + obstacle-level definitions (obstacles.py).
examples/play_random.py: --pusher circle_small choice.
examples/replay_zarr.py: factored _replay_step_loop with early-stop drift check (shared with scripted_collect).
README.md + SCHEMA_NOTES.md: document the compact one-chunk bulk-write layout.

13 files changed, +1753 / -91.

Test plan

python -m Tsimulation.examples.play_random --pusher circle_small runs without error
python -m Tsimulation.collect.scripted_collect --num-episodes 10 produces a balanced bucket distribution
python -m Tsimulation.examples.replay_zarr <ep.zarr> reports drift_max < 0.5 on a freshly collected episode
Bulk-written episodes load through the existing ZarrDataset / packed dataloader with no schema drift

🤖 Generated with Claude Code

- scripted_collect: BucketTracker + per-frame replay drift validation (DEFAULT_REPLAY_DRIFT_THRESHOLD=0.5, shared _replay_step_loop with replay_zarr); rejection reasons surfaced to caller. - collect/balance.py: bucket assignment helpers used to balance scripted collections across coverage buckets. - collect/gympusht_collect.py: gym-pushT collection front-end. - mouse_collect: expanded recording flow with bucket awareness. - zarr_writer: one-chunk-per-array bulk-write path (matches scripts/rechunk_zarr_dataset.py output). - pushshapes/obstacles.py: obstacle-level definitions feeding new collection variants. - pushshapes/env.py, shapes.py, render.py: support circle_small pusher and align rendering with new shape set. - examples/play_random.py: '--pusher circle_small' option. - examples/replay_zarr.py: _replay_step_loop with early-stop drift check. - README + SCHEMA_NOTES: document the compact one-chunk layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-14T01:04:27Z

Claude Code Review

Review of PR #497: Tsimulation balanced data collection

Summary

Adds bucket-balanced data collection (16-cell quadrant grid), a circle_small pusher, replay-drift validation against recorded episodes, and a bulk one-chunk-per-array zarr layout for new collections. Code is scoped to Tsimulation/ and does not touch core EgoVerse training paths.

Key concerns

One-chunk-per-array layout for episodes — verify dataloader compatibility. The PR claims ZarrDataset / packed dataloader handles this "with no schema drift," but this needs to be confirmed by actually loading a bulk-written episode through egomimic/rldb/. Random access for image arrays stored as a single chunk requires decoding the whole chunk to get one frame — this is the opposite of the prior (1,) chunked image arrays where frames decode independently. Check whether ZarrDataset iteration triggers full-episode reads (memory + I/O regression at train time).
SCHEMA_NOTES.md change implies "one chunk for the whole image array" — this contradicts the prior "frames can be decoded independently" guarantee. If the dataloader uses random per-step indexing into image arrays, this will be significantly slower. Please benchmark dataloader throughput on a bulk-written vs. legacy episode.
Episode filename regex broadened to non-greedy .+?. The new _EPISODE_RE = r"^episode_.+?_obs\d+_(\d+)\.zarr$" is correct for circle_small but is also strictly looser than the old [A-Za-z0-9]+_[A-Za-z0-9]+. Add a unit test that exercises circle_small, stick, and tagged variants (_ontarget_, _gympusht_) to lock the index extraction behavior.
bucket_full rejection happens after env.reset(seed=seed) but before start_episode — fine, but note that the rejected reset still consumes a seed counter, which is desirable for reproducibility. Confirm intentional.
Replay drift validation uses L-inf in docstring but L2 in _replay_step_loop (np.linalg.norm is L2). The scripted_collect.py docstring and --replay-drift-threshold help text say "L-inf"; the implementation computes L2 over a 5-vec. Either fix the docstring or switch to np.max(np.abs(...)). The threshold 0.5 is interpreted very differently between the two.
_apply_on_target rotates only the object but doesn't re-check pusher/object collision. If the pusher was sampled close to the goal xy, forcing the object onto the goal could spawn it overlapping the pusher. Add a collision check or re-roll the pusher after on-target placement.
replay_inits pose-key fallback uses obstacle_level to disambiguate, good — but the _init_pose_key uses round(..., 6) while _load_replay_inits doesn't normalize. If the source JSON was written with different float precision, the resume check could miss matches. Consider standardizing.

Suggestions

Add a regression test for ZarrDemoWriter.commit_episode round-tripping through whatever loader egomimic/rldb/ uses for sim data, to confirm the one-chunk layout doesn't break training.
Add a unit test for _EPISODE_RE covering: circle, circle_small, stick, with and without tag, with multi-digit obstacle levels (obs19, obs29).
Fix the L-inf vs L2 inconsistency in replay drift — either the doc or the implementation, but pick one. Given the threshold 0.5 and the 5-vec including obj_theta in radians, L-inf seems more interpretable.
run_episode return tuple grew from 3 → 5 values. If this is called anywhere else (CI script, notebook), it will break. Quick grep recommended.
obstacles choices expanded [0,1,2,3] → range(30) — the level helpers go up to 29. Confirm OBSTACLE_LEVELS actually defines all 30 (the diff is truncated and only shows up to level 22). Levels referenced but not defined will crash at env init.
_BALANCE_MAX_REDRAWS = 200: on a heavily-skewed env this can silently overshoot a bucket by one. Worth logging when the cap is hit so it's not invisible.
gympusht_collect.py imports gym_pusht at module top — this will break import in the main emimic env if gym_pusht isn't installed. Move under if __name__ == "__main__" or lazy-import inside run() to keep the module importable for tooling.
The task_description JSON in env_args is convenient but the embodiment field is still pushshapes_sim for gym-pusht episodes — fine for now, but flag this for the dataloader's embodiment routing if these episodes are ever mixed with real PushShapesEnv data.

Verdict: Request Changes

The L-inf/L2 docstring mismatch and the unconfirmed dataloader compatibility for the bulk one-chunk layout are the blocking items. The bucket-balancing and replay-validation logic itself is well-structured and the shared _replay_step_loop is a nice refactor. Once (a) drift metric is reconciled, (b) a round-trip test through ZarrDataset confirms no training regression, and (c) the regex / obstacle-level expansion is tested, this should be good to land.

Reviewed by Claude · Review workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tsimulation: balanced data collection (circle_small + chunked replay)#497

Tsimulation: balanced data collection (circle_small + chunked replay)#497
ElmoPA wants to merge 1 commit into
hpt-hnet-pusher-nc3from
elmond/tsim-data-collection

ElmoPA commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ElmoPA commented Jun 14, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Jun 14, 2026

Claude Code Review

Review of PR #497: Tsimulation balanced data collection

Summary

Key concerns

Suggestions

Verdict: Request Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant