Skip to content

Tsimulation: balanced data collection (circle_small + chunked replay)#497

Open
ElmoPA wants to merge 1 commit into
hpt-hnet-pusher-nc3from
elmond/tsim-data-collection
Open

Tsimulation: balanced data collection (circle_small + chunked replay)#497
ElmoPA wants to merge 1 commit into
hpt-hnet-pusher-nc3from
elmond/tsim-data-collection

Conversation

@ElmoPA

@ElmoPA ElmoPA commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Stacked on top of #477. Adds the data-collection updates for the circle_small pusher and the bulk one-chunk-per-array Zarr layout.

  • scripted_collect.py: BucketTracker + per-frame replay drift validation against a recorded episode (DEFAULT_REPLAY_DRIFT_THRESHOLD=0.5), sharing _replay_step_loop with examples/replay_zarr.py. Episodes are rejected with explicit reasons (bucket_full / low_coverage / replay_drift).
  • collect/balance.py (new): coverage-bucket helpers (BucketTracker, bucket_for, count_existing_buckets, N_BUCKETS).
  • collect/gympusht_collect.py (new): gym-pushT collection front-end.
  • mouse_collect.py: extended manual recording flow with bucket awareness.
  • zarr_writer.py: one-chunk-per-array bulk-write path (matches the layout produced by scripts/rechunk_zarr_dataset.py).
  • pushshapes/: circle_small pusher (env.py, shapes.py, render.py) + obstacle-level definitions (obstacles.py).
  • examples/play_random.py: --pusher circle_small choice.
  • examples/replay_zarr.py: factored _replay_step_loop with early-stop drift check (shared with scripted_collect).
  • README.md + SCHEMA_NOTES.md: document the compact one-chunk bulk-write layout.

13 files changed, +1753 / -91.

Test plan

  • python -m Tsimulation.examples.play_random --pusher circle_small runs without error
  • python -m Tsimulation.collect.scripted_collect --num-episodes 10 produces a balanced bucket distribution
  • python -m Tsimulation.examples.replay_zarr <ep.zarr> reports drift_max < 0.5 on a freshly collected episode
  • Bulk-written episodes load through the existing ZarrDataset / packed dataloader with no schema drift

🤖 Generated with Claude Code

- scripted_collect: BucketTracker + per-frame replay drift validation
  (DEFAULT_REPLAY_DRIFT_THRESHOLD=0.5, shared _replay_step_loop
  with replay_zarr); rejection reasons surfaced to caller.
- collect/balance.py: bucket assignment helpers used to balance scripted
  collections across coverage buckets.
- collect/gympusht_collect.py: gym-pushT collection front-end.
- mouse_collect: expanded recording flow with bucket awareness.
- zarr_writer: one-chunk-per-array bulk-write path (matches
  scripts/rechunk_zarr_dataset.py output).
- pushshapes/obstacles.py: obstacle-level definitions feeding new
  collection variants.
- pushshapes/env.py, shapes.py, render.py: support circle_small pusher
  and align rendering with new shape set.
- examples/play_random.py: '--pusher circle_small' option.
- examples/replay_zarr.py: _replay_step_loop with early-stop drift check.
- README + SCHEMA_NOTES: document the compact one-chunk layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Claude Code Review

Review of PR #497: Tsimulation balanced data collection

Summary

Adds bucket-balanced data collection (16-cell quadrant grid), a circle_small pusher, replay-drift validation against recorded episodes, and a bulk one-chunk-per-array zarr layout for new collections. Code is scoped to Tsimulation/ and does not touch core EgoVerse training paths.

Key concerns

  1. One-chunk-per-array layout for episodes — verify dataloader compatibility. The PR claims ZarrDataset / packed dataloader handles this "with no schema drift," but this needs to be confirmed by actually loading a bulk-written episode through egomimic/rldb/. Random access for image arrays stored as a single chunk requires decoding the whole chunk to get one frame — this is the opposite of the prior (1,) chunked image arrays where frames decode independently. Check whether ZarrDataset iteration triggers full-episode reads (memory + I/O regression at train time).

  2. SCHEMA_NOTES.md change implies "one chunk for the whole image array" — this contradicts the prior "frames can be decoded independently" guarantee. If the dataloader uses random per-step indexing into image arrays, this will be significantly slower. Please benchmark dataloader throughput on a bulk-written vs. legacy episode.

  3. Episode filename regex broadened to non-greedy .+?. The new _EPISODE_RE = r"^episode_.+?_obs\d+_(\d+)\.zarr$" is correct for circle_small but is also strictly looser than the old [A-Za-z0-9]+_[A-Za-z0-9]+. Add a unit test that exercises circle_small, stick, and tagged variants (_ontarget_, _gympusht_) to lock the index extraction behavior.

  4. bucket_full rejection happens after env.reset(seed=seed) but before start_episode — fine, but note that the rejected reset still consumes a seed counter, which is desirable for reproducibility. Confirm intentional.

  5. Replay drift validation uses L-inf in docstring but L2 in _replay_step_loop (np.linalg.norm is L2). The scripted_collect.py docstring and --replay-drift-threshold help text say "L-inf"; the implementation computes L2 over a 5-vec. Either fix the docstring or switch to np.max(np.abs(...)). The threshold 0.5 is interpreted very differently between the two.

  6. _apply_on_target rotates only the object but doesn't re-check pusher/object collision. If the pusher was sampled close to the goal xy, forcing the object onto the goal could spawn it overlapping the pusher. Add a collision check or re-roll the pusher after on-target placement.

  7. replay_inits pose-key fallback uses obstacle_level to disambiguate, good — but the _init_pose_key uses round(..., 6) while _load_replay_inits doesn't normalize. If the source JSON was written with different float precision, the resume check could miss matches. Consider standardizing.

Suggestions

  • Add a regression test for ZarrDemoWriter.commit_episode round-tripping through whatever loader egomimic/rldb/ uses for sim data, to confirm the one-chunk layout doesn't break training.
  • Add a unit test for _EPISODE_RE covering: circle, circle_small, stick, with and without tag, with multi-digit obstacle levels (obs19, obs29).
  • Fix the L-inf vs L2 inconsistency in replay drift — either the doc or the implementation, but pick one. Given the threshold 0.5 and the 5-vec including obj_theta in radians, L-inf seems more interpretable.
  • run_episode return tuple grew from 3 → 5 values. If this is called anywhere else (CI script, notebook), it will break. Quick grep recommended.
  • obstacles choices expanded [0,1,2,3]range(30) — the level helpers go up to 29. Confirm OBSTACLE_LEVELS actually defines all 30 (the diff is truncated and only shows up to level 22). Levels referenced but not defined will crash at env init.
  • _BALANCE_MAX_REDRAWS = 200: on a heavily-skewed env this can silently overshoot a bucket by one. Worth logging when the cap is hit so it's not invisible.
  • gympusht_collect.py imports gym_pusht at module top — this will break import in the main emimic env if gym_pusht isn't installed. Move under if __name__ == "__main__" or lazy-import inside run() to keep the module importable for tooling.
  • The task_description JSON in env_args is convenient but the embodiment field is still pushshapes_sim for gym-pusht episodes — fine for now, but flag this for the dataloader's embodiment routing if these episodes are ever mixed with real PushShapesEnv data.

Verdict: Request Changes

The L-inf/L2 docstring mismatch and the unconfirmed dataloader compatibility for the bulk one-chunk layout are the blocking items. The bucket-balancing and replay-validation logic itself is well-structured and the shared _replay_step_loop is a nice refactor. Once (a) drift metric is reconciled, (b) a round-trip test through ZarrDataset confirms no training regression, and (c) the regex / obstacle-level expansion is tested, this should be good to land.


Reviewed by Claude · Review workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant