optional flag to generate validation videos for each task by RyanPCo · Pull Request #475 · GaTech-RL2/EgoVerse

RyanPCo · 2026-05-27T23:21:42Z

No description provided.

RyanPCo · 2026-05-27T23:21:56Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

remove rotation bounds checks for norm stats #480 : 2 dependent PRs (#481 , #490 )
optional viz videos on train dataset #476
optional flag to generate validation videos for each task #475 👈 (View in Graphite)
pi train human new #473
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-06-11T17:46:36Z

Claude Code Review

Summary

Adds an opt-in one_video_per_task mode to EvalVideo that buckets validation frames by (embodiment, task) and emits one mp4 per task, plumbs a per-sample task string from ZarrDataset through the PI batch processor, and ships a corresponding eval-only Mecka data config plus a small SQL reporting script.

Key concerns

task collation through the DataLoader is fragile. ZarrDataset.__getitem__ now adds data["task"] = self.metadata.get("task_name", "unknown") (a Python str). PyTorch's default collate will turn a batch of strings into a tuple[str, ...] of length B, which is what eval_video.py iterates over — good. But other consumers (e.g., anything that does batch[emb].to(device) or tensor-only assumptions) may now choke on a non-tensor key. In pi.py you guard with if "task" in _batch, but the loop right below it (if isinstance(value, torch.Tensor)) correctly skips non-tensors — confirm the same is true for act.py/hpt.py paths, otherwise non-PI runs may break. Worth a quick grep for .items() over processed_batch.
Key name mismatch: task_name vs task. The script top_tasks_by_hours.py groups by df["task"] (SQL column), but ZarrDataset reads self.metadata.get("task_name", ...). Please confirm the zarr attr is actually called task_name (matches CONTRIBUTING_DATA.md schema) — if the canonical name is task, this will silently produce "unknown" for every sample and you'll get a single unknown.mp4 per embodiment with no error.
Filter lambdas hard-code episode hashes inline. The mecka_pi_eval.yaml filter pins 5 specific episode_hash values via a lambda. This is fine for a one-off eval recipe, but (a) the hashes look truncated (24 chars vs the usual 26-char UTC-derived format YYYY-MM-DD-HH-MM-SS-ffffff) — please double-check these are real episode_hash values and not Mongo-style ids from another table, and (b) the comment maps them to task names — consider filtering by task directly so the config self-documents and survives episode re-ingestion.
max_frames_per_task=1000 default + no mid-flush. In per-task mode you removed the 1000-frame mid-flush "because limit_val_batches bounds it." With limit_val_batches=400 and batch_size=32 that's up to 12,800 frames per task before the cap. The 1000 cap saves you, but if a user sets max_frames_per_task=null to "disable" they'll buffer all 12k+ frames per task in CPU RAM as a Python list of tensors before the single torch.stack at on_validation_end. Worth a comment in the config that null is dangerous, or just keep a hard mid-flush ceiling.
CLAUDE.md committed to repo. Not a blocker, but confirm the team wants this tracked rather than gitignored — it duplicates CONTRIBUTING_DATA.md content and will drift.

Suggestions

In eval_video.py, after tasks = batch[embodiment_id].get("task"), assert len(tasks) == frames_tensor.shape[0] — silent mis-alignment here would assign frames to the wrong task bucket with no error.
The _sanitize_task fallback to "unknown" will collide all unlabeled episodes into one mp4. Consider including episode_hash[:8] in the filename when task is "unknown" so debugging is easier.
mecka_pi_eval.yaml: the valid_datasets block re-uses _target_ via interpolation but redeclares filters. Cleaner to use Hydra's ${...} for the resolver + override only filters and mode — which is what you did, good. But _target_: ${data.train_datasets.mecka_bimanual._target_} is unusual; just write the literal string to avoid resolver-order surprises with +evaluator=.
top_tasks_by_hours.py: add --lab / --embodiment as first-class flags rather than free-form --filter col=val — protects against typos like lab=Mecka silently returning empty.
The train filter and valid filter in mecka_pi_eval.yaml are identical. Either (a) the train side should be narrower (the docstring says "train side is unchanged (single task)" but it's not — it's all 5), or (b) update the comment. As written, train and valid sample the same episodes, which will produce optimistic eval numbers.

Verdict

Request Changes — primarily to resolve (2) the task_name vs task key check and (5) the train/valid filter mismatch with the stated intent. The core EvalVideo change is clean and well-scoped.

Reviewed by Claude · Review workflow

This was referenced May 27, 2026

pi train human new #473

Open

optional viz videos on train dataset #476

Open

add custom configs to specify tasks for viz #481

Closed

remove rotation bounds checks for norm stats #480

Open

RyanPCo mentioned this pull request Jun 5, 2026

6D rotation normalization to prevent large changes from euler #490

Open

RyanPCo force-pushed the ryanco/pi-train-human-new branch from c4a5205 to fd0e414 Compare June 6, 2026 21:45

RyanPCo force-pushed the ryanco/valid-per-task branch from 6fba0a3 to d6ea265 Compare June 6, 2026 21:45

RyanPCo force-pushed the ryanco/pi-train-human-new branch from fd0e414 to 4580ba3 Compare June 7, 2026 19:43

RyanPCo force-pushed the ryanco/valid-per-task branch from d6ea265 to 561fa3d Compare June 7, 2026 19:43

RyanPCo mentioned this pull request Jun 8, 2026

exp short state history memory #492

Draft

RyanPCo force-pushed the ryanco/pi-train-human-new branch from 4580ba3 to 9fcdebd Compare June 8, 2026 20:45

RyanPCo force-pushed the ryanco/valid-per-task branch from 561fa3d to 8ad9764 Compare June 8, 2026 20:45

RyanPCo marked this pull request as ready for review June 8, 2026 20:46

RyanPCo mentioned this pull request Jun 8, 2026

local: personal env overrides — Do not merge #493

Draft

optional flag to generate validation videos for each task

35ef260

RyanPCo force-pushed the ryanco/pi-train-human-new branch from 9fcdebd to c66446a Compare June 11, 2026 17:45

RyanPCo force-pushed the ryanco/valid-per-task branch from 8ad9764 to 35ef260 Compare June 11, 2026 17:45

This was referenced Jun 11, 2026

data-quality: episode review tool + mecka folding_clothes labels #494

Draft

exp task segment labels #498

Draft

better val metrics #499

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optional flag to generate validation videos for each task#475

optional flag to generate validation videos for each task#475
RyanPCo wants to merge 1 commit into
ryanco/pi-train-human-newfrom
ryanco/valid-per-task

RyanPCo commented May 27, 2026

Uh oh!

RyanPCo commented May 27, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyanPCo commented May 27, 2026

Uh oh!

RyanPCo commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Claude Code Review

Summary

Key concerns

Suggestions

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RyanPCo commented May 27, 2026 •

edited

Loading