Pass norm stats to collate for proprio tong kenization by ElmoPA · Pull Request #400 · GaTech-RL2/EgoVerse

ElmoPA · 2026-05-08T04:22:32Z

No description provided.

ElmoPA · 2026-05-08T04:22:45Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2026-06-02T02:11:04Z

Claude Code Review

Summary

Threads data_schematic into build_tokenized_collate so proprio tokenization uses properly normalized values (via normalize_data) instead of assuming upstream normalization. This fixes a latent bug where raw proprio was being clipped to [-1,1] and binned.

Key concerns

MultiDataModuleWrapper signature — The diff updates the hydra.utils.instantiate call to pass data_schematic=data_schematic, but doesn't show the corresponding change in MultiDataModuleWrapper.__init__ or the wiring through to build_tokenized_collate. If the wrapper doesn't accept/forward data_schematic, this will fail at instantiate time. Please confirm that change is in this PR.
Silent skip of unregistered keys could mask config errors. When zarr_key_to_keyname returns None, the key is silently dropped. If all proprio_keys are unregistered for a given embodiment, raw is empty and the function returns None — no prompt state, no error. Given the hard-fail philosophy elsewhere in this function, consider warning or erroring when a key in proprio_keys is configured but unregistered for an active embodiment. At minimum, a one-time warning per (embodiment, key) would help catch config drift.
Order preservation relies on dict insertion order. The comment says "Iterate in raw insertion order (which mirrors proprio_keys ordering)" — this is correct in Py3.7+, but if normalize_data ever returns a re-ordered dict (or if a downstream refactor changes that), the bin layout in the prompt will silently shift and break trained checkpoints. Safer to iterate proprio_keys directly and look up the translated keyname:
```
for k in proprio_keys:
    keyname = keyname_to_zarr_inverse.get(k)
    if keyname is None or keyname not in normed:
        continue
    v = normed[keyname]...
```
Norm stats checkpoint compatibility — Any existing pi0.5 checkpoints trained with proprio=True under the old "assumes upstream normalization" path will have learned bins on a different value distribution. This silently invalidates prior runs that used this code path. Worth a callout in the PR description / release notes so nobody resumes a stale checkpoint.

Suggestions

Show / verify the MultiDataModuleWrapper.__init__ change in this PR.
Iterate proprio_keys (not raw.keys()) for the concat to make ordering robust to dict semantics.
Add at least one unit test: build a collate with a stub data_schematic, feed a sample with known stats, assert bins match a hand-computed expectation. Also test the three failure modes (data_schematic=None, norm_stats=None, missing embodiment).
Consider logging (once) when a proprio_keys entry is dropped because zarr_key_to_keyname returned None.
Add a PR description — this is a semantically meaningful change for any pi0.5 experiment in flight.

Verdict: Request Changes

Mainly to (a) confirm the MultiDataModuleWrapper signature change is included and (b) harden the ordering and add a minimal test, given this affects token alignment in a way that's hard to spot post-hoc.

Reviewed by Claude · Review workflow

ElmoPA marked this pull request as ready for review May 8, 2026 04:23

Pass norm stats to collate for proprio tong kenization

c044ed8

Boey-li force-pushed the elmo/norm-collate branch from 1b82282 to c044ed8 Compare June 2, 2026 02:10

rl2aloha mentioned this pull request Jun 2, 2026

rollout: config-driven viz, wristframe revert, HPT annotation support #482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass norm stats to collate for proprio tong kenization#400

Pass norm stats to collate for proprio tong kenization#400
ElmoPA wants to merge 1 commit into
mainfrom
elmo/norm-collate

ElmoPA commented May 8, 2026

Uh oh!

ElmoPA commented May 8, 2026 •

edited by rl2aloha

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ElmoPA commented May 8, 2026

Uh oh!

ElmoPA commented May 8, 2026 • edited by rl2aloha Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 2, 2026

Claude Code Review

Summary

Key concerns

Suggestions

Verdict: Request Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ElmoPA commented May 8, 2026 •

edited by rl2aloha

Loading