You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Threads data_schematic into build_tokenized_collate so proprio tokenization uses properly normalized values (via normalize_data) instead of assuming upstream normalization. This fixes a latent bug where raw proprio was being clipped to [-1,1] and binned.
Key concerns
MultiDataModuleWrapper signature — The diff updates the hydra.utils.instantiate call to pass data_schematic=data_schematic, but doesn't show the corresponding change in MultiDataModuleWrapper.__init__ or the wiring through to build_tokenized_collate. If the wrapper doesn't accept/forward data_schematic, this will fail at instantiate time. Please confirm that change is in this PR.
Silent skip of unregistered keys could mask config errors. When zarr_key_to_keyname returns None, the key is silently dropped. If allproprio_keys are unregistered for a given embodiment, raw is empty and the function returns None — no prompt state, no error. Given the hard-fail philosophy elsewhere in this function, consider warning or erroring when a key in proprio_keys is configured but unregistered for an active embodiment. At minimum, a one-time warning per (embodiment, key) would help catch config drift.
Order preservation relies on dict insertion order. The comment says "Iterate in raw insertion order (which mirrors proprio_keys ordering)" — this is correct in Py3.7+, but if normalize_data ever returns a re-ordered dict (or if a downstream refactor changes that), the bin layout in the prompt will silently shift and break trained checkpoints. Safer to iterate proprio_keys directly and look up the translated keyname:
Norm stats checkpoint compatibility — Any existing pi0.5 checkpoints trained with proprio=True under the old "assumes upstream normalization" path will have learned bins on a different value distribution. This silently invalidates prior runs that used this code path. Worth a callout in the PR description / release notes so nobody resumes a stale checkpoint.
Suggestions
Show / verify the MultiDataModuleWrapper.__init__ change in this PR.
Iterate proprio_keys (not raw.keys()) for the concat to make ordering robust to dict semantics.
Add at least one unit test: build a collate with a stub data_schematic, feed a sample with known stats, assert bins match a hand-computed expectation. Also test the three failure modes (data_schematic=None, norm_stats=None, missing embodiment).
Consider logging (once) when a proprio_keys entry is dropped because zarr_key_to_keyname returned None.
Add a PR description — this is a semantically meaningful change for any pi0.5 experiment in flight.
Verdict: Request Changes
Mainly to (a) confirm the MultiDataModuleWrapper signature change is included and (b) harden the ordering and add a minimal test, given this affects token alignment in a way that's hard to spot post-hoc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.