Skip to content

feat(policies): WBCPolicy provider for GR00T Whole-Body-Control (SONIC, ONNX)#483

Merged
cagataycali merged 14 commits into
strands-labs:mainfrom
rmncardoso:feat/wbc-policy
Jun 18, 2026
Merged

feat(policies): WBCPolicy provider for GR00T Whole-Body-Control (SONIC, ONNX)#483
cagataycali merged 14 commits into
strands-labs:mainfrom
rmncardoso:feat/wbc-policy

Conversation

@rmncardoso

@rmncardoso rmncardoso commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds WBCPolicy — a new non-VLA policy provider wrapping NVIDIA's GR00T
Whole-Body-Control (SONIC / decoupled-WBC) ONNX controllers for deploy-grade
Unitree G1 locomotion. Clean-room implementation against the upstream
reference (NVlabs/GR00T-WholeBodyControl,
decoupled_wbc/sim2mujoco/scripts/run_mujoco_gear_wbc.py +
resources/robots/g1/g1_gear_wbc.yaml). Closes #466.

Validated end-to-end against the real shipped weights
(GR00T-WholeBodyControl-{Balance,Walk}.onnx): through the torque-control deploy
harness the G1 produces a stable forward walk (~0.38 m/s for a 0.5 m/s
command, height held, no fall), and holds balance in place on a zero command.

What's included

  • strands_robots/policies/wbc/WBCPolicy (provider wbc, shorthand
    sonic), WBCConfig, the observation builder, and pure-NumPy control/quat
    helpers.
    • requires_images = False; reads the goal from the well-known kwargs
      (target_velocity = [vx, vy, omega], optional target_orientation for base
      RPY, optional height). Drives the 15 leg+waist DOFs; the 14 arm joints
      are held at defaults (upper-body composition is left to a future
      CompositePolicy, feat(policies): CompositePolicy - stack policies across DOF groups #468).
    • Faithful to the reference contract: 86-dim observation
      (command[7] + base_ang_vel[3] + projected_gravity[3] + qj[29] + dqj[29] + prev_action[15]) stacked over obs_history_len=6 → 516-wide network input;
      whole-body qj/dqj (all 29 joints, not just the 15 controlled); two ONNX
      sessions (policy + walk_policy) selected by norm(raw velocity) <= 0.05;
      PD→torque law via compute_torques(...); zero-warm-started history deque;
      quaternion math numerically identical to upstream (verified to 1e-10).
    • WBCConfig loads the upstream g1_gear_wbc.yaml directly (JSON or YAML;
      flat *_scale keys normalised). Joints resolved by name (handles the
      G1's leading floating_base_joint + interleaved arms).
    • No weights bundled — fetched/pointed-at at runtime under the NVIDIA Open
      Model License.
  • Precursor (feat(sim)): a per-call policy_kwargs channel on
    SimEngine.run_policy / start_policy (+ MuJoCo overrides), forwarded
    verbatim to policy.get_actions, so the docs(policies): clarify Policy ABC supports VLA + classical planners (#299) #300 well-known goal kwargs reach
    non-VLA providers (WBC, cuRobo, MoveIt2) through the local sim path — not just
    the mesh. Pinned by tests/simulation/test_policy_kwargs_forwarding.py.
  • Integration: registered in registry/policies.json; wbc/sonic added
    to the mesh + Device Connect policy-provider allowlist so WBC can be driven
    over tell() / Device Connect; new [wbc] extra
    (onnxruntime + pyyaml + huggingface_hub); NOTICE attribution.
  • Docs & example: docs/policies/wbc.md (+ mkdocs nav, README rows);
    examples/wbc_g1_torque_deploy.py — a torque-control deploy harness with an
    importable simulate_rollout (the real weights walk the G1).

Testing

  • 82 WBC unit tests (stub ONNX, no GPU): observation layout, PD/quat
    hand-values, walk-selection, history determinism, config/YAML loading, factory
    resolution, mesh allowlist, error paths — plus TestUpstreamFidelity pins
    transcribed from the upstream YAML/runner.
  • No-weights harness smoke test (tests/policies/wbc/test_torque_harness.py,
    runs in CI with mujoco): ground plane, torque actuators, observation shape,
    no fall-through.
  • Gated real-weights integration (tests_integ/policies/wbc/,
    WBC_LIVE=1 + checkpoint): ONNX I/O-dims match config, action shape, load.
    The gait-quality + standing-balance checks run the torque-deploy loop behind
    WBC_GAIT_CHECK=1 and pass with real weights.
  • format / lint / mypy clean; zero regressions vs main.

Reviewer notes / scope

  • Gait-quality tests are opt-in (WBC_GAIT_CHECK=1 + real checkpoint): they
    need the torque-control deploy environment, not sim.run_policy's
    position-servo path. The harness + the I/O-contract tests are the runnable
    evidence; a reviewer with the real weights can flip the flag to watch it walk.
  • policy_kwargs is scoped to the control/deploy path (run_policy /
    start_policy / mesh), not eval_policy / evaluate_benchmark (which are
    instruction-driven benchmark entry points). Static-velocity WBC eval works via
    the constructor target_velocity. Documented in docs/policies/wbc.md.
  • Pre-existing observation: the already-merged cosmos3 / curobo / moveit2
    providers are also absent from the mesh policy-provider allowlist; this PR only
    adds wbc/sonic (the providers it introduces) and leaves the broader gap for
    a follow-up.

Follow-up (not in this PR)

An optional SonicPlannerPolicy wrapping the GR00T-WBC kinematic planner
(planner_sonic.onnx: mode/target_velqpos[T,36]) — a different SONIC
component, complementary to this tracking controller. Tracked separately.


Review changelog

Round 1 — CodeQL code-scanning findings (commit d92904f)

Resolved both code-scanning alerts; no behavioral change, no API change.

  1. Empty except without comment (strands_robots/policies/wbc/policy.py,
    per-joint observation reader). Added an explanatory comment: a non-numeric
    observation entry intentionally degrades to its pre-zeroed neutral value
    (the full qj/dqj block is validated downstream), rather than aborting the
    observation build.
  2. Module imported with both import and import from
    (tests/policies/wbc/test_policy.py). Consolidated to a single top-level
    from strands_robots.policies.wbc import policy as wbc_policy; the
    monkeypatch (require_optional) and _looks_like_hf_repo_id heuristic
    checks now reference the module object consistently.

Validation: ruff check + ruff format --check clean; WBC unit suite
73 passed / 1 skipped.

@cagataycali

Copy link
Copy Markdown
Member

Test report (aarch64 / MuJoCo EGL headless)

Functionally this looks solid, but the lint gate (mypy strands_robots tests tests_integ from hatch run lint) is not cleantest-lint.yml will fail on 6 type errors in the PR's own test files. The PR description says "format / lint / mypy clean"; that does not reproduce here (mypy 1.20.2).

What passes

  • tests/policies/wbc/ + tests/simulation/test_policy_kwargs_forwarding.py: 82 passed (incl. the mujoco torque-harness smoke tests and the 4 kwargs-forwarding tests — all run, none skipped).
  • Provider wiring: wbc is in list_providers(); the sonic shorthand resolves to wbc via resolve_policy.
  • No regressions in tests/simulation/test_policy_runner.py (30 passed, 2 skipped) from the policy_runner.py / base.py / mujoco/simulation.py changes.
  • ruff check / ruff format --check clean on strands_robots/policies/wbc.

mypy failures (block CI)

tests_integ/policies/wbc/test_wbc_live.py:156: error: "Policy" has no attribute "policy_session"  [attr-defined]
tests_integ/policies/wbc/test_wbc_live.py:159,160,164,165: error: "Policy" has no attribute "config"  [attr-defined]
tests/policies/wbc/test_policy.py:661: error: Incompatible types in assignment (expression has type "list[float]", target has type "float")  [assignment]

Root causes:

  1. create_policy("wbc", ...) is typed to return the Policy base, so .policy_session / .config (WBCPolicy-specific) are invisible to mypy in test_real_onnx_io_dims_match_config.
  2. obs = {k: 0.0 for k in _g1_keys()} infers dict[str, float], then obs["base_ang_vel"] = [0.0, 0.0, 0.1] assigns a list into a float-valued dict.

Minimal fix (verified — Success: no issues found in 363 source files, and tests/policies/wbc/test_policy.py still 74 passed)

  • In test_real_onnx_io_dims_match_config, narrow after construction:
    policy = create_policy("wbc", checkpoint=CHECKPOINT, walk=True)
    assert isinstance(policy, WBCPolicy)   # add
    sess = policy.policy_session
    (WBCPolicy is already imported in this file.)
  • At test_policy.py:660, widen the annotation:
    obs: dict[str, object] = {k: 0.0 for k in _g1_keys()}

Other

Real-weights gait checks (WBC_LIVE=1 / WBC_GAIT_CHECK=1) were not exercised here — no GR00T-WBC ONNX checkpoint available — so this report covers the no-weights path only.

Comment thread strands_robots/policies/wbc/policy.py
Comment thread tests/policies/wbc/test_policy.py Outdated
@rmncardoso

Copy link
Copy Markdown
Collaborator Author

Rebased on main + fixed the CI dependency conflict

1. call-test-lint (fixed): failure was at pip install .[all,dev], not tests/lint — the [wbc] extra pinned huggingface_hub<1.0.0, conflicting with lerobot's >=1.0.0. Widened to >=0.20.0,<2.0.0 (matches lerobot); the full [all,dev] surface now resolves.

2. Merge conflict (resolved): rebased onto current main (incl. the #484 CHANGELOG restructure); WBC entry reconciled under ## [Unreleased]. Now MERGEABLE, 12 linear commits.

Re-verified locally: WBC + integration tests pass, ruff/mypy clean, zero new failures vs main. CI is held on the fork-PR workflow-approval gate — ready for a maintainer to approve + review.

rmncardoso and others added 14 commits June 18, 2026 00:30
…ctions

Non-VLA providers (cuRobo, MoveIt2, and the incoming WBCPolicy) read their
goal from the well-known strands-labs#300 kwargs (target_pose / target_joints /
target_velocity / world_update). The mesh tell() path already forwards these,
but the local-sim path (run_policy/start_policy -> PolicyRunner.run ->
get_actions) dropped them: get_actions was called positionally with no kwargs.

Add an explicit policy_kwargs dict (default None) threaded through
SimEngine.run_policy/start_policy, the MuJoCo overrides, and PolicyRunner.run,
forwarded verbatim to every policy.get_actions(obs, instruction,
**policy_kwargs) call. Explicit dict over **kwargs per the project's
'reject silently-dropped kwargs' convention. VLA providers ignore unknown
kwargs per strands-labs#300 so forwarding is always safe; None reproduces the historical
no-kwargs behaviour exactly.

Unblocks dynamic per-call goals for WBCPolicy (strands-labs#466) and fixes the same
latent gap for cuRobo/MoveIt2 driven through sim.run_policy. Pinned by
tests/simulation/test_policy_kwargs_forwarding.py.
…C, ONNX) (closes strands-labs#466)

Clean-room non-VLA locomotion provider wrapping NVIDIA's GR00T
Whole-Body-Control (SONIC / decoupled-WBC) ONNX controllers for deploy-grade
Unitree G1 locomotion. Drives the 15 leg+waist DOFs; arm joints held at
defaults (composition with an upper-body policy is left to a future
CompositePolicy, strands-labs#468).

New package strands_robots/policies/wbc/:
- config.py    WBCConfig dataclass (paths, gains, dims) + JSON loader, fail-fast
               on dimension mismatch.
- control.py   pure-NumPy pd_control / compute_targets / quat_rotate_inverse /
               projected_gravity (hand-verified, no torch/onnx).
- observation.py  build_single_frame (exact 86-dim layout) + ObservationHistory
               deque (warm-started, oldest-first stacking).
- policy.py    WBCPolicy(Policy): dual ONNX sessions loaded in __init__
               (centralised dep check; RuntimeError on missing onnxruntime /
               checkpoint - never silent zero torques), requires_images=False,
               reads target_velocity/target_orientation from strands-labs#300 kwargs,
               explicit WBC_G1_LEG_WAIST_JOINTS actuator<->15-dim mapping with
               validation in set_robot_state_keys (no positional guessing),
               compute_torques() public PD helper, reset() clears history.
               Checkpoint resolution: local path | HF download | cache.

Wiring:
- registry/policies.json: canonical 'wbc', shorthands ['wbc','sonic'].
- pyproject.toml: [wbc] extra = onnxruntime (light, no torch); added to [all];
                  onnxruntime in mypy ignore; 'wbc' pytest marker.
- NOTICE: attributes GR00T-WBC (Apache-2.0) + ONNX Runtime (MIT).
- docs/policies/wbc.md + mkdocs nav + README extras/provider rows.

Tests:
- tests/policies/wbc/test_policy.py: 45 unit tests with a stub ONNX session
  (no GPU/onnxruntime) - obs layout, PD/quat hand values, action shape,
  history deque, factory wbc/sonic resolution, missing-dep+missing-model
  RuntimeError, reset, mapping-table validation, prev-action feedback,
  checkpoint resolution.
- tests_integ/policies/wbc/test_wbc_live.py: gated (WBC_LIVE=1 + checkpoint)
  real-ONNX G1 forward-walk integration test; skips cleanly otherwise.

Builds on the policy_kwargs forwarding fix (ff5dbee) so per-call
target_velocity reaches get_actions through sim.run_policy.
…s-labs#466)

Tested against the REAL MuJoCo G1 (robot_descriptions) + REAL onnxruntime
sessions (not stubs), plus an adversarial code review. Fixes:

SHIP-BLOCKER - name-based joint resolution:
  The real sim's robot_joint_names('unitree_g1') leads with
  'floating_base_joint' (the free joint), so the leg+waist joints are at
  [1:16], not [0:15]. The old set_robot_state_keys validated keys[:15] ==
  WBC_G1_LEG_WAIST_JOINTS and RAISED against the real G1 - run_policy would
  have crashed. Now resolve the 15 WBC joints BY NAME within the robot's key
  list (any surrounding free-base/arm joints ignored), and read/emit by those
  resolved names. Also fixes the flat observation.state path, which now indexes
  by name instead of a positional [:n] slice. (was hidden because the unit-test
  fixture hard-coded leg+waist-first; fixture now uses the real layout.)

velocity feedback (adversarial strands-labs#4):
  The MuJoCo backend's unified observation is positions-only (no '<name>.vel',
  no observation.velocity), so WBC ran open-loop on velocity while a comment
  falsely claimed it 'converges'. Now read .vel by resolved name when present,
  and warn ONCE when no velocity signal exists (a dead channel can destabilise
  the gait) instead of silently pretending otherwise.

target_orientation overflow (adversarial strands-labs#1):
  A target_orientation longer than command_dim-3 overflowed the command block
  and raised on every tick - a documented kwarg that deterministically crashed.
  The command is now fitted to command_dim (truncate-with-debug-log).

num_actions guard (adversarial strands-labs#8):
  num_actions > len(WBC_G1_LEG_WAIST_JOINTS) silently truncated the 15-entry
  mapping table everywhere and failed late with a confusing zip error. Now
  rejected at construction with a clear message.

HF-id heuristic (adversarial strands-labs#6/strands-labs#7):
  Tightened to _looks_like_hf_repo_id (rejects ./ ../ absolute, backslash, .onnx,
  >1 slash, non-HF charset) so a non-existent relative dir isn't mistaken for a
  repo; log BEFORE the network download so an unexpected fetch is visible.

Integration test fixes (found by running it for real):
  - precondition asserted joint_names[:15] positionally (same floating_base
    bug) -> now checks WBC joints present by name.
  - get_body_state json block is content[1], not content[0] -> find it robustly.

Tests: 59 unit (was 49; +10 regressions incl. the floating_base ordering, the
velocity warning, target_orientation no-crash, num_actions guard, HF heuristic),
all green. 2 deterministic integration tests pass against real MuJoCo G1 + real
ONNX; gait-quality test correctly requires real SONIC weights. Zero regressions
vs. pristine baseline (verified: same 20 pre-existing macOS-GL failures on main).
format + lint + mypy clean.
…; dup-name first-wins

Second adversarial-review pass found two more issues in _read_joint_vector:

- strands-labs#1 (BUG): the flat observation.state / observation.velocity branch was gated
  on 'self._robot_state_keys' being set, so a direct-API / replay caller that
  passed observation.state but never called set_robot_state_keys got all-zero
  qj/dqj silently - diverging from cuRobo/MoveIt2, which consume observation.state
  unconditionally. Now: with keys -> name-resolved indexing (free-base/arm safe);
  without keys -> consume the flat vector positionally in WBC leg+waist order
  (the same contract the sibling planners use), zero-padding a short vector.

- strands-labs#8 (minor): name-resolved indexing used a dict comprehension (last-occurrence
  wins), so a duplicated joint name could shift the resolved slot. Now first
  occurrence wins via setdefault.

+2 regression tests (61 total in wbc+forwarding). format/lint/mypy clean.
…y_config static-walk path

Two edge cases verified during the second testing round (real onnxruntime +
stub introspection):
- history_len=3: each tick feeds a 258-wide stack whose blocks are
  [oldest..newest] over a rolling 3-frame window, warm-filled at tick 0.
- mesh tell()/policy_config path: create_policy('wbc', target_velocity=...)
  sets the static-walk default command (how a command reaches WBC without
  per-call policy_kwargs).

63 wbc+forwarding tests pass; format/lint/mypy clean.
…s#466

Reviewed against the ACTUAL source strands-labs#466 cites - NVlabs/GR00T-WholeBodyControl
decoupled_wbc/sim2mujoco/scripts/run_mujoco_gear_wbc.py + resources/robots/g1/
g1_gear_wbc.yaml (fetched from upstream main) - rather than the paraphrase in
the issue body or the SageMaker endpoint (which serves a different SONIC
component, the kinematic planner). The reference VINDICATES the core design
(single_obs_dim=86, two ONNX sessions, PD law, action_scale, deque history,
name-based G1 joints, quaternion math) but revealed concrete contract details
my first pass got wrong:

- Command block layout (the big one). compute_observation builds the 7-dim
  command as [vx*cmd_scale, vy*cmd_scale, omega*cmd_scale, height_cmd, roll,
  pitch, yaw] - NOT zero-padded velocity. Rewrote _resolve_command to produce
  exactly this; target_orientation now means base RPY (slots [4:7]) and a new
  'height' kwarg drives slot [3]. With real weights the old zero-padded command
  would have been wrong in 6 of 7 slots.
- Defaults corrected to g1_gear_wbc.yaml: obs_history_len 1 -> 6 (num_obs
  86 -> 516), ang_vel_scale 0.25 -> 0.5. Added cmd_scale [2.0,2.0,0.5] and
  height_cmd 0.74 config fields.
- Walk-vs-main selection: upstream uses norm(RAW loco_cmd) <= 0.05 -> main
  (standing) policy. Was norm(scaled command) > 1e-6. _run_session now takes
  the raw velocity and uses the 0.05 threshold (constant _WALK_VELOCITY_THRESHOLD).
- Quaternion math verified numerically IDENTICAL to upstream's conjugate-based
  quat_rotate_inverse (max diff 1.8e-15 over 2000 random quats).

Tests: +6 TestUpstreamFidelity pins (config matches yaml, command layout,
height default, 0.05 walk threshold on raw velocity, raw-not-scaled selection,
quat-matches-upstream-formula). 69 wbc+forwarding pass; 2 integration tests
pass against real onnxruntime+MuJoCo G1 with the corrected 516-wide input.
Docs updated with the exact observation layout + new kwargs. format/lint/mypy clean.
…onfig

Round-4 review (adversarial pass + validation against the REAL shipped weights,
which are in the upstream git: GR00T-WholeBodyControl-Balance.onnx /-Walk.onnx,
ONNX input [batch,516] output [batch,15] - confirming the corrected 516 default).

Observation warm-start (BUG, behavioral on the first obs_history_len-1 ticks):
  ObservationHistory pre-filled the deque with COPIES of the first frame; the
  upstream reference (run_mujoco_gear_wbc.py:47-50) pre-fills with ZERO frames
  and appends the live frame each tick. Since frame 0 carries gravity ~ -1 and
  height_cmd 0.74, the copies-vs-zeros difference was O(1) in the older history
  blocks during the first ~5 control steps of every episode (and post-reset).
  Now zero-warm-started to match upstream exactly; reset() re-seeds zeros. The
  old unit test + docstring asserted the WRONG behavior - both corrected.

Real-weights readiness - YAML config:
  WBCConfig.from_file now reads .yaml/.yml (via pyyaml, added to [wbc] extra)
  in addition to .json, so the upstream g1_gear_wbc.yaml loads directly.
  from_dict normalizes the upstream FLAT scale keys (ang_vel_scale/dof_pos_scale/
  dof_vel_scale) into obs_scales; an explicit obs_scales map still wins. Verified
  by loading the real upstream YAML (516/6, scales, cmd_scale, 15-len vectors).

Config validation: cmd_scale must be length 3 (the one previously-unguarded
field), matching the per-joint-vector guards.

Scope: documented that this targets the NON-gait reference (single_obs_dim 86,
7-wide command, two policies). The upstream gait variant (run_mujoco_gear_wbc_gait.py:
single_obs_dim 95, 8-wide command + clock + torso) is a separate embodiment,
explicitly out of scope.

Integration: added test_real_onnx_io_dims_match_config (pins ONNX input==num_obs,
output==num_actions). Reframed the gait-quality test as a deploy-fidelity check
gated behind WBC_GAIT_CHECK=1 - WBCPolicy emits POSITION targets; a real gait
needs the upstream TORQUE-PD loop (compute_torques) on a torque-actuator model,
which is a deploy concern beyond the policy. The 3 deterministic real-weights
integration tests pass; gait check skips with a clear explanation.

Tests: 73 wbc+forwarding pass (+4 config/YAML); zero regressions vs pristine
main (verified by failure-set diff). format/lint/mypy clean.
…d (critical)

Found while building the torque-deploy harness, by confronting the decisive
upstream math: single_obs_dim is hardcoded 86, but 7+3+3+15+15+15 = 58. The
only way to reach 86 is qj/dqj observing ALL joints (upstream n_joints = nq-7
= 29): 7+3+3+29+29+15 = 86. Re-reading compute_observation confirms
qj = qpos[7:7+n_joints], dqj = qvel[6:6+n_joints] with n_joints=29, while the
action block is the 15 leg+waist outputs.

My implementation used num_actions (15) for the qj/dqj blocks, so it populated
only 58 of 86 values and put dqj/prev_action in the WRONG slots - the arm-joint
observations (28 values) were ZEROS. The real ONNX accepts the 516 width so
earlier 'it runs' tests passed, but the network saw a malformed observation
(verified: frame[28], a real arm-joint reading, was 0 before and is 0.16 now).

Fix - separate the two joint sets:
- New WBCConfig.n_obs_joints (default 29) = observed whole-body joints; distinct
  from num_actions (15) = controlled leg+waist. Validates n_obs_joints>=num_actions.
- build_single_frame writes qj/dqj at n_obs_joints width, action at num_actions:
  [0:c] cmd, [c:c+3] omega, [c+3:c+6] grav, [c+6:c+6+no] qj, [c+6+no:c+6+2no] dqj,
  [c+6+2no:c+6+2no+na] action. default_angles (15) zero-padded to no for the
  qj subtraction, exactly as upstream padded_defaults.
- New WBC_G1_ALL_JOINTS (29, legs+waist+arms) = the observed-joint order; its
  first 15 are exactly WBC_G1_LEG_WAIST_JOINTS. set_robot_state_keys now resolves
  BOTH the controlled (15) and observed (29) sets by name; _extract_state reads
  all 29 via _read_joint_vector(names=...).

Re-validated end-to-end against the REAL GR00T-WholeBodyControl-Balance/Walk.onnx
(input 516) - arm-joint observations now populated; the upstream g1_gear_wbc.yaml
still loads (n_obs_joints defaults to 29). 76 unit tests pass (+ pins for the
29-vs-15 layout and n_obs_joints>=num_actions); 3 real-weights integration tests
pass. format/lint/mypy clean.
Adds examples/wbc_g1_torque_deploy.py: a standalone harness that reproduces the
upstream GearWbcController loop closely enough to drive the real
GR00T-WholeBodyControl-{Balance,Walk}.onnx weights and watch the G1 locomote -
the deploy-fidelity path that sim.run_policy (position-servo) can't exercise.

What it does:
- Loads the robot_descriptions G1 and converts its position-servo actuators to
  pure-torque motors in-process (+ adds a ground plane the bare robot MJCF lacks).
- Per physics tick: tau = policy.compute_torques(target, q, dq) on the 15 leg+
  waist motors; arms held with a stiff PD; mj_step.
- Every control_decimation (4) ticks: builds the whole-body 86-dim observation
  (29-joint qj/dqj + real joint velocities + base IMU) and queries WBCPolicy
  (Balance when standing, Walk when moving); the returned targets become the PD
  setpoints.
- Reports base x/z over the rollout, a verdict, and an optional MP4.

Validated with the real SONIC weights (all errors found were harness setup, not
WBCPolicy - the layout fix in the prior commit was the real bug):
- vx=0.5, 5s  -> base advances +1.89 m, height holds ~0.74 m (WALKED FORWARD)
- vx=1.0, 4s  -> +3.07 m (faster command, faster gait)
- vx=0 stand  -> holds balance in place (static-hold PD collapses by ~1s; the
  policy actively balances) - proving the controller is doing real work
- omega / vy  -> stays upright
Rendered a 167-frame MP4 of the forward walk (real motion, not static).

Docs: WBC page gains a 'Watching it walk' section pointing at the harness and
explaining the position-servo vs torque-deploy distinction; examples README row.
… + no-weights harness smoke test

The gait check was a permanently-skipped placeholder that asserted something the
position-servo sim.run_policy path cannot achieve. Replace it with tests that
actually validate the torque-deploy gait through the harness loop.

Harness refactor: extract a pure, importable simulate_rollout(policy, vx, ...,
duration) from the CLI run() - returns {x0,z0,x1,z1,forward,fell,steps,frames}.
The CLI and both tests now drive the IDENTICAL loop (no duplicated control code).

No-weights smoke test (tests/policies/wbc/test_torque_harness.py, runs in CI
with mujoco, stub ONNX):
- _build_torque_g1 adds a ground plane (REGRESSION: the bare robot_descriptions
  G1 MJCF has none -> the robot fell through space, z -> -18) and converts all
  actuators to pure-torque motors.
- _model_observation yields whole-body per-joint positions + .vel + base IMU.
- a short static-hold rollout (zero-action stub) stays above the floor and does
  not fall through the world; rollout returns the documented metric keys.

Real-weights gait test (gated WBC_GAIT_CHECK=1 + WBC_CHECKPOINT): now calls
simulate_rollout with the real Balance/Walk weights and asserts forward progress
+ no fall - and PASSES (verified: +1.5-1.9 m forward, height held). Added a
companion standing-balance test (zero command holds height). Both skip cleanly
without the env flag.

80 WBC unit tests pass (+4 harness smoke); gated gait+balance pass with real
weights; integration suite skips cleanly in CI; zero regressions; format/lint clean.
…, docs)

Final pre-PR review pass (whole changeset + adversarial review) found and fixed:

- PACKAGING (ship-blocker): the [wbc] extra was missing huggingface_hub, but
  the default create_policy('wbc') downloads nvidia/GEAR-SONIC via it - so a
  clean 'pip install strands-robots[wbc]' + zero-arg construct raised
  'huggingface_hub not installed (the [wbc] extra)' while [wbc] didn't carry it.
  Added huggingface_hub>=0.20,<1.0 to the extra.

- SECURITY/INTEGRATION (real gap): 'wbc'/'sonic' were absent from the mesh +
  Device Connect policy-provider allowlist (_DEFAULT_POLICY_TYPES), so
  tell(..., policy_provider='wbc') and the DC drivers rejected WBC at
  validate_command / is_safe_policy_provider. Added them + 2 regression tests
  (allowlist membership + validate_command accepts a wbc execute).

- DOCS (stale, safety-relevant): observation.py module docstring still described
  the OLD 15-wide qj/dqj / 58-padded layout; corrected to the real 29-joint
  whole-body layout (7+3+3+29+29+15=86, no padding) matching the function
  docstring and code. Reserved-tail comment clarified (no tail by default).

- CHANGELOG: added a WBCPolicy 'Unreleased' entry (matching the existing format).

- Scope note in docs/policies/wbc.md: policy_kwargs is on the control/deploy
  path (run_policy/start_policy/mesh), not eval_policy/evaluate_benchmark
  (instruction-driven); static-velocity WBC eval uses the constructor. This is
  a deliberate scope decision, now documented rather than silent.

- Cosmetics: WBC_G1_ALL_JOINTS added to policy.py __all__; _build_torque_g1
  docstring corrected (4-tuple, not a phantom 5-tuple).

Verified: 82 WBC unit tests pass; mesh security suite 231 pass; real-weights
integration (incl. gait + balance) 5 pass; full-repo regression diff vs
pristine main is EMPTY (zero new failures); format/lint/mypy clean.
CI dependency-resolution failure: the [wbc] extra pinned huggingface_hub
<1.0.0, but lerobot 0.5.x requires huggingface-hub>=1.0.0. Since [wbc] is pulled
into [all], 'pip install .[all,dev]' (what CI installs) hit ResolutionImpossible.

Widen to >=0.20.0,<2.0.0 (cap the major per AGENTS.md dependency-bounds rule),
matching lerobot's own <2.0.0 bound so the two co-resolve (verified: resolves to
huggingface-hub 1.19.0 + lerobot 0.5.1). The <1.0.0 cap was an over-tightening
introduced when adding huggingface_hub to the extra.
…t style)

Address two CodeQL code-scanning findings on the WBC policy PR:

- policy.py: the empty except (TypeError, ValueError) in the per-joint
  observation reader now carries an explanatory comment documenting that a
  non-numeric entry intentionally degrades to its pre-zeroed neutral value
  (the full qj/dqj block is validated downstream).
- test_policy.py: the wbc.policy module was imported both with 'import ... as'
  and 'from ... import' across test methods. Consolidate to a single top-level
  'from strands_robots.policies.wbc import policy as wbc_policy' and reference
  require_optional / _looks_like_hf_repo_id through the module object.

No behavioral change. 73 passed / 1 skipped; ruff check + format clean.
CI runs 'mypy strands_robots tests tests_integ' over the whole tree; my local
mypy was scoped to the wbc source files and missed 6 errors in the tests:

- tests_integ/policies/wbc/test_wbc_live.py: create_policy() is typed to return
  the base Policy, so .policy_session / .config (WBCPolicy attrs) tripped
  [attr-defined] (x5). Added the isinstance(policy, WBCPolicy) narrowing the
  other tests already use - also a genuinely correct assertion.
- tests/policies/wbc/test_policy.py: obs inferred as dict[str, float] from the
  comprehension, then assigned a list base_ang_vel -> [assignment]. Annotated
  that obs as dict[str, Any].

Verified with the CI-equivalent invocation: mypy clean across all wbc source +
test + integ files; ruff + format clean; 82 WBC tests still pass.
@rmncardoso

Copy link
Copy Markdown
Collaborator Author

Thanks — spot-on, all three addressed. My miss: I ran mypy scoped to wbc/ locally instead of the whole-tree check CI uses.

  • mypy: fixed as you suggested — assert isinstance(policy, WBCPolicy) in test_real_onnx_io_dims_match_config, and widened the obs annotation (used dict[str, Any] to match Policy.get_actions's signature; either works).
  • CHANGELOG conflict: rebased onto current main (incl. chore(release): collapse Unreleased CHANGELOG sections into dated v0.4.0 heading #484) and re-anchored the entry under [Unreleased]. The huggingface_hub<1.0.0↔lerobot conflict is also fixed (widened to <2.0.0).

Re-verified CI-style: mypy/ruff/format clean, 82 tests pass. FWIW the WBC_GAIT_CHECK=1 path against the real GR00T-WBC weights also passes (G1 walks ~1.5–1.9 m, holds balance) — just opt-in since it needs a checkpoint. Ready for a re-run.

@cagataycali cagataycali merged commit 7ec9e3c into strands-labs:main Jun 18, 2026
9 checks passed
@github-project-automation github-project-automation Bot moved this from In review to Done in Strands Labs - Robots Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

feat(policies): WBCPolicy provider for GR00T Whole-Body-Control (SONIC, ONNX)

3 participants