Skip to content

Add NVIDIA Warp environment support#342

Open
Denys88 wants to merge 11 commits intomasterfrom
feature/warp-env-support
Open

Add NVIDIA Warp environment support#342
Denys88 wants to merge 11 commits intomasterfrom
feature/warp-env-support

Conversation

@Denys88
Copy link
Copy Markdown
Owner

@Denys88 Denys88 commented Mar 27, 2026

Summary

  • Add WarpVecEnv adapter for GPU-accelerated Warp environments (vecenv_type: WARP)
  • Add WarpCartPole example env with discrete and continuous actions, running on GPU via Warp kernels
  • Fix SAC has_cnn attribute bug in SACBuilder.Network.load()
  • Example PPO (discrete + continuous) and SAC configs for Warp CartPole

Relates to #341 — provides built-in Warp integration so users don't need custom IVecEnv wrappers.

Details

WarpVecEnv (rl_games/envs/warp_vecenv.py):

  • Handles Warp array <-> PyTorch tensor conversion
  • Supports both 4-tuple and 5-tuple step() returns
  • Proper done = terminated | truncated with time_outs info for value bootstrapping
  • Falls back gracefully if Warp is not installed

WarpCartPole (rl_games/envs/warp_cartpole.py):

  • Exact CartPole-v1 physics implemented as Warp GPU kernels
  • 1024 parallel envs, ~8M env fps on RTX 5090
  • Auto-reset on termination/truncation
  • Discrete (left/right) and continuous (force) action modes

WSL2 note: Warp needs LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH on WSL2 to find the correct CUDA driver.

Test plan

  • PPO discrete CartPole: solved at epoch 173 (reward 515)
  • PPO continuous CartPole: trained 500 epochs
  • SAC continuous CartPole: trained 500 epochs
  • 1024 parallel GPU envs working

🤖 Generated with Claude Code

DenSumy and others added 3 commits March 26, 2026 23:00
- Add WarpVecEnv adapter in rl_games/envs/warp_vecenv.py
- Register 'WARP' vecenv type for GPU-accelerated Warp environments
- Handle Warp array <-> PyTorch tensor conversion via warp.to_torch()
- Support both 4-tuple and 5-tuple step() returns
- Add example PPO and SAC configs for Warp environments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add WarpCartPole: GPU-accelerated CartPole using Warp kernels
- Supports both discrete (left/right) and continuous (force) actions
- 1024 parallel envs on GPU with auto-reset
- PPO discrete, PPO continuous, and SAC configs
- Runner script with env registration example

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix int64->int32 conversion for discrete actions in WarpCartPole
- Pass torch tensors directly through WarpVecEnv (env handles conversion)
- Fix SAC network missing has_cnn attribute (cherry-pick from PR #340)
- All 3 configs tested: PPO discrete (solved!), PPO continuous, SAC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Denys88 Denys88 force-pushed the feature/warp-env-support branch from bc0520f to a493f99 Compare March 27, 2026 06:01
DenSumy and others added 8 commits March 27, 2026 08:32
- GPU-accelerated Ant using Newton physics engine
- MuJoCo solver with joint_f for actuation
- 1024 parallel envs, ~600K env-steps/s
- Physics working (contact, articulation, locomotion)
- Training not yet converging — needs auto-reset and reward tuning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Implement per-world auto-reset for terminated/truncated episodes
- Match MuJoCo Ant-v4 reward: forward_vel + healthy - 0.5*ctrl
- Fix force scale to match nv_ant.xml gear=15
- Fix body_qd ordering (linear first, angular second)
- Use joint_f for MuJoCo solver actuation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tuned force scale to 50 (gear=15 * ~3.3x for Newton's joint_f).
PPO training on 1024 GPU envs reaches reward 585 (max) / 125 (final).
Clear learning curve from -1.7 to 232 over 1000 epochs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Local frame velocities via quat_rotate_inverse
- DOF positions from eval_ik, normalized to [-1,1]
- DOF velocities scaled
- Previous actions in observation (31 dims total)
- Forward velocity from position delta
- action_scale * joint_gears for torques
- Fix video rendering with eval_ik and Y/Z coord mapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace eval_ik with relative body positions (fixes memory leak)
- Obs 47 dims: local frame body positions/velocities for all legs
- horizon=8, mini_epochs=3, entropy=0 (per user suggestion)
- Stable ~40K fps, reward plateaus at 20-30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use position delta for forward_vel (body_qd unreliable with MuJoCo solver)
- ctrl_cost=0.005, alive_bonus=0.05 (matches old working env)
- Configurable solver_type: mujoco or xpbd
- Simple 27-dim obs: z + quat + vel + angvel + leg relative positions
- Training reaches reward 17 (steadily climbing over 2000 epochs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use XY speed (sqrt(dx^2+dy^2)/dt) instead of just x velocity
- The ant naturally moves in Y direction, not X
- Register MuJoCo custom attributes on BOTH ant and replication builders
- 10x forward_vel scaling for stronger locomotion signal
- Training: reward 316 final, 938 max over 2000 epochs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- action_scale=10 matches MuJoCo gear=150 (was 15 in nv_ant.xml)
- XY speed reward (ant moves in Y, not just X)
- Wider termination z-range (0.2, 5.0)
- register_custom_attributes on both ant and replication builders
- MuJoCo reward: forward_vel + 1.0 healthy - 0.5 ctrl
- Best: max reward 938 (XY speed run), 33 (gear=150 run)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants