Add NVIDIA Warp environment support by Denys88 · Pull Request #342 · Denys88/rl_games

Denys88 · 2026-03-27T05:50:00Z

Summary

Add WarpVecEnv adapter for GPU-accelerated Warp environments (vecenv_type: WARP)
Add WarpCartPole example env with discrete and continuous actions, running on GPU via Warp kernels
Fix SAC has_cnn attribute bug in SACBuilder.Network.load()
Example PPO (discrete + continuous) and SAC configs for Warp CartPole

Relates to #341 — provides built-in Warp integration so users don't need custom IVecEnv wrappers.

Details

WarpVecEnv (rl_games/envs/warp_vecenv.py):

Handles Warp array <-> PyTorch tensor conversion
Supports both 4-tuple and 5-tuple step() returns
Proper done = terminated | truncated with time_outs info for value bootstrapping
Falls back gracefully if Warp is not installed

WarpCartPole (rl_games/envs/warp_cartpole.py):

Exact CartPole-v1 physics implemented as Warp GPU kernels
1024 parallel envs, ~8M env fps on RTX 5090
Auto-reset on termination/truncation
Discrete (left/right) and continuous (force) action modes

WSL2 note: Warp needs LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH on WSL2 to find the correct CUDA driver.

Test plan

PPO discrete CartPole: solved at epoch 173 (reward 515)
PPO continuous CartPole: trained 500 epochs
SAC continuous CartPole: trained 500 epochs
1024 parallel GPU envs working

🤖 Generated with Claude Code

- Add WarpVecEnv adapter in rl_games/envs/warp_vecenv.py - Register 'WARP' vecenv type for GPU-accelerated Warp environments - Handle Warp array <-> PyTorch tensor conversion via warp.to_torch() - Support both 4-tuple and 5-tuple step() returns - Add example PPO and SAC configs for Warp environments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add WarpCartPole: GPU-accelerated CartPole using Warp kernels - Supports both discrete (left/right) and continuous (force) actions - 1024 parallel envs on GPU with auto-reset - PPO discrete, PPO continuous, and SAC configs - Runner script with env registration example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix int64->int32 conversion for discrete actions in WarpCartPole - Pass torch tensors directly through WarpVecEnv (env handles conversion) - Fix SAC network missing has_cnn attribute (cherry-pick from PR #340) - All 3 configs tested: PPO discrete (solved!), PPO continuous, SAC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- GPU-accelerated Ant using Newton physics engine - MuJoCo solver with joint_f for actuation - 1024 parallel envs, ~600K env-steps/s - Physics working (contact, articulation, locomotion) - Training not yet converging — needs auto-reset and reward tuning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Implement per-world auto-reset for terminated/truncated episodes - Match MuJoCo Ant-v4 reward: forward_vel + healthy - 0.5*ctrl - Fix force scale to match nv_ant.xml gear=15 - Fix body_qd ordering (linear first, angular second) - Use joint_f for MuJoCo solver actuation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Tuned force scale to 50 (gear=15 * ~3.3x for Newton's joint_f). PPO training on 1024 GPU envs reaches reward 585 (max) / 125 (final). Clear learning curve from -1.7 to 232 over 1000 epochs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Local frame velocities via quat_rotate_inverse - DOF positions from eval_ik, normalized to [-1,1] - DOF velocities scaled - Previous actions in observation (31 dims total) - Forward velocity from position delta - action_scale * joint_gears for torques - Fix video rendering with eval_ik and Y/Z coord mapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace eval_ik with relative body positions (fixes memory leak) - Obs 47 dims: local frame body positions/velocities for all legs - horizon=8, mini_epochs=3, entropy=0 (per user suggestion) - Stable ~40K fps, reward plateaus at 20-30 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use position delta for forward_vel (body_qd unreliable with MuJoCo solver) - ctrl_cost=0.005, alive_bonus=0.05 (matches old working env) - Configurable solver_type: mujoco or xpbd - Simple 27-dim obs: z + quat + vel + angvel + leg relative positions - Training reaches reward 17 (steadily climbing over 2000 epochs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Use XY speed (sqrt(dx^2+dy^2)/dt) instead of just x velocity - The ant naturally moves in Y direction, not X - Register MuJoCo custom attributes on BOTH ant and replication builders - 10x forward_vel scaling for stronger locomotion signal - Training: reward 316 final, 938 max over 2000 epochs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- action_scale=10 matches MuJoCo gear=150 (was 15 in nv_ant.xml) - XY speed reward (ant moves in Y, not just X) - Wider termination z-range (0.2, 5.0) - register_custom_attributes on both ant and replication builders - MuJoCo reward: forward_vel + 1.0 healthy - 0.5 ctrl - Best: max reward 938 (XY speed run), 33 (gear=150 run) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

DenSumy and others added 3 commits March 26, 2026 23:00

Denys88 force-pushed the feature/warp-env-support branch from bc0520f to a493f99 Compare March 27, 2026 06:01

DenSumy and others added 8 commits March 27, 2026 08:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NVIDIA Warp environment support#342

Add NVIDIA Warp environment support#342
Denys88 wants to merge 11 commits intomasterfrom
feature/warp-env-support

Denys88 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Denys88 commented Mar 27, 2026

Summary

Details

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants