Open
Conversation
- Add WarpVecEnv adapter in rl_games/envs/warp_vecenv.py - Register 'WARP' vecenv type for GPU-accelerated Warp environments - Handle Warp array <-> PyTorch tensor conversion via warp.to_torch() - Support both 4-tuple and 5-tuple step() returns - Add example PPO and SAC configs for Warp environments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add WarpCartPole: GPU-accelerated CartPole using Warp kernels - Supports both discrete (left/right) and continuous (force) actions - 1024 parallel envs on GPU with auto-reset - PPO discrete, PPO continuous, and SAC configs - Runner script with env registration example Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix int64->int32 conversion for discrete actions in WarpCartPole - Pass torch tensors directly through WarpVecEnv (env handles conversion) - Fix SAC network missing has_cnn attribute (cherry-pick from PR #340) - All 3 configs tested: PPO discrete (solved!), PPO continuous, SAC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bc0520f to
a493f99
Compare
- GPU-accelerated Ant using Newton physics engine - MuJoCo solver with joint_f for actuation - 1024 parallel envs, ~600K env-steps/s - Physics working (contact, articulation, locomotion) - Training not yet converging — needs auto-reset and reward tuning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Implement per-world auto-reset for terminated/truncated episodes - Match MuJoCo Ant-v4 reward: forward_vel + healthy - 0.5*ctrl - Fix force scale to match nv_ant.xml gear=15 - Fix body_qd ordering (linear first, angular second) - Use joint_f for MuJoCo solver actuation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tuned force scale to 50 (gear=15 * ~3.3x for Newton's joint_f). PPO training on 1024 GPU envs reaches reward 585 (max) / 125 (final). Clear learning curve from -1.7 to 232 over 1000 epochs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Local frame velocities via quat_rotate_inverse - DOF positions from eval_ik, normalized to [-1,1] - DOF velocities scaled - Previous actions in observation (31 dims total) - Forward velocity from position delta - action_scale * joint_gears for torques - Fix video rendering with eval_ik and Y/Z coord mapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace eval_ik with relative body positions (fixes memory leak) - Obs 47 dims: local frame body positions/velocities for all legs - horizon=8, mini_epochs=3, entropy=0 (per user suggestion) - Stable ~40K fps, reward plateaus at 20-30 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use position delta for forward_vel (body_qd unreliable with MuJoCo solver) - ctrl_cost=0.005, alive_bonus=0.05 (matches old working env) - Configurable solver_type: mujoco or xpbd - Simple 27-dim obs: z + quat + vel + angvel + leg relative positions - Training reaches reward 17 (steadily climbing over 2000 epochs) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use XY speed (sqrt(dx^2+dy^2)/dt) instead of just x velocity - The ant naturally moves in Y direction, not X - Register MuJoCo custom attributes on BOTH ant and replication builders - 10x forward_vel scaling for stronger locomotion signal - Training: reward 316 final, 938 max over 2000 epochs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- action_scale=10 matches MuJoCo gear=150 (was 15 in nv_ant.xml) - XY speed reward (ant moves in Y, not just X) - Wider termination z-range (0.2, 5.0) - register_custom_attributes on both ant and replication builders - MuJoCo reward: forward_vel + 1.0 healthy - 0.5 ctrl - Best: max reward 938 (XY speed run), 33 (gear=150 run) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WarpVecEnvadapter for GPU-accelerated Warp environments (vecenv_type: WARP)WarpCartPoleexample env with discrete and continuous actions, running on GPU via Warp kernelshas_cnnattribute bug inSACBuilder.Network.load()Relates to #341 — provides built-in Warp integration so users don't need custom IVecEnv wrappers.
Details
WarpVecEnv (
rl_games/envs/warp_vecenv.py):step()returnsdone = terminated | truncatedwithtime_outsinfo for value bootstrappingWarpCartPole (
rl_games/envs/warp_cartpole.py):WSL2 note: Warp needs
LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATHon WSL2 to find the correct CUDA driver.Test plan
🤖 Generated with Claude Code