A reinforcement learning project that trains a simulated robotic arm to reach random target positions using Proximal Policy Optimization (PPO).
This project uses a custom Gymnasium environment built on MuJoCo to simulate a KUKA iiwa robotic arm. The agent learns to control the arm's joints to reach randomly placed 3D target points, trained entirely from scratch using PPO from Stable-Baselines3.
The trained agent controls a 7-DOF robotic arm in a MuJoCo physics simulation, learning to minimize the distance between the end-effector and a randomly placed target.
RL---Robotics/
├── arm_reach_env.py # Custom Gymnasium environment (PyBullet simulation)
├── train.py # PPO training script
├── evaluate.py # Evaluate a trained model
├── requirements.txt # Python dependencies
├── models/ # Saved model checkpoints
│ └── ppo_arm_reach.zip # Pre-trained model
└── logs/ # Training logs (TensorBoard)
ArmReachEnv — a custom gym.Env wrapping PyBullet:
| Property | Value |
|---|---|
| Robot | KUKA iiwa (7 DOF) |
| Action Space | Continuous joint position targets, normalized to [-1, 1] |
| Observation Space | Joint positions + velocities + target position + end-effector position + distance |
| Episode Length | Max 500 steps |
| Success Condition | End-effector within 5cm of target |
Reward shaping:
- Distance penalty:
-distanceeach step - Progress bonus: reward for getting closer to the target
- Success bonus:
+100when within 5cm - Time penalty:
-0.01per step to encourage efficiency
- Python 3.8+
- (Optional) CUDA GPU for faster training
git clone https://github.com/vihaan-glitch/RL---Robotics.git
cd RL---Robotics
pip install -r requirements.txtpython train.pyTraining runs for 500,000 timesteps. Checkpoints are saved to models/ every 10,000 steps, and the best model is saved to models/best_model/.
Optional: Monitor training with TensorBoard:
pip install tensorboard
tensorboard --logdir logs/tensorboard/python evaluate.pyThis loads the trained model and runs it in the GUI-rendered simulation so you can watch the arm move.
- Stable-Baselines3 — PPO implementation
- MuJoCo — Physics simulation
- Gymnasium — RL environment interface
- NumPy
- Observation and reward normalization is applied via
VecNormalizefor training stability - Entropy coefficient (
ent_coef=0.01) encourages exploration early in training - The
__pycache__folder can be safely added to.gitignore - MuJoCo requires a valid installation — see MuJoCo installation guide
- Add obstacle avoidance
- Train on more complex manipulation tasks (grasp, push, stack)
- Deploy to a real robot using ROS
- Try SAC or TD3 for comparison
MIT