RL Robotics — Robotic Arm Reach with PPO

A reinforcement learning project that trains a simulated robotic arm to reach random target positions using Proximal Policy Optimization (PPO).

Overview

This project uses a custom Gymnasium environment built on MuJoCo to simulate a KUKA iiwa robotic arm. The agent learns to control the arm's joints to reach randomly placed 3D target points, trained entirely from scratch using PPO from Stable-Baselines3.

Demo

The trained agent controls a 7-DOF robotic arm in a MuJoCo physics simulation, learning to minimize the distance between the end-effector and a randomly placed target.

Project Structure

RL---Robotics/
├── arm_reach_env.py       # Custom Gymnasium environment (PyBullet simulation)
├── train.py               # PPO training script
├── evaluate.py            # Evaluate a trained model
├── requirements.txt       # Python dependencies
├── models/                # Saved model checkpoints
│   └── ppo_arm_reach.zip  # Pre-trained model
└── logs/                  # Training logs (TensorBoard)

Environment Details

ArmReachEnv — a custom gym.Env wrapping PyBullet:

Property	Value
Robot	KUKA iiwa (7 DOF)
Action Space	Continuous joint position targets, normalized to `[-1, 1]`
Observation Space	Joint positions + velocities + target position + end-effector position + distance
Episode Length	Max 500 steps
Success Condition	End-effector within 5cm of target

Reward shaping:

Distance penalty: -distance each step
Progress bonus: reward for getting closer to the target
Success bonus: +100 when within 5cm
Time penalty: -0.01 per step to encourage efficiency

Getting Started

Prerequisites

Python 3.8+
(Optional) CUDA GPU for faster training

Installation

git clone https://github.com/vihaan-glitch/RL---Robotics.git
cd RL---Robotics
pip install -r requirements.txt

Train

python train.py

Training runs for 500,000 timesteps. Checkpoints are saved to models/ every 10,000 steps, and the best model is saved to models/best_model/.

Optional: Monitor training with TensorBoard:

pip install tensorboard
tensorboard --logdir logs/tensorboard/

Evaluate

python evaluate.py

This loads the trained model and runs it in the GUI-rendered simulation so you can watch the arm move.

Dependencies

Stable-Baselines3 — PPO implementation
MuJoCo — Physics simulation
Gymnasium — RL environment interface
NumPy

Training Notes

Observation and reward normalization is applied via VecNormalize for training stability
Entropy coefficient (ent_coef=0.01) encourages exploration early in training
The __pycache__ folder can be safely added to .gitignore
MuJoCo requires a valid installation — see MuJoCo installation guide

Future Ideas

Add obstacle avoidance
Train on more complex manipulation tasks (grasp, push, stack)
Deploy to a real robot using ROS
Try SAC or TD3 for comparison

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
figures		figures
logs		logs
models		models
pybullet-env		pybullet-env
results		results
results_hard		results_hard
.gitignore		.gitignore
MUJOCO_LOG.TXT		MUJOCO_LOG.TXT
README.md		README.md
arm_reach_env.py		arm_reach_env.py
evaluate.py		evaluate.py
evaluate_ablation_study.py		evaluate_ablation_study.py
generate_figures.py		generate_figures.py
ppo_arm_reach.zip		ppo_arm_reach.zip
requirements.txt		requirements.txt
results_summary.md		results_summary.md
run_ablation_study.py		run_ablation_study.py
test_env.py		test_env.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Robotics — Robotic Arm Reach with PPO

Overview

Demo

Project Structure

Environment Details

Getting Started

Prerequisites

Installation

Train

Evaluate

Dependencies

Training Notes

Future Ideas

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Robotics — Robotic Arm Reach with PPO

Overview

Demo

Project Structure

Environment Details

Getting Started

Prerequisites

Installation

Train

Evaluate

Dependencies

Training Notes

Future Ideas

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages