APF-SAC: Artificial Potential Field Guided Soft Actor-Critic for 6-DoF Mobile Manipulator Obstacle Avoidance
An ASCF Path Planning Method for a Hybrid Robot in Coating Inspection Scenarios
This repository implements a three-stage progressive training framework combining Artificial Potential Field (APF) guidance with Soft Actor-Critic (SAC) algorithm for real-time obstacle avoidance trajectory planning of 6-DoF mobile manipulators. The deployment phase integrates Control Barrier Function Quadratic Programming (CBF-QP) as a safety filter to guarantee collision-free operation at 100Hz.
- Algorithm: Stable Baselines3 SAC with automatic entropy adjustment
- Guidance Mechanism: Distance-dynamic APF guidance (adaptive potential field)
- Curriculum Learning: Three-stage progressive training strategy
- First 30%: Fixed scenarios (static obstacle positions)
- Middle 30%: Randomized scenarios (random obstacle initialization)
- Final 40%: Complex scenarios (dual dynamic obstacles)
- State Space: 29-dimensional (6 joint positions + APF gradient information + end-effector pose)
- Action Space: 6-dimensional continuous (joint velocity increments Δq̇)
- Safety Filter: Full multi-point CBF-QP constraint optimization running at 100Hz
- Protected Points: End-effector + Elbow + Wrist (3 critical collision points)
- Solver: CVXOPT QP optimizer with warm-starting
- Real-time Performance: Average solving time < 5ms per control loop
- Safety Radius: 0.2m (configurable padding distance)
├── APF_SAC_train.py # Main training script with curriculum learning
├── env.py # Training environment (PyBullet + APF)
├── env_test.py # Testing environment with CBF constraints
├── reward.py # Reward function shaping (collision + goal + smoothness)
├── test_apf_sac.py # Basic testing script (SAC only)
├── test_apf_sac_cbf.py # Safety-critical testing (SAC + CBF-QP filter)
├── deploy_filter.py # CBF safety filter wrapper
├── cbf_qp.py # CBF constraint formulation and QP solver
└── utils/ # Helper functions (optional)
└── config.py # Hyperparameter configurationBefore running
trainortest: Modify the URDF path and model paths inenv.pyandenv_test.pyto your absolute local paths:# In env.py / env_test.py, change: self.urdf_path = "/absolute/path/to/your/openarm_v10.urdf"
# Create conda environment (recommended)
conda create -n apf_sac python=3.8
conda activate apf_sac
# Install core dependencies
pip install stable-baselines3==2.0.0
pip install pybullet==3.2.5
pip install gymnasium==0.28.1
pip install numpy scipy matplotlib
pip install loguru # Logging utility
# Install CBF-QP solver (critical for deployment)
pip install cvxopt==1.3.0Basic training with default parameters:
python APF_SAC_train.py \
--mode train \
--episodes 6000 \
--save_dir ./models \
--log_dir ./logsCustom hyperparameters:
python APF_SAC_train.py \
--mode train \
--episodes 10000 \
--omega 4.0 \
--safety_radius 0.25 \
--curriculum True \
--save_dir ./checkpointsStandard testing (SAC policy only):
python test_apf_sac.py \
--model_path ./models/sac_final.zip \
--episodes 100 \
--render TrueSafety-critical testing (SAC + CBF-QP filter):
python test_apf_sac_cbf.py \
--model_path ./models/sac_final.zip \
--cbf_radius 0.2 \
--kappa1 1.25 \
--kappa2 1.25 \
--test_episodes 1000 \
--mode complex # complex, random, or fixed| Parameter | Default Value | Description |
|---|---|---|
CRITIC_LR |
3e-4 |
Critic network learning rate |
ACTOR_LR |
3e-4 |
Actor network learning rate |
BUFFER_SIZE |
200000 |
Experience replay buffer size |
BATCH_SIZE |
256 |
Mini-batch size for training |
ENT_COEF |
"auto_0.125" |
Automatic entropy coefficient tuning |
OMEGA |
4.0 |
APF guidance gain upper bound (λ_max) |
GAMMA |
0.99 |
Discount factor for reward |
TAU |
0.005 |
Soft update coefficient for target networks |
| Parameter | Default Value | Description |
|---|---|---|
safety_radius |
0.2 |
Safety padding radius |
kappa1 ( |
1.25 |
CBF class-$\mathcal{K}$ function parameter 1 |
kappa2 ( |
1.25 |
CBF class-$\mathcal{K}$ function parameter 2 |
control_bounds |
[-1.0, 1.0] |
Joint velocity increment limits (rad/s) |
protected_points |
['ee', 'elbow', 'wrist'] |
Critical collision check points |
- Training Convergence: ~5000 episodes for complex scenarios
- Inference Frequency: 100Hz (10ms control loop)
- CBF-QP Solving Time: <5ms average (cvxopt with warm-start)
- Success Rate: >95% in cluttered environments (obstacle density >0.3/m³)
If you use this code in your research, please cite the original paper:
@article{apf_sac_manipulator,
title={An ASCF Path Planning Method for a Hybrid Robot in Coating Inspection Scenarios},
author={Junhao Hu},
journal={Control and Decision (CCDC)},
year={2025},
publisher={Chinese Association of Automation}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Stable Baselines3 for the SAC implementation
- PyBullet for physics simulation
- CVXOPT for real-time QP optimization
- The CBF formulation follows the control barrier function theory from [Ames et al., 2019]