GNN-based-DRL

Towards Practical PPO: Implementation and Validation of 8 PPO Optimization Methods Based on SB3

Proximal Policy Optimization (PPO) is a widely used algorithm in the field of reinforcement learning, and Stable Baselines3 (SB3) provides an efficient basic implementation for it. However, there is still room for optimization in practical application scenarios such as complex environment adaptation, convergence speed, and stability. This paper aims to extend the PPO framework of SB3 by introducing 8 targeted improvement tricks to enhance algorithm performance, including dynamic clip_range adjustment (linear scheduling/KL divergence-based adaptive adjustment), Dual-Clip, Entropy decay, Winsorization (advantage clipping and normalization), PopArt value network normalization, Policy regularization, Actor-Critic network layer sharing (Totally Split/Deeply share with only different heads/Half share half split), and value function clipping (clip_range_vf). Experiments verified in typical environments such as CartPole-v1, LunarLander-v3, and MountainCarContinuous-v0 show that most tricks are effective: dynamic clip_range, Dual-Clip, Entropy decay, Actor-Critic deep sharing, and Policy regularization in specific scenarios can significantly accelerate convergence speed, improve reward peaks, and enhance training stability; Winsorization and PopArt have no obvious improvement effects but do not impair basic performance; value function clipping optimizes the rationality of value estimation. The extended PPO framework in this paper enriches the functional options of SB3, enhances the algorithm's adaptability to different scenarios, and provides a more flexible and efficient solution for the practical application of reinforcement learning.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
example_project/graph-jsp-env-master		example_project/graph-jsp-env-master
logs		logs
monitor_logs		monitor_logs
results		results
stable_baselines3		stable_baselines3
tensorboard_logs		tensorboard_logs
README.md		README.md
Towards Practical PPO: Implementation and Validation of 8 PPO Optimization Methods Based on SB3.pdf		Towards Practical PPO: Implementation and Validation of 8 PPO Optimization Methods Based on SB3.pdf
data.zip		data.zip
example_jsp.py		example_jsp.py
gcn_mask_ppo.py		gcn_mask_ppo.py
maskable_ppo.py		maskable_ppo.py
ppo_tricks.py		ppo_tricks.py
stable_baselines_masked_ppo.py		stable_baselines_masked_ppo.py
test_diff_ppos.py		test_diff_ppos.py
test_stable_baseline.py		test_stable_baseline.py
torch_geometric_benchmark.py		torch_geometric_benchmark.py
training_metrics_gnn.json		training_metrics_gnn.json
training_metrics_mlp.json		training_metrics_mlp.json
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GNN-based-DRL

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GNN-based-DRL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages