Skip to content

dyth/doublegum

Repository files navigation

DoubleGum

Code for Double Gumbel Q-Learning

[.pdf] [Reviews] [Poster (.png)] [5-min talk] [1-hour seminar] [Errata]

Data (5.4 MB): https://drive.google.com/file/d/12wyYZ92bvVdkEQIHms8mVR5zYJZue-cd/view?usp=sharing

Logs (4.21 GB): https://drive.google.com/file/d/1LpR3lrKUx-qTaCrI4YViAjc0QA5kb8P2/view?usp=sharing

Due to an accident in saving data, the logs are incomplete and do not contain data for Figs. 2 and 7.

Installation

On Python 3.9 with Cuda 12.2.1 and cudnn 8.8.0.

git clone git@github.com:dyth/doublegum.git
cd doublegum

create virtualenv

virtualenv <VIRTUALENV_LOCATION>/doublegum
source <VIRTUALENV_LOCATION>/doublegum

or conda

conda create --name doublegum python=3.9
conda activate doublegum

install mujoco

mkdir .mujoco
cd .mujoco
wget https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz
tar -xf mujoco210-linux-x86_64.tar.gz

install packages

pip install -r requirements.txt
pip install "jax[cuda12_pip]==0.4.14" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

test that the code runs

./test.sh

Continuous Control

main_cont.py --env <ENV_NAME> --policy <POLICY>

MetaWorld envs are run with --env MetaWorld_<ENVNAME>

Policies benchmarked in our paper were:

Policies we created/modified as additional benchmarks were:

  • QR-DDPG: QR-DDPG (Quantile Regression [Dabney et al., 2018] with DDPG, defaults to use Twin Critics)
  • QR-DDPG --ensemble 1: QR-DDPG without Twin Critics
  • SAC --ensemble 1: SAC without Twin Critics
  • XQL: XQL with Twin Critics
  • TD3 --ensemble 5 --pessimism <p>: Finer TD3, where p is an integer between 0 and 4

Policies included in this repository but not benchmarked in our paper were:

Discrete Control

main_disc.py --env <ENV_NAME> --policy <POLICY>

Policies benchmarked in our paper were:

Policies we created/modified as additional benchmarks were:

  • DuellingDDQN: DuellingDDQN (Duelling Double DQN)

Graphs and Tables

Reproduced using raw data from Data and Logs. Logs (4.21 GB) contains data for Figs. 1 and 6, while Data (5.4 MB) contains benchmark results for DoubleGum and baselines used in all other graphs, results and tables. Due to an accident in saving data, the logs are incomplete and do not contain data for Figs. 2 and 7.

Ran by

python plotting/fig<x>.py
python tables/tab<x>.py

Acknowledgements

About

NeurIPS 2023 Spotlight

Resources

License

Stars

Watchers

Forks