Code for our paper PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav.
Ram Ramrakhya, Dhruv Batra, Erik Wijmans, Abhishek Das
PIRLNav is a two-stage learning scheme for IL pretraining on human demonstrations followed by RL-finetuning for ObjectNav. To enable successful RL finetuning, we present a two-stage learning scheme involving a critic-only learning phase first that gradually transitions over to training both the actor and critic.
Scaling laws of IL→RL for ObjectNav
Using this IL→RL training recipe, we present a rigorous empirical analysis of design choices. We study how RL-finetuning performance scales with the size of the IL pretraining dataset. We find that as we increase the size of the IL-pretraining dataset and get to high IL accuracies, the improvements from RL-finetuning are smaller, and that 90% of the performance of our best IL→RL policy can be achieved with less than half the number of IL demonstrations.
Read more in the paper.
Run the following commands:
git clone https://github.com/Ram81/pirlnav.git
git submodule update --init
conda create -n pirlnav python=3.7 cmake=3.14.0
cd habitat-sim/
pip install -r requirements.txt
./build.sh --headless
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
cd habitat-lab/
pip install -r requirements.txt
pip install -e habitat-lab
pip install -e habitat-baselines
pip install -e .
-
Download the HM3D dataset using the instructions here (download the full HM3D dataset for use with habitat)
-
Move the HM3D scene dataset or create a symlink at
data/scene_datasets/hm3d. -
Download the ObjectNav HM3D episode dataset from here.
You can use the following command to download the datasets to reproduce results reported in our paper.
git clone git@hf.co:datasets/axel81/pirlnav data/datasets/objectnav
| Dataset | Scene dataset | Split | Link | Dataset path |
|---|---|---|---|---|
| ObjectNav-HD | HM3D | 77k | objectnav_hm3d_hd | data/datasets/objectnav/objectnav_hm3d_hd/ |
| ObjectNav-SP | HM3D | 240k | objectnav_hm3d_sp | data/datasets/objectnav/objectnav_hm3d_sp/ |
| ObjectNav-FE | HM3D | 70k | objectnav_hm3d_fe | data/datasets/objectnav/objectnav_hm3d_fe/ |
The demonstration datasets released as part of this project are licensed under a Creative Commons Attribution-NonCommercial 4.0 License.
To train policies using OVRL pretrained RGB encoder, download the model weights from here and move to data/visual_encoders/.
The code requires the datasets in data folder in the following format:
├── habitat-web-baselines/
│ ├── data
│ │ ├── scene_datasets/
│ │ │ ├── hm3d/
│ │ │ │ ├── JeFG25nYj2p.glb
│ │ │ │ └── JeFG25nYj2p.navmesh
│ │ ├── datasets
│ │ │ ├── objectnav/
│ │ │ │ ├── objectnav_hm3d/
│ │ │ │ │ ├── objectnav_hm3d_hd/
│ │ │ │ │ │ ├── train/
│ │ │ │ │ ├── objectnav_hm3d_v1/
│ │ │ │ │ │ ├── train/
│ │ │ │ │ │ ├── val/
│ │ ├── visual_encodersFor training the behavior cloning policy on the ObjectGoal Navigation task use the following script:
sbatch scripts/1-objectnav-il.sh <dataset_name>where dataset_name can be objectnav_hm3d_hd, objectnav_hm3d_sp, or objectnav_hm3d_fe
For RL finetuning the behavior cloned policy on the ObjectGoal Navigation task use the following script:
sbatch scripts/2-objectnav-rl-ft.sh /path/to/initial/checkpointTo evaluate a checkpoint trained using behavior cloning checkpoint use the following command:
sbatch scripts/1-objectnav-il-eval.sh /path/to/checkpointFor evaluating a checkpoint trained using RL finetuning use the following command:
sbatch scripts/1-objectnav-rl-ft-eval.sh /path/to/checkpointWe provide best checkpoints for agents trained on ObjectNav task with imitation learning and RL finetuning. You can use the following checkpoints to reproduce results reported in our paper.
| Task | Checkpoint | Success Rate | SPL |
|---|---|---|---|
| 🆕ObjectNav | objectnav_il_hd.ckpt | 64.1 | 27.1 |
| 🆕ObjectNav | objectnav_rl_ft_hd.ckpt | 70.4 | 34.1 |
If you use this code in your research, please consider citing:
@inproceedings{ramrakhya2023pirlnav,
title={PIRLNav: Pretraining with Imitation and RL Finetuning for ObjectNav},
author={Ram Ramrakhya and Dhruv Batra and Erik Wijmans and Abhishek Das},
booktitle={CVPR},
year={2023},
}
