This is a PyTorch implementation of our IMCNet for unsupervised video object segmentation.
Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation.
Install deformable convolution (DCNv2). The MCM modele presents a feature alignment process based on deformable convolution.
bash ./models/libs/make.shOur MCM uses features from the adjacent frames to dynamically predict offsets of sampling convolution kernels (./models/libs/DCNv2/dcn_conv.py).
The training and testing experiments are conducted using PyTorch 1.8.1 with a single NVIDIA TITAN RTX GPU with 24GB Memory.
- python 3.8
- pytorch 1.8.1
- torchvision 0.9.1
Other minor Python modules can be installed by running
pip install opencv-python tqdm tensorboard -
DAVIS dataset: We use all the data in the train and validation subset of DAVIS 2016. However, please download DAVIS 2017 (Unsupervised 480p) to fit the code. Download Link
-
YouTube-VOS dataset: The training set of YouTube-VOS (2019 version) is used to train our IMCNet. A subset of the training set of YouTube-VOS selected 18K frames, which is obtained by sampling images containing a single object per sequence (
./dataloaders/ytvos_train.txt). We first pre-train our network for 200K iterations on the subset of YouTube-VOS (see Section III.B). -
DUTS dataset: DUTS-TR which is the training set of DUTS was used to train our IMCNet with our joint training strategy (see Section II.E in our paper).
-
Path configuration: Dataset path settings is in
./conf/global_settings.py.
DATASET_CONF = {
'davis2016': {
...,
db_root_dir = 'path to dataset',
...
},
'youtubevos2019': {
...,
db_root_dir = 'path to dataset',
...
},
...
}In datasets folder:
|--datasets
|--DAVIS2017
|--Annotations_unsupervised
|--480p
|--ImageSets
|--2016
|--JPEGImages
|--480p
|--YouTubeVOS
|--2019
|--train
|--Annotations
|--JPEGImages
|--DUTS
|--DUTS-TR
|--DUTS-TR-Image
|--DITS-TR-Mask
- Download the pretrained backbone (ResNet101) from Google Drive into
./checkpoints/prefolder. - The training process is divided into two stages. Stage 1: we first pre-train our network for 200K iterations on a subset of YouTube-VOS. Stage 2: we fine-tune the entire network on the training set of DAVIS 2016 and DUTS with our joint training strategy.
- Stage 1:
bash ./scripts/train_s1.sh- Stage 2:
bash ./scripts/train_s2.sh- Run
infer.pyto obtain binary segmentation results.
bash ./scripts/infer_davis.sh # DAVIS 2016
bash ./scripts/infer_davis_multi # DAVIS 2016 with multi-scale inference
bash ./scripts/infer_ytboj.sh # YouTube-Objects
bash ./scripts/infer_ytboj_multi.sh # YouTube-Objects with multi-scale inference- Run post CRF processing for results without multi-scale inference.
- The segmentation result on DAVIS 2016 val can be downloaded from Google Drive, and multi-scale inference can be downloaded from Google Drive.
- The segmentation result on Youtube-Objects can be downloaded from Google Drive, and multi-scale inference can be downloaded from Google Drive.
- Lin Xi, Weihai Chen, Xingming Wu, Zhong Liu and Zhengguo Li, "Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 6279-6292, Sept. 2022.
@ARTICLE{9751597,
author={Xi, Lin and Chen, Weihai and Wu, Xingming and Liu, Zhong and Li, Zhengguo},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation},
year={2022},
volume={32},
number={9},
pages={6279-6292}
}