Skip to content

robot-learning-freiburg/BALViT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters

LiDAR semantic segmentation traditionally relies on fully supervised training, requiring large-scale labeled datasets that are expensive to acquire. Unlike images, self-supervised pre-training for LiDAR remains difficult due to limited data diversity and strong sensor-specific biases. Existing cross-modal transfer methods mitigate this gap by leveraging pre-trained image models, but still require training LiDAR backbones from scratch and tightly synchronized, accurately calibrated camera–LiDAR pairs.

BALViT overcomes these limitations by directly adapting frozen vision foundation models to LiDAR. We tailor only the patch embedding, decoder, and a lightweight 2D–3D adapter, enabling seamless integration of LiDAR geometry into a pre-trained ViT. Through bidirectional feature exchange between range-view and BEV representations inside the frozen backbone, BALViT preserves the amodal reasoning capabilities of vision transformers while drastically reducing the need for labeled LiDAR data.

Overview of BALViT Architecture

This repository contains the PyTorch re-implementation of our IROS 2025 paper Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters.

If you find this code useful for your research, we kindly ask you to consider citing our papers:

@article{hindel2025label,
  title={Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters},
  author={Hindel, Julia and Mohan, Rohit and Bratulic, Jelena and Cattaneo, Daniele and Brox, Thomas and Valada, Abhinav},
  journal={arXiv preprint arXiv:2503.03299},
  year={2025}
}

System Requirements

  • Linux
  • Python 3.9
  • PyTorch 2.4
  • CUDA 11.7
  • GCC 7 or higher

IMPORTANT NOTE: These requirements are not necessarily mandatory. However, we have only tested the code under the above settings and cannot provide support for other setups.

Installation

1. Environment Setup

We provide a pre-configured Conda environment with all required dependencies.

conda env create -f environment.yaml
conda activate balvit

2. Build Deformable Attention Ops

The Deformable Attention module must be compiled before running the code.

cd balvit/models/vit_adapter/ops
sh make.sh
cd -

Dataset Preparation

SemanticKITTI

  1. Download the dataset from the official website.
  2. Organize the directory as:
$DATA_ROOT/
└── sequences/
    ├── 00/
    ├── 01/
    ├── ...
    └── 21/
  1. Place label files under each sequence directory following the official SemanticKITTI format.

nuScenes

Follow the official nuScenes LiDAR semantic segmentation preprocessing instructions and ensure the converted files are stored under:

$DATA_ROOT/
└── nuscenes/
    ├── lidarseg/
    └── samples/

Usage

Training

To train the model:

torchrun --nproc_per_node=$NUM_GPUS --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    $CONFIG_FILE \
    --data_root=$DATA_ROOT \
    --save_path=$SAVE_PATH

Parameters:

  • $NUM_GPUS: Number of GPUs to use
  • $CONFIG_FILE: Path to config file (e.g., config/kitti.yaml)
  • $DATA_ROOT: Path to dataset sequences directory
  • $SAVE_PATH: Directory to save checkpoints and logs

Example:

torchrun --nproc_per_node=1 --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    config/kitti.yaml \
    --data_root=/path/to/semantickitti/dataset/sequences \
    --save_path=./output

Inference

To evaluate a trained model:

torchrun --nproc_per_node=$NUM_GPUS --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    $CONFIG_FILE \
    --data_root=$DATA_ROOT \
    --save_path=$SAVE_PATH \
    --test_only \
    --checkpoint $CHECKPOINT_PATH

Parameters:

  • $NUM_GPUS: Number of GPUs to use
  • $CONFIG_FILE: Path to config file (e.g., config/kitti.yaml)
  • $DATA_ROOT: Path to dataset sequences directory
  • $SAVE_PATH: Directory to save evaluation results
  • $CHECKPOINT_PATH: Path to trained model checkpoint
  • --test_only: Flag to run evaluation only

Example:

torchrun --nproc_per_node=8 --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    config/kitti/kitti_1.yaml \
    --data_root=/path/to/semantickitti/dataset/sequences \
    --save_path=./eval_results \
    --test_only \
    --checkpoint ./checkpoints/skitti_1.pth

Pre-Trained Models

Model mIoU Download
SemanticKITTI 1% 51.87 checkpoint
nuScenes 1% 59.30 checkpoint

Acknowledgements

We have used utility functions from other open-source projects. We especially thank the authors of:

Contacts

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors