Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters

LiDAR semantic segmentation traditionally relies on fully supervised training, requiring large-scale labeled datasets that are expensive to acquire. Unlike images, self-supervised pre-training for LiDAR remains difficult due to limited data diversity and strong sensor-specific biases. Existing cross-modal transfer methods mitigate this gap by leveraging pre-trained image models, but still require training LiDAR backbones from scratch and tightly synchronized, accurately calibrated camera–LiDAR pairs.

BALViT overcomes these limitations by directly adapting frozen vision foundation models to LiDAR. We tailor only the patch embedding, decoder, and a lightweight 2D–3D adapter, enabling seamless integration of LiDAR geometry into a pre-trained ViT. Through bidirectional feature exchange between range-view and BEV representations inside the frozen backbone, BALViT preserves the amodal reasoning capabilities of vision transformers while drastically reducing the need for labeled LiDAR data.

This repository contains the PyTorch re-implementation of our IROS 2025 paper Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters.

If you find this code useful for your research, we kindly ask you to consider citing our papers:

@article{hindel2025label,
  title={Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters},
  author={Hindel, Julia and Mohan, Rohit and Bratulic, Jelena and Cattaneo, Daniele and Brox, Thomas and Valada, Abhinav},
  journal={arXiv preprint arXiv:2503.03299},
  year={2025}
}

System Requirements

Linux
Python 3.9
PyTorch 2.4
CUDA 11.7
GCC 7 or higher

IMPORTANT NOTE: These requirements are not necessarily mandatory. However, we have only tested the code under the above settings and cannot provide support for other setups.

Installation

1. Environment Setup

We provide a pre-configured Conda environment with all required dependencies.

conda env create -f environment.yaml
conda activate balvit

2. Build Deformable Attention Ops

The Deformable Attention module must be compiled before running the code.

cd balvit/models/vit_adapter/ops
sh make.sh
cd -

Dataset Preparation

SemanticKITTI

Download the dataset from the official website.
Organize the directory as:

$DATA_ROOT/
└── sequences/
    ├── 00/
    ├── 01/
    ├── ...
    └── 21/

Place label files under each sequence directory following the official SemanticKITTI format.

nuScenes

Follow the official nuScenes LiDAR semantic segmentation preprocessing instructions and ensure the converted files are stored under:

$DATA_ROOT/
└── nuscenes/
    ├── lidarseg/
    └── samples/

Usage

Training

To train the model:

torchrun --nproc_per_node=$NUM_GPUS --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    $CONFIG_FILE \
    --data_root=$DATA_ROOT \
    --save_path=$SAVE_PATH

Parameters:

$NUM_GPUS: Number of GPUs to use
$CONFIG_FILE: Path to config file (e.g., config/kitti.yaml)
$DATA_ROOT: Path to dataset sequences directory
$SAVE_PATH: Directory to save checkpoints and logs

Example:

torchrun --nproc_per_node=1 --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    config/kitti.yaml \
    --data_root=/path/to/semantickitti/dataset/sequences \
    --save_path=./output

Inference

To evaluate a trained model:

torchrun --nproc_per_node=$NUM_GPUS --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    $CONFIG_FILE \
    --data_root=$DATA_ROOT \
    --save_path=$SAVE_PATH \
    --test_only \
    --checkpoint $CHECKPOINT_PATH

Parameters:

$NUM_GPUS: Number of GPUs to use
$CONFIG_FILE: Path to config file (e.g., config/kitti.yaml)
$DATA_ROOT: Path to dataset sequences directory
$SAVE_PATH: Directory to save evaluation results
$CHECKPOINT_PATH: Path to trained model checkpoint
--test_only: Flag to run evaluation only

Example:

torchrun --nproc_per_node=8 --master_addr=127.0.0.1 --master_port=30322 \
    main.py \
    config/kitti/kitti_1.yaml \
    --data_root=/path/to/semantickitti/dataset/sequences \
    --save_path=./eval_results \
    --test_only \
    --checkpoint ./checkpoints/skitti_1.pth

Pre-Trained Models

Model	mIoU	Download
SemanticKITTI 1%	51.87	checkpoint
nuScenes 1%	59.30	checkpoint

Acknowledgements

We have used utility functions from other open-source projects. We especially thank the authors of:

Contacts

Rohit Mohan

License

For academic usage, the code is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
balvit		balvit
common_utils		common_utils
config		config
.gitignore		.gitignore
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters

System Requirements

Installation

1. Environment Setup

2. Build Deformable Attention Ops

Dataset Preparation

SemanticKITTI

nuScenes

Usage

Training

Inference

Pre-Trained Models

Acknowledgements

Contacts

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters

System Requirements

Installation

1. Environment Setup

2. Build Deformable Attention Ops

Dataset Preparation

SemanticKITTI

nuScenes

Usage

Training

Inference

Pre-Trained Models

Acknowledgements

Contacts

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages