RadarGPT: A Foundation Model for Radar Perception

🧭 Vision

RadarGPT investigates how the architectural principles of large language models can be adapted for radar perception. The goal is to develop a radar-only foundation model that learns spatial and temporal radar representations useful for downstream tasks such as object detection, motion forecasting, and scene understanding.

Unlike cameras, radar sensors remain reliable in rain, fog, and low-light, yet radar data is rarely treated as a first-class input perception and scene understanding. RadarGPT bridges this gap by learning compact latent tokens, temporal dynamics, and task-specific reasoning entirely within the radar modality.

⚙️ Technical Roadmap

RadarGPT is developed in three main stages, each addressing a core research challenge — from radar representation learning to temporal world modeling and downstream adaptation.

Stage 1 — Radar Representation Learning

Step 1: AutoEncoder-Base

Learning high-fidelity latent radar representations.

Objective: Train a convolutional autoencoder to reconstruct radar intensity maps with minimal information loss.
Input / Output: 1 × 128 × 128 → 128 × 8 × 8 → 1 × 128 × 128
Losses: MSE + SSIM + HF-MSE composite emphasizing spatial and high-frequency fidelity.
Outcome: A frozen radar latent space encoding spatial geometry and backscatter patterns.

🧱 Data flow Diagram

Input radar → encoder🔥(↓×4) → latent (128×8×8) → decoder🔥 (↑×4) → reconstruction.

Dataset Details:

The AutoEncoder-Base was trained and validated on radar intensity maps derived from the ROD2021 dataset.

The official training dataset contains a total of 40,734 radar images, which was split into:
- Training split (80%) — used continuously during training, comprising 32,587 radar images.
- Validation split (20%) — used throughout training for validation, comprising 8,147 radar images.
The test dataset is a completely unseen and untouched set containing 10,437 radar images, reserved exclusively for final evaluation.
This setup ensures that reported reconstruction metrics reflect the model’s true generalization rather than overfitting to training samples.

Data Augmentation

To improve robustness and generalization, multiple spatial and intensity augmentations were applied to the training split only.
Each augmentation is applied with its corresponding probability:

Augmentation	Description	Probability
Additive Noise	Adds small Gaussian noise (`σ≈0.02`) to simulate sensor noise and measurement uncertainty.	0.8
Shift (Pad & Crop)	Random spatial shifts of ±5 px along x/y axes with edge padding, preserving overall structure.	0.3
Rotation	Small random rotations within ±10° to simulate vehicle or sensor orientation variance.	0.3
Elastic Deformation	Smooth nonlinear spatial distortions (`α=20`, `σ=4`) to emulate radar beam distortions.	0.7
Patch Masking	Randomly masks 1–18 patches (`16×16` px) to encourage spatial inpainting ability.	0.7

All radar images are globally normalized to zero mean and unit variance (using dataset-wide statistics) and clipped to the range [−1, 1].
These augmentations ensure the AutoEncoder learns robust radar representations that generalize to unseen environments and sensor noise.

Training samples:
Validation samples:

Quantitative Results

Metric	Validation	Test	Δ(Test-Validation)	Notes
MSE	0.000064	0.000096	+0.000031	Very low reconstruction error
MAE	0.00460	0.00541	+0.00081	Stable generalization
PSNR (dB)	47.94	46.25	−1.69	High-fidelity reconstruction
SSIM	0.99866	0.99812	−0.00054	Structural preservation
HF-MSE	0.00749	0.00980	+0.00232	Slight smoothing at high frequencies

Qualitative Results:

Step 2: AutoEncoder-LatentBridge (Tokenization Bridge)

Bridging radar latents to GPT-ready tokens.

Motivation: GPT models require 1-D token inputs, but direct compression from radar images destroys spatial detail.
Approach: Introduce Projection / Unprojection modules trained after freezing the AutoEncoder-Base.

🧱 Data flow Diagram

Input radar → frozen encoder❄️ → latent (128×8×8) → projection🔥 (flatten 8192 → 1024) → token (1×1024) → unprojection🔥 (1024 → 8192 → reshape) → z_hat (128×8×8) → frozen decoder❄️→ reconstruction.

Training: Optimize projection and unprojection only, supervised by both latent and image reconstructions.
Outcome: Continuous radar-aware tokenization that preserves semantics while producing compact GPT-compatible vectors.
Impact:
Decouples representation learning from sequence modeling.
Removes the need for projection heads within GPT.
Allows the Transformer to focus purely on temporal radar dynamics.

Quantitative Results (AutoEncoder-LatentBridge)

The LatentBridge model successfully learns a compact projection–unprojection mapping that preserves the original autoencoder’s reconstruction fidelity while enabling token-based radar representations.
Quantitative results show almost identical performance to the AutoEncoder-Base, confirming that the projection bottleneck introduces minimal information loss.

Metric	Validation	Test	Δ(Test–Validation)	Notes
MSE	0.000065	0.000096	+0.000031	Very low reconstruction error
MAE	0.004613	0.005422	+0.000809	Stable generalization
PSNR (dB)	47.97	46.40	−1.56	High-fidelity reconstructions
SSIM	0.998647	0.998108	−0.000539	Structural consistency preserved
HF-MSE	0.007525	0.009842	+0.002317	Slight high-frequency smoothing
ReconLoss_like_train	0.003675	0.004940	+0.001265	Consistent with base AE performance

The close alignment between AutoEncoder-Base and LatentBridge results demonstrates that the learned tokenization bridge can effectively encode and decode radar latent spaces with negligible degradation — validating its use as the interface for RadarGPT pretraining.

Comparison with AutoEncoder-Base

Metric	Base (Test)	LatentBridge (Test)	Δ(Bridge–Base)	Comment
MSE	0.000096	0.000096	≈ 0	Identical error
MAE	0.00541	0.00542	+0.00001	Negligible change
PSNR (dB)	46.25	46.40	+0.15	Slight gain
SSIM	0.99812	0.99811	−0.00001	Unchanged
HF-MSE	0.00980	0.00984	+0.00004	Negligible difference

The LatentBridge achieves the same reconstruction fidelity as the base autoencoder, confirming that radar latents can be safely projected into a compact token space for transformer pretraining.

Qualitative Results:

Stage 2 — Transformer Pretraining (RadarGPT) ⏳ (Not Started Yet)

Modeling temporal radar dynamics via next-token prediction.

Input: Sequences of radar tokens from the LatentBridge.
Objective: Predict the next latent token using a causal Transformer decoder.
Outcome: A radar-only foundation model capable of forecasting future radar observations and encoding implicit motion priors.

Stage 3 — Downstream Fine-Tuning ⏳ (Not Started Yet)

Adapting the pretrained radar foundation model to perception tasks.

Applications include:

Radar object detection & tracking
Occupancy / BEV map prediction
Demonstrates the transferability of radar-only pretraining to real-world autonomous-driving tasks.

📊 Dataset

ROD2021 — Radar Object Detection benchmark.
Used for both AutoEncoder training and evaluation.
After downloading, set the dataset path in scripts or notebooks as required.

Key Features

Modular autoencoder architecture
Reconstruction evaluation metrics: MSE, MAE, PSNR, SSIM, HF-MSE
Visualization tools for qualitative analysis
Integration with the ROD2021 radar object detection dataset
Experiment tracking (Neptune integration)

Repository Structure

RadarGPT/
├── models/                # Model definitions (AutoEncoder, etc.)
├── utils/                 # Helpers: Data handling, losses, visuals ,train, test
├── notebooks              # experiemental notebooks
├───── autoencoder 
├──────── checkpoints 
├──────── results
├──────── autoencoder_Base_test.ipynb      # Base AE training notebook
├──────── autoencoder_Base_training.ipynb  # Base AE testing notebook
├──────── autoencoder_LatentBridge_training.ipynb  # LatentBridge training notebook
├──────── autoencoder_LatentBridge_test.ipynb    # LatentBridge testing notebook

🧠 Research Contributions

Two-Step Radar Representation Learning: separates spatial encoding (AutoEncoder-Base) from tokenization (LatentBridge).
Continuous Radar Tokenization: learns projection/unprojection that preserves radar semantics without quantization.
Modular World Model Design: isolates representation, tokenization, and temporal modeling for stable GPT training.
Radar-Only Foundation Model: establishes a strong pretraining paradigm independent of camera or LiDAR.

Acknowledgments

Thanks to the ROD2021 team and the open-source community for their support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RadarGPT: A Foundation Model for Radar Perception

🧭 Vision

⚙️ Technical Roadmap

Stage 1 — Radar Representation Learning

Step 1: AutoEncoder-Base

🧱 Data flow Diagram

Dataset Details:

Data Augmentation

Quantitative Results

Qualitative Results:

Step 2: AutoEncoder-LatentBridge (Tokenization Bridge)

🧱 Data flow Diagram

Quantitative Results (AutoEncoder-LatentBridge)

Comparison with AutoEncoder-Base

Qualitative Results:

Stage 2 — Transformer Pretraining (RadarGPT) ⏳ (Not Started Yet)

Stage 3 — Downstream Fine-Tuning ⏳ (Not Started Yet)

📊 Dataset

Key Features

Repository Structure

🧠 Research Contributions

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
autoencoder		autoencoder
images		images
models		models
utils		utils
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

RadarGPT: A Foundation Model for Radar Perception

🧭 Vision

⚙️ Technical Roadmap

Stage 1 — Radar Representation Learning

Step 1: AutoEncoder-Base

🧱 Data flow Diagram

Dataset Details:

Data Augmentation

Quantitative Results

Qualitative Results:

Step 2: AutoEncoder-LatentBridge (Tokenization Bridge)

🧱 Data flow Diagram

Quantitative Results (AutoEncoder-LatentBridge)

Comparison with AutoEncoder-Base

Qualitative Results:

Stage 2 — Transformer Pretraining (RadarGPT) ⏳ (Not Started Yet)

Stage 3 — Downstream Fine-Tuning ⏳ (Not Started Yet)

📊 Dataset

Key Features

Repository Structure

🧠 Research Contributions

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages