RadarGPT investigates how the architectural principles of large language models can be adapted for radar perception. The goal is to develop a radar-only foundation model that learns spatial and temporal radar representations useful for downstream tasks such as object detection, motion forecasting, and scene understanding.
Unlike cameras, radar sensors remain reliable in rain, fog, and low-light, yet radar data is rarely treated as a first-class input perception and scene understanding. RadarGPT bridges this gap by learning compact latent tokens, temporal dynamics, and task-specific reasoning entirely within the radar modality.
RadarGPT is developed in three main stages, each addressing a core research challenge β from radar representation learning to temporal world modeling and downstream adaptation.
Learning high-fidelity latent radar representations.
- Objective: Train a convolutional autoencoder to reconstruct radar intensity maps with minimal information loss.
- Input / Output:
1 Γ 128 Γ 128 β 128 Γ 8 Γ 8 β 1 Γ 128 Γ 128 - Losses: MSE + SSIM + HF-MSE composite emphasizing spatial and high-frequency fidelity.
- Outcome: A frozen radar latent space encoding spatial geometry and backscatter patterns.
Input radar β encoderπ₯(βΓ4) β latent (128Γ8Γ8) β decoderπ₯ (βΓ4) β reconstruction.
The AutoEncoder-Base was trained and validated on radar intensity maps derived from the ROD2021 dataset.
-
The official training dataset contains a total of 40,734 radar images, which was split into:
- Training split (80%) β used continuously during training, comprising 32,587 radar images.
- Validation split (20%) β used throughout training for validation, comprising 8,147 radar images.
-
The test dataset is a completely unseen and untouched set containing 10,437 radar images, reserved exclusively for final evaluation.
This setup ensures that reported reconstruction metrics reflect the modelβs true generalization rather than overfitting to training samples.
To improve robustness and generalization, multiple spatial and intensity augmentations were applied to the training split only.
Each augmentation is applied with its corresponding probability:
| Augmentation | Description | Probability |
|---|---|---|
| Additive Noise | Adds small Gaussian noise (Οβ0.02) to simulate sensor noise and measurement uncertainty. |
0.8 |
| Shift (Pad & Crop) | Random spatial shifts of Β±5 px along x/y axes with edge padding, preserving overall structure. | 0.3 |
| Rotation | Small random rotations within Β±10Β° to simulate vehicle or sensor orientation variance. | 0.3 |
| Elastic Deformation | Smooth nonlinear spatial distortions (Ξ±=20, Ο=4) to emulate radar beam distortions. |
0.7 |
| Patch Masking | Randomly masks 1β18 patches (16Γ16 px) to encourage spatial inpainting ability. |
0.7 |
All radar images are globally normalized to zero mean and unit variance (using dataset-wide statistics) and clipped to the range [β1, 1].
These augmentations ensure the AutoEncoder learns robust radar representations that generalize to unseen environments and sensor noise.
| Metric | Validation | Test | Ξ(Test-Validation) | Notes |
|---|---|---|---|---|
| MSE | 0.000064 | 0.000096 | +0.000031 | Very low reconstruction error |
| MAE | 0.00460 | 0.00541 | +0.00081 | Stable generalization |
| PSNR (dB) | 47.94 | 46.25 | β1.69 | High-fidelity reconstruction |
| SSIM | 0.99866 | 0.99812 | β0.00054 | Structural preservation |
| HF-MSE | 0.00749 | 0.00980 | +0.00232 | Slight smoothing at high frequencies |
Bridging radar latents to GPT-ready tokens.
- Motivation: GPT models require 1-D token inputs, but direct compression from radar images destroys spatial detail.
- Approach: Introduce Projection / Unprojection modules trained after freezing the AutoEncoder-Base.
Input radar β frozen encoderβοΈ β latent (128Γ8Γ8) β projectionπ₯ (flatten 8192 β 1024) β token (1Γ1024) β unprojectionπ₯ (1024 β 8192 β reshape) β z_hat (128Γ8Γ8) β frozen decoderβοΈβ reconstruction.
- Training: Optimize projection and unprojection only, supervised by both latent and image reconstructions.
- Outcome: Continuous radar-aware tokenization that preserves semantics while producing compact GPT-compatible vectors.
- Impact:
- Decouples representation learning from sequence modeling.
- Removes the need for projection heads within GPT.
- Allows the Transformer to focus purely on temporal radar dynamics.
The LatentBridge model successfully learns a compact projectionβunprojection mapping that preserves the original autoencoderβs reconstruction fidelity while enabling token-based radar representations.
Quantitative results show almost identical performance to the AutoEncoder-Base, confirming that the projection bottleneck introduces minimal information loss.
| Metric | Validation | Test | Ξ(TestβValidation) | Notes |
|---|---|---|---|---|
| MSE | 0.000065 | 0.000096 | +0.000031 | Very low reconstruction error |
| MAE | 0.004613 | 0.005422 | +0.000809 | Stable generalization |
| PSNR (dB) | 47.97 | 46.40 | β1.56 | High-fidelity reconstructions |
| SSIM | 0.998647 | 0.998108 | β0.000539 | Structural consistency preserved |
| HF-MSE | 0.007525 | 0.009842 | +0.002317 | Slight high-frequency smoothing |
| ReconLoss_like_train | 0.003675 | 0.004940 | +0.001265 | Consistent with base AE performance |
The close alignment between AutoEncoder-Base and LatentBridge results demonstrates that the learned tokenization bridge can effectively encode and decode radar latent spaces with negligible degradation β validating its use as the interface for RadarGPT pretraining.
| Metric | Base (Test) | LatentBridge (Test) | Ξ(BridgeβBase) | Comment |
|---|---|---|---|---|
| MSE | 0.000096 | 0.000096 | β 0 | Identical error |
| MAE | 0.00541 | 0.00542 | +0.00001 | Negligible change |
| PSNR (dB) | 46.25 | 46.40 | +0.15 | Slight gain |
| SSIM | 0.99812 | 0.99811 | β0.00001 | Unchanged |
| HF-MSE | 0.00980 | 0.00984 | +0.00004 | Negligible difference |
The LatentBridge achieves the same reconstruction fidelity as the base autoencoder, confirming that radar latents can be safely projected into a compact token space for transformer pretraining.
Modeling temporal radar dynamics via next-token prediction.
- Input: Sequences of radar tokens from the LatentBridge.
- Objective: Predict the next latent token using a causal Transformer decoder.
- Outcome: A radar-only foundation model capable of forecasting future radar observations and encoding implicit motion priors.
Adapting the pretrained radar foundation model to perception tasks.
Applications include:
- Radar object detection & tracking
- Occupancy / BEV map prediction
Demonstrates the transferability of radar-only pretraining to real-world autonomous-driving tasks.
ROD2021 β Radar Object Detection benchmark.
Used for both AutoEncoder training and evaluation.
After downloading, set the dataset path in scripts or notebooks as required.
- Modular autoencoder architecture
- Reconstruction evaluation metrics: MSE, MAE, PSNR, SSIM, HF-MSE
- Visualization tools for qualitative analysis
- Integration with the ROD2021 radar object detection dataset
- Experiment tracking (Neptune integration)
RadarGPT/
βββ models/ # Model definitions (AutoEncoder, etc.)
βββ utils/ # Helpers: Data handling, losses, visuals ,train, test
βββ notebooks # experiemental notebooks
ββββββ autoencoder
βββββββββ checkpoints
βββββββββ results
βββββββββ autoencoder_Base_test.ipynb # Base AE training notebook
βββββββββ autoencoder_Base_training.ipynb # Base AE testing notebook
βββββββββ autoencoder_LatentBridge_training.ipynb # LatentBridge training notebook
βββββββββ autoencoder_LatentBridge_test.ipynb # LatentBridge testing notebook
-
Two-Step Radar Representation Learning: separates spatial encoding (AutoEncoder-Base) from tokenization (LatentBridge).
-
Continuous Radar Tokenization: learns projection/unprojection that preserves radar semantics without quantization.
-
Modular World Model Design: isolates representation, tokenization, and temporal modeling for stable GPT training.
-
Radar-Only Foundation Model: establishes a strong pretraining paradigm independent of camera or LiDAR.
Thanks to the ROD2021 team and the open-source community for their support.
.png)
















