Skip to content

ahmeddawy/RadarGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RadarGPT Logo

RadarGPT: A Foundation Model for Radar Perception

🧭 Vision

RadarGPT investigates how the architectural principles of large language models can be adapted for radar perception. The goal is to develop a radar-only foundation model that learns spatial and temporal radar representations useful for downstream tasks such as object detection, motion forecasting, and scene understanding.

Unlike cameras, radar sensors remain reliable in rain, fog, and low-light, yet radar data is rarely treated as a first-class input perception and scene understanding. RadarGPT bridges this gap by learning compact latent tokens, temporal dynamics, and task-specific reasoning entirely within the radar modality.


alt text

βš™οΈ Technical Roadmap

RadarGPT is developed in three main stages, each addressing a core research challenge β€” from radar representation learning to temporal world modeling and downstream adaptation.

Stage 1 β€” Radar Representation Learning

Step 1: AutoEncoder-Base

Learning high-fidelity latent radar representations.

  • Objective: Train a convolutional autoencoder to reconstruct radar intensity maps with minimal information loss.
  • Input / Output: 1 Γ— 128 Γ— 128 β†’ 128 Γ— 8 Γ— 8 β†’ 1 Γ— 128 Γ— 128
  • Losses: MSE + SSIM + HF-MSE composite emphasizing spatial and high-frequency fidelity.
  • Outcome: A frozen radar latent space encoding spatial geometry and backscatter patterns.

🧱 Data flow Diagram

Input radar β†’ encoderπŸ”₯(↓×4) β†’ latent (128Γ—8Γ—8) β†’ decoderπŸ”₯ (↑×4) β†’ reconstruction.

Dataset Details:

The AutoEncoder-Base was trained and validated on radar intensity maps derived from the ROD2021 dataset.

  • The official training dataset contains a total of 40,734 radar images, which was split into:

    • Training split (80%) β€” used continuously during training, comprising 32,587 radar images.
    • Validation split (20%) β€” used throughout training for validation, comprising 8,147 radar images.
  • The test dataset is a completely unseen and untouched set containing 10,437 radar images, reserved exclusively for final evaluation.
    This setup ensures that reported reconstruction metrics reflect the model’s true generalization rather than overfitting to training samples.

Data Augmentation

To improve robustness and generalization, multiple spatial and intensity augmentations were applied to the training split only.
Each augmentation is applied with its corresponding probability:

Augmentation Description Probability
Additive Noise Adds small Gaussian noise (Οƒβ‰ˆ0.02) to simulate sensor noise and measurement uncertainty. 0.8
Shift (Pad & Crop) Random spatial shifts of Β±5 px along x/y axes with edge padding, preserving overall structure. 0.3
Rotation Small random rotations within Β±10Β° to simulate vehicle or sensor orientation variance. 0.3
Elastic Deformation Smooth nonlinear spatial distortions (Ξ±=20, Οƒ=4) to emulate radar beam distortions. 0.7
Patch Masking Randomly masks 1–18 patches (16Γ—16 px) to encourage spatial inpainting ability. 0.7

All radar images are globally normalized to zero mean and unit variance (using dataset-wide statistics) and clipped to the range [βˆ’1, 1].
These augmentations ensure the AutoEncoder learns robust radar representations that generalize to unseen environments and sensor noise.

  • Training samples: alt text alt text alt text
  • Validation samples: alt text alt text alt text

Quantitative Results

Metric Validation Test Ξ”(Test-Validation) Notes
MSE 0.000064 0.000096 +0.000031 Very low reconstruction error
MAE 0.00460 0.00541 +0.00081 Stable generalization
PSNR (dB) 47.94 46.25 βˆ’1.69 High-fidelity reconstruction
SSIM 0.99866 0.99812 βˆ’0.00054 Structural preservation
HF-MSE 0.00749 0.00980 +0.00232 Slight smoothing at high frequencies

Qualitative Results:

alt text alt text alt text alt text alt text alt text

Step 2: AutoEncoder-LatentBridge (Tokenization Bridge)

Bridging radar latents to GPT-ready tokens.

  • Motivation: GPT models require 1-D token inputs, but direct compression from radar images destroys spatial detail.
  • Approach: Introduce Projection / Unprojection modules trained after freezing the AutoEncoder-Base.

🧱 Data flow Diagram

Input radar β†’ frozen encoder❄️ β†’ latent (128Γ—8Γ—8) β†’ projectionπŸ”₯ (flatten 8192 β†’ 1024) β†’ token (1Γ—1024) β†’ unprojectionπŸ”₯ (1024 β†’ 8192 β†’ reshape) β†’ z_hat (128Γ—8Γ—8) β†’ frozen decoder❄️→ reconstruction.

  • Training: Optimize projection and unprojection only, supervised by both latent and image reconstructions.
  • Outcome: Continuous radar-aware tokenization that preserves semantics while producing compact GPT-compatible vectors.
  • Impact:
  • Decouples representation learning from sequence modeling.
  • Removes the need for projection heads within GPT.
  • Allows the Transformer to focus purely on temporal radar dynamics.

Quantitative Results (AutoEncoder-LatentBridge)

The LatentBridge model successfully learns a compact projection–unprojection mapping that preserves the original autoencoder’s reconstruction fidelity while enabling token-based radar representations.
Quantitative results show almost identical performance to the AutoEncoder-Base, confirming that the projection bottleneck introduces minimal information loss.

Metric Validation Test Ξ”(Test–Validation) Notes
MSE 0.000065 0.000096 +0.000031 Very low reconstruction error
MAE 0.004613 0.005422 +0.000809 Stable generalization
PSNR (dB) 47.97 46.40 βˆ’1.56 High-fidelity reconstructions
SSIM 0.998647 0.998108 βˆ’0.000539 Structural consistency preserved
HF-MSE 0.007525 0.009842 +0.002317 Slight high-frequency smoothing
ReconLoss_like_train 0.003675 0.004940 +0.001265 Consistent with base AE performance

The close alignment between AutoEncoder-Base and LatentBridge results demonstrates that the learned tokenization bridge can effectively encode and decode radar latent spaces with negligible degradation β€” validating its use as the interface for RadarGPT pretraining.

Comparison with AutoEncoder-Base

Metric Base (Test) LatentBridge (Test) Ξ”(Bridge–Base) Comment
MSE 0.000096 0.000096 β‰ˆ 0 Identical error
MAE 0.00541 0.00542 +0.00001 Negligible change
PSNR (dB) 46.25 46.40 +0.15 Slight gain
SSIM 0.99812 0.99811 βˆ’0.00001 Unchanged
HF-MSE 0.00980 0.00984 +0.00004 Negligible difference

The LatentBridge achieves the same reconstruction fidelity as the base autoencoder, confirming that radar latents can be safely projected into a compact token space for transformer pretraining.

Qualitative Results:

alt text alt text alt text alt text

Stage 2 β€” Transformer Pretraining (RadarGPT) ⏳ (Not Started Yet)

Modeling temporal radar dynamics via next-token prediction.

  • Input: Sequences of radar tokens from the LatentBridge.
  • Objective: Predict the next latent token using a causal Transformer decoder.
  • Outcome: A radar-only foundation model capable of forecasting future radar observations and encoding implicit motion priors.

Stage 3 β€” Downstream Fine-Tuning ⏳ (Not Started Yet)

Adapting the pretrained radar foundation model to perception tasks.

Applications include:

  • Radar object detection & tracking
  • Occupancy / BEV map prediction
    Demonstrates the transferability of radar-only pretraining to real-world autonomous-driving tasks.

πŸ“Š Dataset

ROD2021 β€” Radar Object Detection benchmark.
Used for both AutoEncoder training and evaluation.
After downloading, set the dataset path in scripts or notebooks as required.


Key Features

  • Modular autoencoder architecture
  • Reconstruction evaluation metrics: MSE, MAE, PSNR, SSIM, HF-MSE
  • Visualization tools for qualitative analysis
  • Integration with the ROD2021 radar object detection dataset
  • Experiment tracking (Neptune integration)

Repository Structure

RadarGPT/
β”œβ”€β”€ models/                # Model definitions (AutoEncoder, etc.)
β”œβ”€β”€ utils/                 # Helpers: Data handling, losses, visuals ,train, test
β”œβ”€β”€ notebooks              # experiemental notebooks
β”œβ”€β”€β”€β”€β”€ autoencoder 
β”œβ”€β”€β”€β”€β”€β”€β”€β”€ checkpoints 
β”œβ”€β”€β”€β”€β”€β”€β”€β”€ results
β”œβ”€β”€β”€β”€β”€β”€β”€β”€ autoencoder_Base_test.ipynb      # Base AE training notebook
β”œβ”€β”€β”€β”€β”€β”€β”€β”€ autoencoder_Base_training.ipynb  # Base AE testing notebook
β”œβ”€β”€β”€β”€β”€β”€β”€β”€ autoencoder_LatentBridge_training.ipynb  # LatentBridge training notebook
β”œβ”€β”€β”€β”€β”€β”€β”€β”€ autoencoder_LatentBridge_test.ipynb    # LatentBridge testing notebook

🧠 Research Contributions

  • Two-Step Radar Representation Learning: separates spatial encoding (AutoEncoder-Base) from tokenization (LatentBridge).

  • Continuous Radar Tokenization: learns projection/unprojection that preserves radar semantics without quantization.

  • Modular World Model Design: isolates representation, tokenization, and temporal modeling for stable GPT training.

  • Radar-Only Foundation Model: establishes a strong pretraining paradigm independent of camera or LiDAR.

Acknowledgments

Thanks to the ROD2021 team and the open-source community for their support.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors