titanpg (TorchTitan Playground) is a learning project for understanding how to build and run training workloads with TorchTitan.
Current examples in this repo:
nanogpt(GPT-style language modeling)dinov3(self-supervised vision pre-training)
The goal is practical learning, not a polished framework. This repo is where we implement, test, and compare a small set of TorchTitan-based training examples end to end.
train.py: unified launcher for all examplesnanogpt/: NanoGPT model, data pipeline, and parallelization wiringdinov3/: DINOv3 SSL model, data pipeline, trainer, and infra hooksdocs/: guided walkthroughs (environment setup, data, first runs, scaling, debugging)tests/: behavior and integration tests for launcher, models, and distributed setup
Using uv:
uv syncOr using pip in a virtualenv:
pip install -e .torchrun --standalone --nnodes=1 --nproc-per-node=1 train.py \
--model.name=nanogpt \
--model.flavor=gpt2_small \
--training.dataset=tinyshakespeare \
--training.seq_len=128 \
--training.local_batch_size=4 \
--training.steps=20torchrun --standalone --nnodes=1 --nproc-per-node=1 train.py \
--model.name=dinov3 \
--training.dataset_path='ImageFolder:root=/data/imagenet1k_hf:split=train' \
--training.local_batch_size=8 \
--training.steps=20Prepare FineWeb shards for NanoGPT:
python nanogpt/populate_fineweb.py /data/edu_fineweb10BPrepare ImageNet-style layout for DINOv3:
python dinov3/populate_imagenet.py /data/imagenet1k_hf train 0.33Start here:
- docs/README.md
- docs/01-what-this-repo-is.md
- docs/06-first-training-run.md
- docs/09-practical-launch-recipes.md
- Default launcher target is
nanogptif--model.nameis omitted. - For
--model.name=dinov3, launcher injects--job.custom_config_module=dinov3.job_configwhen not set. - This repository is intentionally narrow in scope: it currently focuses only on NanoGPT and DINOv3 SSL pre-training.