An Implemenentation of the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale in Pytorch.
- Clone the repository
git clone https://github.com/dakofler/vision_transformer.git
cd vision_transformer/- (Optional) Create a Python virtual environment (Linux)
python3 -m venv .venv
source .venv/bin/activate- Install dependencies
pip install -r requirements.txt- Download the CIFAR10-10 dataset
- Upack and put the files into a
./datadirectory
vision_transformer/
├─ data/
│ ├─ batches.meta
│ ├─ data_batch_1
│ ├─ data_batch_2
│ ├─ data_batch_3
│ ├─ data_batch_4
│ ├─ data_batch_5
│ ├─ readme.html
│ ├─ test_batch
...- run the training script
python3 train.pyDaniel Kofler - dkofler@outlook.com
2025