Transformer From Scratch

A Transformer model implementation from scratch using PyTorch.

Setup

This project requires conda for managing PyTorch and NumPy dependencies. Other development tools are installed via pip.

Installation

git clone https://github.com/henok3878/transformer-from-scratch.git
cd transformer-from-scratch

# CPU version
conda env create -f environment-cpu.yml
conda activate transformer-cpu

# GPU version (with CUDA)
conda env create -f environment-gpu.yml
conda activate transformer-gpu

Development

Code Quality Tools

black src/ tests/     # Format code
isort src/ tests/     # Sort imports
flake8 src/ tests/    # Lint code
mypy src/            # Type checking

Testing

pytest                    # Run tests
pytest --cov=transformer # With coverage

Experiment Tracking

This project uses Weights & Biases (wandb) for experiment tracking. Key metrics like loss, perplexity, and BLEU scores are automatically logged, along with model configurations and checkpoint during training.

Setup & Usage

Sign up for a free account at wandb.ai.
Login to your account from your terminal by running the following command and providing your API key:
```
wandb login
```
Once you've logged in, the training script (train.py) will automatically create a new run in your wandb project (transformer-from-scratch). You can monitor your experiments live from your wandb dashboard.

Project Structure

src/transformer/
├── components/
│   ├── multi_head.py          # Multi-head attention
│   ├── encoder_block.py       # Transformer encoder
│   ├── decoder_block.py       # Transformer decoder
│   ├── input_embedding.py     # Token embeddings
│   ├── positional_encoding.py # Position embeddings
│   ├── feed_forward.py        # Position wise feed forward network
│   └── layer_norm.py          # Layer normalization
├── transformer.py             # Complete model
└── train.py                   # Training script

tests/                         # Unit tests

Model Training

This project supports both single-node single-GPU and distributed multi-GPU training.

Single-Node Single-GPU Training

For basic training on one GPU, simply run:

python train.py --config configs/config_de-en.yaml

No .env or hostfile setup is required.

Multi-GPU Training (Single-Node or Multi-Node)

For training on multiple GPUs (either on one node or across several nodes), set up your .env file:

MASTER_ADDR=node001      # or your master node's hostname
MASTER_PORT=29500
NNODES=1                   # set to 1 for single-node, >1 for multi-node
GPUS_PER_NODE=2            # set to number of GPUs per node
NCCL_DEBUG=INFO

For multi-node training, also create a hostfile listing all participating nodes one per line in order of rank (master node first, then worker nodes):

node001
node002
node003

Then launch distributed training by running the following command on each node:

bash run_dist.sh

Checkpoints are saved under experiments/ and tracked with wandb.

Training & Evaluation Results

View Project on W&B →

Training Loss:
Quick Validation Loss (subset, every 1000 steps):
Full Validation Loss (entire set, every 10,000 steps):
Validation BLEU (greedy search, every 10,000 steps):
Final Test BLEU (beam search, beam size=4): Using the default configuration, the model achieved a BLEU score of 25.53 on the official WMT14 test set (news-test) using beam search with beam size 4.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
configs		configs
data		data
plots		plots
src/transformer		src/transformer
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
count_token.py		count_token.py
environment-cpu.yml		environment-cpu.yml
environment-gpu.yml		environment-gpu.yml
hostfile.example		hostfile.example
install_env.sh		install_env.sh
preprocess.py		preprocess.py
pyproject.toml		pyproject.toml
run_dist.sh		run_dist.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer From Scratch

Setup

Installation

Development

Code Quality Tools

Testing

Experiment Tracking

Setup & Usage

Project Structure

Model Training

Single-Node Single-GPU Training

Multi-GPU Training (Single-Node or Multi-Node)

Training & Evaluation Results

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer From Scratch

Setup

Installation

Development

Code Quality Tools

Testing

Experiment Tracking

Setup & Usage

Project Structure

Model Training

Single-Node Single-GPU Training

Multi-GPU Training (Single-Node or Multi-Node)

Training & Evaluation Results

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages