GPT-style decoder-only Transformer language model built from scratch in PyTorch.
SmallGPT implements the core components behind modern autoregressive language models: custom BPE tokenization, causal self-attention, next-token prediction training, checkpointed inference, and Telegram deployment.
This project explores how GPT-style language models work under the hood by implementing a decoder-only Transformer from first principles.
Implemented:
- decoder-only Transformer architecture
- causal masked multi-head self-attention
- custom BPE tokenizer
- next-token prediction training
- mixed precision training (AMP)
- gradient accumulation
- validation + perplexity evaluation
- checkpointed inference
- Telegram bot deployment
Configuration:
vocab_size: 16000
context_length: 256
n_layers: 6
n_heads: 6
d_model: 384
d_ff: 1536
dropout: 0.1Model size:
16.88M parameters
Training dataset:
roneneldan/TinyStories
Training subset:
200,000 train examples
5,000 validation examples
44.1M training tokens
980K validation tokens
The tokenizer is trained directly on the dataset using a custom BPE pipeline.
| Metric | Value |
|---|---|
| GPU | Tesla T4 |
| Parameters | 16.88M |
| Training Steps | 30,000 |
| Validation Loss | 1.633 |
| Validation Perplexity | 5.12 |
| Generation Length | 150 tokens |
Generated using the deployed Telegram bot (/tokens 150).
A curious little robot wanted to learn how humans make friends
A curious little robot wanted to learn how humans make friends, so he asked his mom, "Can I speak to those who are bigger?"
His mom said, "You have to be careful, Tim. Some people are bigger if they are small."
Tim was still excited, but he knew he had to be careful...
Every weekend we should go to
Every weekend we should go to the doctor and see if someone else is hurt.
Anna and Max nod. They say "Yes!"
Mom and Dad help them. They put some cream on their wounds...
Once upon a time there was a small machine that wanted to understand people
Once upon a time there was a small machine that wanted to understand people.
The machine went to a store. The store had lots of things that weighed...
The machine was very proud of itself.
The trained model is deployed through a Telegram interface.
Commands:
/start
/settings
/tokens 150
Run:
PYTHONPATH=. python scripts/run_bot.pyTelegram demo:
Clone repository:
git clone https://github.com/AbdullohML/SmallGPT.git
cd SmallGPTInstall dependencies:
pip install -r requirements.txtIf using Git LFS:
git lfs pullRun inference:
PYTHONPATH=. python scripts/generate.py \
--prompt "Once upon a time"SmallGPT/
├── configs/
├── data/
├── runs/
│ └── smallgpt/
├── src/
├── bot/
├── scripts/
├── notebooks/
└── README.md
- instruction fine-tuning
- GPT-2 fine-tuning comparison
- LoRA adaptation
- RAG integration
- Docker deployment

