SmallGPT

GPT-style decoder-only Transformer language model built from scratch in PyTorch.

SmallGPT implements the core components behind modern autoregressive language models: custom BPE tokenization, causal self-attention, next-token prediction training, checkpointed inference, and Telegram deployment.

Project Overview

This project explores how GPT-style language models work under the hood by implementing a decoder-only Transformer from first principles.

Implemented:

decoder-only Transformer architecture
causal masked multi-head self-attention
custom BPE tokenizer
next-token prediction training
mixed precision training (AMP)
gradient accumulation
validation + perplexity evaluation
checkpointed inference
Telegram bot deployment

Architecture

Configuration:

vocab_size: 16000
context_length: 256

n_layers: 6
n_heads: 6
d_model: 384
d_ff: 1536
dropout: 0.1

Model size:

16.88M parameters

Dataset

Training dataset:

roneneldan/TinyStories

Training subset:

200,000 train examples
5,000 validation examples
44.1M training tokens
980K validation tokens

The tokenizer is trained directly on the dataset using a custom BPE pipeline.

Training Results

Metric	Value
GPU	Tesla T4
Parameters	16.88M
Training Steps	30,000
Validation Loss	1.633
Validation Perplexity	5.12
Generation Length	150 tokens

Example Generations

Generated using the deployed Telegram bot (/tokens 150).

Prompt

A curious little robot wanted to learn how humans make friends

Output

A curious little robot wanted to learn how humans make friends, so he asked his mom, "Can I speak to those who are bigger?"

His mom said, "You have to be careful, Tim. Some people are bigger if they are small."

Tim was still excited, but he knew he had to be careful...

Prompt

Every weekend we should go to

Output

Every weekend we should go to the doctor and see if someone else is hurt.

Anna and Max nod. They say "Yes!"

Mom and Dad help them. They put some cream on their wounds...

Prompt

Once upon a time there was a small machine that wanted to understand people

Output

Once upon a time there was a small machine that wanted to understand people.

The machine went to a store. The store had lots of things that weighed...

The machine was very proud of itself.

Telegram Bot

The trained model is deployed through a Telegram interface.

Commands:

/start
/settings
/tokens 150

Run:

PYTHONPATH=. python scripts/run_bot.py

Telegram demo:

Installation

Clone repository:

git clone https://github.com/AbdullohML/SmallGPT.git
cd SmallGPT

Install dependencies:

pip install -r requirements.txt

If using Git LFS:

git lfs pull

Run inference:

PYTHONPATH=. python scripts/generate.py \
--prompt "Once upon a time"

Repository Structure

SmallGPT/
├── configs/
├── data/
├── runs/
│   └── smallgpt/
├── src/
├── bot/
├── scripts/
├── notebooks/
└── README.md

Future Work

instruction fine-tuning
GPT-2 fine-tuning comparison
LoRA adaptation
RAG integration
Docker deployment

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
assets		assets
bot		bot
configs		configs
notebooks		notebooks
runs/smallgpt		runs/smallgpt
scripts		scripts
src		src
.env		.env
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmallGPT

Project Overview

Architecture

Dataset

Training Results

Example Generations

Prompt

Output

Prompt

Output

Prompt

Output

Telegram Bot

Installation

Repository Structure

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmallGPT

Project Overview

Architecture

Dataset

Training Results

Example Generations

Prompt

Output

Prompt

Output

Prompt

Output

Telegram Bot

Installation

Repository Structure

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages