Skip to content

AbdullohML/SmallGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SmallGPT

GPT-style decoder-only Transformer language model built from scratch in PyTorch.

SmallGPT implements the core components behind modern autoregressive language models: custom BPE tokenization, causal self-attention, next-token prediction training, checkpointed inference, and Telegram deployment.


Project Overview

This project explores how GPT-style language models work under the hood by implementing a decoder-only Transformer from first principles.

Implemented:

  • decoder-only Transformer architecture
  • causal masked multi-head self-attention
  • custom BPE tokenizer
  • next-token prediction training
  • mixed precision training (AMP)
  • gradient accumulation
  • validation + perplexity evaluation
  • checkpointed inference
  • Telegram bot deployment

Architecture

Configuration:

vocab_size: 16000
context_length: 256

n_layers: 6
n_heads: 6
d_model: 384
d_ff: 1536
dropout: 0.1

Model size:

16.88M parameters

Dataset

Training dataset:

roneneldan/TinyStories

Training subset:

200,000 train examples
5,000 validation examples
44.1M training tokens
980K validation tokens

The tokenizer is trained directly on the dataset using a custom BPE pipeline.


Training Results

Metric Value
GPU Tesla T4
Parameters 16.88M
Training Steps 30,000
Validation Loss 1.633
Validation Perplexity 5.12
Generation Length 150 tokens

Example Generations

Generated using the deployed Telegram bot (/tokens 150).

Prompt

A curious little robot wanted to learn how humans make friends

Output

A curious little robot wanted to learn how humans make friends, so he asked his mom, "Can I speak to those who are bigger?"

His mom said, "You have to be careful, Tim. Some people are bigger if they are small."

Tim was still excited, but he knew he had to be careful...

Prompt

Every weekend we should go to

Output

Every weekend we should go to the doctor and see if someone else is hurt.

Anna and Max nod. They say "Yes!"

Mom and Dad help them. They put some cream on their wounds...

Prompt

Once upon a time there was a small machine that wanted to understand people

Output

Once upon a time there was a small machine that wanted to understand people.

The machine went to a store. The store had lots of things that weighed...

The machine was very proud of itself.

Telegram Bot

The trained model is deployed through a Telegram interface.

Commands:

/start
/settings
/tokens 150

Run:

PYTHONPATH=. python scripts/run_bot.py

Telegram demo:

Telegram Demo 1

Telegram Demo 2


Installation

Clone repository:

git clone https://github.com/AbdullohML/SmallGPT.git
cd SmallGPT

Install dependencies:

pip install -r requirements.txt

If using Git LFS:

git lfs pull

Run inference:

PYTHONPATH=. python scripts/generate.py \
--prompt "Once upon a time"

Repository Structure

SmallGPT/
├── configs/
├── data/
├── runs/
│   └── smallgpt/
├── src/
├── bot/
├── scripts/
├── notebooks/
└── README.md

Future Work

  • instruction fine-tuning
  • GPT-2 fine-tuning comparison
  • LoRA adaptation
  • RAG integration
  • Docker deployment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages