monogpt

Train your own language model with your own data!

This was heavily inspired and based on the Attention Is All You Need whitepaper.

Monogpt has

Custom Transformer architecture (Decoder-only GPT)
Custom BytePairEncoding Tokenizer training pipeline
Hackable Attention implementation with stackable blocks
Inference script(wrapper.py) made to be extremely accessible
It can reach GPT2 level if trained for long enough with the right data

It is a lightweight Transformer implementation for GPT training and inference in PyTorch. Designed to be a drop-in solution: given raw text -> output a generative model. It handles the entire pipeline from raw data ingestion to generative AI with an inference script.

Using monogpt to create models

Simply create "data/" folder, put data that you want the model to learn from and just run wrapper.py!

ubuntu@ubuntu:~$ python3 wrapper.py

>>>> Loading entire dataset to RAM...
Loaded 269352791 characters from 2 files.
>>>> Loaded 269352791 characters.
>>>> Using Hugging faces tokenizer! (i love rust implementations)
>>>> Found and loaded tokenizers weights on ./build/wrapper_w_tokenizer.json.
>>>> No existing model found at ./build/monogpt_v1.pth.
>>>> Starting fresh training run...
Using device: cuda
>>>> Starting to encode dataset with tokenizer(size=16000)
>>>> Finished tokenizing dataset (took 402.57 sec)
>>>> Saved computed tokenized dataset to ./build/tokenized_dataset.bin
>>>> Passing tokenized dataset(len=64673413) to tensor
>>>> Dataset is on: cpu
>>>> Data Tensor Shape: torch.Size([64673413]), using 90.0% for training
MONOGPT Initialized with:
        CONTEXT_SIZE=512
        VOCAB_SIZE=16000
        EMBEDDING_DIM=384
        BLOCK_NUM=6
        HEADS_PER_BLOCK=6
        TOTAL_TRANSFORMER_HEADS=36

Technical Neural Net info

(Most are the same that was used on Attention Is All You Need whitepaper)

Positional Embeddings: Learnable absolute positions
Attention: Causal Self-Attention (Masked trigonometry)
Activation: GELU (Gaussian Error Linear Unit)
Optimization: AdamW with Weight Decay
Regularization: Dropout & LayerNorm

Transformer Hyperparams and Training

The current version of monogpt is perfect for small-medium datasets. Meaning the neural net is not complex enough that it will start overfitting and taking long to train.
But it still has some complexity/is deep. The current network params take 2GB-10GB of VRAM. Mainly because of AdamW optimizer and PyTorch just being greedy with RAM.

It's very hackable so:

lower the params for: Faster training(lower VRAM aswell) and support lower complexity datasets (won't learn very niche information on data).
increase the params for: More complex datasets and potential to learn more connections between words, resulting in industry-standard and even GPT2 level results.

# nn.py
class MONOGPT(nn.Module):
  def __init__(self, vocab_size: int, verbose:bool=False):
    super().__init__()
    self.verbose = verbose

    self.CONTEXT_SIZE:int = 512 # how much the network will see before predicting next token
    self.EMBEDDING_DIM:int = 384 # how much dimension will each token be represented in
    self.BLOCK_NUM:int = 6 # number of transformer blocks
    self.HEADS_PER_BLOCK:int = 6 # number of attention heads in each block
    self.DROPOUT: float = 0.2 # just normal feedforward block dropout(defined in paper)

Full Instalation and Usage Instructions

!you should install a cuda compiled pytorch for compute speed!

# 1. Install dependencies
pip install torch numpy
pip install tokenizers # if you want faster tokenization;
                       # else it will automatically use monogpts slower implementation

# 2. Add your text files to /data
cp my_book.txt ./data/ # more than 100Mb of data is advised for learning!

# 3. Run the pipeline (Train -> Chat)
python wrapper.py

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
monogpt		monogpt
.gitignore		.gitignore
README.md		README.md
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

monogpt

Monogpt has

Using monogpt to create models

Technical Neural Net info

Transformer Hyperparams and Training

Full Instalation and Usage Instructions

!you should install a cuda compiled pytorch for compute speed!

About

Uh oh!

Releases

Packages

Languages

lipeeeee/monogpt

Folders and files

Latest commit

History

Repository files navigation

monogpt

Monogpt has

Using monogpt to create models

Technical Neural Net info

Transformer Hyperparams and Training

Full Instalation and Usage Instructions

!you should install a cuda compiled pytorch for compute speed!

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages