Skip to content

ved1beta/GPU-sanghathan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU-sanghathan

A tiny POC implementation of distributed training for sequential deep learning models. Implemented using plain Numpy & mpi4py.

Currently implements:

  • Sequential models / deep MLPs, training using SGD.
  • Data parallel training with interleaved communication & computation, similar to PyTorch's DistributedDataParallel.
  • Pipeline parallel training:
    • Naive schedule without interleaved stages.
    • Gpipe schedule with interleaved FWD & interleaved BWD.
    • (soon) PipeDream Flush schedule with additional inter-FWD & BWD interleaving.
  • Any combination of DP & PP algorithms.

Setup

python4 -m myenv venv
pip install -e .
# M1 Macs: conda install "libblas=*=*accelerate"
python data.py
pytest

Usage

# Sequential training
python train.py
# Data parallel distributed training
mpirun -n 4 python train.py --dp 4
# Pipeline parallel distributed training
mpirun -n 4 python train.py --pp 4 --schedule naive
# Data & pipeline parallel distributed training
mpirun -n 8 python train.py --dp 2 --pp 4 --schedule gpipe

Internals

About

Small scale distributed training of sequential deep learning models, built on Numpy and MPI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages