Skip to content

aakritipp/Lenet5

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Git Link --> https://github.com/prajjwalmehta123/Lenet5

LeNet-5 Implementation in C++

This repository contains a modern C++ implementation of the LeNet-5 convolutional neural network architecture, designed for MNIST digit classification. The implementation features both CPU (OpenMP) and GPU (CUDA) acceleration options.

Features

  • Complete LeNet-5 architecture implementation
  • MNIST dataset support
  • OpenMP parallel processing for CPU acceleration
  • Optional CUDA support for GPU acceleration
  • Batch processing capability
  • Adam optimizer implementation
  • Modular design with separate layer implementations

Prerequisites

Required:

  • C++17 compatible compiler
  • CMake (minimum version 3.16)
  • OpenMP

Optional:

  • CUDA Toolkit (for GPU acceleration)
  • Compatible NVIDIA GPU

Building the Project

  1. Clone the repository:
git clone [repository-url]
cd Lenet5
  1. Create a build directory:
mkdir build
cd build
  1. Configure with CMake:

For CPU-only build:

cmake ..

For GPU-enabled build:

cmake -DUSE_CUDA=ON ..
  1. Build the project:
make

Usage

  1. Set up environment variables for MNIST dataset paths:
export MNIST_IMAGES_PATH=/path/to/train-images-idx3-ubyte
export MNIST_LABELS_PATH=/path/to/train-labels-idx1-ubyte
export MNIST_TEST_IMAGES_PATH=/path/to/t10k-images-idx3-ubyte
export MNIST_TEST_LABELS_PATH=/path/to/t10k-labels-idx1-ubyte
  1. Run the executable:
./lenet5

Dataset

The implementation uses the MNIST dataset. You can download it from here.

The dataset should be in the IDX format:

  • Training images: train-images-idx3-ubyte
  • Training labels: train-labels-idx1-ubyte
  • Test images: t10k-images-idx3-ubyte
  • Test labels: t10k-labels-idx1-ubyte

Implementation Details

Network Architecture

  • Input Layer: 32x32 grayscale images (padded from 28x28)
  • C1: Convolutional layer (6 feature maps, 5x5 kernels)
  • S2: Average pooling layer (2x2)
  • C3: Convolutional layer (16 feature maps, 5x5 kernels)
  • S4: Average pooling layer (2x2)
  • F5: Fully connected layer (120 neurons)
  • F6: Fully connected layer (84 neurons)
  • Output: Fully connected layer (10 neurons)

Optimization

  • Adam optimizer with configurable parameters
  • ReLU activation functions
  • OpenMP parallelization for CPU
  • Optional CUDA acceleration for GPU

Performance

  • CPU Version: Utilizes OpenMP for parallel processing
  • GPU Version: Supports CUDA acceleration (requires -DUSE_CUDA=ON during build)
  • Batch sizes:
    • CPU: Default batch size of 64
    • GPU: Default batch size of 256

About

High Performance C++ and CUDA based implementation for Lenet5 from Scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Jupyter Notebook 55.6%
  • C++ 16.5%
  • Cuda 14.4%
  • Python 13.0%
  • CMake 0.5%