A Python implementation of a fully connected deep neural network trained on the MNIST dataset without using deep learning frameworks.
This project demonstrates the implementation of a deep neural network built entirely from scratch using NumPy. The model is trained on the MNIST handwritten digit dataset to classify digits from 0 to 9. The goal of this project is to understand the internal working of neural networks, including forward propagation, backpropagation, weight initialization, and gradient descent, without relying on high-level deep learning libraries.
Python
NumPy
scikit-learn
OpenML (MNIST dataset)
Custom implementation of a deep neural network
Two hidden layers with sigmoid activation
Softmax output layer for multi-class classification
Manual forward propagation and backpropagation
Stochastic Gradient Descent (SGD) training
One-hot encoded target labels
Accuracy evaluation on test data
Loaded and preprocessed the MNIST dataset.
Normalized image pixel values to the range [0, 1].
One-hot encoded the digit labels.
Initialized network weights using scaled random values.
Performed forward propagation to generate predictions.
Calculated errors using backpropagation.
Updated weights using gradient descent.
Repeated the process over multiple epochs.
Evaluated model accuracy after each epoch.
How neural networks work internally without using deep learning frameworks
The importance of data normalization for training stability
Implementation of forward propagation and backpropagation
How gradient descent updates weights in a neural network
Handling multi-class classification using softmax and one-hot encoding
Debugging shape mismatches and numerical stability issues
- sizes defines the number of neurons in each layer.
- epochs defines how many times the model iterates over the entire dataset.
- l_rate is the learning rate that controls weight updates.
- initialization() is called to initialize network weights.
- sigmoid is used in hidden layers.
- When derivative=True the derivative of the sigmoid function is returned.
- softmax converts output layer values into class probabilities.
- Input layer size is 784 (28×28 images).
- First hidden layer contains 128 neurons.
- Second hidden layer contains 64 neurons.
- Output layer contains 10 neurons (digits 0–9).
- Weight matrices
W1,W2, andW3are created. - Random initialization is scaled to prevent vanishing or exploding gradients.
- Input sample is stored in A0.
- Reshaped into a column vector for matrix multiplication.
Lines 47–48: Hidden layer 1
- Z1 is computed using matrix multiplication of W1 and A0.
- A1 is obtained by applying the sigmoid activation function.
Lines 50–51: Hidden layer 2
- Z2 is computed using W2 and A1.
- A2 is obtained using the sigmoid function.
- Z2 is computed using W2 and A2.
- A3 is obtained using the softmax function.
- Network parameters are accessed.
- change_w dictionary stores gradients.
- Error is computed as the difference between predicted output and true label.
- Gradient for W3 is calculated using A2.
Lines 65–66: Hidden layer 2 gradients
- Error is propagated backward using the transpose of W3.
- Sigmoid derivative is applied to Z2.
- Gradient for W2 is calculated.
Lines 68–69: Hidden layer 1 gradients
- Error is propagated using the transpose of W2.
- Sigmoid derivative is applied to Z2.
- Gradient for W1 is calculated.
- Model processes one training sample at a time.
- Forward propagation generates predictions.
- Backpropagation computes gradients.
- Weights are updated using gradient descent.
- Model predictions are compared with true labels.
- Accuracy is computed on the validation dataset.
- Displays epoch number, elapsed time, and accuracy.
- A DeepNeuralNetwork object is created with layer sizes [784, 128, 64, 10].