Skip to content

Man1ac-1773/minigrad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minigrad

Cool-Image

A scalar-valued automatic differentiation engine built from scratch in C++17. Implements reverse-mode autodiff over a dynamically constructed computation graph, with a neural network library on top. Header-only.

Inspired by Karpathy's micrograd, reimplemented independently in C++ over ~2 days.

What is this?

Modern deep learning frameworks like PyTorch and TensorFlow automatically compute gradients for you. This project implements that mechanism from scratch at the scalar level - every number is a node in a computation graph, and gradients flow backward through the graph via the chain rule.

This is not a production library. It's a learning project that demonstrates a complete, working autodiff in C++.

How it works

The Value class

Every scalar is wrapped in a Value object. Operations on Value objects don't just compute results - they record the computation graph.

Value a(2.0);
Value b(3.0);
Value c = a * b;  // c.node->data = 6.0
                  // c knows its inputs are a and b
                  // c knows how to backprop through multiplication
c.backward();     // dc/da = 3.0, dc/db = 2.0

Memory model

The core design problem in an autograd engine is keeping graph nodes alive for the backward pass. Expressions like:

Value loss = (pred - target) ^ 2;

create intermediate nodes that need to outlive the expression. Solved using shared_ptr<Node> - each Value is a lightweight handle to a heap-allocated Node. Copying a Value is cheap and shares the same underlying node, keeping the graph alive as long as any handle exists.

This also makes parameter updates intuitive:

// This works correctly even on copies:
for (auto& p : net.params()) {
    p -= eta * p.get_grad(); // mutates the underlying node
}

Backpropagation

backward() builds a topological ordering of the graph and traverses it in reverse, calling each node's stored _backward lambda to accumulate gradients.

void backward() {
    // build topological order
    // set output gradient to 1.0
    // traverse in reverse, call _backward() on each node
}

Features

  • Header-only - just include and compile
  • Operators - +, -, *, /, ^ (scalar and Value overloads)
  • Activations - tanh, sigmoid, relu
  • Neural network library - Neuron, Layer, NeuralNet
  • Loss functions - MSE (single sample and batched)
  • Computation graph visualization - via Graphviz
  • Batched training - feedforward and loss over multiple samples

Demo - XOR

XOR is the classic test for nonlinear learning. It cannot be solved by a linear model, requiring hidden layer representations.

NeuralNet net({2, 4, 1}); // 2 inputs, 4 hidden, 1 output

vector<vector<Value>> inp = {{1,1},{1,0},{0,1},{0,0}};
vector<vector<Value>> ans = {{0},{1},{1},{0}};

for (uint i = 0; i < 5000; i++) {
    auto out = net.feedforward_batch(inp);
    Value loss = net.mse_loss_batch(out, ans);
    loss.backward();
    for (auto& p : net.params())
        p -= 0.1 * p.get_grad();
    net.zero_grad();
}
// outputs: ~0.01, ~0.99, ~0.99, ~0.002
// loss : ~ 0.0003

Computation Graph

Graphviz visualization of the computation graph for a simple expression:

f = (a*b + c)^2

Example graph

Generate your own:

build_dot(loss.node, "graph.dot");
system("dot -Tpng graph.dot -o graph.png");

Build

Header-only. No dependencies beyond the standard library and Graphviz for visualization.

g++ -std=c++17 main.cpp -o main && ./main

For graph visualization, install Graphviz:

# Ubuntu
sudo apt install graphviz

# macOS
brew install graphviz

What I learned

Autograd design

The interesting engineering problem is graph lifetime management. In Python, garbage collection handles this transparently. In C++, you have to be explicit. The shared_ptr design means Value copies are O(1) and the graph stays alive for backprop without manual memory management.

Activation functions and initialization are coupled

Building this, I ran into vanishing gradients, dead neurons, and initialization sensitivity firsthand - the same problems that stalled deep learning research for decades.

  • Sigmoid - gradients shrink by 75% per layer minimum. Needed lr=10 just to compensate. Impractical for deep nets.
  • ReLU - neurons initialized in the negative region output zero, receive zero gradient, never recover. With uniform [-1,1] init, half the network was dead from step one.
  • Tanh - derivative peaks at 1.0 near zero, and uniform [-1,1] init keeps neurons in that region at startup. Gradients flow cleanly.

This is what Glorot & Bengio formalized in their 2010 paper. I rediscovered it by accident.

Gradient flow through the graph

Writing each backward pass by hand makes the chain rule concrete. The multiplication backward:

a_node->grad += b_node->data * out_node->grad;
b_node->grad += a_node->data * out_node->grad;

is just d(a*b)/da = b written in code. Doing this for every operation makes backprop feel less like magic.

Project Structure

minigrad/
│
├── src/
│   ├── value.h      # Value class, operators, activations
│   ├── engine.h     # Neuron, Layer, NeuralNet
│   └── graph.h      # Graphviz dot file generation
│
├── tests/
│   ├── xor         
│   │   ├── xor.cpp             # XOR demonstration with neural net 
│   │   └── xor-graph.png       # Computational graph of XOR 
│   └── simple-graphs     
│       ├── simple-graph.cpp    # Simple expression computation 
│       └── simple-graph.png    # Associated computation graph 
│
├── main.cpp                    # empty example file 
└── README.md

References

About

A scalar-valued automatic differentiation engine built from scratch in C++17.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages