● ML Foundations· Education·2026Live
Backpropagation from First Principles
The chain rule by hand — backprop derived and implemented in pure Python, no autograd, no NumPy in the gradient path.
The brief
I'd used loss.backward() a thousand times without being able to derive what it does on a whiteboard. This project closes that gap: backpropagation written out from the chain rule, in pure Python, where every gradient is a line of code I had to justify.
The build
- A 2-3-2 toy network — small enough that every weight's gradient can be checked by hand on paper
- Forward pass as explicit composition — each layer stores its inputs and pre-activations, because backprop is just bookkeeping about what you saw on the way forward
- Backward pass derived per-layer — ∂L/∂w written out for the output layer, then the recursive delta rule for hidden layers; no vectorization until the math was right scalar-by-scalar
- Gradient checking — every analytic gradient verified against central-difference numerical gradients to 1e-7 before trusting it
What I learned
Backprop is not magic and it is not even hard — it's the chain rule plus disciplined caching. The thing that is hard is the indexing. Writing the scalar version first, then vectorizing, is the only order that worked for me.