diff --git a/posts/2015-08-Backprop/index.html b/posts/2015-08-Backprop/index.html index c94b9f0..db717f4 100644 --- a/posts/2015-08-Backprop/index.html +++ b/posts/2015-08-Backprop/index.html @@ -136,7 +136,7 @@

Factoring Paths

Instead of just naively summing over the paths, it would be much better to factor them:

\[\frac{\partial Z}{\partial X} = (\alpha + \beta + \gamma)(\delta + \epsilon + \zeta)\]

This is where “forward-mode differentiation” and “reverse-mode differentiation” come in. They’re algorithms for efficiently computing the sum by factoring the paths. Instead of summing over all of the paths explicitly, they compute the same sum more efficiently by merging paths back together at every node. In fact, both algorithms touch each edge exactly once!

-

Forward-mode differentiation starts at an input to the graph and moves towards the end. At every node, it sums all the paths feeding in. Each of those paths represents one way in which the input affects that node. By adding them up, we get the total way in which the node is affected by the input, it’s derivative.

+

Forward-mode differentiation starts at an input to the graph and moves towards the end. At every node, it sums all the paths feeding in. Each of those paths represents one way in which the input affects that node. By adding them up, we get the total ways in which the node is affected by the input, it’s derivative.