From 8a33167029820cd66c1864e0ce11e03c608d6d92 Mon Sep 17 00:00:00 2001 From: Jonathon Cai Date: Wed, 2 Aug 2017 08:11:08 -0700 Subject: [PATCH] fix typo --- posts/2015-08-Backprop/index.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/posts/2015-08-Backprop/index.html b/posts/2015-08-Backprop/index.html index c94b9f0..db717f4 100644 --- a/posts/2015-08-Backprop/index.html +++ b/posts/2015-08-Backprop/index.html @@ -136,7 +136,7 @@

Factoring Paths

Instead of just naively summing over the paths, it would be much better to factor them:

\[\frac{\partial Z}{\partial X} = (\alpha + \beta + \gamma)(\delta + \epsilon + \zeta)\]

This is where “forward-mode differentiation” and “reverse-mode differentiation” come in. They’re algorithms for efficiently computing the sum by factoring the paths. Instead of summing over all of the paths explicitly, they compute the same sum more efficiently by merging paths back together at every node. In fact, both algorithms touch each edge exactly once!

-

Forward-mode differentiation starts at an input to the graph and moves towards the end. At every node, it sums all the paths feeding in. Each of those paths represents one way in which the input affects that node. By adding them up, we get the total way in which the node is affected by the input, it’s derivative.

+

Forward-mode differentiation starts at an input to the graph and moves towards the end. At every node, it sums all the paths feeding in. Each of those paths represents one way in which the input affects that node. By adding them up, we get the total ways in which the node is affected by the input, it’s derivative.