Machine Learning: Difference between revisions

← Older edit Newer edit →

@@ Line 74: / Line 74: @@
 ===Learning Rate===
+===Hessian===
+Some optimization methods may rely on the Hessian.<br>
+;How to calculate <math>Hv</math><br>
+Use finite differencing with directional derivatives:<br>
+*<math> H_{\theta} \mathbf{v} = \lim_{\epsilon \rightarrow 0} \frac{g(\theta + \epsilon \mathbf{v}) - g(\theta)}{\epsilon}</math>
+;How to calculate <math>H_{\theta}^{-1}v</math>?
+Calculate H. Then use gradient descent to minimize <math>\frac{1}{2}x^T H x - v^T x</math>.<br>
+By first order optimality, the minimum occurs when the gradient is zero.<br>
+* <math>\nabla_{x} (\frac{1}{2}x^T H x - v^T x) = (\frac{1}{2}H + \frac{1}{2}H^T)x - v^T = Hx - v^T</math>
+* <math>\implies x^* = H^{-1}v</math>
+* Using [https://math.stackexchange.com/questions/222894/how-to-take-the-gradient-of-the-quadratic-form Gradient of quadratic form]
 ==SVM==
 [http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br>