5,321
edits
Line 74: | Line 74: | ||
===Learning Rate=== | ===Learning Rate=== | ||
===Hessian=== | |||
Some optimization methods may rely on the Hessian.<br> | |||
;How to calculate <math>Hv</math><br> | |||
Use finite differencing with directional derivatives:<br> | |||
*<math> H_{\theta} \mathbf{v} = \lim_{\epsilon \rightarrow 0} \frac{g(\theta + \epsilon \mathbf{v}) - g(\theta)}{\epsilon}</math> | |||
;How to calculate <math>H_{\theta}^{-1}v</math>? | |||
Calculate H. Then use gradient descent to minimize <math>\frac{1}{2}x^T H x - v^T x</math>.<br> | |||
By first order optimality, the minimum occurs when the gradient is zero.<br> | |||
* <math>\nabla_{x} (\frac{1}{2}x^T H x - v^T x) = (\frac{1}{2}H + \frac{1}{2}H^T)x - v^T = Hx - v^T</math> | |||
* <math>\implies x^* = H^{-1}v</math> | |||
* Using [https://math.stackexchange.com/questions/222894/how-to-take-the-gradient-of-the-quadratic-form Gradient of quadratic form] | |||
==SVM== | ==SVM== | ||
[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br> | [http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br> |