Machine Learning: Difference between revisions

Line 74: Line 74:


===Learning Rate===
===Learning Rate===
===Hessian===
Some optimization methods may rely on the Hessian.<br>
;How to calculate <math>Hv</math><br>
Use finite differencing with directional derivatives:<br>
*<math> H_{\theta} \mathbf{v} = \lim_{\epsilon \rightarrow 0} \frac{g(\theta + \epsilon \mathbf{v}) - g(\theta)}{\epsilon}</math>
;How to calculate <math>H_{\theta}^{-1}v</math>?
Calculate H. Then use gradient descent to minimize <math>\frac{1}{2}x^T H x - v^T x</math>.<br>
By first order optimality, the minimum occurs when the gradient is zero.<br>
* <math>\nabla_{x} (\frac{1}{2}x^T H x - v^T x) = (\frac{1}{2}H + \frac{1}{2}H^T)x - v^T = Hx - v^T</math>
* <math>\implies x^* = H^{-1}v</math>
* Using [https://math.stackexchange.com/questions/222894/how-to-take-the-gradient-of-the-quadratic-form Gradient of quadratic form]
==SVM==
==SVM==
[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br>
[http://cs229.stanford.edu/notes/cs229-notes3.pdf Andrew Ng Notes]<br>