5,321
edits
(One intermediate revision by the same user not shown) | |||
Line 28: | Line 28: | ||
<math> | <math> | ||
\nabla_\theta J(\theta) = -\nabla_\theta \sum [(y^{(i)})\log(g(\theta^t x^{(i)})) + (1-y^{(i)})\log(1-g(\theta^t x^{(i)}))] | \nabla_\theta J(\theta) = -\nabla_\theta \sum [(y^{(i)})\log(g(\theta^t x^{(i)})) + (1-y^{(i)})\log(1-g(\theta^t x^{(i)}))] | ||
</math><br> | |||
<math> | |||
= -\sum [(y^{(i)})\frac{g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{g(\theta^t x^{(i)})}x^{(i)} + (1-y^{(i)})\frac{-g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{1-g(\theta^t x^{(i)})}x^{(i)}] | = -\sum [(y^{(i)})\frac{g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{g(\theta^t x^{(i)})}x^{(i)} + (1-y^{(i)})\frac{-g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)}))}{1-g(\theta^t x^{(i)})}x^{(i)}] | ||
</math><br> | |||
<math> | |||
= -\sum [(y^{(i)})(1-g(\theta^t x^{(i)}))x^{(i)} - (1-y^{(i)})g(\theta^t x^{(i)})x^{(i)}] | = -\sum [(y^{(i)})(1-g(\theta^t x^{(i)}))x^{(i)} - (1-y^{(i)})g(\theta^t x^{(i)})x^{(i)}] | ||
</math><br> | |||
<math> | |||
= -\sum [(y^{(i)})x^{(i)} -(y^{(i)}) g(\theta^t x^{(i)}))x^{(i)} - g(\theta^t x^{(i)})x^{(i)} + y^{(i)}g(\theta^t x^{(i)})x^{(i)}] | = -\sum [(y^{(i)})x^{(i)} -(y^{(i)}) g(\theta^t x^{(i)}))x^{(i)} - g(\theta^t x^{(i)})x^{(i)} + y^{(i)}g(\theta^t x^{(i)})x^{(i)}] | ||
</math><br> | |||
<math> | |||
= -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}] | = -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}] | ||
</math><br> | </math><br> | ||
<math> | <math> | ||
\implies \nabla^2_\theta J(\theta) = \nabla_\theta -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}] | \implies \nabla^2_\theta J(\theta) = \nabla_\theta -\sum [(y^{(i)})x^{(i)} - g(\theta^t x^{(i)})x^{(i)}] | ||
</math><br> | |||
<math> | |||
= \sum_i g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)})) x^{(i)} (x^{(i)})^T | = \sum_i g(\theta^t x^{(i)})(1-g(\theta^t x^{(i)})) x^{(i)} (x^{(i)})^T | ||
</math><br> | </math><br> | ||
Line 149: | Line 159: | ||
</math> | </math> | ||
}} | }} | ||
====Common Kernels==== | |||
; Polynomial Kernel | |||
* See [[wikipedia:Polynomial kernel]] | |||
* <math>K(x,z) = (c+x^Tz)^d</math> | |||
* For <math>d=2</math> | |||
** we have <math>(1+x^Tz)^2 = 1 + 2(x^Tz) + (x^Tz)^2</math> | |||
** <math>= 1 + 2 \sum x_i z_i + (\sum x_i z_i)(\sum x_j z_j)</math> | |||
** <math>= 1 + 2 \sum x_i z_i + 2\sum_{i < j} (x_i x_j) (z_i z_j) + \sum (x_i^2)(z_i)^2</math> | |||
** <math>= 1 + \sum (\sqrt{2}x_i) (\sqrt{2}z_i) + \sum_{i < j} (\sqrt{2} x_i x_j)(\sqrt{2} z_i z_j) + \sum x_i^2 z_i^2</math> | |||
* The dimension of the feature map associated with this kernel is exponential in d | |||
** There are <math>1+n+\binom{n}{2} + ... + \binom{n}{d} = O(n^d)</math> terms | |||
==Learning Theory== | ==Learning Theory== |