Jump to content

Machine Learning: Difference between revisions

Line 48: Line 48:


===Lagrangians===
===Lagrangians===
The goal for svm is to maximize the margin:
The goal for svm is to maximize the margin:<br>
<math>
<math>
\begin{aligned}
\begin{aligned}
\max_{\gamma, w, b\} &\frac{\gamma}{\Vert \gamma \Vert}\\
\max_{\hat{\gamma}, w, b} &\frac{\hat{\gamma}}{\Vert w \Vert}\\
\text{s.t.}& y^{(i)}(w^Tx^{(i)} + b) \geq \gamma \quad \forall i
\text{s.t. }& y^{(i)}(w^Tx^{(i)} + b) \geq \hat{\gamma} \quad \forall i
 
\end{aligned}
\end{aligned}
</math>
</math><br>
which is equivalent to by setting <math>\hat{\gamma}=1</math>
<math>
\begin{aligned}
\min_{\gamma, w, b} &\Vert w \Vert ^2\\
\text{s.t. }& y^{(i)}(w^Tx^{(i)} + b) \geq 1 \quad \forall i
\end{aligned}
</math><br><br>
In general, given an optimization in the (primal) form:<br>
<math>
\begin{aligned}
\min_w & f(w)\\
\text{s.t. }& h_i(w) \leq 0 \quad \forall i\\
& g_i(w) = 0
\end{aligned}
</math><br>
we can rewrite the optimization as <br>
<math>
\min_{w}\max_{\alpha, \beta \mid \alpha \geq 0} \mathcal{L}(w, \alpha, \beta)
</math><br>
where <math>\mathcal{L}(w, \alpha, \beta) = f(w) + \sum \alpha_i g_i(w) + \sum \beta_i h_i(w)</math> is called the lagrangian.


===Kernel Trick===
===Kernel Trick===