5,337
edits
Line 48: | Line 48: | ||
===Lagrangians=== | ===Lagrangians=== | ||
The goal for svm is to maximize the margin: | The goal for svm is to maximize the margin:<br> | ||
<math> | <math> | ||
\begin{aligned} | \begin{aligned} | ||
\max_{\gamma, w, b | \max_{\hat{\gamma}, w, b} &\frac{\hat{\gamma}}{\Vert w \Vert}\\ | ||
\text{s.t.}& y^{(i)}(w^Tx^{(i)} + b) \geq \gamma \quad \forall i | \text{s.t. }& y^{(i)}(w^Tx^{(i)} + b) \geq \hat{\gamma} \quad \forall i | ||
\end{aligned} | \end{aligned} | ||
</math> | </math><br> | ||
which is equivalent to by setting <math>\hat{\gamma}=1</math> | |||
<math> | |||
\begin{aligned} | |||
\min_{\gamma, w, b} &\Vert w \Vert ^2\\ | |||
\text{s.t. }& y^{(i)}(w^Tx^{(i)} + b) \geq 1 \quad \forall i | |||
\end{aligned} | |||
</math><br><br> | |||
In general, given an optimization in the (primal) form:<br> | |||
<math> | |||
\begin{aligned} | |||
\min_w & f(w)\\ | |||
\text{s.t. }& h_i(w) \leq 0 \quad \forall i\\ | |||
& g_i(w) = 0 | |||
\end{aligned} | |||
</math><br> | |||
we can rewrite the optimization as <br> | |||
<math> | |||
\min_{w}\max_{\alpha, \beta \mid \alpha \geq 0} \mathcal{L}(w, \alpha, \beta) | |||
</math><br> | |||
where <math>\mathcal{L}(w, \alpha, \beta) = f(w) + \sum \alpha_i g_i(w) + \sum \beta_i h_i(w)</math> is called the lagrangian. | |||
===Kernel Trick=== | ===Kernel Trick=== |