SVM: Kernelized SVM
SVM (with features)
Maximum margin principle
Slack variables allow for margin violation
$$ \begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &\|\mathbf{w}\|^{2} + C \sum_i^N \xi_i \\\\ \text { s.t. } \quad & y_{i}\left(\mathbf{w}^{T} \color{red}{\phi(\mathbf{x}_{i})} + b\right) \geq 1 -\xi_i, \quad \xi_i \geq 0\end{array} $$
Math basics
Solve the constrained optimization problem: Method of Lagrangian Multipliers
- Primal optimization problem:
- Lagrangian optimization:
Dual optimization problem
$$ \begin{aligned} \boldsymbol{\lambda}^{\*}=\underset{\boldsymbol{\lambda}}{\arg \max } g(\boldsymbol{\lambda}), \quad & g(\boldsymbol{\lambda})=\min \_{\boldsymbol{x}} L(\boldsymbol{x}, \boldsymbol{\lambda}) \\\\ \text { s.t. } \quad \lambda_{i} \geq 0, & \text { for } i=1 \ldots K \end{aligned} $$- $g$ : dual function of the optimization problem
- Essentially swapped min and max in the definition of $L$
Slaters condition: For a convex objective and convex constraints, solving the dual is equivalent to solving the primal
- I.e., optimal primal parameters can be obtained from optimal dual parameters $$ \boldsymbol{x}^* = \underset{\boldsymbol{x}}{\operatorname{argmin}}L(\boldsymbol{x}, \boldsymbol{\lambda}^*) $$
Dual derivation of the SVM
SVM optimization:
$$ \begin{array}{ll} &\underset{\boldsymbol{w}}{\operatorname{argmin}} \quad &\|\boldsymbol{w}\|^2 \\\\ &\text{ s.t. } \quad &y_i(\boldsymbol{w}^T\phi(\mathbf{x}_i) + b) \geq 1 \end{array} $$Lagrangian function:
$$ L(\boldsymbol{w}, \boldsymbol{\lambda})=\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}-\sum_{i} \alpha_{i}\left(y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1\right) $$Compute optimal $\boldsymbol{w}$
$$ \begin{align} &\frac{\partial L}{\partial \boldsymbol{w}} = \boldsymbol{w} - \sum_i \alpha_i y_i \phi(\boldsymbol{x}_i) \overset{!}{=} 0 \\\\ \Leftrightarrow \quad & \color{CornflowerBlue}{\boldsymbol{w}^* = \sum_i \alpha_i y_i \phi(\boldsymbol{x}_i)} \end{align} $$Many of the $\alpha_i$ will be zero (constraint satisfied)
If $\alpha_i \neq 0 \overset{\text{complementary slackness}}{\Rightarrow} y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1 =0$
$\Rightarrow \phi(\boldsymbol{x}_i)$ is a support vector
The optimal weight vector $\boldsymbol{w}$ is a linear combination of the support vectors! 👏
Optimality condition for $b$:
$$ \frac{\partial L}{\partial b} = - \sum_i \alpha_i y_i \overset{!}{=} 0 \quad \Rightarrow \sum_i \alpha_i y_i = 0 $$- We do not obtain a solution for $b$
- But an additional condition for $\alpha$
$b$ can be computed from $w$:
If $\alpha\_i > 0$, then $\boldsymbol{x}\_i$ is on the margin due to complementary slackness condition. I.e.:
$$ \begin{align}y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1 &= 0 \\\\y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right) &= 1 \\\\ \underbrace{y_{i} y_{i}}_{=1}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right) &= y_{i} \\\\ \Rightarrow b = y_{i} - \boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)\end{align} $$
Apply kernel tricks for SVM
- Lagrangian:
- Dual function (Wolfe Dual Lagrangian function):
- Wolfe dual optimization problem:
Compute primal from dual parameters:
Weight vector
$$ \boldsymbol{w}^{*}=\sum_{i} \alpha_{i} y_{i} \phi\left(\boldsymbol{x}_{i}\right) \label{eq:weight vector} $$- Can not be represented (as it is potentially infinite dimensional). But don’t worry, we don’t need the explicit representation
Bias: For any $i$ with $\alpha_i > 0$ :
- Decision function (Again, we use the kernel trick and therefore we don’t need the explicit representation of the weight vector $\boldsymbol{w}^*$)
Relaxed constraints with slack variable
Primal optimization problem
$$ \begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &\|\mathbf{w}\|^{2} + \color{CornflowerBlue}{C \sum_i^N \xi_i} \\\\ \text { s.t. } \quad & y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i} + b\right) \geq 1 - \color{CornflowerBlue}{\xi_i}, \quad \color{CornflowerBlue}{\xi_i} \geq 0\end{array} $$Dual optimization problem
$$ \begin{array}{ll}\underset{\boldsymbol{\alpha}}{\min} \quad & \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j ) \\\\ \text{ s.t } \quad & \color{CornflowerBlue}{C \geq} \alpha_i \geq 0 \quad \forall i = 1, \dots, N \\\\ & \sum_i \alpha_i y_i = 0\end{array} $$Add upper bound of $\color{CornflowerBlue}{C}$ on $\color{CornflowerBlue}{\alpha_i}$
- Without slack, $\alpha_i \to \infty$ when constraints are violated (points misclassified)
- Upper bound of $C$ limits the $\alpha_i$, so misclassifications are allowed