SVM: Kernelized SVM
SVM (with features)
Math basics
Solve the constrained optimization problem: Method of Lagrangian Multipliers
- Primal optimization problem:
xminβ s.t. βf(x)hiβ(x)β₯biβ, for i=1β¦KβxminβΞ»maxβ s.t. βL(x,Ξ»)=f(x)ββi=1KβΞ»iβ(hiβ(x)βbiβ)Ξ»iββ₯0,i=1β¦KβDual optimization problem
Ξ»\*=Ξ»argmaxβg(Ξ»), s.t. Ξ»iββ₯0,βg(Ξ»)=min_xL(x,Ξ») for i=1β¦Kβ- g : dual function of the optimization problem
- Essentially swapped min and max in the definition of L
Slaters condition: For a convex objective and convex constraints, solving the dual is equivalent to solving the primal
- I.e., optimal primal parameters can be obtained from optimal dual parameters
xβ=xargminβL(x,Ξ»β)
Dual derivation of the SVM
SVM optimization:
βwargminβ s.t. ββ₯wβ₯2yiβ(wTΟ(xiβ)+b)β₯1βLagrangian function:
L(w,Ξ»)=21βwTwβiββΞ±iβ(yiβ(wTΟ(xiβ)+b)β1)Compute optimal w
βββwβLβ=wβiββΞ±iβyiβΟ(xiβ)=!0wβ=iββΞ±iβyiβΟ(xiβ)ββMany of the Ξ±iβ will be zero (constraint satisfied)
If Ξ±iβξ =0βcomplementary slacknessyiβ(wTΟ(xiβ)+b)β1=0
βΟ(xiβ) is a support vector
The optimal weight vector w is a linear combination of the support vectors! π
Optimality condition for b:
βbβLβ=βiββΞ±iβyiβ=!0βiββΞ±iβyiβ=0- We do not obtain a solution for b
- But an additional condition for Ξ±
b can be computed from w:
If Ξ±_i>0, then x_i is on the margin due to complementary slackness condition. I.e.:
yiβ(wTΟ(xiβ)+b)β1yiβ(wTΟ(xiβ)+b)=1yiβyiβββ(wTΟ(xiβ)+b)βb=yiββwTΟ(xiβ)β=0=1=yiβββ
Apply kernel tricks for SVM
L(w,Ξ»)=21βwTwβiββΞ±_i(y_i(wTΟ(xiβ)+b)β1),w\*=β_iΞ±iβy_iΟ(x_i)- Dual function (Wolfe Dual Lagrangian function):
g(Ξ±)β=L(wβ,Ξ±)=21βwβTwβiββjββΞ±iβΞ±jβyiβyjβΟ(xiβ)TΟ(xjβ)βββiββΞ±iβyiβ(wβjββΞ±jβyjβΟ(xjβ)ββ)TΟ(xiβ)+iββΞ±iβ=iββΞ±iββ21βiββjββΞ±iβΞ±jβyiβyjβ=k(xiβ,xjβ)Ο(xiβ)TΟ(xjβ)ββ=iββΞ±iββ21βiββjββΞ±iβΞ±jβyiβyjβk(xiβ,xjβ)β- Wolfe dual optimization problem:
Ξ±minβ s.t ββiβΞ±iββ21ββiββjβΞ±iβΞ±jβyiβyjβk(xiβ,xjβ)Ξ±iββ₯0βi=1,β¦,NβiβΞ±iβyiβ=0βRelaxed constraints with slack variable
Primal optimization problem
wargminβ s.t. ββ₯wβ₯2+CβiNβΞΎiβyiβ(wTxiβ+b)β₯1βΞΎiβ,ΞΎiββ₯0βDual optimization problem
Ξ±minβ s.t ββiβΞ±iββ21ββiββjβΞ±iβΞ±jβyiβyjβk(xiβ,xjβ)Cβ₯Ξ±iββ₯0βi=1,β¦,NβiβΞ±iβyiβ=0βAdd upper bound of C on Ξ±iβ
- Without slack, Ξ±iβββ when constraints are violated (points misclassified)
- Upper bound of C limits the Ξ±iβ, so misclassifications are allowed