No assumption about distributions -> non-parametric
Linear decision surfaces
Begin by supervised training (given class of training data)
Linear Discriminant Functions and Decision Surfaces
A discriminant function that is a linear combination of the components of x can be written as
g(x)=wTx+w_0
x: feature vector
w: weight vector
w_0: bias or threshold weight
The two category case
Decision rule:
Decide w_1 if g(x)>0βwTx+w_0>0βwTx>βw_0
Decide w_2 if g(x)<0βwTx+w_0<0βwTx<βw_0
g(x)=0: assign to either class or can be left undefined
The equation g(x)=0 defines the decision surface that separates points assigned to w_1 from points assigned to w_2. When g(x) is linear, this decision surface is a hyperplane.
For arbitrary x_1 and x_2 on the decision surface, we have:
wTx_1+w_0=wTx_2+w_0wT(x_1βx_2)=0
βw is normal to any vector lying in the hyperplane.
In general, the hyperplane H divides the feature space into two half-spaces:
decision region R_1 for w_1
decision region R_2 for w_2
Because g(x)>0 if x in R_1, it follows that the normal vector w points into R_1. Therefore, It is sometimes said that any x in R_1 is on the positive side of H, and any x in R_2 is on the negative side of H
The discriminant function g(x) gives an algebraic measure of the distance from x to the hyperplane. We can write x as
x=x_p+rβ₯wβ₯wβ
x_p: normal projection of x onto H
r: desired algebraic distance which is positive if x is on the positive side, else negative