Linear Discriminant Functions

Linear Discriminant Functions

  • No assumption about distributions -> non-parametric
  • Linear decision surfaces
  • Begin by supervised training (given class of training data)

Linear Discriminant Functions and Decision Surfaces

A discriminant function that is a linear combination of the components of xx can be written as

g(x)=wTx+w_0 g(\mathbf{x})=\mathbf{w}^{T} \mathbf{x}+w\_{0}
  • x\mathbf{x}: feature vector
  • w\mathbf{w}: weight vector
  • w_0w\_0: bias or threshold weight

The two category case

Decision rule:

  • Decide w_1w\_1 if g(x)>0⇔wTx+w_0>0⇔wTx>βˆ’w_0g(\mathbf{x}) > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}> -w\_{0}
  • Decide w_2w\_{2} if g(x)<0⇔wTx+w_0<0⇔wTx<βˆ’w_0g(\mathbf{x}) < 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} < 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}<-w\_{0}
  • g(x)=0g(\mathbf{x}) = 0: assign to either class or can be left undefined

The equation g(x)=0g(\mathbf{x}) = 0 defines the decision surface that separates points assigned to w_1w\_{1} from points assigned to w_2w\_{2}. When g(x)g(\mathbf{x}) is linear, this decision surface is a hyperplane.

For arbitrary x_1\mathbf{x}\_1 and x_2\mathbf{x}\_2 on the decision surface, we have:

wTx_1+w_0=wTx_2+w_0 \mathbf{w}^{\mathrm{T}} \mathbf{x}\_{1}+w\_{0}=\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{2}+w\_{0} wT(x_1βˆ’x_2)=0 \mathbf{w}^{\mathrm{T}}\left(\mathbf{x}\_{1}-\mathbf{x}\_{2}\right)=0

β‡’w\Rightarrow \mathbf{w} is normal to any vector lying in the hyperplane.

In general, the hyperplane HH divides the feature space into two half-spaces:

  • decision region R_1R\_1 for w_1w\_1
  • decision region R_2R\_2 for w_2w\_2

Because g(x)>0g(\mathbf{x}) > 0 if x\mathbf{x} in R_1R\_1, it follows that the normal vector w\mathbf{w} points into R_1R\_1. Therefore, It is sometimes said that any x\mathbf{x} in R_1R\_1 is on the positive side of HH, and any x\mathbf{x} in R_2R\_2 is on the negative side of HH

img

The discriminant function g(x)g(\mathbf{x}) gives an algebraic measure of the distance from x\mathbf{x} to the hyperplane. We can write x\mathbf{x} as

x=x_p+rwβˆ₯wβˆ₯ \mathbf{x}=\mathbf{x}\_{p}+r \frac{\mathbf{w}}{\|\mathbf{w}\|}
  • x_p\mathbf{x}\_{p}: normal projection of x\mathbf{x} onto HH
  • rr: desired algebraic distance which is positive if x\mathbf{x} is on the positive side, else negative

As x_p\mathbf{x}\_p is on the hyperplane

g(x_p)=0wTx_p+w_0=0wT(xβˆ’rwβˆ₯wβˆ₯)+w_0=0wTxβˆ’rwTwβˆ₯wβˆ₯+w_0=0wTxβˆ’rβˆ₯wβˆ₯+w_0=0wTx+w_0⏟_=g(x)=rβˆ₯wβˆ₯β‡’g(x)=rβˆ₯wβˆ₯β‡’r=g(x)βˆ₯wβˆ₯ \begin{array}{ll} g\left(\mathbf{x}\_{p}\right)=0 \\\\ \mathbf{w}^{\mathrm{T}} \mathbf{x}\_{p}+w\_{0}=0 \\\\ \mathbf{w}^{\mathrm{T}}\left(\mathbf{x}-r \frac{\mathbf{w}}{\|\mathbf{w}\|}\right)+w\_{0}=0 \\\\ \mathbf{w}^{\mathrm{T}} \mathbf{x}-r \frac{\mathbf{w}^{\mathrm{T}} \mathbf{w}}{\|\mathbf{w}\|}+w\_{0}=0 \\\\ \mathbf{w}^{\mathrm{T}} \mathbf{x}-r\|\mathbf{w}\| + w\_0 = 0 \\\\ \underbrace{\mathbf{w}^{\mathrm{T}} \mathbf{x} + w\_0}\_{=g(\mathbf{x})} = r\|\mathbf{w}\| \\\\ \Rightarrow g(\mathbf{x}) = r\|\mathbf{w}\| \\\\ \Rightarrow r = \frac{g(\mathbf{x})}{\|\mathbf{w}\|} \end{array}

In particular, the distance from the origin to hyperplane HH is given by w0βˆ₯wβˆ₯\frac{w_0}{\|\mathbf{w}\|}

  • w_0>0w\_0 > 0: the origin is on the positive side of HH
  • w_0<0w\_0 < 0: the origin is on the negative side of HH
  • w_0=0w\_0 = 0: g(x)g(\mathbf{x}) has the homogeneous form wTx\mathbf{w}^{\mathrm{T}} \mathbf{x} and the hyperplane passes through the origin

A linear discriminant function divides the feature space by a hyperplane decision surface:

  • orientation: determined by the normal vector w\mathbf{w}
  • location: determined by the bias w_0w\_0

Reference