<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Non-Parametric | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/non-parametric/</link><atom:link href="https://haobin-tan.netlify.app/tags/non-parametric/index.xml" rel="self" type="application/rss+xml"/><description>Non-Parametric</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 07 Nov 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Non-Parametric</title><link>https://haobin-tan.netlify.app/tags/non-parametric/</link></image><item><title>Non-parametric Machine Learning Alogrithms</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/</guid><description/></item><item><title>Linear Discriminant Functions</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/linear-discriminant-functions/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/linear-discriminant-functions/</guid><description>&lt;ul>
&lt;li>No assumption about distributions -&amp;gt; &lt;strong>non-parametric&lt;/strong>&lt;/li>
&lt;li>Linear decision surfaces&lt;/li>
&lt;li>Begin by supervised training (given class of training data)&lt;/li>
&lt;/ul>
&lt;h2 id="linear-discriminant-functions-and-decision-surfaces">Linear Discriminant Functions and Decision Surfaces&lt;/h2>
&lt;p>A discriminant function that is a linear combination of the components of $x$ can be written as
&lt;/p>
$$
g(\mathbf{x})=\mathbf{w}^{T} \mathbf{x}+w\_{0}
$$
&lt;ul>
&lt;li>$\mathbf{x}$: feature vector&lt;/li>
&lt;li>$\mathbf{w}$: weight vector&lt;/li>
&lt;li>$w\_0$: bias or threshold weight&lt;/li>
&lt;/ul>
&lt;h3 id="the-two-category-case">The two category case&lt;/h3>
&lt;p>Decision rule:&lt;/p>
&lt;ul>
&lt;li>Decide $w\_1$ if $g(\mathbf{x}) > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}> -w\_{0}$&lt;/li>
&lt;li>Decide $w\_{2}$ if $g(\mathbf{x}) &lt; 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} &lt; 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}&lt;-w\_{0}$&lt;/li>
&lt;li>$g(\mathbf{x}) = 0$: assign to either class or can be left undefined&lt;/li>
&lt;/ul>
&lt;p>The equation $g(\mathbf{x}) = 0$ defines the decision surface that separates points assigned to $w\_{1}$ from points assigned to $w\_{2}$. When $g(\mathbf{x})$ is linear, this decision surface is a &lt;strong>hyperplane&lt;/strong>.&lt;/p>
&lt;p>For arbitrary $\mathbf{x}\_1$ and $\mathbf{x}\_2$ on the decision surface, we have:
&lt;/p>
$$
\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{1}+w\_{0}=\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{2}+w\_{0}
$$
$$
\mathbf{w}^{\mathrm{T}}\left(\mathbf{x}\_{1}-\mathbf{x}\_{2}\right)=0
$$
&lt;p>$\Rightarrow \mathbf{w}$ is &lt;strong>normal&lt;/strong> to any vector lying in the hyperplane.&lt;/p>
&lt;p>In general, the hyperplane $H$ divides the feature space into two half-spaces:&lt;/p>
&lt;ul>
&lt;li>decision region $R\_1$ for $w\_1$&lt;/li>
&lt;li>decision region $R\_2$ for $w\_2$&lt;/li>
&lt;/ul>
&lt;p>Because $g(\mathbf{x}) > 0$ if $\mathbf{x}$ in $R\_1$, it follows that the normal vector $\mathbf{w}$ points into $R\_1$. Therefore, It is sometimes said that any $\mathbf{x}$ in $R\_1$ is on the &lt;em>positive&lt;/em> side of $H$, and any $\mathbf{x}$ in $R\_2$ is on the &lt;em>negative&lt;/em> side of $H$&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image015.jpg" alt="img">&lt;/p>
&lt;p>The discriminant function $g(\mathbf{x})$ gives an algebraic measure of the distance from $\mathbf{x}$ to the hyperplane. We can write $\mathbf{x}$ as
&lt;/p>
$$
\mathbf{x}=\mathbf{x}\_{p}+r \frac{\mathbf{w}}{\|\mathbf{w}\|}
$$
&lt;ul>
&lt;li>$\mathbf{x}\_{p}$: normal projection of $\mathbf{x}$ onto $H$&lt;/li>
&lt;li>$r$: desired algebraic distance which is positive if $\mathbf{x}$ is on the positive side, else negative&lt;/li>
&lt;/ul>
&lt;p>As $\mathbf{x}\_p$ is on the hyperplane&lt;/p>
$$
\begin{array}{ll}
g\left(\mathbf{x}\_{p}\right)=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{p}+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}}\left(\mathbf{x}-r \frac{\mathbf{w}}{\|\mathbf{w}\|}\right)+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}-r \frac{\mathbf{w}^{\mathrm{T}} \mathbf{w}}{\|\mathbf{w}\|}+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}-r\|\mathbf{w}\| + w\_0 = 0 \\\\
\underbrace{\mathbf{w}^{\mathrm{T}} \mathbf{x} + w\_0}\_{=g(\mathbf{x})} = r\|\mathbf{w}\| \\\\
\Rightarrow g(\mathbf{x}) = r\|\mathbf{w}\| \\\\
\Rightarrow r = \frac{g(\mathbf{x})}{\|\mathbf{w}\|}
\end{array}
$$
&lt;p>In particular, the distance from the origin to hyperplane $H$ is given by $\frac{w_0}{\|\mathbf{w}\|}$&lt;/p>
&lt;ul>
&lt;li>$w\_0 > 0$: the origin is on the &lt;em>positive&lt;/em> side of $H$&lt;/li>
&lt;li>$w\_0 &lt; 0$: the origin is on the &lt;em>negative&lt;/em> side of $H$&lt;/li>
&lt;li>$w\_0 = 0$: $g(\mathbf{x})$ has the homogeneous form $\mathbf{w}^{\mathrm{T}} \mathbf{x}$ and the hyperplane passes through the origin&lt;/li>
&lt;/ul>
&lt;p>A linear discriminant function divides the feature space by a hyperplane decision surface:&lt;/p>
&lt;ul>
&lt;li>orientation: determined by the normal vector $\mathbf{w}$&lt;/li>
&lt;li>location: determined by the bias $w\_0$&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.byclb.com/TR/Tutorials/neural_networks/ch9_1.htm">https://www.byclb.com/TR/Tutorials/neural_networks/ch9_1.htm&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Linear Discriminant Analysis (LDA)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/</guid><description>&lt;p>&lt;strong>Linear Discriminant Analysis (LDA)&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>also called &lt;strong>Fisher’s Linear Discriminant&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>reduces dimension (like PCA)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>but focuses on &lt;strong>maximizing seperability among known categories&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="-idea">💡 Idea&lt;/h2>
&lt;ol>
&lt;li>Create a new axis&lt;/li>
&lt;li>Project the data onto this new axis in a way to maximize the separation of two categories&lt;/li>
&lt;/ol>
&lt;h2 id="how-it-works">How it works?&lt;/h2>
&lt;h3 id="create-a-new-axis">Create a new axis&lt;/h3>
&lt;p>According to two criteria (considered simultaneously):&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Maximize the distance between means&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Minimize the variation $s^2$ (which LDA calls &amp;ldquo;scatter&amp;rdquo;) within each category&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.11.22.png" alt="截屏2020-05-14 15.11.22" style="zoom:50%;" />
&lt;/li>
&lt;/ul>
&lt;p>We have:
&lt;/p>
$$
\frac{(\overbrace{\mu_1 - \mu_2}^{=: d})^2}{s_1^2 + s_2^2} \qquad\left(\frac{\text{''ideally large''}}{\text{"ideally small"}}\right)
$$
&lt;p>
&lt;strong>Why both distance and scatter are important?&lt;/strong>&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-05-14%2015.17.59.png" alt="截屏2020-05-14 15.17.59">&lt;/p>
&lt;h4 id="more-than-2-dimensions">More than 2 dimensions&lt;/h4>
&lt;p>The process is the &lt;strong>same&lt;/strong> 👏:&lt;/p>
&lt;p>Create an axis that maximizes the distance between the means for the two categories while minimizing the scatter&lt;/p>
&lt;h4 id="more-than-2-categories-eg-3-categories">More than 2 categories (e.g. 3 categories)&lt;/h4>
&lt;p>Little difference:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Measure the distances among the means&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Find the point that is &lt;strong>central&lt;/strong> to all of the data&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Then measure the distances between a point that is central in each category and the main central point&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.26.35.png" alt="截屏2020-05-14 15.26.35" style="zoom:50%;" />
&lt;/li>
&lt;li>
&lt;p>Maximize the distance between each category and the central point while minimizing the scatter for each category&lt;/p>
&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.28.40.png" alt="截屏2020-05-14 15.28.40" style="zoom:50%;" />
&lt;/li>
&lt;li>
&lt;p>Create 2 axes to separate the data (because the 3 central points for each category define a plane)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.30.16.png" alt="截屏2020-05-14 15.30.16" style="zoom:50%;" />
&lt;/li>
&lt;/ul>
&lt;h2 id="lda-and-pca">LDA and PCA&lt;/h2>
&lt;h3 id="similarities">Similarities&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Both rank the new axes in order of importance&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>PC1 (the first new axis that PCA creates) accounts for the most variation in the data
&lt;ul>
&lt;li>PC2 (the second new axis) does the second best job&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>LD1 (the first new axis that LDA creates) accounts for the most variation between the categories
&lt;ul>
&lt;li>LD2 does the second best job&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Both can let you dig in and see which features are driving the new axes&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Both try to reduce dimensions&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>PCA looks at the features with the most variation&lt;/li>
&lt;li>LDA tries to maximize the separation of known categories&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=azXCzI57Yfc">https://www.youtube.com/watch?v=azXCzI57Yfc&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>