<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Machine Learning | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/machine-learning/</link><atom:link href="https://haobin-tan.netlify.app/tags/machine-learning/index.xml" rel="self" type="application/rss+xml"/><description>Machine Learning</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 07 Nov 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Machine Learning</title><link>https://haobin-tan.netlify.app/tags/machine-learning/</link></image><item><title>Machine Learning Fundamentals</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/</guid><description/></item><item><title>Math Basics</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/math-basics/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/math-basics/</guid><description>&lt;h2 id="linear-algebra">Linear Algebra&lt;/h2>
&lt;h3 id="vectors">Vectors&lt;/h3>
&lt;p>&lt;strong>Vector&lt;/strong>: multi-dimensional quantity&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Each dimension contains different information (e.g.: Age, Weight, Height&amp;hellip;)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/Vectors.png" alt="Vectors" style="zoom:70%;" />
&lt;/li>
&lt;li>
&lt;p>represented as &lt;strong>bold symbols&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A vector $\boldsymbol{x}$ is always a &lt;strong>column&lt;/strong> vector
&lt;/p>
$$
\boldsymbol{x}=\left[\begin{array}{l}
{1} \\\\
{2} \\\\
{4}
\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>A transposed vector $\boldsymbol{x}^T$ is a &lt;strong>row&lt;/strong> vector
&lt;/p>
$$
\boldsymbol{x}^{T}=\left[\begin{array}{lll}
{1} &amp; {2} &amp; {4}
\end{array}\right]
$$
&lt;/li>
&lt;/ul>
&lt;h4 id="vector-operations">Vector Operations&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Multiplication by scalars&lt;/strong>
&lt;/p>
$$
2\left[\begin{array}{l}
{1} \\\\
{2}
\end{array}\right]=\left[\begin{array}{l}
{2} \\\\
{4}
\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Addtition of vectors&lt;/strong>
&lt;/p>
$$
\left[\begin{array}{l}{1} \\\\ {2} \end{array}\right]+\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]=\left[\begin{array}{l}{4} \\\\ {3} \end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Scalar (Inner) products&lt;/strong>: Sum the element-wise products
&lt;/p>
$$
\boldsymbol{v}=\left[\begin{array}{c}{1} \\\\ {2} \\\\ {4}\end{array}\right], \quad \boldsymbol{w}=\left[\begin{array}{l}{2} \\\\ {4} \\\\ {8}\end{array}\right]
$$
&lt;/li>
&lt;/ul>
$$
\langle v, w\rangle= 1 \cdot 2+2 \cdot 4+4 \cdot 8=42
$$
&lt;ul>
&lt;li>&lt;strong>Length of a vector&lt;/strong>: Square root of the inner product with itself
$$
\|\boldsymbol{v}\|=\langle\boldsymbol{v}, \boldsymbol{v}\rangle^{\frac{1}{2}}=\left(1^{2}+2^{2}+4^{2}\right)^{\frac{1}{2}}=\sqrt{21}
$$&lt;/li>
&lt;/ul>
&lt;h3 id="matrices">Matrices&lt;/h3>
&lt;p>Matrix: rectangular array of numbers arranged in rows and columns&lt;/p>
&lt;ul>
&lt;li>
&lt;p>denoted with &lt;strong>bold upper-case letters&lt;/strong>
&lt;/p>
$$
\boldsymbol{X}=\left[\begin{array}{ll}{1} &amp; {3} \\\\ {2} &amp; {3} \\\\ {4} &amp; {7}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>Dimension: $\\#rows \\times \\#columns$ (E.g.: 👆$X \in \mathbb{R}^{3 \times 2}$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Vectors are special cases of matrices
&lt;/p>
$$
\boldsymbol{x}^{T}=\underbrace{\left[\begin{array}{ccc}{1} &amp; {2} &amp; {4}\end{array}\right]}_{1 \times 3 \text { matrix }}
$$
&lt;/li>
&lt;/ul>
&lt;p>####Matrices in ML&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Data set can be represented as matrix, where single samples are vectors&lt;/p>
&lt;p>e.g.:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Age&lt;/th>
&lt;th>Weight&lt;/th>
&lt;th>Height&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Joe&lt;/td>
&lt;td>37&lt;/td>
&lt;td>72&lt;/td>
&lt;td>175&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mary&lt;/td>
&lt;td>10&lt;/td>
&lt;td>30&lt;/td>
&lt;td>61&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Carol&lt;/td>
&lt;td>25&lt;/td>
&lt;td>65&lt;/td>
&lt;td>121&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Brad&lt;/td>
&lt;td>66&lt;/td>
&lt;td>67&lt;/td>
&lt;td>175&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
$$
\text { Joe: } \boldsymbol{x}\_{1}=\left[\begin{array}{c}{37} \\\\ {72} \\\\ {175}\end{array}\right], \qquad \text { Mary: } \boldsymbol{x}\_{2}=\left[\begin{array}{c}{10} \\\\ {30} \\\\ {61}\end{array}\right] \\\\
$$
$$
\text { Carol: } \boldsymbol{x}\_{3}=\left[\begin{array}{c}{25} \\\\ {65} \\\\ {121}\end{array}\right], \qquad \text { Brad: } \boldsymbol{x}\_{4}=\left[\begin{array}{c}{66} \\\\ {67} \\\\ {175}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Most typical representation:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>row ~ data sample (e.g. Joe)&lt;/li>
&lt;li>column ~ data entry (e.g. age)&lt;/li>
&lt;/ul>
$$
\boldsymbol{X}=\left[\begin{array}{l}{\boldsymbol{x}\_{1}^{T}} \\\\ {\boldsymbol{x}\_{2}^{T}} \\\\ {\boldsymbol{x}\_{3}^{T}} \\\\ {\boldsymbol{x}\_{4}^{T}}\end{array}\right]=\left[\begin{array}{ccc}{37} &amp; {72} &amp; {175} \\\\ {10} &amp; {30} &amp; {61} \\\\ {25} &amp; {65} &amp; {121} \\\\ {66} &amp; {67} &amp; {175}\end{array}\right]
$$
&lt;/li>
&lt;/ul>
&lt;h4 id="matrice-operations">Matrice Operations&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Multiplication with scalar&lt;/strong>
&lt;/p>
$$
3 \boldsymbol{M}=3\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]=\left[\begin{array}{ccc}{9} &amp; {12} &amp; {15} \\\\ {3} &amp; {0} &amp; {3}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Addition of matrices&lt;/strong>
&lt;/p>
$$
\boldsymbol{M} + \boldsymbol{N}=\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]+\left[\begin{array}{lll}{1} &amp; {2} &amp; {1} \\\\ {3} &amp; {1} &amp; {1}\end{array}\right]=\left[\begin{array}{lll}{4} &amp; {6} &amp; {6} \\\\ {4} &amp; {1} &amp; {2}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Transposed&lt;/strong>
&lt;/p>
$$
\boldsymbol{M}=\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right], \boldsymbol{M}^{T}=\left[\begin{array}{ll}{3} &amp; {1} \\\\ {4} &amp; {0} \\\\ {5} &amp; {1}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Matrix-Vector product&lt;/strong> (Vector need to have &lt;strong>same&lt;/strong> dimensionality as number of columns)
&lt;/p>
$$
\underbrace{\left[\boldsymbol{w}\_{1}, \ldots, \boldsymbol{w}\_{n}\right]}_{\boldsymbol{W}} \underbrace{\left[\begin{array}{c}{v\_{1}} \\\\ {\vdots} \\\\ {v\_{n}}\end{array}\right]}\_{\boldsymbol{v}}=\underbrace{\left[\begin{array}{c}{v\_{1} \boldsymbol{w}\_{1}+\cdots+v\_{n} \boldsymbol{w}\_{n}}\end{array}\right]}\_{\boldsymbol{u}}
$$
&lt;p>
E.g.:
&lt;/p>
$$
\boldsymbol{u}=\boldsymbol{W} \boldsymbol{v}=\left[\begin{array}{ccc}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]\left[\begin{array}{l}{1} \\\\ {0} \\\\ {2}\end{array}\right]=\left[\begin{array}{l}{3 \cdot 1+4 \cdot 0+5 \cdot 2} \\\\ {1 \cdot 1+0 \cdot 0+1 \cdot 2}\end{array}\right]=\left[\begin{array}{c}{13} \\\\ {3}\end{array}\right]
$$
&lt;p>
💡 &lt;em>Think as: We sum over the columns $\boldsymbol{w}_i$ of $\boldsymbol{W}$ weighted by $v_i$&lt;/em>&lt;/p>
&lt;/li>
&lt;/ul>
$$
u=v\_{1} w\_{1}+\cdots+v\_{n} w\_{n}=1\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]+0\left[\begin{array}{l}{4} \\\\ {0}\end{array}\right]+2\left[\begin{array}{l}{5} \\\\ {1}\end{array}\right]=\left[\begin{array}{c}{13} \\\\ {3}\end{array}\right]
$$
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Matrix-Matrix product&lt;/strong>
&lt;/p>
$$
\boldsymbol{U} = \boldsymbol{W} \boldsymbol{V}=\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]\left[\begin{array}{ll}{1} &amp; {0} \\\\ {0} &amp; {3} \\\\ {2} &amp; {4}\end{array}\right]=\left[\begin{array}{ll}{3 \cdot 1+4 \cdot 0+5 \cdot 2} &amp; {3 \cdot 0+4 \cdot 3+5 \cdot 4} \\\\ {1 \cdot 1+0 \cdot 0+1 \cdot 2} &amp; {1 \cdot 0+0 \cdot 3+1 \cdot 4}\end{array}\right]=\left[\begin{array}{cc}{13} &amp; {32} \\\\ {3} &amp; {4}\end{array}\right]
$$
&lt;p>
💡 &lt;em>Think of it as: Each column $\boldsymbol{u}\_i = \boldsymbol{W} \boldsymbol{v}\_i$ can be computed by a matrix-vector product&lt;/em>
&lt;/p>
$$
\boldsymbol{W} \underbrace{\left[\boldsymbol{v}\_{1}, \ldots, \boldsymbol{v}\_{n}\right]}\_{\boldsymbol{V}}=[\underbrace{\boldsymbol{W} \boldsymbol{v}\_{1}}_{\boldsymbol{u}\_{1}}, \ldots, \underbrace{\boldsymbol{W} \boldsymbol{v}\_{n}}\_{\boldsymbol{u}\_{n}}]=\boldsymbol{U}
$$
&lt;ul>
&lt;li>
&lt;p>Non-commutative: $\boldsymbol{V} \boldsymbol{W} \neq \boldsymbol{W} \boldsymbol{V}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Associative: $\boldsymbol{V}(\boldsymbol{W} \boldsymbol{X})=(\boldsymbol{V} \boldsymbol{W}) \boldsymbol{X}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Transpose product:
&lt;/p>
$$
(\boldsymbol{V} \boldsymbol{W}) ^{T}=\boldsymbol{W}^{T} \boldsymbol{V}^{T}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Matrix inverse&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>scalar
&lt;/p>
$$
w \cdot w^{-1}=1
$$
&lt;/li>
&lt;li>
&lt;p>matrices
&lt;/p>
$$
\boldsymbol{W} \boldsymbol{W}^{-1}=\boldsymbol{I}, \quad \boldsymbol{W}^{-1} \boldsymbol{W}=\boldsymbol{I}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="important-special-cases">Important Special Cases&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Scalar (Inner) product:&lt;/strong>
&lt;/p>
$$
\langle\boldsymbol{w}, \boldsymbol{v}\rangle = \boldsymbol{w}^{T} \boldsymbol{v}=\left[w\_{1}, \ldots, w\_{n}\right]\left[\begin{array}{c}{v\_{1}} \\\\ {\vdots} \\\\ {v\_{n}}\end{array}\right]=w\_{1} v\_{1}+\cdots+w\_{n} v\_{n}
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Compute row/column averages of matrix&lt;/strong>
&lt;/p>
$$
\boldsymbol{X}=\underbrace{\left[\begin{array}{ccc}{X\_{1,1}} &amp; {\dots} &amp; {X\_{1, m}} \\\\ {\vdots} &amp; {} &amp; {\vdots} \\\\ {X\_{n, 1}} &amp; {\dots} &amp; {X\_{n, m}}\end{array}\right]}\_{n \text { (samples) } \times m \text { (entries) }}
$$
&lt;ul>
&lt;li>
&lt;p>Vector of row averages (average over all entries per sample)
&lt;/p>
$$
\left[\begin{array}{cc}{\frac{1}{m} \sum\_{i=1}^{m} X\_{1, i}} \\\\ {\vdots} &amp; {} \\\\ {\frac{1}{m} \sum_{i=1}^{m} X\_{n, i}}\end{array}\right]=\boldsymbol{X}\left[\begin{array}{c}{\frac{1}{m}} \\\\ {\vdots} \\\\ {\frac{1}{m}}\end{array}\right]=\boldsymbol{X} \boldsymbol{a}, \quad \text { with } \boldsymbol{a}=\left[\begin{array}{c}{\frac{1}{m}} \\\\ {\vdots} \\\\ {\frac{1}{m}}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>Vector of column averages (average over all samples per entry)
&lt;/p>
$$
\left[\frac{1}{n} \sum_{i=1}^{n} X\_{i, 1}, \ldots, \frac{1}{n} \sum\_{i=1}^{n} X\_{i, m}\right]=\left[\frac{1}{n}, \ldots, \frac{1}{n}\right] \boldsymbol{X}=\boldsymbol{b}^{T} \boldsymbol{X}, \text { with } \boldsymbol{b}=\left[\begin{array}{c}{\frac{1}{n}} \\\\ {\vdots} \\\\ {\frac{1}{n}}\end{array}\right]
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="calculus">Calculus&lt;/h2>
&lt;ul>
&lt;li>
&lt;blockquote>
&lt;p>“The derivative of a function of a real variable measures &lt;strong>the sensitivity to change of a quantity&lt;/strong> (a function value or dependent variable) which is determined by another quantity (the independent variable)”&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Scalar&lt;/th>
&lt;th>Vector&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Function&lt;/td>
&lt;td>$f(x)$&lt;/td>
&lt;td>$f(\boldsymbol{x})$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Derivative&lt;/td>
&lt;td>$\frac{\partial f(x)}{\partial x}=g$&lt;/td>
&lt;td>$\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}=\left[\frac{\partial f(\boldsymbol{x})}{\partial x\_{1}}, \ldots, \frac{\partial f(\boldsymbol{x})}{\partial x\_{d}}\right]^{T} =: \nabla f(x)\quad$&lt;br />(👆 gradient of function $f$ at $\boldsymbol{x}$)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Min/Max&lt;/td>
&lt;td>$\frac{\partial f(x)}{\partial x}=0$&lt;/td>
&lt;td>$\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}=[0, \ldots, 0]^{T}=\mathbf{0}$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="matrix-calculus">Matrix Calculus&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Scalar&lt;/th>
&lt;th>Vector&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Linear&lt;/td>
&lt;td>$\frac{\partial a x}{\partial x}=a$&lt;/td>
&lt;td>$\nabla\_{\boldsymbol{x}} \boldsymbol{A} \boldsymbol{x}=\boldsymbol{A}^{T}$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Quadratic&lt;/td>
&lt;td>$\frac{\partial x^{2}}{\partial x}=2 x$&lt;/td>
&lt;td>$\begin{array}{l}{\nabla\_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{x}=2 \boldsymbol{x}} \\\\ {\nabla\_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x}=2 \boldsymbol{A} \boldsymbol{x}}\end{array}$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>End-to-End Machine Learning Project</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/e2e-ml-project/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/e2e-ml-project/</guid><description>&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/e2e_ML_Project.png" alt="e2e_ML_Project" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="1-look-at-the-big-picture">1. Look at the big picture&lt;/h2>
&lt;h3 id="11-frame-the-problem">1.1 Frame the problem&lt;/h3>
&lt;p>Consider the business objective: How do we expect to use and benefit from this model?&lt;/p>
&lt;h3 id="12-select-a-performance-measure">1.2 Select a performance measure&lt;/h3>
&lt;h3 id="13-check-the-assumptions">1.3 Check the assumptions&lt;/h3>
&lt;p>List and verify the assumptions.&lt;/p>
&lt;h2 id="2-get-the-data">2. Get the data&lt;/h2>
&lt;h3 id="21-download-the-data">2.1 Download the data&lt;/h3>
&lt;p>Automate this process: Create a small function to handle downloading, extracting, and storing data.&lt;/p>
&lt;h3 id="22-take-a-quick-look-at-the-data">2.2 Take a quick look at the data&lt;/h3>
&lt;ul>
&lt;li>Use &lt;code>pandas.head()&lt;/code> to look at the top rows of the data&lt;/li>
&lt;li>Use &lt;code>pandas.info()&lt;/code> to get a quick description of the data
&lt;ul>
&lt;li>For categorical attributes, use &lt;code>value_counts()&lt;/code> to see categories and the #samples of each category&lt;/li>
&lt;li>For numerical attributes, use &lt;code>describe()&lt;/code> to get a summary of the numerical attributes.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="create-a-test-set">Create a test set&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>If dataset is large enough, use &lt;strong>purely random sampling&lt;/strong>. (&lt;code>train_test_split&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If the test set need to be representative of the overall data, use &lt;strong>stratified sampling&lt;/strong>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="3-discover-and-visualize-the-data-to-gain-insights">3. Discover and visualize the data to gain insights&lt;/h2>
&lt;ol>
&lt;li>Make sure put the test set aside and only explore the training set&lt;/li>
&lt;li>If the trainingset is very large, sample an exploration set to make manipulations easy and fast&lt;/li>
&lt;/ol>
&lt;h3 id="31-visualizing-data">3.1 Visualizing data&lt;/h3>
&lt;h3 id="32-look-for-correlations">3.2 Look for correlations&lt;/h3>
&lt;p>Two ways:&lt;/p>
&lt;ul>
&lt;li>Compute the &lt;strong>standard correlation coefficient&lt;/strong> (also called &lt;strong>Pearson&amp;rsquo;s r&lt;/strong>) between every pair of attributes using the &lt;code>corr()&lt;/code> method.&lt;/li>
&lt;li>Or use &lt;code>panda&lt;/code>&amp;rsquo;s &lt;code>scatter_matrix&lt;/code> function&lt;/li>
&lt;/ul>
&lt;h3 id="33-experimenting-with-attribute-combinations">3.3 Experimenting with attribute combinations&lt;/h3>
&lt;h2 id="4-prepare-the-data-for-ml-algorithms">4. Prepare the data for ML algorithms&lt;/h2>
&lt;p>&lt;strong>Firstly, ensure a clean training set and separate the predictors and labels.&lt;/strong>&lt;/p>
&lt;h3 id="41-data-cleaning">4.1 Data cleaning&lt;/h3>
&lt;p>Handle missing features:&lt;/p>
&lt;ul>
&lt;li>Get rid of the corresponding samples (districts) -&amp;gt; use &lt;code>dropna()&lt;/code>&lt;/li>
&lt;li>Get rid of the whole attribute -&amp;gt; use &lt;code>drop()&lt;/code>&lt;/li>
&lt;li>Set the values to some value (zero, the mean, the median, etc.) -&amp;gt; use &lt;code>fillna()&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Or apply &lt;code>SimpleImputer&lt;/code> from Scikit-Learn to all the numerical attributes.&lt;/p>
&lt;h3 id="42-handle-text-and-categorical-attributes">4.2 Handle text and categorical attributes&lt;/h3>
&lt;p>Most ML algorithms prefer to work with numbers anyway.
Transform text and categorical attributes to numerical attributes Using One-hot encoding.&lt;/p>
&lt;h3 id="43-custom-transformers">4.3 Custom transformers&lt;/h3>
&lt;p>The custom transformer should work seamlessly with Scikit-Learn functionalities (such as pipelines).
-&amp;gt; Create a class and implement three methods:&lt;/p>
&lt;ul>
&lt;li>&lt;code>fit()&lt;/code>&lt;/li>
&lt;li>&lt;code>transform()&lt;/code>&lt;/li>
&lt;li>&lt;code>fit_transform()&lt;/code> (can get it by simply adding &lt;code>TransfromerMixin&lt;/code> as a base class)&lt;/li>
&lt;/ul>
&lt;p>If we add &lt;code>BaseEstimator&lt;/code> as a bass class, we can get two extra methods&lt;/p>
&lt;ul>
&lt;li>&lt;code>get_params()&lt;/code>&lt;/li>
&lt;li>&lt;code>set_params()&lt;/code>
that will be useful for automatic hyperparameter tuning.&lt;/li>
&lt;/ul>
&lt;h3 id="44-feature-scaling">4.4 Feature scaling&lt;/h3>
&lt;p>Comman ways:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Min-max scaling (normalization)&lt;/strong>: Use &lt;code>MinMaxScalar&lt;/code>&lt;/li>
&lt;li>&lt;strong>Standardization&lt;/strong>
&lt;ul>
&lt;li>Use &lt;code>StandardScalar&lt;/code>&lt;/li>
&lt;li>Less affected by outliners&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="45-transformation-pipelines">4.5 Transformation pipelines&lt;/h3>
&lt;p>Group sequences of transformations into one step.&lt;/p>
&lt;p>&lt;code>Pipeline&lt;/code> from &lt;code>scikit-learn&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>a list of name/estimator pairs defining a sequence of steps&lt;/li>
&lt;li>the last estimator must be transformers (must have a &lt;code>fit_transform()&lt;/code> method)&lt;/li>
&lt;li>names can be anything but must be unique and don&amp;rsquo;t contain double underscores &amp;ldquo;__&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;p>More convenient is to use a &lt;strong>single&lt;/strong> transformer to handle the categorical columns and the numerical columns.
-&amp;gt; Use &lt;code>ColumbTransformer&lt;/code>: handle all columns, applying the appropriate transformations to each column and also works great with Pandas DataFrames.&lt;/p>
&lt;h2 id="5-select-a-model-and-train-it">5. Select a model and train it&lt;/h2>
&lt;h3 id="51-train-and-evaluate-on-the-trainging-set">5.1 Train and evaluate on the trainging set&lt;/h3>
&lt;h3 id="52-better-evaluation-using-cross-validation">5.2 Better evaluation using Cross-Validation&lt;/h3>
&lt;h2 id="6-fine-tune-the-model">6. Fine-tune the model&lt;/h2>
&lt;h3 id="61-grid-search">6.1 Grid search&lt;/h3>
&lt;p>When exploring &lt;strong>relatively few&lt;/strong> combinations, use &lt;code>GridSearchCV&lt;/code>: Tell it which hyperparameters we want to experiment with, and what values to try out. Then it will evaluate all the possible combinations of hyperparameter values, using cross-validation.&lt;/p>
&lt;h3 id="62-randomized-search">6.2 Randomized search&lt;/h3>
&lt;p>When the hyperparameter search space is &lt;strong>large&lt;/strong>, use &lt;code>RandomizedSearchCV&lt;/code>. It evaluates a given number of random combinations by selecting a random value for each hyperparameter at every iteration.&lt;/p>
&lt;h3 id="63-ensemble-methods">6.3 Ensemble methods&lt;/h3>
&lt;p>Try to combine the models that perform best.&lt;/p>
&lt;h3 id="64-analyze-the-best-models-and-their-errors">6.4 Analyze the best models and their errors&lt;/h3>
&lt;p>Gain good insights on the problem by inspecting the best models.&lt;/p>
&lt;h3 id="65-evaluate-the-system-on-the-test-set">6.5 Evaluate the system on the test set&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>Get the predictors and labels from test set&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Run full pipeline to transform the data&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Evaluate the final model on the test set&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="7-present-the-solution">7. Present the solution&lt;/h2>
&lt;h2 id="8-launch-monitor-and-maintain-the-system">8. Launch, monitor, and maintain the system&lt;/h2>
&lt;ul>
&lt;li>Plug the production input data source into the system and write test&lt;/li>
&lt;li>Write monitoring code to check system&amp;rsquo;s live performance at regular intervals and trigger callouts when it drops&lt;/li>
&lt;li>Evaluate the system&amp;rsquo;s input data quality&lt;/li>
&lt;li>Train the models on a regular basis using fresh data (automate this precess as much as possible!)&lt;/li>
&lt;/ul></description></item><item><title>Evaluation</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/evaluation/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/evaluation/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/Confusion_Matrix_and_ROC.png"
alt="Confusion matrix, ROC, and AUC">&lt;figcaption>
&lt;p>Confusion matrix, ROC, and AUC&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;h2 id="confuse-matrix">Confuse matrix&lt;/h2>
&lt;p>A confusion matrix tells you what your ML algorithm did right and what it did wrong.&lt;/p>
&lt;style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-cly1{text-align:left;vertical-align:middle}
.tg .tg-tab6{color:#77b300;text-align:left;vertical-align:top}
.tg .tg-viqs{color:#fe0000;text-align:left;vertical-align:top}
.tg .tg-0lax{text-align:left;vertical-align:top}
.tg .tg-hjor{font-weight:bold;color:#9698ed;text-align:center;vertical-align:middle}
.tg .tg-dsu0{color:#9698ed;text-align:left;vertical-align:top}
.tg .tg-0sd6{font-weight:bold;color:#3399ff;text-align:center;vertical-align:top}
.tg .tg-12v1{color:#3399ff;text-align:left;vertical-align:top}
&lt;/style>
&lt;table class="tg">
&lt;tr>
&lt;th class="tg-0lax" colspan="2" rowspan="2">&lt;/th>
&lt;th class="tg-hjor" colspan="2">Known Truth&lt;/th>
&lt;th class="tg-cly1" rowspan="2">&lt;/th>
&lt;/tr>
&lt;tr>
&lt;td class="tg-dsu0">Positive&lt;/td>
&lt;td class="tg-dsu0">Negative&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0sd6" rowspan="2">&lt;br>Prediction&lt;/td>
&lt;td class="tg-12v1">Positive&lt;/td>
&lt;td class="tg-tab6">True Positive (TP)&lt;/td>
&lt;td class="tg-viqs">False Positive (FP)&lt;/td>
&lt;td class="tg-0lax">Precision = TP / (TP+FP)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-12v1">Negative&lt;/td>
&lt;td class="tg-viqs">False Negative (FN)&lt;/td>
&lt;td class="tg-tab6">True Negative (TN)&lt;/td>
&lt;td class="tg-0lax" rowspan="2">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0lax" colspan="2">&lt;/td>
&lt;td class="tg-0lax">TPR = Sensitivity = Recall &lt;br> = TP / (TP + FN)&lt;/td>
&lt;td class="tg-0lax">Specificity = TN / (FP+TN) &lt;br> FPR = FP / (FP + TN) = 1 - Specificity &lt;/td>
&lt;/tr>
&lt;/table>
&lt;ul>
&lt;li>Row: Prediction&lt;/li>
&lt;li>Column: Known truth&lt;/li>
&lt;/ul>
&lt;p>Each cell:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Positive/negative: refers to the prediction&lt;/p>
&lt;/li>
&lt;li>
&lt;p>True/False: Whether this prediction matches to the truth&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The numbers along the diagonal (green) tell us how many times the samples were correctly classified&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The numbers not on the diagonal (red) are samples the algorithm messed up.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="definition">Definition&lt;/h2>
&lt;h3 id="precision">&lt;strong>Precision&lt;/strong>&lt;/h3>
&lt;p>How many selected items are relevant?
&lt;/p>
$$
\text{ Precision } = \frac{TP}{TP + FP}
=\frac{\\# \text{ relevant item retrieved }}{\\# \text{ of items retrieved }}
$$
&lt;h3 id="recall--true-positive-rate-tpr--sensitivity">&lt;strong>Recall / True Positive Rate (TPR) / Sensitivity&lt;/strong>&lt;/h3>
&lt;p>How many relevant items are selected?
&lt;/p>
$$
\text { Recall } = \frac{TP}{TP + FN}
=\frac{\\# \text { relevant item retrieved }}{\\# \text { of relevant items in collection }}
$$
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/350px-Precisionrecall.svg.png" alt="img">&lt;/p>
&lt;details>
&lt;summary>Example&lt;/summary>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.38.png" alt="截屏2020-09-15 11.51.38" style="zoom: 33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.43.png" alt="截屏2020-09-15 11.51.43" style="zoom:33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.46.png" alt="截屏2020-09-15 11.51.46" style="zoom:33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.49.png" alt="截屏2020-09-15 11.51.49" style="zoom:33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.52.png" alt="截屏2020-09-15 11.51.52" style="zoom:33%;" />
&lt;/details>
&lt;h3 id="f-score--f-measure">&lt;strong>F-score / F-measure&lt;/strong>&lt;/h3>
&lt;h4 id="f_1-score">$F\_1$ score&lt;/h4>
&lt;p>The traditional F-measure or balanced F-score (&lt;strong>$F\_1$ score&lt;/strong>) is the &lt;a href="https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers">harmonic mean&lt;/a> of precision and recall:
&lt;/p>
$$
F\_1=\frac{2 \cdot \text {precison} \cdot \text {recall}}{\text {precision}+\text {recall}} = \frac{2TP}{2TP + FP + FN}
$$
&lt;h4 id="f_beta-score">$F\_\beta$ score&lt;/h4>
&lt;p>$F\_\beta$ uses a positive real factor $\beta$, where $\beta$ is chosen such that &lt;strong>recall is considered $\beta$ times as important as precision&lt;/strong>
&lt;/p>
$$
F\_{\beta}=\left(1+\beta^{2}\right) \cdot \frac{\text { precision } \cdot \text { recall }}{\left(\beta^{2} \cdot \text { precision }\right)+\text { recall }}
$$
&lt;p>
Two commonly used values for $\beta$:&lt;/p>
&lt;ul>
&lt;li>$2$: weighs recall &lt;strong>higher&lt;/strong> than precision&lt;/li>
&lt;li>$0.5$: weighs recall &lt;strong>lower&lt;/strong> than precision&lt;/li>
&lt;/ul>
&lt;h3 id="specificity">Specificity&lt;/h3>
$$
\text{Specifity} = \frac{TN}{FP + TN}
$$
&lt;h3 id="false-positive-rate-fpr">False Positive Rate (FPR)&lt;/h3>
$$
\text{FPR} = \frac{FP}{FP + TN} \left(= 1- \frac{TN}{FP + TN} = 1- \text{Specifity}\right)
$$
&lt;h2 id="relation-between-sensitivity-specificity-fpr-and-threshold">Relation between Sensitivity, Specificity, FPR and Threshold&lt;/h2>
&lt;p>Assuming that the distributions of the actual postive and negative classes looks like this:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/evaluation-metrics-Page-1.png" alt="evaluation-metrics-Page-1" style="zoom:67%;" />
&lt;p>And we have already defined our threshold. What greater than the threshold will be predicted as positive, and smaller than the threshold will be predicted as negative.&lt;/p>
&lt;p>If we set a lower threshold, we&amp;rsquo;ll get the following diagram:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/evaluation-metrics-2.png" alt="evaluation-metrics-2" style="zoom:67%;" />
&lt;p>We can notice that FP ⬆️ , and FN ⬇️ .&lt;/p>
&lt;p>Therefore, we have the relationship:&lt;/p>
&lt;ul>
&lt;li>Threshold ⬇️
&lt;ul>
&lt;li>FP ⬆️ , FN ⬇️&lt;/li>
&lt;li>$\text{Sensitivity} (= TPR) = \frac{TP}{TP + FN}$ ⬆️ , $\text{Specificity} = \frac{TN}{TN + FP}$ ⬇️&lt;/li>
&lt;li>$FPR (= 1 - \text{Specificity})$⬆️&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>And vice versa&lt;/li>
&lt;/ul>
&lt;h2 id="auc-roc-curve">AUC-ROC curve&lt;/h2>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/evaluation-metrics-ROC-AUC.png" alt="evaluation-metrics-ROC-AUC" style="zoom:80%;" />
&lt;p>AUC (&lt;strong>Area Under The Curve&lt;/strong>)-ROC (&lt;strong>Receiver Operating Characteristics&lt;/strong>) curve&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Performance measurement for the classification problems at various threshold settings.&lt;/p>
&lt;ul>
&lt;li>ROC is a probability curve&lt;/li>
&lt;li>AUC represents the degree or measure of separability&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Tells how much the model is capable of distinguishing between classes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="how-is-roc-plotted">How is ROC plotted?&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">threshold&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">thresholds&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="c1"># iterate over all thresholds&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TPR&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">FPR&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">classify&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">threshold&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># calculate TPR and FPR based on threshold&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plot_point&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FPR&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TPR&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># plot coordinate (FPR, TPR) in the diagram&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">connect_points&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># connect all plotted points to get ROC curve&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Example:&lt;/p>
&lt;p>Suppose that the probability of a series of samples being classified into positive classes has been derived and we sort them descendingly:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2021-02-24%2022.05.59.png" alt="截屏2021-02-24 22.05.59" style="zoom: 50%;" />
&lt;ul>
&lt;li>Class: actual label of test sample&lt;/li>
&lt;li>Score: probability of classifying test sample as positive&lt;/li>
&lt;/ul>
&lt;p>Next, we use the &amp;ldquo;Score&amp;rdquo; value as the threshold (from high to low).&lt;/p>
&lt;ul>
&lt;li>
&lt;p>When the probability that the test sample is a positive sample is greater than or equal to this threshold, we consider it a positive sample, otherwise it is a negative sample.&lt;/p>
&lt;ul>
&lt;li>For example, for the 4-th sample, its &amp;ldquo;Score&amp;rdquo; has value 0.6. So Sample 1, 2, 3, 4 will be considered as positive, because their &amp;ldquo;Score&amp;rdquo; values are $\geq$ 0.6. Other samples are classified as negative.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>By picking a different threshold each time, we can get a set of FPR and TPR, i.e., a point on the ROC curve. In this way, we get a total of 20 sets of FPR and TPR values. We plot them in the diagram:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/081955100088586.jpg" alt="img" style="zoom:80%;" />
&lt;/li>
&lt;/ul>
&lt;h3 id="how-to-speculate-about-the-performance-of-the-model">How to speculate about the performance of the model?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>An &lt;strong>excellent&lt;/strong> model has &lt;strong>AUC near to the 1&lt;/strong> which means it has a good measure of separability.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.02.34.png"
alt="Ideal situation: two curves don’t overlap at all means model has an ideal measure of separability. It is perfectly able to distinguish between positive class and negative class.">&lt;figcaption>
&lt;p>Ideal situation: two curves don’t overlap at all means model has an ideal measure of separability. It is perfectly able to distinguish between positive class and negative class.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;/li>
&lt;li>
&lt;p>When $0.5 &lt; \text{AUC} &lt; 1$, there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. This is because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.09.30.png"
alt="When AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish between positive class and negative class.">&lt;figcaption>
&lt;p>When AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish between positive class and negative class.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;/li>
&lt;li>
&lt;p>When AUC is 0.5, it means the model has no class separation capacity whatsoever.&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.05.28.png" alt="截屏2021-02-24 21.05.28">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A &lt;strong>poor&lt;/strong> model has &lt;strong>AUC near to the 0&lt;/strong> which means it has the worst measure of separability.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.05.54.png"
alt="When AUC is approximately 0, the model is actually reciprocating the classes. It means the model is predicting a negative class as a positive class and vice versa.">&lt;figcaption>
&lt;p>When AUC is approximately 0, the model is actually reciprocating the classes. It means the model is predicting a negative class as a positive class and vice versa.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;/li>
&lt;/ul>
&lt;h2 id="-video-tutorials">🎥 Video tutorials&lt;/h2>
&lt;h3 id="the-confusion-matrix">The confusion matrix&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/Kdsp6soqA7o?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="sensitivity-and-specificity">Sensitivity and specificity&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/vP06aMoz4v8?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="roc-and-auc">ROC and AUC&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/4jRBRDbJemM?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5">Understanding AUC - ROC Curve&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deepai.org/machine-learning-glossary-and-terms/f-score">What is the F-score?&lt;/a>: very nice explanation with examples&lt;/li>
&lt;li>&lt;a href="http://www.cnblogs.com/dlml/p/4403482.html">机器学习之分类器性能指标之ROC曲线、AUC值&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Overview of Machine Learning Algorithms</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/ml-algo-overview/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/ml-algo-overview/</guid><description>&lt;h2 id="supervisedunsupervised-learning">Supervised/Unsupervised Learning&lt;/h2>
&lt;h3 id="supervised-learning">Supervised learning&lt;/h3>
&lt;p>The training data you feed to the algorithm &lt;strong>includes&lt;/strong> the desired solutions, called &lt;strong>labels&lt;/strong>&lt;/p>
&lt;p>Typical task:&lt;/p>
&lt;ul>
&lt;li>Classification&lt;/li>
&lt;li>Regression&lt;/li>
&lt;/ul>
&lt;p>Important supervised learning algo:&lt;/p>
&lt;ul>
&lt;li>k-Nearest Neighbors&lt;/li>
&lt;li>Linear Regression&lt;/li>
&lt;li>Logistic Regression&lt;/li>
&lt;li>Support Vector Machine (SVM)&lt;/li>
&lt;li>Decision Trees and Random Forests&lt;/li>
&lt;li>Neural Networks&lt;/li>
&lt;/ul>
&lt;h3 id="unsupervised-learning">Unsupervised learning&lt;/h3>
&lt;p>Training data is &lt;strong>unlabeled&lt;/strong>.&lt;/p>
&lt;p>Important unsupervised learning algo:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Clustering&lt;/p>
&lt;ul>
&lt;li>K-Means&lt;/li>
&lt;li>DBSCAN&lt;/li>
&lt;li>Hierarchical Cluster Analysis (HCA)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Anomaly detection and novelty detection&lt;/p>
&lt;ul>
&lt;li>One-class SVM&lt;/li>
&lt;li>Isolation Forest&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Visualization and dimensionality reduction&lt;/p>
&lt;ul>
&lt;li>Principal Component Analysis (PCA)&lt;/li>
&lt;li>Kernel PCA&lt;/li>
&lt;li>Locally-Linear Embedding (LLE)&lt;/li>
&lt;li>t-distributed Stochastic Neighbor Embedding (t-SNE)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Association rule learning&lt;/p>
&lt;ul>
&lt;li>Apriori&lt;/li>
&lt;li>Eclat&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="semisupervised-learning-supervised--unsupervised">Semisupervised learning (supervised + unsupervised)&lt;/h3>
&lt;p>Deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data&lt;/p>
&lt;h3 id="reinforcement-learning">Reinforcement Learning&lt;/h3>
&lt;p>The learning system, called an &lt;strong>agent&lt;/strong> in this context, can observe the environment, select and perform actions, and get rewards in return or penalties in the form of negative rewards.&lt;/p>
&lt;p>It must then learn by itself what is the best strategy, called a &lt;strong>policy&lt;/strong>, to get the most reward over time.&lt;/p>
&lt;p>A policy defines what action the agent should choose when it is in a given situation.&lt;/p>
&lt;h2 id="batch-and-online-learning">Batch and Online Learning&lt;/h2>
&lt;p>whether the system can learn incrementally from a stream of incoming data or not&lt;/p>
&lt;h3 id="batch-learning">Batch Learning&lt;/h3>
&lt;p>The system muss be trained using all the available data (I.e., it is incapable of learning incrementally)&lt;/p>
&lt;p>First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called &lt;strong>offline learning&lt;/strong>.&lt;/p>
&lt;p>Want a batch learning system to know about new data?&lt;/p>
&lt;p>Need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data). Then stop the old system and replace it with the new one.&lt;/p>
&lt;h3 id="online-learning">Online Learning&lt;/h3>
&lt;p>Train the system &lt;strong>incrementally&lt;/strong> by feeding it data instances sequentially, either individually or by small groups called &lt;strong>mini-batches&lt;/strong>.&lt;/p>
&lt;p>Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives.&lt;/p>
&lt;p>👍 Advantages:&lt;/p>
&lt;ul>
&lt;li>Great for systems that receive data as a continuous flow and need to adapt to chagne rapidly or autonomously&lt;/li>
&lt;li>Save a huge amount of space (After learning the new data instance, do not need them anymore and can just discard them)&lt;/li>
&lt;/ul>
&lt;p>😠 Challenge: if bad data is fed to the system, the system&amp;rsquo;s performance will gradually decline.&lt;/p>
&lt;p>🔧 Solution:&lt;/p>
&lt;ul>
&lt;li>monitor the system closely&lt;/li>
&lt;li>promptly switch learning off if detect a drop in performance&lt;/li>
&lt;li>monitor the input data and react to abnormal data&lt;/li>
&lt;/ul>
&lt;h2 id="instance-based-vs-model-based-learning">Instance-Based Vs. Model-Based Learning&lt;/h2>
&lt;h3 id="instance-based-learning">Instance-based learning&lt;/h3>
&lt;p>The system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them), using a similarity measure&lt;/p>
&lt;h3 id="model-based-learning">Model-based learning&lt;/h3>
&lt;p>Build a model of these examples, then use that model to make predictions&lt;/p></description></item><item><title>Model Selection</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/</guid><description/></item><item><title>Objective Function</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/objective-function/</link><pubDate>Mon, 06 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/model-selection/objective-function/</guid><description>&lt;h2 id="how-does-the-objective-function-look-like">How does the objective function look like?&lt;/h2>
&lt;p>Objective function:&lt;/p>
$$
\operatorname{Obj}(\Theta)= \overbrace{L(\Theta)}^{\text {Training Loss}} + \underbrace{\Omega(\Theta)}_{\text{Regularization}}
$$
&lt;ul>
&lt;li>
&lt;p>Training loss: measures how well the model fit on training data
&lt;/p>
$$
L=\sum_{i=1}^{n} l\left(y_{i}, g_{i}\right)
$$
&lt;ul>
&lt;li>Square loss:
$$
l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2
$$&lt;/li>
&lt;li>Logistic loss:
$$
l(y_i, \hat{y}_i) = y_i \log(1 + e^{-\hat{y}_i}) + (1 - y_i) \log(1 + e^{\hat{y}_i})
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Regularization: How complicated is the model?&lt;/p>
&lt;ul>
&lt;li>$L_2$ norm (Ridge): $\omega(w) = \lambda \|w\|^2$&lt;/li>
&lt;li>$L_1$ norm (Lasso): $\omega(w) = \lambda \|w\|$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-fymr{border-color:inherit;font-weight:bold;text-align:left;vertical-align:top}
&lt;/style>
&lt;table class="tg">
&lt;thead>
&lt;tr>
&lt;th class="tg-0pky">&lt;/th>
&lt;th class="tg-fymr">Objective Function&lt;/th>
&lt;th class="tg-fymr">Linear model?&lt;/th>
&lt;th class="tg-fymr">Loss&lt;/th>
&lt;th class="tg-fymr">Regularization&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td class="tg-fymr">Ridge regression&lt;/td>
&lt;td class="tg-0pky">$\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|^{2}$&lt;/td>
&lt;td class="tg-0pky">✅&lt;/td>
&lt;td class="tg-0pky">square&lt;/td>
&lt;td class="tg-0pky">$L_2$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-fymr">Lasso regression&lt;/td>
&lt;td class="tg-0pky">$\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|$&lt;/td>
&lt;td class="tg-0pky">✅&lt;/td>
&lt;td class="tg-0pky">square&lt;/td>
&lt;td class="tg-0pky">$L_1$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-fymr">Logistic regression&lt;/td>
&lt;td class="tg-0pky">$\sum_{i=1}^{n}\left[y_{i} \cdot \ln \left(1+e^{-w^{\top} x_{i}}\right)+\left(1-y_{i}\right) \cdot \ln \left(1+e^{w^{\top} x_{i}}\right)\right]+\lambda\|w\|^{2}$&lt;/td>
&lt;td class="tg-0pky">✅&lt;/td>
&lt;td class="tg-0pky">logistic&lt;/td>
&lt;td class="tg-0pky">$L_2$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="why-do-we-want-to-contain-two-component-in-the-objective">Why do we want to contain two component in the objective?&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Optimizing training loss encourages predictive models&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;em>Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution&lt;/em>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Optimizing regularization encourages simple models&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;em>Simpler models tends to have smaller variance in future predictions, making prediction&lt;/em> &lt;em>stable&lt;/em>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Regression</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/regression/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/regression/</guid><description/></item><item><title>Machine Learning (ML)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/</guid><description>&lt;!-- TODO: Add `list_children` shortcode --></description></item><item><title>Classification</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/</guid><description/></item><item><title>Logistic Regression: Basics</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression/</guid><description>&lt;p>💡 &lt;strong>Use regression algorithm for classification&lt;/strong>&lt;/p>
&lt;p>Logistic regression: &lt;strong>estimate the probability that an instance belongs to a particular class&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>If the estimated probability is &lt;strong>greater than 50%&lt;/strong>, then the model predicts that the instance belongs to that class (called the &lt;strong>positive&lt;/strong> class, labeled “1”),&lt;/li>
&lt;li>or else it predicts that it does not (i.e., it belongs to the &lt;strong>negative&lt;/strong> class, labeled “0”).&lt;/li>
&lt;/ul>
&lt;p>This makes it a &lt;strong>binary&lt;/strong> classifier.&lt;/p>
&lt;h2 id="logistic--sigmoid-function">Logistic / Sigmoid function&lt;/h2>
&lt;img src="https://upload.wikimedia.org/wikipedia/commons/5/53/Sigmoid-function-2.svg" style="zoom:60%; background-color:white">
&lt;p>$\sigma(t)=\frac{1}{1+\exp (-t)}$&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Bounded: $\sigma(t) \in (0, 1)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Symmetric: $1 - \sigma(t) = \sigma(-t)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Derivative: $\sigma^{\prime}(t)=\sigma(t)(1-\sigma(t))$&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="estimating-probabilities-and-making-prediction">Estimating probabilities and making prediction&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Computes a weighted sum of the input features (plus a bias term)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Outputs the logistic of this result&lt;/p>
&lt;p>$\hat{p}=h_{\theta}(\mathbf{x})=\sigma\left(\mathbf{x}^{\mathrm{T}} \boldsymbol{\theta}\right)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Prediction:&lt;/p>
$$
\hat{y} = \begin{cases} 0 &amp; \text{ if } \hat{p}&lt;0.5\left(\Leftrightarrow h_{\theta}(\mathbf{x})&lt;0\right) \\\\
1 &amp; \text{ if }\hat{p} \geq 0.5\left(\Leftrightarrow h_{\theta}(\mathbf{x}) \geq 0\right)\end{cases}
$$
&lt;/li>
&lt;/ol>
&lt;h2 id="train-and-cost-function">Train and cost function&lt;/h2>
&lt;p>Objective of training: to set the parameter vector $\boldsymbol{\theta}$ so that the model estimates:&lt;/p>
&lt;ul>
&lt;li>high probabilities ($\geq 0.5$) for positive instances ($y=1$)&lt;/li>
&lt;li>low probabilities ($&lt; 0.5$) for negative instances ($y=0$)&lt;/li>
&lt;/ul>
&lt;h3 id="cost-function-of-a-single-training-instance">Cost function of a single training instance:&lt;/h3>
$$
c(\boldsymbol{\theta}) = \begin{cases} -\log (\hat{p}) &amp; \text{ if } y=1 \\\\
-\log (1-\hat{p}) &amp; \text{ if } y=0\end{cases}
$$
&lt;blockquote>
&lt;img src="https://miro.medium.com/max/1621/1*_NeTem-yeZ8Pr9cVUoi_HA.png" style="zoom:30%; background-color:white">
&lt;ul>
&lt;li>Actual lable: $y=1$, Misclassification: $\hat{y} = 0 \Leftrightarrow$ $\hat{p} = \sigma(h_{\boldsymbol{\theta}}(x))$ close to 0 $\Leftrightarrow c(\boldsymbol{\theta})$ large&lt;/li>
&lt;li>Actual lable: $y=0$, Misclassification: $\hat{y} = 1 \Leftrightarrow$ $\hat{p} = \sigma(h_{\boldsymbol{\theta}}(x))$ close to 1 $\Leftrightarrow c(\boldsymbol{\theta})$ large&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;h3 id="the-cost-function-over-the-whole-training-set">The cost function over the whole training set&lt;/h3>
&lt;p>Simply the average cost over all training instances (Combining the expressions of two different cases above into one single expression):&lt;/p>
&lt;p>$\begin{aligned} J(\boldsymbol{\theta}) &amp;=-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \log \left(\hat{p}^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)\right] \\\\ &amp;=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(\hat{p}^{(i)}\right)-\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)\right] \end{aligned}$&lt;/p>
&lt;blockquote>
&lt;ul>
&lt;li>$y^{(i)} =1:-y^{(i)} \log \left(\hat{p}^{(i)}\right)-\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)=-\log \left(\hat{p}^{(i)}\right)$&lt;/li>
&lt;li>$y^{(i)} =0:-y^{(i)} \log \left(\hat{p}^{(i)}\right)-\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)=-\log \left(1-\hat{p}^{(i)}\right)$
(Exactly the same as $c(\boldsymbol{\theta})$ for a single instance above 👏)&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;h3 id="training">Training&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>No closed-form equation 🤪&lt;/p>
&lt;/li>
&lt;li>
&lt;p>But it is convex so Gradient Descent (or any other optimization algorithm) is guaranteed to find the global minimum&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Partial derivatives of the cost function with regards to the $j$-th model parameter $\theta_j$:&lt;/p>
$$
\frac{\partial}{\partial \theta_{j}} J(\boldsymbol{\theta})=\frac{1}{m} \displaystyle \sum_{i=1}^{m}\left(\sigma\left(\boldsymbol{\theta}^{T} \mathbf{x}^{(l)}\right)-y^{(i)}\right) x_{j}^{(i)}
$$
&lt;/li>
&lt;/ul></description></item><item><title>Logistic Regression: Probabilistic view</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression-in-probabilistic-view/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression-in-probabilistic-view/</guid><description>&lt;p>Class label:&lt;/p>
$$
y_i \in \\{0, 1\\}
$$
&lt;p>Conditional probability distribution of the class label is&lt;/p>
$$
\begin{aligned}
p(y=1|\boldsymbol{x}) &amp;= \sigma(\boldsymbol{w}^T\boldsymbol{x}+b) \\\\
p(y=0|\boldsymbol{x}) &amp;= 1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}+b)
\end{aligned}
$$
&lt;p>
with&lt;/p>
$$
\sigma(x) = \frac{1}{1+\operatorname{exp}(-x)}
$$
&lt;p>This is a &lt;strong>conditional Bernoulli distribution&lt;/strong>. Therefore, the probability can be represented as&lt;/p>
$$
\begin{array}{ll}
p(y|\boldsymbol{x}) &amp;= p(y=1|\boldsymbol{x})^y p(y=0|\boldsymbol{x})^{1-y} \\\\
&amp; = \sigma(\boldsymbol{w}^T\boldsymbol{x}+b)^y (1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}+b))^{1-y}
\end{array}
$$
&lt;p>The &lt;strong>conditional Bernoulli log-likelihood&lt;/strong> is (assuming training data is i.i.d)&lt;/p>
$$
\begin{aligned}
\operatorname{loglik}(\boldsymbol{w}, \mathcal{D})
&amp;= \log(\operatorname{lik}(\boldsymbol{w}, \mathcal{D})) \\\\
&amp;= \log(\displaystyle\prod_i p(y_i|\boldsymbol{x}_i)) \\\\
&amp;= \log\left(\displaystyle\prod_i \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)^y \left(1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)\right)^{1-y}\right) \\\\
&amp;= \displaystyle\sum_i y\log\left(\sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)\right)+ (1-y)\log\left(1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)\right)
\end{aligned}
$$
&lt;p>Let&lt;/p>
$$
\tilde{\boldsymbol{w}}=\left(\begin{array}{c}1 \\\\ \boldsymbol{w} \end{array}\right), \quad \tilde{\boldsymbol{x}_i}=\left(\begin{array}{c}b \\\\ \boldsymbol{x}_i \end{array}\right)
$$
&lt;p>Then:&lt;/p>
$$
\operatorname{loglik}(\boldsymbol{w}, \mathcal{D}) = \operatorname{loglik}(\tilde{\boldsymbol{w}}, \mathcal{D}) = \displaystyle\sum_i y\log\left(\sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i})\right)+ (1-y)\log\left(1 - \sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i}))\right)
$$
&lt;p>Our objective is to find the $\tilde{\boldsymbol{w}}^*$ that &lt;strong>maximize the log-likelihood&lt;/strong>, i.e.&lt;/p>
$$
\begin{array}{cl}
\tilde{\boldsymbol{w}}^* &amp;= \underset{\tilde{\boldsymbol{w}}}{\arg \max} \quad \operatorname{loglik}(\tilde{\boldsymbol{w}}, \mathcal{D}) \\\\
&amp;= \underset{\tilde{\boldsymbol{w}}}{\arg \min} \quad -\operatorname{loglik}(\tilde{\boldsymbol{w}}, \mathcal{D})\\\\
&amp;= \underset{\tilde{\boldsymbol{w}}}{\arg \min} \quad \underbrace{-\left(\displaystyle\sum_i y\log\left(\sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i})\right) + (1-y)\log\left(1 - \sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i}))\right)\right)}_{\text{cross-entropy loss}}
\end{array}
$$
&lt;p>In other words, &lt;strong>maximizing the (log-)likelihood is the same as minimizing the cross entropy.&lt;/strong>&lt;/p></description></item><item><title>SVM: Basics</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/support-vector-machine/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/support-vector-machine/</guid><description>&lt;h2 id="-goal-of-svm">🎯 Goal of SVM&lt;/h2>
&lt;p>To find the optimal separating hyperplane which &lt;strong>maximizes the margin&lt;/strong> of the training data&lt;/p>
&lt;ul>
&lt;li>it &lt;strong>correctly&lt;/strong> classifies the training data&lt;/li>
&lt;li>it is the one which will generalize better with unseen data (as far as possible from data points from each category)&lt;/li>
&lt;/ul>
&lt;h2 id="svm-math-formulation">SVM math formulation&lt;/h2>
&lt;p>Assuming data is linear separable&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304135136513.png" alt="image-20200304135136513" style="zoom:50%;" />
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Decision boundary&lt;/strong>: Hyperplane $\mathbf{w}^{T} \mathbf{x}+b=0$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Support Vectors:&lt;/strong> Data points closes to the decision boundary (Other examples can be ignored)&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Positive&lt;/strong> support vectors: $\mathbf{w}^{T} \mathbf{x}_{+}+b=+1$&lt;/li>
&lt;li>&lt;strong>negative&lt;/strong> support vectors: $\mathbf{w}^{T} \mathbf{x}_{-}+b=-1$&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>Why do we use 1 and -1 as class labels?&lt;/p>
&lt;ul>
&lt;li>This makes the math manageable, because -1 and 1 are only different by the sign. We can write a single equation to describe the margin or how close a data point is to our separating hyperplane and not have to worry if the data is in the -1 or +1 class.&lt;/li>
&lt;li>If a point is far away from the separating plane on the positive side, then $w^Tx+b$ will be a large positive number, and $label*(w^Tx+b)$ will give us a large number. If it’s far from the negative side and has a negative label, $label*(w^Tx+b)$ will also give us a large positive number.&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Margin&lt;/strong> $\rho$ : distance between the support vectors and the decision boundary and should be &lt;strong>maximized&lt;/strong>
&lt;/p>
$$
\rho = \frac{\mathbf{w}^{T} \mathbf{x}\_{+}+b}{\|\mathbf{w}\|}-\frac{\mathbf{w}^{T} \mathbf{x}\_{-}+b}{\|\mathbf{w}\|}=\frac{2}{\|\mathbf{w}\|}
$$
&lt;/li>
&lt;/ul>
&lt;h3 id="svm-optimization-problem">SVM optimization problem&lt;/h3>
&lt;p>Requirement:&lt;/p>
&lt;ol>
&lt;li>Maximal margin&lt;/li>
&lt;li>Correct classification&lt;/li>
&lt;/ol>
&lt;p>Based on these requirements, we have:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200713164553044.png" alt="image-20200713164553044" style="zoom:67%;" />
&lt;p>Reformulation:
&lt;/p>
$$
\begin{aligned}
\underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\\|\mathbf{w}\\|^{2} \\\\ \text {s.t.} \quad &amp; y_{i}\left(\mathbf{w}^{T} \mathbf{x}\_{i}+b\right) \geq 1
\end{aligned}
$$
&lt;p>This is the &lt;strong>hard margin SVM&lt;/strong>.&lt;/p>
&lt;h3 id="soft-margin-svm">Soft margin SVM&lt;/h3>
&lt;h4 id="-idea">💡 Idea&lt;/h4>
&lt;p>&lt;strong>&amp;ldquo;Allow the classifier to make some mistakes&amp;rdquo;&lt;/strong> (Soft margin)&lt;/p>
&lt;p>➡️ &lt;strong>Trade-off between margin and classification accuracy&lt;/strong>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304141838595.png" alt="image-20200304141838595" style="zoom:50%;" />
&lt;ul>
&lt;li>
&lt;p>Slack-variables: ${\color {blue}{\xi_{i}}} \geq 0$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>💡&lt;strong>Allows violating the margin conditions&lt;/strong>
&lt;/p>
$$
y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i}+b\right) \geq 1- \color{blue}{\xi_{i}}
$$
&lt;ul>
&lt;li>$0 \leq \xi\_{i} \leq 1$ : sample is between margin and decision boundary (&lt;span style="color:red">&lt;strong>margin violation&lt;/strong>&lt;/span>)&lt;/li>
&lt;li>$\xi\_{i} \geq 1$ : sample is on the wrong side of the decision boundary (&lt;span style="color:red">&lt;strong>misclassified&lt;/strong>&lt;/span>)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="soft-max-margin">Soft Max-Margin&lt;/h4>
&lt;p>Optimization problem
&lt;/p>
$$
\begin{array}{lll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\|\mathbf{w}\|^{2} + \color{blue}{C \sum_i^N \xi_i} \qquad \qquad &amp; \text{(Punish large slack variables)}\\\\
\text { s.t. } \quad &amp; y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i}+b\right) \geq 1 -\color{blue}{\xi_i}, \quad \xi_i \geq 0 \qquad \qquad &amp; \text{(Condition for soft-margin)}\end{array}
$$
&lt;ul>
&lt;li>$C$ : regularization parameter, determines how important $\xi$ should be
&lt;ul>
&lt;li>&lt;strong>Small&lt;/strong> $C$: Constraints have &lt;strong>little&lt;/strong> influence ➡️ &lt;strong>large&lt;/strong> margin&lt;/li>
&lt;li>&lt;strong>Large&lt;/strong> $C$: Constraints have &lt;strong>large&lt;/strong> influence ➡️ &lt;strong>small&lt;/strong> margin&lt;/li>
&lt;li>$C$ infinite: Constraints are enforced ➡️ &lt;strong>hard&lt;/strong> margin&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="soft-svm-optimization">Soft SVM Optimization&lt;/h4>
&lt;p>Reformulate into an unconstrained optimization problem&lt;/p>
&lt;ol>
&lt;li>Rewrite constraints: $\xi_{i} \geq 1-y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i}+b\right)=1-y_{i} f\left(\boldsymbol{x}_{i}\right)$&lt;/li>
&lt;li>Together with $\xi_{i} \geq 0 \Rightarrow \xi_{i}=\max \left(0,1-y_{i} f\left(\boldsymbol{x}_{i}\right)\right)$&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Unconstrained optimization&lt;/strong> (over $\mathbf{w}$):
&lt;/p>
$$
\underset{{\mathbf{w}}}{\operatorname{argmin}} \underbrace{\|\mathbf{w}\|^{2}}\_{\text {regularization }}+C \underbrace{\sum_{i=1}^{N} \max \left(0,1-y\_{i} f\left(\boldsymbol{x}\_{i}\right)\right)}_{\text {loss function }}
$$
&lt;p>
Points are in 3 categories:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>$y\_{i} f\left(\boldsymbol{x}\_{i}\right) > 1$ : Point &lt;strong>outside&lt;/strong> margin, &lt;strong>no contribution&lt;/strong> to loss&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$y\_{i} f\left(\boldsymbol{x}\_{i}\right) = 1$: Point is &lt;strong>on&lt;/strong> the margin, &lt;strong>no contribution&lt;/strong> to loss as &lt;strong>in hard margin&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$y\_{i} f\left(\boldsymbol{x}\_{i}\right) &lt; 1$: &lt;span style="color:red">&lt;strong>Point violates the margin, contributes to loss&lt;/strong>&lt;/span>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="loss-function">Loss function&lt;/h4>
&lt;p>SVMs uses &amp;ldquo;hinge&amp;rdquo; loss (approximation of 0-1 loss)&lt;/p>
&lt;blockquote>
&lt;p>&lt;a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss&lt;/a>&lt;/p>
&lt;p>For an intended output $t=\pm 1$ and a classifier score $y$, the hinge loss of the prediction $y$ is defined as
&lt;/p>
$$
> \ell(y)=\max (0,1-t \cdot y)
> $$
&lt;p>
Note that $y$ should be the &amp;ldquo;raw&amp;rdquo; output of the classifier&amp;rsquo;s decision function, not the predicted class label. For instance, in linear SVMs, $y = \mathbf{w}\cdot \mathbf{x}+ b$, where $(\mathbf{w},b)$ are the parameters of the hyperplane and $mathbf{x}$ is the input variable(s).&lt;/p>
&lt;/blockquote>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304172146690.png" alt="image-20200304172146690" style="zoom:40%;" />
&lt;p>The loss function of SVM is &lt;strong>convex&lt;/strong>:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304172349088.png" alt="image-20200304172349088" style="zoom: 33%;" />
&lt;p>I.e.,&lt;/p>
&lt;ul>
&lt;li>There is only &lt;strong>one&lt;/strong> minimum&lt;/li>
&lt;li>We can find it with gradient descent&lt;/li>
&lt;li>&lt;strong>However:&lt;/strong> Hinge loss is &lt;strong>not differentiable!&lt;/strong> 🤪&lt;/li>
&lt;/ul>
&lt;h2 id="sub-gradients">Sub-gradients&lt;/h2>
&lt;p>For convex function $f: \mathbb{R}^d \to \mathbb{R}$ :
&lt;/p>
$$
f(\boldsymbol{z}) \geq f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{T}(\boldsymbol{z}-\boldsymbol{x})
$$
&lt;p>
(Linear approximation underestimates function)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304172748278.png" alt="image-20200304172748278" style="zoom:33%;" />
&lt;p>A &lt;strong>subgradient&lt;/strong> of a convex function $f$ at point $\boldsymbol{x}$ is any $\boldsymbol{g}$ such that
&lt;/p>
$$
f(\boldsymbol{z}) \geq f(\boldsymbol{x})+\nabla \mathbf{g}^{T}(\boldsymbol{z}-\boldsymbol{x})
$$
&lt;ul>
&lt;li>Always exists (even $f$ is not differentiable)&lt;/li>
&lt;li>If $f$ is differentiable at $\boldsymbol{x}$, then: $\boldsymbol{g}=\nabla f(\boldsymbol{x})$&lt;/li>
&lt;/ul>
&lt;h3 id="example">Example&lt;/h3>
&lt;p>$f(x)=|x|$&lt;/p>
&lt;ul>
&lt;li>$x \neq 0$ : unique sub-gradient is $g= \operatorname{sign}(x)$&lt;/li>
&lt;li>$x =0$ : $g \in [-1, 1]$&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/220px-Absolute_value.svg.png" alt="img">&lt;/p>
&lt;h3 id="sub-gradient-method">Sub-gradient Method&lt;/h3>
&lt;p>&lt;strong>Sub-gradient Descent&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>Given &lt;strong>convex&lt;/strong> $f$, not necessarily differentiable&lt;/li>
&lt;li>Initialize $\boldsymbol{x}_0$&lt;/li>
&lt;li>Repeat: $\boldsymbol{x}\_{t+1}=\boldsymbol{x}\_{t}+\eta \boldsymbol{g}$, where $\boldsymbol{g}$ is any sub-gradient of $f$ at point $\boldsymbol{x}_{t}$&lt;/li>
&lt;/ol>
&lt;p>‼️ Notes:&lt;/p>
&lt;ul>
&lt;li>Sub-gradients do not necessarily decrease $f$ at every step (no real descent method)&lt;/li>
&lt;li>Need to keep track of the best iterate $\boldsymbol{x}^*$&lt;/li>
&lt;/ul>
&lt;h4 id="sub-gradients-for-hinge-loss">Sub-gradients for hinge loss&lt;/h4>
$$
\mathcal{L}\left(\mathbf{x}\_{i}, y\_{i} ; \mathbf{w}\right)=\max \left(0,1-y\_{i} f\left(\mathbf{x}\_{i}\right)\right) \quad f\left(\mathbf{x}\_{i}\right)=\mathbf{w}^{\top} \mathbf{x}\_{i}+b
$$
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304175930294.png" alt="image-20200304175930294" style="zoom:33%;" />
&lt;h4 id="sub-gradient-descent-for-svms">Sub-gradient descent for SVMs&lt;/h4>
&lt;p>Recall the &lt;strong>Unconstrained optimization&lt;/strong> for SVMs:
&lt;/p>
$$
\underset{{\mathbf{w}}}{\operatorname{argmin}} \quad C \underbrace{\sum\_{i=1}^{N} \max \left(0,1-y_{i} f\left(\boldsymbol{x}\_{i}\right)\right)}\_{\text {loss function }} + \underbrace{\|\mathbf{w}\|^{2}}\_{\text {regularization }}
$$
&lt;p>
At each iteration, pick random training sample $(\boldsymbol{x}_i, y_i)$&lt;/p>
&lt;ul>
&lt;li>
&lt;p>If $y_{i} f\left(\boldsymbol{x}_{i}\right)&lt;1$: ​
&lt;/p>
$$
\boldsymbol{w}{t+1}=\boldsymbol{w}{t}-\eta\left(2 \boldsymbol{w}{t}-C y{i} \boldsymbol{x}_{i}\right)
$$
&lt;/li>
&lt;li>
&lt;p>Otherwise:
&lt;/p>
$$
\quad \boldsymbol{w}\_{t+1}=\boldsymbol{w}\_{t}-\eta 2 \boldsymbol{w}\_{t}
$$
&lt;/li>
&lt;/ul>
&lt;h2 id="application-of-svms">Application of SVMs&lt;/h2>
&lt;ul>
&lt;li>Pedestrian Tracking&lt;/li>
&lt;li>text (and hypertext) categorization&lt;/li>
&lt;li>image classification&lt;/li>
&lt;li>bioinformatics (Protein classification, cancer classification)&lt;/li>
&lt;li>hand-written character recognition&lt;/li>
&lt;/ul>
&lt;p>Yet, in the last 5-8 years, neural networks have outperformed SVMs on most applications.🤪☹️😭&lt;/p></description></item><item><title>SVM: Kernel Methods</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernel-methods/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernel-methods/</guid><description>&lt;h2 id="kernel-function">Kernel function&lt;/h2>
&lt;p>Given a mapping function $\phi: \mathcal{X} \rightarrow \mathcal{V}$, the function&lt;/p>
$$
\mathcal{K}: x \rightarrow v, \quad \mathcal{K}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left\langle\phi(\mathbf{x}), \phi\left(\mathbf{x}^{\prime}\right)\right\rangle_{\mathcal{V}}
$$
&lt;p>is called a &lt;strong>kernel function&lt;/strong>.&lt;/p>
&lt;p>&lt;em>&amp;ldquo;A kernel is a function that returns the result of a dot product performed in another space.&amp;rdquo;&lt;/em>&lt;/p>
&lt;h2 id="kernel-trick">Kernel trick&lt;/h2>
&lt;p>Applying the kernel trick simply means &lt;strong>replacing the dot product of two examples by a kernel function&lt;/strong>.&lt;/p>
&lt;h3 id="typical-kernels">Typical kernels&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Kernel Type&lt;/th>
&lt;th>Definition&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Linear kernel&lt;/strong>&lt;/td>
&lt;td>$k\left(\boldsymbol{x}, \boldsymbol{x}^{\prime}\right)=\left\langle\boldsymbol{x}, \boldsymbol{x}^{\prime}\right\rangle$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Polynomial kernel&lt;/strong>&lt;/td>
&lt;td>$k\left(\boldsymbol{x}, \boldsymbol{x}^{\prime}\right)=\left\langle\boldsymbol{x}, \boldsymbol{x}^{\prime}\right\rangle^{d}$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Gaussian / Radial Basis Function (RBF) kernel&lt;/strong>&lt;/td>
&lt;td>$k \left(\boldsymbol{x}, \boldsymbol{y}\right)=\exp \left(-\frac{\|\boldsymbol{x}-\boldsymbol{y}\|^{2}}{2 \sigma^{2}}\right)$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="why-do-we-need-kernel-trick">Why do we need kernel trick?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Kernels can be used for all feature based algorithms that can be rewritten such that they contain &lt;strong>inner products&lt;/strong> of feature vectors&lt;/p>
&lt;ul>
&lt;li>This is true for almost all feature based algorithms (Linear regression, SVMs, &amp;hellip;)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Kernels can be used to map the data $\mathbf{x}$ in an infinite dimensional feature space (i.e., a function space)&lt;/p>
&lt;ul>
&lt;li>&lt;strong>The feature vector never has to be represented explicitly&lt;/strong>&lt;/li>
&lt;li>&lt;strong>As long as we can evaluate the inner product of two feature vectors&lt;/strong>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>➡️ We can obtain a more powerful representation than standard linear feature models.&lt;/p>
&lt;p>&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="1049px" viewBox="-0.5 -0.5 1049 675" content="&amp;lt;mxfile host=&amp;quot;app.diagrams.net&amp;quot; modified=&amp;quot;2020-07-13T14:50:43.530Z&amp;quot; agent=&amp;quot;5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&amp;quot; etag=&amp;quot;XPL9LKrkJbpFtgSQEWYO&amp;quot; version=&amp;quot;13.4.2&amp;quot; type=&amp;quot;device&amp;quot;&amp;gt;&amp;lt;diagram id=&amp;quot;Q3K544789h8GwBtGwhde&amp;quot; name=&amp;quot;Page-1&amp;quot;&amp;gt;7Vpdj5s4FP01kXYeZgQGDHmcJDOt9kNadVbd9qlywAFmALPGmST99WsHEz7sJKQTdhJ1K7XF1zaxz7n3XBt7ZE3T9QeK8ugPEuBkBIxgPbJmIwBsCPm/wrApDa5rloaQxkFpahie4u9YGg1pXcYBLloNGSEJi/O20SdZhn3WsiFKyardbEGS9q/mKMSK4clHiWr9Ow5YJK0mHNcVH3EcRvKnPeCWFSnaNS4NRYQCsipN28lZDyNrSglh5VO6nuJEYFfhUiLwuKd2NzCKM9anw4w+efnrX96vrx8/fV1Nku9P3uZWvuUVJUs5YTlYtqkQ4G/hYPPCZBXFDD/lyBc1K043t0UsTXjJ5I+LOEmmJCF0289aLBbA97m9YJS84EZNAOfQgbxGnUI1HkwZXjdMckofMEkxoxveRNZ60k+ke1ljWV7VZAFb2qIGT2bVEUkHCXevrjHkDxLGEyB1hoM0QNhbCEgXJGMyVjjrGoih7+H54jwQ27CDMVAxhkCDsWUNhTFQMXamv/C/Iup48I7cyZeRO+OGGwV8PmvWRrmNXkYy3IFemlAShxkv+hxLzO0TgWHMf+5eVqRxECT7aKVkmQVYzMroMAirshykJrRPJs3sBIZt2AppurgAQ1E2VohoA0Ioi0hIMpT8TkgumXnGjG0kTGjJSJs3jg7dfJH9t4WvonDnVMXZulk522iw10bP46PB/+xqKuEHh+KpIEvq4wMAWDJ7IRpidqCdzB84aCUmlWiKE8Ti13aeOjttJlTVi4+sctWatYfaOhmMWfcYtQqV9mw83kNl0w1cWdb1HJ5y0zk359uu95SiTaNBTuKMFY03/ykMDc0AHc1wOyuKI+256HT8rRxB7X27qfy4Q1r7pH/Ol3bFJuX/cfVfc/X/Fl+J/h90sjdIvqPmaZ3k20NJfrWG36MdEtWfLA3YPTXBu6w04F5SGgDXlQb6Un4hacBty4jT3Vcebm661vBJwO6fBJ5/7iTgGO+cBOA+quqt2ufr2qqdgSXX7myprX4sDbY7M613TdVHBf19crXbV7jNi0rW7r6Qy6N430oZwESE2pzyp1A83VxJRJ5JNx3otBfPQP1e4jpqRFpwqIi03zUij+603yciveuMSO/kiHz+PyI7EenYao7UReRw21ndYUlJkph+ixb4z5JUFbfFFph73gDAfF1XVsRKX0hQFvLuAKJUAJzNi7ysPE22wfSUV2j9jFPoTOl2ON/UlZrSofTL0sqRLcGQVZfvruc6iHI6X1+g6q7mWLOmg4P5q3cwgwy0Z7+srXe1pb62hGGqp4i/YZptz9YXy8xnMcmagdhDetx90vOyV1x0SqKRkJt9knDwwOeEo89KFN7qLv2jeXeOXH3V0HxL1e7QxoNF81jxCPNOOOw6T2I/FumHcckuFoRyvd66x8VLr6GSuFs1noFE4IC7zhrC1dBoaWh0hqJR9FcWER2iQo5Y3n/+u8svaF69wTiIi+l2cYEaXDz4n+JiHMeliFAuHv0lTTYTivwXIenHhKR2vx6ycsgjKWFlYFmz2/HbtzyH3aP/7YzOIdztztD0cQ2V3mALD6AulH2S5ssSPpQkm1K3cFaIRCpuZYkUFvGx3QZxKsyimajAiC0p3gLiM9HqqjWteckDDOYRZue+jsYfdjrY+pQxmD+oZ7ZApK6Yq5YgPqckWPrsysk9Q8KC4AITluasZQhhPmnfcBYRH5/rQ4XRUWBL86VCq8CD3UEE6vYBvZI4KNoLxhTleZyF5X5f7AI+icFtr82WC3hD1W0/wpzsK1XinU81ldgaZKmp2f47Oif4gSMdXqyvDZdHqvXda+vhXw==&amp;lt;/diagram&amp;gt;&amp;lt;/mxfile&amp;gt;" onclick="(function(svg){var src=window.event.target||window.event.srcElement;while (src!=null&amp;amp;&amp;amp;src.nodeName.toLowerCase()!='a'){src=src.parentNode;}if(src==null){if(svg.wnd!=null&amp;amp;&amp;amp;!svg.wnd.closed){svg.wnd.focus();}else{var r=function(evt){if(evt.data=='ready'&amp;amp;&amp;amp;evt.source==svg.wnd){svg.wnd.postMessage(decodeURIComponent(svg.getAttribute('content')),'*');window.removeEventListener('message',r);}};window.addEventListener('message',r);svg.wnd=window.open('https://app.diagrams.net/?client=1&amp;amp;lightbox=1&amp;amp;edit=_blank');}}})(this);" style="cursor:pointer;max-width:100%;max-height:675px;">&lt;defs>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_Preview {color: #888}
#MathJax_Message {position: fixed; left: 1em; bottom: 1.5em; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap}
#MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px}
.MathJax_Error {color: #CC0000; font-style: italic}
&lt;/style>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute}
.MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: &amp;lsquo;Courier New&amp;rsquo;,Courier; font-size: 9px; color: #F0F0F0}
.MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px}
.MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_Hover_Arrow:hover span {background-color: #CCC!important}
&lt;/style>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_SVG_Display {text-align: center; margin: 1em 0em; position: relative; display: block!important; text-indent: 0; max-width: none; max-height: none; min-width: 0; min-height: 0; width: 100%}
.MathJax_SVG .MJX-monospace {font-family: monospace}
.MathJax_SVG .MJX-sans-serif {font-family: sans-serif}
#MathJax_SVG_Tooltip {background-color: InfoBackground; color: InfoText; border: 1px solid black; box-shadow: 2px 2px 5px #AAAAAA; -webkit-box-shadow: 2px 2px 5px #AAAAAA; -moz-box-shadow: 2px 2px 5px #AAAAAA; -khtml-box-shadow: 2px 2px 5px #AAAAAA; padding: 3px 4px; z-index: 401; position: absolute; left: 0; top: 0; width: auto; height: auto; display: none}
.MathJax_SVG {display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; padding: 0; margin: 0}
.MathJax_SVG * {transition: none; -webkit-transition: none; -moz-transition: none; -ms-transition: none; -o-transition: none}
.MathJax_SVG &amp;gt; div {display: inline-block}
.mjx-svg-href {fill: blue; stroke: blue}
.MathJax_SVG_Processing {visibility: hidden; position: absolute; top: 0; left: 0; width: 0; height: 0; overflow: hidden; display: block!important}
.MathJax_SVG_Processed {display: none!important}
.MathJax_SVG_test {font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-transform: none; letter-spacing: normal; word-spacing: normal; overflow: hidden; height: 1px}
.MathJax_SVG_test.mjx-test-display {display: table!important}
.MathJax_SVG_test.mjx-test-inline {display: inline!important; margin-right: -1px}
.MathJax_SVG_test.mjx-test-default {display: block!important; clear: both}
.MathJax_SVG_ex_box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .MathJax_SVG_left_box {display: inline-block; width: 0; float: left}
.mjx-test-inline .MathJax_SVG_right_box {display: inline-block; width: 0; float: right}
.mjx-test-display .MathJax_SVG_right_box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
.MathJax_SVG .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
&lt;/style>&lt;/defs>&lt;g>&lt;ellipse cx="138" cy="434" rx="120" ry="90" fill="#fff2cc" stroke="#d6b656" pointer-events="all"/>&lt;ellipse cx="708" cy="439" rx="310" ry="165" fill="#dae8fc" stroke="#6c8ebf" pointer-events="all"/>&lt;rect x="118" y="358" width="40" height="20" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 368px; margin-left: 119px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 26px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; font-weight: bold; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-1-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1.875ex" height="1.848ex" viewBox="0 -730.1 807.5 795.5" role="img" focusable="false" style="vertical-align: -0.152ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M324 614Q291 576 250 573Q231 573 231 584Q231 589 232 592Q235 601 244 614T271 643T324 671T400 683H403Q462 683 481 610Q485 594 490 545T498 454L501 413Q504 413 551 442T648 509T705 561Q707 565 707 578Q707 610 682 614Q667 614 667 626Q667 641 695 662T755 683Q765 683 775 680T796 662T807 623Q807 596 792 572T713 499T530 376L505 361V356Q508 346 511 278T524 148T557 75Q569 69 580 69Q585 69 593 77Q624 108 660 110Q667 110 670 110T676 106T678 94Q668 59 624 30T510 0Q487 0 471 9T445 32T430 71T422 117T417 173Q416 183 416 188Q413 214 411 244T407 286T405 299Q403 299 344 263T223 182T154 122Q152 118 152 105Q152 69 180 69Q183 69 187 66T191 60L192 58V56Q192 41 163 21T105 0Q94 0 84 3T63 21T52 60Q52 77 56 90T85 131T155 191Q197 223 259 263T362 327T402 352L391 489Q391 492 390 505T387 526T384 547T379 568T372 586T361 602T348 611Q346 612 341 613T333 614H324Z"/>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-1">\mathcal{X}&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="138" y="376" fill="#000000" font-family="Helvetica" font-size="26px" text-anchor="middle" font-weight="bold">\ma&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 158 422.47 L 494.79 396.63" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 500.77 396.17 L 493.1 400.77 L 494.79 396.63 L 492.49 392.79 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;path d="M 118 424 L 58 424 L 58 171.5 L 319.76 171.5" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 325.76 171.5 L 317.76 175.5 L 319.76 171.5 L 317.76 167.5 Z" fill="#4d9900" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="118" y="404" width="40" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 424px; margin-left: 119px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-2-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.331ex" height="1.636ex" viewBox="0 -496.4 1003.8 704.4" role="img" focusable="false" style="vertical-align: -0.483ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-2">\boldsymbol{x}_i&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="138" y="430" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\bol&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 158 476.19 L 494.81 513.1" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 500.78 513.76 L 492.39 516.86 L 494.81 513.1 L 493.26 508.91 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;path d="M 118 474 L 8 474 L 8 126.5 L 319.76 126.5" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 325.76 126.5 L 317.76 130.5 L 319.76 126.5 L 317.76 122.5 Z" fill="#4d9900" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="118" y="454" width="40" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 474px; margin-left: 119px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-3-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.441ex" height="2.019ex" viewBox="0 -496.4 1051.2 869.2" role="img" focusable="false" style="vertical-align: -0.866ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-3">\boldsymbol{x}_j&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="138" y="480" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\bol&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="678" y="284" width="40" height="20" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 294px; margin-left: 679px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 26px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-4-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1.529ex" height="1.921ex" viewBox="0 -730.1 658.5 827.1" role="img" focusable="false" style="vertical-align: -0.225ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M25 633Q25 647 47 665T100 683Q291 683 291 306Q291 264 288 213T282 132L279 102Q281 102 308 126T378 191T464 279T545 381T596 479Q600 490 600 502Q600 527 581 550T523 577Q505 577 505 601Q505 622 516 647T542 681Q546 683 558 683Q605 679 631 645T658 559Q658 423 487 215Q409 126 308 37T190 -52Q177 -52 177 -28Q177 -26 183 15T196 127T203 270Q203 356 192 421T165 523T126 583T83 613T41 620Q25 620 25 633Z"/>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-4">\mathcal{V}&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="698" y="302" fill="#000000" font-family="Helvetica" font-size="26px" text-anchor="middle">\ma&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 578 401 L 779.84 427.91" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 785.78 428.7 L 777.33 431.61 L 779.84 427.91 L 778.38 423.68 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="503" y="378" width="75" height="36" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 73px; height: 1px; padding-top: 396px; margin-left: 504px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-5-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="5.526ex" height="2.689ex" viewBox="0 -826 2379.3 1157.6" role="img" focusable="false" style="vertical-align: -0.77ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;g transform="translate(596,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(986,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(1989,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-5">\phi(\boldsymbol{x}_i )&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="541" y="402" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\phi(\bo&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 578 505.67 L 779.96 460.79" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 785.82 459.49 L 778.88 465.13 L 779.96 460.79 L 777.14 457.32 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="503" y="494" width="75" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 73px; height: 1px; padding-top: 514px; margin-left: 504px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-6-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="5.636ex" height="2.784ex" viewBox="0 -826 2426.7 1198.8" role="img" focusable="false" style="vertical-align: -0.866ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;g transform="translate(596,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(986,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(2037,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-6">\phi(\boldsymbol{x}_j )&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="541" y="520" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\phi(\bo&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="788" y="414" width="190" height="60" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 188px; height: 1px; padding-top: 444px; margin-left: 789px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 26px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;font style="font-size: 26px">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-7-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="15.319ex" height="2.656ex" viewBox="0 -793.5 6595.8 1143.7" role="img" focusable="false" style="vertical-align: -0.813ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M333 -232Q332 -239 327 -244T313 -250Q303 -250 296 -240Q293 -233 202 6T110 250T201 494T296 740Q299 745 306 749L309 750Q312 750 313 750Q331 750 333 732Q333 727 243 489Q152 252 152 250T243 11Q333 -227 333 -232Z"/>&lt;g transform="translate(389,0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;/g>&lt;g transform="translate(986,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(1375,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(2379,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;g transform="translate(2768,0)">&lt;path stroke-width="1" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"/>&lt;/g>&lt;g transform="translate(3213,0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;/g>&lt;g transform="translate(3810,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(4199,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(5251,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;g transform="translate(5640,0)">&lt;path stroke-width="1" d="M55 732Q56 739 61 744T75 750Q85 750 92 740Q95 733 186 494T278 250T187 6T92 -240Q85 -250 75 -250Q67 -250 62 -245T55 -232Q55 -227 145 11Q236 248 236 250T145 489Q55 727 55 732Z"/>&lt;g transform="translate(389,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M25 633Q25 647 47 665T100 683Q291 683 291 306Q291 264 288 213T282 132L279 102Q281 102 308 126T378 191T464 279T545 381T596 479Q600 490 600 502Q600 527 581 550T523 577Q505 577 505 601Q505 622 516 647T542 681Q546 683 558 683Q605 679 631 645T658 559Q658 423 487 215Q409 126 308 37T190 -52Q177 -52 177 -28Q177 -26 183 15T196 127T203 270Q203 356 192 421T165 523T126 583T83 613T41 620Q25 620 25 633Z"/>&lt;/g>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-7">\langle \phi(\boldsymbol{x}_i ), \phi(\boldsymbol{x}&lt;em>j ) \rangle&lt;/em>\mathcal{V} &lt;/script>&lt;/font>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="883" y="452" fill="#000000" font-family="Helvetica" font-size="26px" text-anchor="middle">\langle \phi(\b&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 568 149 L 883 149 L 883 405.76" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 883 411.76 L 879 403.76 L 883 405.76 L 887 403.76 Z" fill="#4d9900" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="328" y="104" width="240" height="90" fill="none" stroke="#000000" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 238px; height: 1px; padding-top: 149px; margin-left: 329px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 27px; font-family: Helvetica; color: #4D9900; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">Kernel function&lt;br style="font-size: 27px" />&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-8-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="8.827ex" height="2.63ex" viewBox="0 -795 3800.7 1132.5" role="img" focusable="false" style="vertical-align: -0.784ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M121 647Q121 657 125 670T137 683Q138 683 209 688T282 694Q294 694 294 686Q294 679 244 477Q194 279 194 272Q213 282 223 291Q247 309 292 354T362 415Q402 442 438 442Q468 442 485 423T503 369Q503 344 496 327T477 302T456 291T438 288Q418 288 406 299T394 328Q394 353 410 369T442 390L458 393Q446 405 434 405H430Q398 402 367 380T294 316T228 255Q230 254 243 252T267 246T293 238T320 224T342 206T359 180T365 147Q365 130 360 106T354 66Q354 26 381 26Q429 26 459 145Q461 153 479 153H483Q499 153 499 144Q499 139 496 130Q455 -11 378 -11Q333 -11 305 15T277 90Q277 108 280 121T283 145Q283 167 269 183T234 206T200 217T182 220H180Q168 178 159 139T145 81T136 44T129 20T122 7T111 -2Q98 -11 83 -11Q66 -11 57 -1T48 16Q48 26 85 176T158 471L195 616Q196 629 188 632T149 637H144Q134 637 131 637T124 640T121 647Z"/>&lt;g transform="translate(521,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(911,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(1914,0)">&lt;path stroke-width="1" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"/>&lt;/g>&lt;g transform="translate(2359,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(3411,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-8">k(\boldsymbol{x}_i, \boldsymbol{x}_j) &lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="448" y="157" fill="#4D9900" font-family="Helvetica" font-size="27px" text-anchor="middle">Kernel function&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="190.5" y="524" width="230" height="50" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 228px; height: 1px; padding-top: 549px; margin-left: 192px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #FF0000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">1. explicit transformation&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="306" y="555" fill="#FF0000" font-family="Helvetica" font-size="20px" text-anchor="middle">1. explicit transformat&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 550.5 204 L 545.5 204 Q 540.5 204 540.5 214 L 540.5 624 Q 540.5 634 535.5 634 L 533 634 Q 530.5 634 535.5 634 L 538 634 Q 540.5 634 540.5 644 L 540.5 1054 Q 540.5 1064 545.5 1064 L 550.5 1064" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" transform="rotate(-90,540.5,634)" pointer-events="all"/>&lt;rect x="270.5" y="644" width="570" height="30" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 568px; height: 1px; padding-top: 659px; margin-left: 272px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #FF0000; line-height: 1.2; pointer-events: all; font-style: italic; white-space: normal; word-wrap: normal; ">computationally expensive for high-dimensional feature vector&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="556" y="665" fill="#FF0000" font-family="Helvetica" font-size="20px" text-anchor="middle" font-style="italic">computationally expensive for high-dimensional feature ve&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="560.5" y="524" width="230" height="50" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 228px; height: 1px; padding-top: 549px; margin-left: 562px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #FF0000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">2. inner product&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="676" y="555" fill="#FF0000" font-family="Helvetica" font-size="20px" text-anchor="middle">2. inner product&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 458 -386 L 453 -386 Q 448 -386 448 -376 L 448 44 Q 448 54 443 54 L 440.5 54 Q 438 54 443 54 L 445.5 54 Q 448 54 448 64 L 448 484 Q 448 494 453 494 L 458 494" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" transform="rotate(90,448,54)" pointer-events="all"/>&lt;rect x="190.5" y="14" width="520" height="20" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 518px; height: 1px; padding-top: 24px; margin-left: 192px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #4D9900; line-height: 1.2; pointer-events: all; font-weight: bold; font-style: italic; white-space: normal; word-wrap: normal; ">avoids explicit mapping &lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-9-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.324ex" height="1.54ex" viewBox="0 -578.8 1000.5 663.2" role="img" focusable="false" style="vertical-align: -0.196ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M580 514Q580 525 596 525Q601 525 604 525T609 525T613 524T615 523T617 520T619 517T622 512Q659 438 720 381T831 300T927 263Q944 258 944 250T935 239T898 228T840 204Q696 134 622 -12Q618 -21 615 -22T600 -24Q580 -24 580 -17Q580 -13 585 0Q620 69 671 123L681 133H70Q56 140 56 153Q56 168 72 173H725L735 181Q774 211 852 250Q851 251 834 259T789 283T735 319L725 327H72Q56 332 56 347Q56 360 70 367H681L671 377Q638 412 609 458T580 514Z"/>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-9">\Rightarrow&lt;/script> computationally cheaper&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="451" y="30" fill="#4D9900" font-family="Helvetica" font-size="20px" text-anchor="middle" font-weight="bold" font-style="italic">avoids explicit mapping \Rightarrow computationally&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;/g>&lt;switch>&lt;g requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"/>&lt;a transform="translate(0,-5)" xlink:href="https://desk.draw.io/support/solutions/articles/16000042487" target="_blank">&lt;text text-anchor="middle" font-size="10px" x="50%" y="100%">Viewer does not support full SVG 1.1&lt;/text>&lt;/a>&lt;/switch>&lt;/svg>&lt;/p></description></item><item><title>SVM: Kernelized SVM</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernelized-svm/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernelized-svm/</guid><description>&lt;h2 id="svm-with-features">SVM (with features)&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Maximum margin principle&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Slack variables allow for margin violation
&lt;/p>
$$
\begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\|\mathbf{w}\|^{2} + C \sum_i^N \xi_i \\\\ \text { s.t. } \quad &amp; y_{i}\left(\mathbf{w}^{T} \color{red}{\phi(\mathbf{x}_{i})} + b\right) \geq 1 -\xi_i, \quad \xi_i \geq 0\end{array}
$$
&lt;/li>
&lt;/ul>
&lt;h2 id="math-basics">Math basics&lt;/h2>
&lt;p>Solve the constrained optimization problem: &lt;strong>Method of Lagrangian Multipliers&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Primal optimization problem&lt;/strong>:&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
\underset{\boldsymbol{x}}{\min} \quad &amp; f(\boldsymbol{x}) \\\\
\text { s.t. } \quad &amp; h_{i}(\boldsymbol{x}) \geq b_{i}, \text { for } i=1 \ldots K
\end{array}
$$
&lt;ul>
&lt;li>&lt;strong>Lagrangian optimization&lt;/strong>:&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
\underset{\boldsymbol{x}}{\min} \underset{\boldsymbol{\lambda}}{\max} \quad &amp; L(\boldsymbol{x}, \boldsymbol{\lambda}) = f(\boldsymbol{x}) - \sum_{i=1}^K \lambda_i(h_i(\boldsymbol{x}) - b_i) \\\\
\text{ s.t. } &amp;\lambda_i\geq 0, \quad i = 1\dots K
\end{array}
$$
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Dual optimization problem&lt;/strong>
&lt;/p>
$$
\begin{aligned}
\boldsymbol{\lambda}^{\*}=\underset{\boldsymbol{\lambda}}{\arg \max } g(\boldsymbol{\lambda}), \quad &amp; g(\boldsymbol{\lambda})=\min \_{\boldsymbol{x}} L(\boldsymbol{x}, \boldsymbol{\lambda}) \\\\
\text { s.t. } \quad \lambda_{i} \geq 0, &amp; \text { for } i=1 \ldots K
\end{aligned}
$$
&lt;ul>
&lt;li>$g$ : &lt;strong>dual function&lt;/strong> of the optimization problem&lt;/li>
&lt;li>Essentially swapped min and max in the definition of $L$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Slaters condition:&lt;/strong> For a &lt;strong>convex&lt;/strong> objective and &lt;strong>convex&lt;/strong> constraints, &lt;strong>solving the dual is equivalent to solving the primal&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>I.e., optimal primal parameters can be obtained from optimal dual parameters
$$
\boldsymbol{x}^* = \underset{\boldsymbol{x}}{\operatorname{argmin}}L(\boldsymbol{x}, \boldsymbol{\lambda}^*)
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="dual-derivation-of-the-svm">Dual derivation of the SVM&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>SVM optimization:
&lt;/p>
$$
\begin{array}{ll}
&amp;\underset{\boldsymbol{w}}{\operatorname{argmin}} \quad &amp;\|\boldsymbol{w}\|^2 \\\\
&amp;\text{ s.t. } \quad &amp;y_i(\boldsymbol{w}^T\phi(\mathbf{x}_i) + b) \geq 1
\end{array}
$$
&lt;/li>
&lt;li>
&lt;p>Lagrangian function:
&lt;/p>
$$
L(\boldsymbol{w}, \boldsymbol{\lambda})=\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}-\sum_{i} \alpha_{i}\left(y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1\right)
$$
&lt;/li>
&lt;li>
&lt;p>Compute optimal $\boldsymbol{w}$
&lt;/p>
$$
\begin{align}
&amp;\frac{\partial L}{\partial \boldsymbol{w}} = \boldsymbol{w} - \sum_i \alpha_i y_i \phi(\boldsymbol{x}_i) \overset{!}{=} 0 \\\\
\Leftrightarrow \quad &amp; \color{CornflowerBlue}{\boldsymbol{w}^* = \sum_i \alpha_i y_i \phi(\boldsymbol{x}_i)}
\end{align}
$$
&lt;ul>
&lt;li>
&lt;p>Many of the $\alpha_i$ will be zero (constraint satisfied)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If $\alpha_i \neq 0 \overset{\text{complementary slackness}}{\Rightarrow} y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1 =0$&lt;/p>
&lt;p>$\Rightarrow \phi(\boldsymbol{x}_i)$ is a support vector&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The optimal weight vector $\boldsymbol{w}$ is a &lt;strong>linear combination of the support vectors&lt;/strong>! 👏&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Optimality condition for $b$:
&lt;/p>
$$
\frac{\partial L}{\partial b} = - \sum_i \alpha_i y_i \overset{!}{=} 0 \quad \Rightarrow \sum_i \alpha_i y_i = 0
$$
&lt;ul>
&lt;li>We do not obtain a solution for $b$&lt;/li>
&lt;li>But an additional condition for $\alpha$&lt;/li>
&lt;/ul>
&lt;p>$b$ can be computed from $w$:&lt;/p>
&lt;p>If $\alpha\_i > 0$, then $\boldsymbol{x}\_i$ is on the margin due to complementary slackness condition. I.e.:
&lt;/p>
$$
\begin{align}y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1 &amp;= 0 \\\\y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right) &amp;= 1 \\\\ \underbrace{y_{i} y_{i}}_{=1}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right) &amp;= y_{i} \\\\ \Rightarrow b = y_{i} - \boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)\end{align}
$$
&lt;/li>
&lt;/ol>
&lt;h2 id="apply-kernel-tricks-for-svm">Apply kernel tricks for SVM&lt;/h2>
&lt;ul>
&lt;li>Lagrangian:&lt;/li>
&lt;/ul>
$$
L(\boldsymbol{w}, \boldsymbol{\lambda}) = {\color{red}{\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}}} - \sum_{i} \alpha\_{i}\left({\color{green}{y\_{i} (w^{T} \phi\left(x_{i}\right)}}+ b)-\color{CornflowerBlue}{1}\right), \quad \boldsymbol{w}^{\*}=\sum\_{i} \alpha_{i} y\_{i} \phi\left(\boldsymbol{x}\_{i}\right)
$$
&lt;ul>
&lt;li>Dual function (&lt;strong>Wolfe Dual Lagrangian function&lt;/strong>):&lt;/li>
&lt;/ul>
$$
\begin{aligned}
g(\boldsymbol{\alpha}) &amp;=L\left(\boldsymbol{w}^{*}, \boldsymbol{\alpha}\right) \\\\
&amp;=\color{red}{\frac{1}{2} \underbrace{\sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \phi\left(\boldsymbol{x}_{i}\right)^{T} \phi\left(\boldsymbol{x}_{j}\right)}_{{\boldsymbol{w}^*}^T \boldsymbol{w}^*}} - \color{green}{\sum_{i} \alpha_{i} y_{i}(\underbrace{\sum_{j} \alpha_{j} y_{j} \phi\left(x_{j}\right)}_{\boldsymbol{w}^*})^{T} \phi\left(x_{i}\right)} + \color{CornflowerBlue}{\sum_{i} \alpha_{i}} \\\\
&amp;=\sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \underbrace{\phi\left(\boldsymbol{x}_{i}\right)^{T} \phi\left(\boldsymbol{x}_{j}\right)}_{\overset{}{=} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j)} \\\\
&amp;= \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j )
\end{aligned}
$$
&lt;ul>
&lt;li>&lt;strong>Wolfe dual optimization problem&lt;/strong>:&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
\underset{\boldsymbol{\alpha}}{\min} \quad &amp; \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j ) \\\\
\text{ s.t } \quad &amp; \alpha_i \geq 0 \quad \forall i = 1, \dots, N \\\\
&amp; \sum_i \alpha_i y_i = 0
\end{array}
$$
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Compute primal from dual parameters&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Weight vector&lt;/strong>
&lt;/p>
$$
\boldsymbol{w}^{*}=\sum_{i} \alpha_{i} y_{i} \phi\left(\boldsymbol{x}_{i}\right)
\label{eq:weight vector}
$$
&lt;ul>
&lt;li>Can not be represented (as it is potentially infinite dimensional). But don&amp;rsquo;t worry, we don&amp;rsquo;t need the explicit representation&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bias&lt;/strong>: For any $i$ with $\alpha_i > 0$ :&lt;/p>
&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
b &amp;=y_{k}-\mathbf{w}^{T} \phi\left(\boldsymbol{x}_{k}\right) \\\\
&amp;=y_{k}-\sum_{i} y_{i} \alpha_{i} k\left(\boldsymbol{x}_{i}, \boldsymbol{x}_{k}\right)
\end{array}
$$
&lt;ul>
&lt;li>&lt;strong>Decision function&lt;/strong> (Again, we use the kernel trick and therefore we don&amp;rsquo;t need the explicit representation of the weight vector $\boldsymbol{w}^*$)&lt;/li>
&lt;/ul>
$$
\begin{aligned}f(\boldsymbol{x}) &amp;= (\boldsymbol{w}^{*})^{T} \boldsymbol{\phi}(\boldsymbol{x}) + b \\\\
&amp;\overset{}{=} \left(\sum_{i} \alpha_{i} y_{i} \phi\left(\boldsymbol{x}_{i}\right)\right)^{T} \boldsymbol{\phi}(\boldsymbol{x}) + b \\\\
&amp;= \sum_{i} \alpha_{i} y_{i} \boldsymbol{\phi}(\boldsymbol{x}_i)^{T} \boldsymbol{\phi}(\boldsymbol{x}) + b \\\\
&amp; \overset{}{=}\sum_i y_{i} \alpha_{i} k\left(\boldsymbol{x}_{i}, \boldsymbol{x}\right)+b\end{aligned}
$$
&lt;/li>
&lt;/ul>
&lt;h2 id="relaxed-constraints-with-slack-variable">Relaxed constraints with slack variable&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Primal optimization problem&lt;/strong>
&lt;/p>
$$
\begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\|\mathbf{w}\|^{2} + \color{CornflowerBlue}{C \sum_i^N \xi_i} \\\\
\text { s.t. } \quad &amp; y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i} + b\right) \geq 1 - \color{CornflowerBlue}{\xi_i}, \quad \color{CornflowerBlue}{\xi_i} \geq 0\end{array}
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Dual optimization problem&lt;/strong>
&lt;/p>
$$
\begin{array}{ll}\underset{\boldsymbol{\alpha}}{\min} \quad &amp; \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j ) \\\\ \text{ s.t } \quad &amp; \color{CornflowerBlue}{C \geq} \alpha_i \geq 0 \quad \forall i = 1, \dots, N \\\\ &amp; \sum_i \alpha_i y_i = 0\end{array}
$$
&lt;p>&lt;span style="color:CornflowerBlue">Add upper bound of &lt;/span> $\color{CornflowerBlue}{C}$ &lt;span style="color:CornflowerBlue">on&lt;/span> $\color{CornflowerBlue}{\alpha_i}$&lt;/p>
&lt;ul>
&lt;li>Without slack, $\alpha_i \to \infty$ when constraints are violated (points misclassified)&lt;/li>
&lt;li>Upper bound of $C$ limits the $\alpha_i$, so misclassifications are allowed&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Decision Trees</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/decision-tree/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/decision-tree/</guid><description/></item><item><title>Ensemble Learning</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/</guid><description/></item><item><title>Why ensemble learning?</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/why-ensemble-learning/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/why-ensemble-learning/</guid><description>&lt;p>&lt;strong>wisdom of the crowd&lt;/strong> : In many cases you will find that this aggregated answer is better than an expert’s answer.&lt;/p>
&lt;p>Similarly, if you aggregate the predictions of a group of predictors (such as classifiers or regressors), you will often get &lt;strong>better&lt;/strong> predictions than with the best individual predictor.&lt;/p>
&lt;p>A group of predictors is called an &lt;strong>ensemble&lt;/strong>;&lt;/p>
&lt;p>thus, this technique is called &lt;strong>Ensemble Learning&lt;/strong>,&lt;/p>
&lt;p>and an Ensemble Learning algorithm is called an &lt;strong>Ensemble method&lt;/strong>.&lt;/p>
&lt;p>Popular Emsemble methods:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/bagging-and-pasting/">Bagging and Pasting&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/boosting/">Boosting&lt;/a>&lt;/li>
&lt;li>stacking&lt;/li>
&lt;li>&lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/voting-classifier/">Voting Classifier&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Voting Classifier</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/voting-classifier/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/voting-classifier/</guid><description>&lt;p>Suppose we have trained a few classifiers, each one achieving about 80% accuracy.&lt;/p>
&lt;p>A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the &lt;strong>most&lt;/strong> votes.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/Voting_Classifier.png" alt="Voting_Classifier" style="zoom:67%;" />
&lt;p>This majority-vote classifier is called a &lt;strong>hard voting classifier&lt;/strong>&lt;/p>
&lt;blockquote>
&lt;p>Surprisingly, this voting classifier often achieves a higher accuracy than the best classifier in the ensemble. In fact, even if each classifier is a weak learner (meaning it does only slightly better than random guessing), the ensemble can still be a strong learner (achieving high accuracy), provided there are a sufficient number of weak learners and they are sufficiently diverse. (Reason behind: the law of large numbers)&lt;/p>
&lt;/blockquote>
&lt;p>&lt;strong>Ensemble methods work best when the predictors are as independent from one another as possible.&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>One way to get diverse classifiers is to &lt;strong>train them using very different algorithms.&lt;/strong> This increases the chance that they will make very different types of errors, improving the ensemble’s accuracy.&lt;/li>
&lt;li>Another approach is to use the &lt;strong>same&lt;/strong> training algorithm for every predictor, but to train them on different random subsets of the training set. (See &lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/bagging-and-pasting/">Bagging and Pasting&lt;/a>)&lt;/li>
&lt;/ul></description></item><item><title>Random Forest</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/random-forest/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/random-forest/</guid><description>&lt;img src="https://i.stack.imgur.com/iY55n.jpg" style="zoom:80%; background-color:white">
&lt;p>Train a group of Decision Tree classifiers (generally via the bagging method (or sometimes pasting)), each on a different random subset of the training set&lt;/p>
&lt;p>To make predictions, just obtain the preditions of all individual trees, then predict the class that gets the &lt;strong>most&lt;/strong> votes.&lt;/p>
&lt;h2 id="why-is-random-forest-good">Why is Random Forest good?&lt;/h2>
&lt;p>The Random Forest algorithm &lt;strong>introduces extra randomness&lt;/strong> when growing trees; instead of searching for the very best feature when splitting a node, it searches for the best feature among a random subset of features. &lt;strong>This results in a greater tree diversity, which (once again) trades a higher bias for a lower variance, generally yielding an overall better model.&lt;/strong> 👏&lt;/p></description></item><item><title>Ensemble Learners</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/ensemble-learners/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/ensemble-learners/</guid><description>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/Un9zObFjBH0?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h2 id="why-emsemble-learners">Why emsemble learners?&lt;/h2>
&lt;p>Lower error&lt;/p>
&lt;ul>
&lt;li>Each learner (model) has its own bias. It we put them together, the bias tend to be reduced (they fight against each other in some sort of way)&lt;/li>
&lt;li>Less overfitting&lt;/li>
&lt;li>Tastes great&lt;/li>
&lt;/ul></description></item><item><title>Boosting</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/boosting/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/boosting/</guid><description>&lt;h1 id="boosting">Boosting&lt;/h1>
&lt;p>Refers to any Ensemble method that can &lt;strong>combine serval weak learners into a strong learner&lt;/strong>&lt;/p>
&lt;p>💡 &lt;strong>General idea: train predictors sequentially, each trying to correct its predecessor.&lt;/strong>&lt;/p>
&lt;p>Popular boosting methods:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/adaboost/">AdaBoost&lt;/a>&lt;/li>
&lt;li>Gradient Boost&lt;/li>
&lt;/ul></description></item><item><title>Bagging and Pasting</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/bagging-and-pasting/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/bagging-and-pasting/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Bootstrap Aggregating (Boosting): Sampling &lt;strong>with&lt;/strong> replacement&lt;/p>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/Boostrap_Aggregating.png" alt="Boostrap_Aggregating" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pasting: Sampling &lt;strong>without&lt;/strong> replacement&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="explaination">Explaination&lt;/h2>
&lt;p>Ensemble methods work best when the predictors are as independent from one another as possible.&lt;/p>
&lt;p>One way to get a diverse set of classifiers: &lt;strong>use the same training algorithm for every predictor, but to train them on different random subsets of the training set&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Sampling &lt;strong>with&lt;/strong> replacement: &lt;strong>boostrap aggregating (Bagging)&lt;/strong>&lt;/li>
&lt;li>Sampling &lt;strong>without&lt;/strong> replacement: &lt;strong>pasting&lt;/strong>&lt;/li>
&lt;/ul>
&lt;p>Once all predictors are trained, the ensemble can make a prediction for a new instance by simply aggregating the predictions of all predictors. The aggregation function is typically the &lt;strong>statistical mode&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>classification: the most frequent prediction (just like a hard voting classifier)&lt;/li>
&lt;li>regression: average&lt;/li>
&lt;/ul>
&lt;p>Each individual predictor has a higher bias than if it were trained on the original training set, but aggregation reduces both bias and variance. 👏&lt;/p>
&lt;p>Generally, the net result is that the ensemble has a &lt;strong>similar bias but a lower variance&lt;/strong> than a single predictor trained on the original training set.&lt;/p>
&lt;p>##Advantages of Bagging and Pasting&lt;/p>
&lt;ul>
&lt;li>Predictors can all be trained in parallel, via different CPU cores or even different servers.&lt;/li>
&lt;li>Predictions can be made in parallel.&lt;/li>
&lt;/ul>
&lt;p>-&amp;gt; They scale very well 👍&lt;/p>
&lt;h2 id="bagging-vs-pasting">Bagging vs. Pasting&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Bootstrapping introduces a bit more diversity in the subsets that each predictor is trained on, so bagging ends up with a &lt;strong>slightly&lt;/strong> &lt;strong>higher bias&lt;/strong> than pasting, but this also means that predictors end up being &lt;strong>less correlated&lt;/strong> so the ensemble’s variance is reduced.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Overall, bagging often results in better models&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>However, if you have spare time and CPU power you can use cross- validation to evaluate both bagging and pasting and select the one that works best.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="out-of-bag-evaluation">Out-of-Bag Evaluation&lt;/h2>
&lt;p>With bagging, some instances may be sampled several times for any given predictor, while others may not be sampled at all. This means that only about 63% of the training instances are sampled on average for each predictor.&lt;/p>
&lt;p>The remaining 37% of the training instances that are not sampled are called &lt;strong>out-of-bag (oob) instances.&lt;/strong> Note that they are &lt;strong>not the same 37%&lt;/strong> for all predictors.&lt;/p>
&lt;p>Since a predictor never sees the oob instances during training, it can be evaluated on these instances, without the need for a separate validation set. You can evaluate the ensemble itself by averaging out the oob evaluations of each predictor.&lt;/p></description></item><item><title>AdaBoost</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/adaboost/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ensemble-learning/adaboost/</guid><description>&lt;p>&lt;strong>Ada&lt;/strong>ptive &lt;strong>Boost&lt;/strong>ing:&lt;/p>
&lt;p>Correct its predecessor by paying a bit more attention to the training instance that the predecessor underfitted. This results in new predictors focusing more and more on the hard cases.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost.png" alt="AdaBoost" style="zoom:80%;" />
&lt;h2 id="pseudocode">Pseudocode&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Assign observation $i$ the weight for $d\_{1,i}=\frac{1}{n}$ (equal weights)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For $t=1:T$&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Train weak learning alg orithm using data weighted by $d\_{ti}$. This produces weak classifier $h\_t$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Choose coefficient $\alpha\_t$ (tells us how good is the classifier is at that round)&lt;/p>
&lt;/li>
&lt;/ol>
$$
\begin{aligned}
\operatorname{Error}\_{t} &amp;= \displaystyle\sum\_{i; h\_{t}\left(x\_{i}\right) \neq y\_{i}} d\_{t} \quad \text{(sum of weights of misclassified points)} \\\\
\alpha\_t &amp;= \frac{1}{2} (\frac{1 - \operatorname{Error}\_{t}}{\operatorname{Error}\_{t}})
\end{aligned}
$$
&lt;ol start="3">
&lt;li>
&lt;p>Update weights
&lt;/p>
$$
d\_{t+1, i}=\frac{d\_{t, i} \cdot \exp (-\alpha\_{t} y\_{i} h\_{t}\left(x\_{i}\right))}{Z\_{t}}
$$
&lt;ul>
&lt;li>
&lt;p>$Z\_t = \displaystyle \sum\_{i=1}^{n} d\_{t,i} $: &lt;strong>normalization factor&lt;/strong>&lt;/p>
&lt;blockquote>
&lt;ul>
&lt;li>If prediction $i$ is correct $\rightarrow y\_i h\_t(x\_i) = 1 \rightarrow $ Weight of observation $i$ will be decreased by $\exp(-\alpha\_t)$&lt;/li>
&lt;li>If prediction $i$ is incorrect $ \rightarrow y\_i h\_t(x\_i) = -1 \rightarrow $ Weight of observation $i$ will be increased by $\exp(\alpha\_t)$&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ol>
&lt;/li>
&lt;li>
&lt;p>Output the final classifier&lt;/p>
&lt;p>$
H(x)=\operatorname{sign}\left(\sum\_{t=1}^{T} \alpha\_{t} h\_{t}\left(x\_{i}\right)\right)
$&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="example">Example&lt;/h2>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-00.png" alt="AdaBoost_Eg-00" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-01.png" alt="AdaBoost_Eg-01" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-02.png" alt="AdaBoost_Eg-02" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-03.png" alt="AdaBoost_Eg-03" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-04.png" alt="AdaBoost_Eg-04" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-05.png" alt="AdaBoost_Eg-05" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-06.png" alt="AdaBoost_Eg-06" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/AdaBoost_Eg-07.png" alt="AdaBoost_Eg-07" style="zoom:50%;" />
&lt;h2 id="tutorial">Tutorial&lt;/h2>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/-DUxtdeCiB4?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div></description></item><item><title>Non-parametric Machine Learning Alogrithms</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/</guid><description/></item><item><title>Linear Discriminant Functions</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/linear-discriminant-functions/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/linear-discriminant-functions/</guid><description>&lt;ul>
&lt;li>No assumption about distributions -&amp;gt; &lt;strong>non-parametric&lt;/strong>&lt;/li>
&lt;li>Linear decision surfaces&lt;/li>
&lt;li>Begin by supervised training (given class of training data)&lt;/li>
&lt;/ul>
&lt;h2 id="linear-discriminant-functions-and-decision-surfaces">Linear Discriminant Functions and Decision Surfaces&lt;/h2>
&lt;p>A discriminant function that is a linear combination of the components of $x$ can be written as
&lt;/p>
$$
g(\mathbf{x})=\mathbf{w}^{T} \mathbf{x}+w\_{0}
$$
&lt;ul>
&lt;li>$\mathbf{x}$: feature vector&lt;/li>
&lt;li>$\mathbf{w}$: weight vector&lt;/li>
&lt;li>$w\_0$: bias or threshold weight&lt;/li>
&lt;/ul>
&lt;h3 id="the-two-category-case">The two category case&lt;/h3>
&lt;p>Decision rule:&lt;/p>
&lt;ul>
&lt;li>Decide $w\_1$ if $g(\mathbf{x}) > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}> -w\_{0}$&lt;/li>
&lt;li>Decide $w\_{2}$ if $g(\mathbf{x}) &lt; 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} &lt; 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}&lt;-w\_{0}$&lt;/li>
&lt;li>$g(\mathbf{x}) = 0$: assign to either class or can be left undefined&lt;/li>
&lt;/ul>
&lt;p>The equation $g(\mathbf{x}) = 0$ defines the decision surface that separates points assigned to $w\_{1}$ from points assigned to $w\_{2}$. When $g(\mathbf{x})$ is linear, this decision surface is a &lt;strong>hyperplane&lt;/strong>.&lt;/p>
&lt;p>For arbitrary $\mathbf{x}\_1$ and $\mathbf{x}\_2$ on the decision surface, we have:
&lt;/p>
$$
\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{1}+w\_{0}=\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{2}+w\_{0}
$$
$$
\mathbf{w}^{\mathrm{T}}\left(\mathbf{x}\_{1}-\mathbf{x}\_{2}\right)=0
$$
&lt;p>$\Rightarrow \mathbf{w}$ is &lt;strong>normal&lt;/strong> to any vector lying in the hyperplane.&lt;/p>
&lt;p>In general, the hyperplane $H$ divides the feature space into two half-spaces:&lt;/p>
&lt;ul>
&lt;li>decision region $R\_1$ for $w\_1$&lt;/li>
&lt;li>decision region $R\_2$ for $w\_2$&lt;/li>
&lt;/ul>
&lt;p>Because $g(\mathbf{x}) > 0$ if $\mathbf{x}$ in $R\_1$, it follows that the normal vector $\mathbf{w}$ points into $R\_1$. Therefore, It is sometimes said that any $\mathbf{x}$ in $R\_1$ is on the &lt;em>positive&lt;/em> side of $H$, and any $\mathbf{x}$ in $R\_2$ is on the &lt;em>negative&lt;/em> side of $H$&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image015.jpg" alt="img">&lt;/p>
&lt;p>The discriminant function $g(\mathbf{x})$ gives an algebraic measure of the distance from $\mathbf{x}$ to the hyperplane. We can write $\mathbf{x}$ as
&lt;/p>
$$
\mathbf{x}=\mathbf{x}\_{p}+r \frac{\mathbf{w}}{\|\mathbf{w}\|}
$$
&lt;ul>
&lt;li>$\mathbf{x}\_{p}$: normal projection of $\mathbf{x}$ onto $H$&lt;/li>
&lt;li>$r$: desired algebraic distance which is positive if $\mathbf{x}$ is on the positive side, else negative&lt;/li>
&lt;/ul>
&lt;p>As $\mathbf{x}\_p$ is on the hyperplane&lt;/p>
$$
\begin{array}{ll}
g\left(\mathbf{x}\_{p}\right)=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{p}+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}}\left(\mathbf{x}-r \frac{\mathbf{w}}{\|\mathbf{w}\|}\right)+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}-r \frac{\mathbf{w}^{\mathrm{T}} \mathbf{w}}{\|\mathbf{w}\|}+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}-r\|\mathbf{w}\| + w\_0 = 0 \\\\
\underbrace{\mathbf{w}^{\mathrm{T}} \mathbf{x} + w\_0}\_{=g(\mathbf{x})} = r\|\mathbf{w}\| \\\\
\Rightarrow g(\mathbf{x}) = r\|\mathbf{w}\| \\\\
\Rightarrow r = \frac{g(\mathbf{x})}{\|\mathbf{w}\|}
\end{array}
$$
&lt;p>In particular, the distance from the origin to hyperplane $H$ is given by $\frac{w_0}{\|\mathbf{w}\|}$&lt;/p>
&lt;ul>
&lt;li>$w\_0 > 0$: the origin is on the &lt;em>positive&lt;/em> side of $H$&lt;/li>
&lt;li>$w\_0 &lt; 0$: the origin is on the &lt;em>negative&lt;/em> side of $H$&lt;/li>
&lt;li>$w\_0 = 0$: $g(\mathbf{x})$ has the homogeneous form $\mathbf{w}^{\mathrm{T}} \mathbf{x}$ and the hyperplane passes through the origin&lt;/li>
&lt;/ul>
&lt;p>A linear discriminant function divides the feature space by a hyperplane decision surface:&lt;/p>
&lt;ul>
&lt;li>orientation: determined by the normal vector $\mathbf{w}$&lt;/li>
&lt;li>location: determined by the bias $w\_0$&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.byclb.com/TR/Tutorials/neural_networks/ch9_1.htm">https://www.byclb.com/TR/Tutorials/neural_networks/ch9_1.htm&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Linear Discriminant Analysis (LDA)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/</guid><description>&lt;p>&lt;strong>Linear Discriminant Analysis (LDA)&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>also called &lt;strong>Fisher’s Linear Discriminant&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>reduces dimension (like PCA)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>but focuses on &lt;strong>maximizing seperability among known categories&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="-idea">💡 Idea&lt;/h2>
&lt;ol>
&lt;li>Create a new axis&lt;/li>
&lt;li>Project the data onto this new axis in a way to maximize the separation of two categories&lt;/li>
&lt;/ol>
&lt;h2 id="how-it-works">How it works?&lt;/h2>
&lt;h3 id="create-a-new-axis">Create a new axis&lt;/h3>
&lt;p>According to two criteria (considered simultaneously):&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Maximize the distance between means&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Minimize the variation $s^2$ (which LDA calls &amp;ldquo;scatter&amp;rdquo;) within each category&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.11.22.png" alt="截屏2020-05-14 15.11.22" style="zoom:50%;" />
&lt;/li>
&lt;/ul>
&lt;p>We have:
&lt;/p>
$$
\frac{(\overbrace{\mu_1 - \mu_2}^{=: d})^2}{s_1^2 + s_2^2} \qquad\left(\frac{\text{''ideally large''}}{\text{"ideally small"}}\right)
$$
&lt;p>
&lt;strong>Why both distance and scatter are important?&lt;/strong>&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-05-14%2015.17.59.png" alt="截屏2020-05-14 15.17.59">&lt;/p>
&lt;h4 id="more-than-2-dimensions">More than 2 dimensions&lt;/h4>
&lt;p>The process is the &lt;strong>same&lt;/strong> 👏:&lt;/p>
&lt;p>Create an axis that maximizes the distance between the means for the two categories while minimizing the scatter&lt;/p>
&lt;h4 id="more-than-2-categories-eg-3-categories">More than 2 categories (e.g. 3 categories)&lt;/h4>
&lt;p>Little difference:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Measure the distances among the means&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Find the point that is &lt;strong>central&lt;/strong> to all of the data&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Then measure the distances between a point that is central in each category and the main central point&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.26.35.png" alt="截屏2020-05-14 15.26.35" style="zoom:50%;" />
&lt;/li>
&lt;li>
&lt;p>Maximize the distance between each category and the central point while minimizing the scatter for each category&lt;/p>
&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.28.40.png" alt="截屏2020-05-14 15.28.40" style="zoom:50%;" />
&lt;/li>
&lt;li>
&lt;p>Create 2 axes to separate the data (because the 3 central points for each category define a plane)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.30.16.png" alt="截屏2020-05-14 15.30.16" style="zoom:50%;" />
&lt;/li>
&lt;/ul>
&lt;h2 id="lda-and-pca">LDA and PCA&lt;/h2>
&lt;h3 id="similarities">Similarities&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Both rank the new axes in order of importance&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>PC1 (the first new axis that PCA creates) accounts for the most variation in the data
&lt;ul>
&lt;li>PC2 (the second new axis) does the second best job&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>LD1 (the first new axis that LDA creates) accounts for the most variation between the categories
&lt;ul>
&lt;li>LD2 does the second best job&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Both can let you dig in and see which features are driving the new axes&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Both try to reduce dimensions&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>PCA looks at the features with the most variation&lt;/li>
&lt;li>LDA tries to maximize the separation of known categories&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=azXCzI57Yfc">https://www.youtube.com/watch?v=azXCzI57Yfc&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Unsupervised Learning</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/</guid><description/></item><item><title>Gaussian Mixture Model</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/gaussian-mixture-model/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/gaussian-mixture-model/</guid><description>&lt;h2 id="gaussian-distribution">Gaussian Distribution&lt;/h2>
&lt;p>&lt;strong>Univariate&lt;/strong>: The Probability Density Function (PDF) is:
&lt;/p>
$$
P(x | \theta)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right)
$$
&lt;ul>
&lt;li>$\mu$: mean&lt;/li>
&lt;li>$\sigma$: standard deviation&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/gaussians.png" alt="gaussian mixture models">&lt;/p>
&lt;p>&lt;strong>Multivariate&lt;/strong>: The Probability Density Function (PDF) is:
&lt;/p>
$$
P(x | \theta)=\frac{1}{(2 \pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}} \exp \left(-\frac{(x-\mu)^{T} \Sigma^{-1}(x-\mu)}{2}\right)
$$
&lt;ul>
&lt;li>$\mu$: mean&lt;/li>
&lt;li>$\Sigma$: covariance&lt;/li>
&lt;li>$D$: dimension of data&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/gaussians-3d-300x224.png" alt="gaussian mixture models">&lt;/p>
&lt;h3 id="learning">Learning&lt;/h3>
&lt;p>For univariate Gaussian model, we can use Maximum Likelihood Estimation (MLE) to estimate parameter $\theta$ :
&lt;/p>
$$
\theta= \underset{\theta}{\operatorname{argmax}} L(\theta)
$$
&lt;p>
Assuming data are i.i.d, we have:
&lt;/p>
$$
L(\theta)=\prod\_{j=1}^{N} P\left(x\_{j} | \theta\right)
$$
&lt;p>
For numerical stability, we usually use Maximum Log-Likelihood:
&lt;/p>
$$
\begin{align} \theta &amp;= \underset{\theta}{\operatorname{argmax}} L(\theta) \\\\
&amp;= \underset{\theta}{\operatorname{argmax}} \log(L(\theta)) \\\\
&amp;= \underset{\theta}{\operatorname{argmax}} \sum\_{j=1}^{N} \log P\left(x\_{j} | \theta\right)\end{align}
$$
&lt;h2 id="gaussian-mixture-model">Gaussian Mixture Model&lt;/h2>
&lt;p>A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. One can think of mixture models as generalizing k-means clustering to incorporate information about the covariance structure of the data as well as the centers of the latent Gaussians.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/mYN2Q9VqZH-gaussian-mixture-example.png" alt="A Gaussian mixture of three normal distributions.">&lt;/p>
&lt;p>Define:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>$x\_j$: the $j$-th observed data, $j=1, 2,\dots, N$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$K$: number of Gaussian model components&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$\alpha\_k$: probability that the observed data belongs to the $k$-th model component&lt;/p>
&lt;ul>
&lt;li>$\alpha\_k \geq 0$&lt;/li>
&lt;li>$\displaystyle \sum\_{k=1}^{K}\alpha\_k=1$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>$\phi(x|\theta\_k)$: probability density function of the $k$-th model component&lt;/p>
&lt;ul>
&lt;li>$\theta\_k = (\mu\_k, \sigma\_k^2)$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>$\gamma\_{jk}$: probability that the $j$-th obeserved data belongs to the $k$-th model component&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Probability density function of Gaussian mixture model:
&lt;/p>
$$
P(x | \theta)=\sum\_{k=1}^{K} \alpha\_{k} \phi\left(x | \theta\_{k}\right)
$$
&lt;p>
For this model, parameter is $\theta=\left(\tilde{\mu}\_{k}, \tilde{\sigma}\_{k}, \tilde{\alpha}\_{k}\right)$.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="expectation-maximum-em">Expectation-Maximum (EM)&lt;/h2>
&lt;blockquote>
&lt;p>&lt;em>Expectation-Maximization (EM) is a statistical algorithm for finding the right model parameters. We typically use EM when the data has missing values, or in other words, when the data is incomplete.&lt;/em>&lt;/p>
&lt;/blockquote>
&lt;p>These missing variables are called &lt;strong>latent variables&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>&lt;em>NEVER&lt;/em> observed&lt;/li>
&lt;li>We do &lt;em>NOT&lt;/em> know the correct values in advance&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Since we do not have the values for the latent variables, Expectation-Maximization tries to use the existing data to determine the optimum values for these variables and then finds the model parameters.&lt;/strong> Based on these model parameters, we go back and update the values for the latent variable, and so on.&lt;/p>
&lt;p>The Expectation-Maximization algorithm has two steps:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>E-step:&lt;/strong> In this step, the available data is used to estimate (guess) the values of the missing variables&lt;/li>
&lt;li>&lt;strong>M-step:&lt;/strong> Based on the estimated values generated in the E-step, the complete data is used to update the parameters&lt;/li>
&lt;/ul>
&lt;h3 id="em-in-gaussian-mixture-model">EM in Gaussian Mixture Model&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Initialize the parameters ($K$ Gaussian distributionw with the mean $\mu\_1, \mu\_2,\dots,\mu\_k$ and covariance $\Sigma\_1, \Sigma\_2, \dots, \Sigma\_k$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Repeat&lt;/p>
&lt;ul>
&lt;li>&lt;strong>E-step&lt;/strong>: For each point $x\_j$, calculate the probability that it belongs to cluster/distribution $k$&lt;/li>
&lt;/ul>
$$
\begin{align}
\gamma\_{j k} &amp;= \frac{\text{Probability } x\_j \text{ belongs to cluster } k}{\text{Sum of probability } x\_j \text{ belongs to cluster } 1, 2, \dots, k} \\\\
&amp;= \frac{\alpha\_{k} \phi\left(x\_{j} | \theta\_{k}\right)}{\sum\_{k=1}^{K} \alpha\_{k} \phi\left(x\_{j} | \theta\_{k}\right)}\qquad j=1,2, \ldots, N ; k=1,2 \ldots, K
\end{align}
$$
&lt;p>​ The value will be high when the point is assigned to the right cluster and lower otherwise&lt;/p>
&lt;ul>
&lt;li>&lt;strong>M-step&lt;/strong>: update parameters&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
$$
\alpha\_k = \frac{\text{Number of points assigned to cluster } k}{\text{Total number of points}} = \frac{\sum\_{j=1}^{N} \gamma\_{j k}}{N} \qquad k=1,2, \ldots, K
$$
$$
\mu\_{k}=\frac{\sum\_{j}^{N}\left(\gamma\_{j k} x\_{j}\right)}{\sum\_{j}^{N} \gamma\_{j k}}\qquad k=1,2, \ldots, K
$$
$$
\Sigma\_{k}=\frac{\sum\_{j}^{N} \gamma\_{j k}\left(x\_{j}-\mu\_{k}\right)\left(x\_{j}-\mu\_{k}\right)^{T}}{\sum\_{j}^{N} \gamma\_{j k}} \qquad k=1,2, \ldots, K
$$
&lt;p>until convergence ($\left\|\theta\_{i+1}-\theta\_{i}\right\|&lt;\varepsilon$)&lt;/p>
&lt;p>Visualization:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/ek1bu6ogj2-em_clustering_of_old_faithful_data.gif" alt="The EM algorithm updating the parameters of a two-component bivariate Gaussian mixture model.">&lt;/p>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://zhuanlan.zhihu.com/p/30483076">https://zhuanlan.zhihu.com/p/30483076&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.analyticsvidhya.com/blog/2019/10/gaussian-mixture-models-clustering/">https://www.analyticsvidhya.com/blog/2019/10/gaussian-mixture-models-clustering/&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://blog.pluskid.org/?p=39">http://blog.pluskid.org/?p=39&lt;/a> 👍&lt;/li>
&lt;/ul></description></item><item><title>Principle Components Analysis (PCA)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/pca/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/pca/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;p>The usual procedure to calculate the $d$-dimensional principal component analysis consists of the following steps:&lt;/p>
&lt;ol start="0">
&lt;li>
&lt;p>Calculate&lt;/p>
&lt;ul>
&lt;li>
&lt;p>average
&lt;/p>
$$
\bar{m}=\sum\_{i=1}^{N} m_{i} \in \mathbb{R}
$$
&lt;/li>
&lt;li>
&lt;p>data matrix
&lt;/p>
$$
\mathbf{M}=\left(m\_{1}-\bar{m}, \ldots, m\_{N}-\bar{m}\right) \in \mathbb{R}^{d \times \mathrm{N}}
$$
&lt;/li>
&lt;li>
&lt;p>scatter matrix (covariance matrix)
&lt;/p>
$$
\mathbf{S}=\mathbf{M M}^{\mathrm{T}} \in \mathbb{R}^{d \times d}
$$
&lt;/li>
&lt;/ul>
&lt;p>of all feature vectors $m\_{1}, \ldots, m\_{N}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Calculate the normalized ($\\|\cdot\\|=1$) eigenvectors $\mathbf{e}\_1, \dots, \mathbf{e}\_d$ and sort them such that the corresponding eigenvalues $\lambda\_1, \dots, \lambda\_d$ are decreasing, i.e. $\lambda\_1 > \lambda\_2 > \dots > \lambda\_d$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Construct a matrix
&lt;/p>
$$
\mathbf{A}:=\left(e\_{1}, \ldots, e\_{d^{\prime}}\right) \in \mathbb{R}^{d \times d^{\prime}}
$$
&lt;p>
with the first $d^{\prime}$ eigenvectors as its columns&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Transform each feature vector $m\_i$ into a new feature vector
&lt;/p>
$$
\mathrm{m}\_{\mathrm{i}}^{\prime}=\mathrm{A}^{\mathrm{T}}\left(\mathrm{m}\_{\mathrm{i}}-\overline{\mathrm{m}}\right) \quad \text { for } i=1, \ldots, N
$$
&lt;p>
of smaller dimension $d^{\prime}$&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="dimensionality-reduction">Dimensionality reduction&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Goal: represent instances with fewer variables&lt;/p>
&lt;ul>
&lt;li>Try to preserve as much structure in the data as possible&lt;/li>
&lt;li>Discriminative: only structure that affects class separability&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Feature selection&lt;/p>
&lt;ul>
&lt;li>Pick a subset of the original dimensions&lt;/li>
&lt;li>Discriminative: pick good class &amp;ldquo;predictors&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Feature extraction&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Construct a new set of dimensions
&lt;/p>
$$
E\_{i} = f(X\_1 \dots X\_d)
$$
&lt;ul>
&lt;li>$X\_1, \dots, X\_d$: features&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>(Linear) combinations of original&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="direction-of-greatest-variance">Direction of greatest variance&lt;/h2>
&lt;ul>
&lt;li>Define a set of principal components
&lt;ul>
&lt;li>1st: direction of the &lt;strong>greatest variability&lt;/strong> in the data (i.e. Data points are spread out as far as possible)&lt;/li>
&lt;li>2nd: &lt;em>perpendicular&lt;/em> to 1st, greatest variability of what&amp;rsquo;s left&lt;/li>
&lt;li>&amp;hellip;and so on until $d$ (original dimensionality)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>First $m \ll d$ components become $m$ dimensions
&lt;ul>
&lt;li>Change coordinates of every data point to these dimensions&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-06%2023.51.17.png" alt="截屏2021-02-06 23.51.17">&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>Q: Why greatest variablility?&lt;/p>
&lt;p>A: If you pick the dimension with the highest variance, that will preserve the distances as much as possible&lt;/p>
&lt;/span>
&lt;/div>
&lt;h2 id="how-to-pca">How to PCA?&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&amp;ldquo;Center&amp;rdquo; the data at zero (subtract mean from each attribute)
&lt;/p>
$$
x\_{i, a} = x\_{i, a} - \mu
$$
&lt;/li>
&lt;li>
&lt;p>Compute covariance matrix $\Sigma$&lt;/p>
&lt;blockquote>
&lt;p>The &lt;strong>covariance&lt;/strong> between two attributes is an indication of whether they change together (positive correlation) or in opposite directions (negative correlation).&lt;/p>
&lt;p>For example, $cov(x\_1, x\_2) = 0.8 > 0 \Rightarrow$ When $x\_1$ increases/decreases, $x\_2$ also increases/decreases.&lt;/p>
&lt;/blockquote>
$$
cov(b, a) = \frac{1}{n} \sum\_{i=1}^{n} x\_{ib} x\_{ia}
$$
&lt;/li>
&lt;li>
&lt;p>We want vectors $\mathbf{e}$ which aren&amp;rsquo;t turned by covariance matrix $\Sigma$:
&lt;/p>
$$
\Sigma \mathbf{e} = \lambda \mathbf{e}
$$
&lt;p>
$\Rightarrow$ $\mathbf{e}$ are eigenvectors of $\Sigma$, and $\lambda$ are corresponding eigenvalues&lt;/p>
&lt;p>&lt;strong>Principle components = eigenvectors with largest eigenvalues&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="finding-principle-components">Finding principle components&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>Find eigenvalues by solving &lt;a href="https://en.wikipedia.org/wiki/Characteristic_polynomial">Characteristic Polynomial&lt;/a>
&lt;/p>
$$
\operatorname{det}(\Sigma - \lambda \mathbf{I}) = 0
$$
&lt;ul>
&lt;li>$\mathbf{I}$: Identity matrix&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Find $i$-th eigenvector by solving
&lt;/p>
$$
\Sigma \mathbf{e}\_i = \lambda\_i \mathbf{e}\_i
$$
&lt;p>
and we want $\mathbf{e}\_{i}$ to have unit length ($\\|\mathbf{e}\_{i}\\| = 1$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Eigenvector with the largest eigenvalue will be the first principle component, eigenvector with the second largest eigenvalue will be the second priciple component, so on and so on.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;details>
&lt;summary>Example&lt;/summary>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2021-02-07%2000.21.08.png" alt="截屏2021-02-07 00.21.08" style="zoom:67%;" />
&lt;/details>
&lt;h3 id="projecting-to-new-dimension">Projecting to new dimension&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>We pick $m&lt;d$ Eigenvectors $\mathbf{e}\_1, \dots, \mathbf{e}\_m$ with the biggest Eigenvalues. Now $\mathbf{e}\_1, \dots, \mathbf{e}\_m$ are the new dimension vectors&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For instance $\mathbf{x} = \{x\_1, \dots, x\_d\}$ (original coordinates), we want new coordinates $\mathbf{x}^{\prime} = \{x^{\prime}\_1, \dots, x^{\prime}\_d\}$&lt;/p>
&lt;ul>
&lt;li>&amp;ldquo;Center&amp;rdquo; the instance (subtract the mean): $\mathbf{x} - \mathbf{\mu}$&lt;/li>
&lt;li>&amp;ldquo;Project&amp;rdquo; to each dimension: $(\mathbf{x} - \mathbf{\mu})^T \mathbf{e}\_j$ for $j=1, \dots, m$&lt;/li>
&lt;/ul>
&lt;details>
&lt;summary>Example&lt;/summary>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/PCA.png" alt="PCA" style="zoom:80%;" />
&lt;/details>
&lt;/li>
&lt;/ul>
&lt;h2 id="go-deeper-in-details">Go deeper in details&lt;/h2>
&lt;h3 id="why-eigenvectors--greatest-variance">Why eigenvectors = greatest variance?&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/cIE2MDxyf80?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="why-eigenvalue--variance-along-eigenvector">Why eigenvalue = variance along eigenvector?&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/tL0wFZ9aJP8?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="how-many-dimensions-should-we-reduce-to">How many dimensions should we reduce to?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Now we have eigenvectors $\mathbf{e}\_1, \dots, \mathbf{e}\_d$ and we want new dimension $m \ll d$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We pick $\mathbf{e}\_i$ that &amp;ldquo;explain&amp;rdquo; the most variance:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Sort eigenvectors s.t. $\lambda\_1 \geq \dots \geq \lambda\_d$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pick first $m$ eigenvectors which explain 90% or the total variance (typical threshold values: 0.9 or 0.95)&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2013.06.46.png" alt="截屏2021-02-07 13.06.46">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Or we can use a scree plot&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="pca-in-a-nutshell">PCA in a nutshell&lt;/h2>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2013.09.32.png" alt="截屏2021-02-07 13.09.32">&lt;/p>
&lt;h2 id="pca-example-eigenfaces">PCA example: Eigenfaces&lt;/h2>
&lt;p>Perform PCA on bitmap images of human faces:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.22.02.png" alt="截屏2021-02-07 16.22.02">&lt;/p>
&lt;p>Belows are the eigenvectors after we perform PCA on the dataset:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.25.01.png" alt="截屏2021-02-07 16.25.01">&lt;/p>
&lt;p>Then we can project new face to space of eigen-faces, and represent vector of new face as a linear combination of principle components.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.24.28.png" alt="截屏2021-02-07 16.24.28">&lt;/p>
&lt;p>As we use more and more eigenvectors in this decomposition, we can end up with a face that looks more and more like the original guy&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.33.28.png" alt="截屏2021-02-07 16.33.28">&lt;/p>
&lt;details>
&lt;summary>Why is eigenface neat and interesting?&lt;/summary>
&lt;ul>
&lt;li>This is neat because by taking the first few eigenvectors you can get a pretty close representation of the face. Suppose that this corresponds to maybe 20 eigenvectors. &lt;strong>This means you&amp;rsquo;re using only 20 numbers to represent a face bitmap which looks kind of like the original guy!&lt;/strong> Can you use only 20 pixels to represent him nearly? No, there&amp;rsquo;s no way!&lt;/li>
&lt;li>You&amp;rsquo;re effectively picking 20 numbers/mixture coefficients/coordinates. One really nice way to use this is you can use this for &lt;strong>massive compression&lt;/strong> of the data. If you communicate to others if they all have access to the same eigenvectors, all they need to send between each other are just the projection coordinates. Then they can transmit arbitrary faces between them. This is massive reduction in the size of data.&lt;/li>
&lt;li>Your classifier or your regression system now operate in low dimensional space. So they have plenty of redundancy to grab on to and learn a better hyperplane. &amp;#x1f44f;&lt;/li>
&lt;/ul>
&lt;/details>
&lt;h3 id="application-of-eigenface">Application of eigenface&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Face similarity&lt;/p>
&lt;ul>
&lt;li>in the reduced space&lt;/li>
&lt;li>insensitive to lighting expression, orientation&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Projecting new &amp;ldquo;faces&amp;rdquo;&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.49.58.png" alt="截屏2021-02-07 16.49.58">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="pratical-issues-of-pca">Pratical issues of PCA&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>PCA is based on covariance matrix and covariance is extremely sensitive to large values&lt;/p>
&lt;ul>
&lt;li>
&lt;p>E.g. multiple some dimension by 1000. Then this dimension dominates covariance and become a principle component.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Solution: normalize each dimension to zero mean and unit variacne
&lt;/p>
$$
x^{\prime} = \frac{x - \text{mean}}{\text{standard deviation}}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>PCA assumes underlying subspace is linear.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>PCA can sometimes hurt the performace of classification&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Because PCA doesn&amp;rsquo;t see the labels&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Solution: &lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/">Linear Discriminant Analysis (LDA)&lt;/a>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Picks a new dimension that gives&lt;/p>
&lt;ul>
&lt;li>maximum separation between means of prejected classes&lt;/li>
&lt;li>minimum variance within each projected class&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2017.23.36.png" alt="截屏2021-02-07 17.23.36">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>But this relies on some assumptions of the data and does not always work. 🤪&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=IbE0tbjy6JQ&amp;amp;list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM&amp;amp;index=1">Principle Component Analysis&lt;/a>: a great series of video tutorials explaining PCA clearly 👍&lt;/li>
&lt;/ul></description></item></channel></rss>