<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>ML Basics | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/ml-basics/</link><atom:link href="https://haobin-tan.netlify.app/tags/ml-basics/index.xml" rel="self" type="application/rss+xml"/><description>ML Basics</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 07 Sep 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>ML Basics</title><link>https://haobin-tan.netlify.app/tags/ml-basics/</link></image><item><title>Machine Learning Fundamentals</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/</guid><description/></item><item><title>Math Basics</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/math-basics/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/math-basics/</guid><description>&lt;h2 id="linear-algebra">Linear Algebra&lt;/h2>
&lt;h3 id="vectors">Vectors&lt;/h3>
&lt;p>&lt;strong>Vector&lt;/strong>: multi-dimensional quantity&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Each dimension contains different information (e.g.: Age, Weight, Height&amp;hellip;)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/Vectors.png" alt="Vectors" style="zoom:70%;" />
&lt;/li>
&lt;li>
&lt;p>represented as &lt;strong>bold symbols&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A vector $\boldsymbol{x}$ is always a &lt;strong>column&lt;/strong> vector
&lt;/p>
$$
\boldsymbol{x}=\left[\begin{array}{l}
{1} \\\\
{2} \\\\
{4}
\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>A transposed vector $\boldsymbol{x}^T$ is a &lt;strong>row&lt;/strong> vector
&lt;/p>
$$
\boldsymbol{x}^{T}=\left[\begin{array}{lll}
{1} &amp; {2} &amp; {4}
\end{array}\right]
$$
&lt;/li>
&lt;/ul>
&lt;h4 id="vector-operations">Vector Operations&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Multiplication by scalars&lt;/strong>
&lt;/p>
$$
2\left[\begin{array}{l}
{1} \\\\
{2}
\end{array}\right]=\left[\begin{array}{l}
{2} \\\\
{4}
\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Addtition of vectors&lt;/strong>
&lt;/p>
$$
\left[\begin{array}{l}{1} \\\\ {2} \end{array}\right]+\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]=\left[\begin{array}{l}{4} \\\\ {3} \end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Scalar (Inner) products&lt;/strong>: Sum the element-wise products
&lt;/p>
$$
\boldsymbol{v}=\left[\begin{array}{c}{1} \\\\ {2} \\\\ {4}\end{array}\right], \quad \boldsymbol{w}=\left[\begin{array}{l}{2} \\\\ {4} \\\\ {8}\end{array}\right]
$$
&lt;/li>
&lt;/ul>
$$
\langle v, w\rangle= 1 \cdot 2+2 \cdot 4+4 \cdot 8=42
$$
&lt;ul>
&lt;li>&lt;strong>Length of a vector&lt;/strong>: Square root of the inner product with itself
$$
\|\boldsymbol{v}\|=\langle\boldsymbol{v}, \boldsymbol{v}\rangle^{\frac{1}{2}}=\left(1^{2}+2^{2}+4^{2}\right)^{\frac{1}{2}}=\sqrt{21}
$$&lt;/li>
&lt;/ul>
&lt;h3 id="matrices">Matrices&lt;/h3>
&lt;p>Matrix: rectangular array of numbers arranged in rows and columns&lt;/p>
&lt;ul>
&lt;li>
&lt;p>denoted with &lt;strong>bold upper-case letters&lt;/strong>
&lt;/p>
$$
\boldsymbol{X}=\left[\begin{array}{ll}{1} &amp; {3} \\\\ {2} &amp; {3} \\\\ {4} &amp; {7}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>Dimension: $\\#rows \\times \\#columns$ (E.g.: 👆$X \in \mathbb{R}^{3 \times 2}$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Vectors are special cases of matrices
&lt;/p>
$$
\boldsymbol{x}^{T}=\underbrace{\left[\begin{array}{ccc}{1} &amp; {2} &amp; {4}\end{array}\right]}_{1 \times 3 \text { matrix }}
$$
&lt;/li>
&lt;/ul>
&lt;p>####Matrices in ML&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Data set can be represented as matrix, where single samples are vectors&lt;/p>
&lt;p>e.g.:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Age&lt;/th>
&lt;th>Weight&lt;/th>
&lt;th>Height&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Joe&lt;/td>
&lt;td>37&lt;/td>
&lt;td>72&lt;/td>
&lt;td>175&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Mary&lt;/td>
&lt;td>10&lt;/td>
&lt;td>30&lt;/td>
&lt;td>61&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Carol&lt;/td>
&lt;td>25&lt;/td>
&lt;td>65&lt;/td>
&lt;td>121&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Brad&lt;/td>
&lt;td>66&lt;/td>
&lt;td>67&lt;/td>
&lt;td>175&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
$$
\text { Joe: } \boldsymbol{x}\_{1}=\left[\begin{array}{c}{37} \\\\ {72} \\\\ {175}\end{array}\right], \qquad \text { Mary: } \boldsymbol{x}\_{2}=\left[\begin{array}{c}{10} \\\\ {30} \\\\ {61}\end{array}\right] \\\\
$$
$$
\text { Carol: } \boldsymbol{x}\_{3}=\left[\begin{array}{c}{25} \\\\ {65} \\\\ {121}\end{array}\right], \qquad \text { Brad: } \boldsymbol{x}\_{4}=\left[\begin{array}{c}{66} \\\\ {67} \\\\ {175}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Most typical representation:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>row ~ data sample (e.g. Joe)&lt;/li>
&lt;li>column ~ data entry (e.g. age)&lt;/li>
&lt;/ul>
$$
\boldsymbol{X}=\left[\begin{array}{l}{\boldsymbol{x}\_{1}^{T}} \\\\ {\boldsymbol{x}\_{2}^{T}} \\\\ {\boldsymbol{x}\_{3}^{T}} \\\\ {\boldsymbol{x}\_{4}^{T}}\end{array}\right]=\left[\begin{array}{ccc}{37} &amp; {72} &amp; {175} \\\\ {10} &amp; {30} &amp; {61} \\\\ {25} &amp; {65} &amp; {121} \\\\ {66} &amp; {67} &amp; {175}\end{array}\right]
$$
&lt;/li>
&lt;/ul>
&lt;h4 id="matrice-operations">Matrice Operations&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Multiplication with scalar&lt;/strong>
&lt;/p>
$$
3 \boldsymbol{M}=3\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]=\left[\begin{array}{ccc}{9} &amp; {12} &amp; {15} \\\\ {3} &amp; {0} &amp; {3}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Addition of matrices&lt;/strong>
&lt;/p>
$$
\boldsymbol{M} + \boldsymbol{N}=\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]+\left[\begin{array}{lll}{1} &amp; {2} &amp; {1} \\\\ {3} &amp; {1} &amp; {1}\end{array}\right]=\left[\begin{array}{lll}{4} &amp; {6} &amp; {6} \\\\ {4} &amp; {1} &amp; {2}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Transposed&lt;/strong>
&lt;/p>
$$
\boldsymbol{M}=\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right], \boldsymbol{M}^{T}=\left[\begin{array}{ll}{3} &amp; {1} \\\\ {4} &amp; {0} \\\\ {5} &amp; {1}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Matrix-Vector product&lt;/strong> (Vector need to have &lt;strong>same&lt;/strong> dimensionality as number of columns)
&lt;/p>
$$
\underbrace{\left[\boldsymbol{w}\_{1}, \ldots, \boldsymbol{w}\_{n}\right]}_{\boldsymbol{W}} \underbrace{\left[\begin{array}{c}{v\_{1}} \\\\ {\vdots} \\\\ {v\_{n}}\end{array}\right]}\_{\boldsymbol{v}}=\underbrace{\left[\begin{array}{c}{v\_{1} \boldsymbol{w}\_{1}+\cdots+v\_{n} \boldsymbol{w}\_{n}}\end{array}\right]}\_{\boldsymbol{u}}
$$
&lt;p>
E.g.:
&lt;/p>
$$
\boldsymbol{u}=\boldsymbol{W} \boldsymbol{v}=\left[\begin{array}{ccc}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]\left[\begin{array}{l}{1} \\\\ {0} \\\\ {2}\end{array}\right]=\left[\begin{array}{l}{3 \cdot 1+4 \cdot 0+5 \cdot 2} \\\\ {1 \cdot 1+0 \cdot 0+1 \cdot 2}\end{array}\right]=\left[\begin{array}{c}{13} \\\\ {3}\end{array}\right]
$$
&lt;p>
💡 &lt;em>Think as: We sum over the columns $\boldsymbol{w}_i$ of $\boldsymbol{W}$ weighted by $v_i$&lt;/em>&lt;/p>
&lt;/li>
&lt;/ul>
$$
u=v\_{1} w\_{1}+\cdots+v\_{n} w\_{n}=1\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]+0\left[\begin{array}{l}{4} \\\\ {0}\end{array}\right]+2\left[\begin{array}{l}{5} \\\\ {1}\end{array}\right]=\left[\begin{array}{c}{13} \\\\ {3}\end{array}\right]
$$
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Matrix-Matrix product&lt;/strong>
&lt;/p>
$$
\boldsymbol{U} = \boldsymbol{W} \boldsymbol{V}=\left[\begin{array}{lll}{3} &amp; {4} &amp; {5} \\\\ {1} &amp; {0} &amp; {1}\end{array}\right]\left[\begin{array}{ll}{1} &amp; {0} \\\\ {0} &amp; {3} \\\\ {2} &amp; {4}\end{array}\right]=\left[\begin{array}{ll}{3 \cdot 1+4 \cdot 0+5 \cdot 2} &amp; {3 \cdot 0+4 \cdot 3+5 \cdot 4} \\\\ {1 \cdot 1+0 \cdot 0+1 \cdot 2} &amp; {1 \cdot 0+0 \cdot 3+1 \cdot 4}\end{array}\right]=\left[\begin{array}{cc}{13} &amp; {32} \\\\ {3} &amp; {4}\end{array}\right]
$$
&lt;p>
💡 &lt;em>Think of it as: Each column $\boldsymbol{u}\_i = \boldsymbol{W} \boldsymbol{v}\_i$ can be computed by a matrix-vector product&lt;/em>
&lt;/p>
$$
\boldsymbol{W} \underbrace{\left[\boldsymbol{v}\_{1}, \ldots, \boldsymbol{v}\_{n}\right]}\_{\boldsymbol{V}}=[\underbrace{\boldsymbol{W} \boldsymbol{v}\_{1}}_{\boldsymbol{u}\_{1}}, \ldots, \underbrace{\boldsymbol{W} \boldsymbol{v}\_{n}}\_{\boldsymbol{u}\_{n}}]=\boldsymbol{U}
$$
&lt;ul>
&lt;li>
&lt;p>Non-commutative: $\boldsymbol{V} \boldsymbol{W} \neq \boldsymbol{W} \boldsymbol{V}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Associative: $\boldsymbol{V}(\boldsymbol{W} \boldsymbol{X})=(\boldsymbol{V} \boldsymbol{W}) \boldsymbol{X}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Transpose product:
&lt;/p>
$$
(\boldsymbol{V} \boldsymbol{W}) ^{T}=\boldsymbol{W}^{T} \boldsymbol{V}^{T}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Matrix inverse&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>scalar
&lt;/p>
$$
w \cdot w^{-1}=1
$$
&lt;/li>
&lt;li>
&lt;p>matrices
&lt;/p>
$$
\boldsymbol{W} \boldsymbol{W}^{-1}=\boldsymbol{I}, \quad \boldsymbol{W}^{-1} \boldsymbol{W}=\boldsymbol{I}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="important-special-cases">Important Special Cases&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Scalar (Inner) product:&lt;/strong>
&lt;/p>
$$
\langle\boldsymbol{w}, \boldsymbol{v}\rangle = \boldsymbol{w}^{T} \boldsymbol{v}=\left[w\_{1}, \ldots, w\_{n}\right]\left[\begin{array}{c}{v\_{1}} \\\\ {\vdots} \\\\ {v\_{n}}\end{array}\right]=w\_{1} v\_{1}+\cdots+w\_{n} v\_{n}
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Compute row/column averages of matrix&lt;/strong>
&lt;/p>
$$
\boldsymbol{X}=\underbrace{\left[\begin{array}{ccc}{X\_{1,1}} &amp; {\dots} &amp; {X\_{1, m}} \\\\ {\vdots} &amp; {} &amp; {\vdots} \\\\ {X\_{n, 1}} &amp; {\dots} &amp; {X\_{n, m}}\end{array}\right]}\_{n \text { (samples) } \times m \text { (entries) }}
$$
&lt;ul>
&lt;li>
&lt;p>Vector of row averages (average over all entries per sample)
&lt;/p>
$$
\left[\begin{array}{cc}{\frac{1}{m} \sum\_{i=1}^{m} X\_{1, i}} \\\\ {\vdots} &amp; {} \\\\ {\frac{1}{m} \sum_{i=1}^{m} X\_{n, i}}\end{array}\right]=\boldsymbol{X}\left[\begin{array}{c}{\frac{1}{m}} \\\\ {\vdots} \\\\ {\frac{1}{m}}\end{array}\right]=\boldsymbol{X} \boldsymbol{a}, \quad \text { with } \boldsymbol{a}=\left[\begin{array}{c}{\frac{1}{m}} \\\\ {\vdots} \\\\ {\frac{1}{m}}\end{array}\right]
$$
&lt;/li>
&lt;li>
&lt;p>Vector of column averages (average over all samples per entry)
&lt;/p>
$$
\left[\frac{1}{n} \sum_{i=1}^{n} X\_{i, 1}, \ldots, \frac{1}{n} \sum\_{i=1}^{n} X\_{i, m}\right]=\left[\frac{1}{n}, \ldots, \frac{1}{n}\right] \boldsymbol{X}=\boldsymbol{b}^{T} \boldsymbol{X}, \text { with } \boldsymbol{b}=\left[\begin{array}{c}{\frac{1}{n}} \\\\ {\vdots} \\\\ {\frac{1}{n}}\end{array}\right]
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="calculus">Calculus&lt;/h2>
&lt;ul>
&lt;li>
&lt;blockquote>
&lt;p>“The derivative of a function of a real variable measures &lt;strong>the sensitivity to change of a quantity&lt;/strong> (a function value or dependent variable) which is determined by another quantity (the independent variable)”&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Scalar&lt;/th>
&lt;th>Vector&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Function&lt;/td>
&lt;td>$f(x)$&lt;/td>
&lt;td>$f(\boldsymbol{x})$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Derivative&lt;/td>
&lt;td>$\frac{\partial f(x)}{\partial x}=g$&lt;/td>
&lt;td>$\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}=\left[\frac{\partial f(\boldsymbol{x})}{\partial x\_{1}}, \ldots, \frac{\partial f(\boldsymbol{x})}{\partial x\_{d}}\right]^{T} =: \nabla f(x)\quad$&lt;br />(👆 gradient of function $f$ at $\boldsymbol{x}$)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Min/Max&lt;/td>
&lt;td>$\frac{\partial f(x)}{\partial x}=0$&lt;/td>
&lt;td>$\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}=[0, \ldots, 0]^{T}=\mathbf{0}$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="matrix-calculus">Matrix Calculus&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Scalar&lt;/th>
&lt;th>Vector&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Linear&lt;/td>
&lt;td>$\frac{\partial a x}{\partial x}=a$&lt;/td>
&lt;td>$\nabla\_{\boldsymbol{x}} \boldsymbol{A} \boldsymbol{x}=\boldsymbol{A}^{T}$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Quadratic&lt;/td>
&lt;td>$\frac{\partial x^{2}}{\partial x}=2 x$&lt;/td>
&lt;td>$\begin{array}{l}{\nabla\_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{x}=2 \boldsymbol{x}} \\\\ {\nabla\_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x}=2 \boldsymbol{A} \boldsymbol{x}}\end{array}$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table></description></item><item><title>End-to-End Machine Learning Project</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/e2e-ml-project/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/e2e-ml-project/</guid><description>&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/e2e_ML_Project.png" alt="e2e_ML_Project" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="1-look-at-the-big-picture">1. Look at the big picture&lt;/h2>
&lt;h3 id="11-frame-the-problem">1.1 Frame the problem&lt;/h3>
&lt;p>Consider the business objective: How do we expect to use and benefit from this model?&lt;/p>
&lt;h3 id="12-select-a-performance-measure">1.2 Select a performance measure&lt;/h3>
&lt;h3 id="13-check-the-assumptions">1.3 Check the assumptions&lt;/h3>
&lt;p>List and verify the assumptions.&lt;/p>
&lt;h2 id="2-get-the-data">2. Get the data&lt;/h2>
&lt;h3 id="21-download-the-data">2.1 Download the data&lt;/h3>
&lt;p>Automate this process: Create a small function to handle downloading, extracting, and storing data.&lt;/p>
&lt;h3 id="22-take-a-quick-look-at-the-data">2.2 Take a quick look at the data&lt;/h3>
&lt;ul>
&lt;li>Use &lt;code>pandas.head()&lt;/code> to look at the top rows of the data&lt;/li>
&lt;li>Use &lt;code>pandas.info()&lt;/code> to get a quick description of the data
&lt;ul>
&lt;li>For categorical attributes, use &lt;code>value_counts()&lt;/code> to see categories and the #samples of each category&lt;/li>
&lt;li>For numerical attributes, use &lt;code>describe()&lt;/code> to get a summary of the numerical attributes.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="create-a-test-set">Create a test set&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>If dataset is large enough, use &lt;strong>purely random sampling&lt;/strong>. (&lt;code>train_test_split&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If the test set need to be representative of the overall data, use &lt;strong>stratified sampling&lt;/strong>.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="3-discover-and-visualize-the-data-to-gain-insights">3. Discover and visualize the data to gain insights&lt;/h2>
&lt;ol>
&lt;li>Make sure put the test set aside and only explore the training set&lt;/li>
&lt;li>If the trainingset is very large, sample an exploration set to make manipulations easy and fast&lt;/li>
&lt;/ol>
&lt;h3 id="31-visualizing-data">3.1 Visualizing data&lt;/h3>
&lt;h3 id="32-look-for-correlations">3.2 Look for correlations&lt;/h3>
&lt;p>Two ways:&lt;/p>
&lt;ul>
&lt;li>Compute the &lt;strong>standard correlation coefficient&lt;/strong> (also called &lt;strong>Pearson&amp;rsquo;s r&lt;/strong>) between every pair of attributes using the &lt;code>corr()&lt;/code> method.&lt;/li>
&lt;li>Or use &lt;code>panda&lt;/code>&amp;rsquo;s &lt;code>scatter_matrix&lt;/code> function&lt;/li>
&lt;/ul>
&lt;h3 id="33-experimenting-with-attribute-combinations">3.3 Experimenting with attribute combinations&lt;/h3>
&lt;h2 id="4-prepare-the-data-for-ml-algorithms">4. Prepare the data for ML algorithms&lt;/h2>
&lt;p>&lt;strong>Firstly, ensure a clean training set and separate the predictors and labels.&lt;/strong>&lt;/p>
&lt;h3 id="41-data-cleaning">4.1 Data cleaning&lt;/h3>
&lt;p>Handle missing features:&lt;/p>
&lt;ul>
&lt;li>Get rid of the corresponding samples (districts) -&amp;gt; use &lt;code>dropna()&lt;/code>&lt;/li>
&lt;li>Get rid of the whole attribute -&amp;gt; use &lt;code>drop()&lt;/code>&lt;/li>
&lt;li>Set the values to some value (zero, the mean, the median, etc.) -&amp;gt; use &lt;code>fillna()&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Or apply &lt;code>SimpleImputer&lt;/code> from Scikit-Learn to all the numerical attributes.&lt;/p>
&lt;h3 id="42-handle-text-and-categorical-attributes">4.2 Handle text and categorical attributes&lt;/h3>
&lt;p>Most ML algorithms prefer to work with numbers anyway.
Transform text and categorical attributes to numerical attributes Using One-hot encoding.&lt;/p>
&lt;h3 id="43-custom-transformers">4.3 Custom transformers&lt;/h3>
&lt;p>The custom transformer should work seamlessly with Scikit-Learn functionalities (such as pipelines).
-&amp;gt; Create a class and implement three methods:&lt;/p>
&lt;ul>
&lt;li>&lt;code>fit()&lt;/code>&lt;/li>
&lt;li>&lt;code>transform()&lt;/code>&lt;/li>
&lt;li>&lt;code>fit_transform()&lt;/code> (can get it by simply adding &lt;code>TransfromerMixin&lt;/code> as a base class)&lt;/li>
&lt;/ul>
&lt;p>If we add &lt;code>BaseEstimator&lt;/code> as a bass class, we can get two extra methods&lt;/p>
&lt;ul>
&lt;li>&lt;code>get_params()&lt;/code>&lt;/li>
&lt;li>&lt;code>set_params()&lt;/code>
that will be useful for automatic hyperparameter tuning.&lt;/li>
&lt;/ul>
&lt;h3 id="44-feature-scaling">4.4 Feature scaling&lt;/h3>
&lt;p>Comman ways:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Min-max scaling (normalization)&lt;/strong>: Use &lt;code>MinMaxScalar&lt;/code>&lt;/li>
&lt;li>&lt;strong>Standardization&lt;/strong>
&lt;ul>
&lt;li>Use &lt;code>StandardScalar&lt;/code>&lt;/li>
&lt;li>Less affected by outliners&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="45-transformation-pipelines">4.5 Transformation pipelines&lt;/h3>
&lt;p>Group sequences of transformations into one step.&lt;/p>
&lt;p>&lt;code>Pipeline&lt;/code> from &lt;code>scikit-learn&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>a list of name/estimator pairs defining a sequence of steps&lt;/li>
&lt;li>the last estimator must be transformers (must have a &lt;code>fit_transform()&lt;/code> method)&lt;/li>
&lt;li>names can be anything but must be unique and don&amp;rsquo;t contain double underscores &amp;ldquo;__&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;p>More convenient is to use a &lt;strong>single&lt;/strong> transformer to handle the categorical columns and the numerical columns.
-&amp;gt; Use &lt;code>ColumbTransformer&lt;/code>: handle all columns, applying the appropriate transformations to each column and also works great with Pandas DataFrames.&lt;/p>
&lt;h2 id="5-select-a-model-and-train-it">5. Select a model and train it&lt;/h2>
&lt;h3 id="51-train-and-evaluate-on-the-trainging-set">5.1 Train and evaluate on the trainging set&lt;/h3>
&lt;h3 id="52-better-evaluation-using-cross-validation">5.2 Better evaluation using Cross-Validation&lt;/h3>
&lt;h2 id="6-fine-tune-the-model">6. Fine-tune the model&lt;/h2>
&lt;h3 id="61-grid-search">6.1 Grid search&lt;/h3>
&lt;p>When exploring &lt;strong>relatively few&lt;/strong> combinations, use &lt;code>GridSearchCV&lt;/code>: Tell it which hyperparameters we want to experiment with, and what values to try out. Then it will evaluate all the possible combinations of hyperparameter values, using cross-validation.&lt;/p>
&lt;h3 id="62-randomized-search">6.2 Randomized search&lt;/h3>
&lt;p>When the hyperparameter search space is &lt;strong>large&lt;/strong>, use &lt;code>RandomizedSearchCV&lt;/code>. It evaluates a given number of random combinations by selecting a random value for each hyperparameter at every iteration.&lt;/p>
&lt;h3 id="63-ensemble-methods">6.3 Ensemble methods&lt;/h3>
&lt;p>Try to combine the models that perform best.&lt;/p>
&lt;h3 id="64-analyze-the-best-models-and-their-errors">6.4 Analyze the best models and their errors&lt;/h3>
&lt;p>Gain good insights on the problem by inspecting the best models.&lt;/p>
&lt;h3 id="65-evaluate-the-system-on-the-test-set">6.5 Evaluate the system on the test set&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>Get the predictors and labels from test set&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Run full pipeline to transform the data&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Evaluate the final model on the test set&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="7-present-the-solution">7. Present the solution&lt;/h2>
&lt;h2 id="8-launch-monitor-and-maintain-the-system">8. Launch, monitor, and maintain the system&lt;/h2>
&lt;ul>
&lt;li>Plug the production input data source into the system and write test&lt;/li>
&lt;li>Write monitoring code to check system&amp;rsquo;s live performance at regular intervals and trigger callouts when it drops&lt;/li>
&lt;li>Evaluate the system&amp;rsquo;s input data quality&lt;/li>
&lt;li>Train the models on a regular basis using fresh data (automate this precess as much as possible!)&lt;/li>
&lt;/ul></description></item><item><title>Evaluation</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/evaluation/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/evaluation/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/Confusion_Matrix_and_ROC.png"
alt="Confusion matrix, ROC, and AUC">&lt;figcaption>
&lt;p>Confusion matrix, ROC, and AUC&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;h2 id="confuse-matrix">Confuse matrix&lt;/h2>
&lt;p>A confusion matrix tells you what your ML algorithm did right and what it did wrong.&lt;/p>
&lt;style type="text/css">
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-cly1{text-align:left;vertical-align:middle}
.tg .tg-tab6{color:#77b300;text-align:left;vertical-align:top}
.tg .tg-viqs{color:#fe0000;text-align:left;vertical-align:top}
.tg .tg-0lax{text-align:left;vertical-align:top}
.tg .tg-hjor{font-weight:bold;color:#9698ed;text-align:center;vertical-align:middle}
.tg .tg-dsu0{color:#9698ed;text-align:left;vertical-align:top}
.tg .tg-0sd6{font-weight:bold;color:#3399ff;text-align:center;vertical-align:top}
.tg .tg-12v1{color:#3399ff;text-align:left;vertical-align:top}
&lt;/style>
&lt;table class="tg">
&lt;tr>
&lt;th class="tg-0lax" colspan="2" rowspan="2">&lt;/th>
&lt;th class="tg-hjor" colspan="2">Known Truth&lt;/th>
&lt;th class="tg-cly1" rowspan="2">&lt;/th>
&lt;/tr>
&lt;tr>
&lt;td class="tg-dsu0">Positive&lt;/td>
&lt;td class="tg-dsu0">Negative&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0sd6" rowspan="2">&lt;br>Prediction&lt;/td>
&lt;td class="tg-12v1">Positive&lt;/td>
&lt;td class="tg-tab6">True Positive (TP)&lt;/td>
&lt;td class="tg-viqs">False Positive (FP)&lt;/td>
&lt;td class="tg-0lax">Precision = TP / (TP+FP)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-12v1">Negative&lt;/td>
&lt;td class="tg-viqs">False Negative (FN)&lt;/td>
&lt;td class="tg-tab6">True Negative (TN)&lt;/td>
&lt;td class="tg-0lax" rowspan="2">&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td class="tg-0lax" colspan="2">&lt;/td>
&lt;td class="tg-0lax">TPR = Sensitivity = Recall &lt;br> = TP / (TP + FN)&lt;/td>
&lt;td class="tg-0lax">Specificity = TN / (FP+TN) &lt;br> FPR = FP / (FP + TN) = 1 - Specificity &lt;/td>
&lt;/tr>
&lt;/table>
&lt;ul>
&lt;li>Row: Prediction&lt;/li>
&lt;li>Column: Known truth&lt;/li>
&lt;/ul>
&lt;p>Each cell:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Positive/negative: refers to the prediction&lt;/p>
&lt;/li>
&lt;li>
&lt;p>True/False: Whether this prediction matches to the truth&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The numbers along the diagonal (green) tell us how many times the samples were correctly classified&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The numbers not on the diagonal (red) are samples the algorithm messed up.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="definition">Definition&lt;/h2>
&lt;h3 id="precision">&lt;strong>Precision&lt;/strong>&lt;/h3>
&lt;p>How many selected items are relevant?
&lt;/p>
$$
\text{ Precision } = \frac{TP}{TP + FP}
=\frac{\\# \text{ relevant item retrieved }}{\\# \text{ of items retrieved }}
$$
&lt;h3 id="recall--true-positive-rate-tpr--sensitivity">&lt;strong>Recall / True Positive Rate (TPR) / Sensitivity&lt;/strong>&lt;/h3>
&lt;p>How many relevant items are selected?
&lt;/p>
$$
\text { Recall } = \frac{TP}{TP + FN}
=\frac{\\# \text { relevant item retrieved }}{\\# \text { of relevant items in collection }}
$$
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/350px-Precisionrecall.svg.png" alt="img">&lt;/p>
&lt;details>
&lt;summary>Example&lt;/summary>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.38.png" alt="截屏2020-09-15 11.51.38" style="zoom: 33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.43.png" alt="截屏2020-09-15 11.51.43" style="zoom:33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.46.png" alt="截屏2020-09-15 11.51.46" style="zoom:33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.49.png" alt="截屏2020-09-15 11.51.49" style="zoom:33%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-09-15%2011.51.52.png" alt="截屏2020-09-15 11.51.52" style="zoom:33%;" />
&lt;/details>
&lt;h3 id="f-score--f-measure">&lt;strong>F-score / F-measure&lt;/strong>&lt;/h3>
&lt;h4 id="f_1-score">$F\_1$ score&lt;/h4>
&lt;p>The traditional F-measure or balanced F-score (&lt;strong>$F\_1$ score&lt;/strong>) is the &lt;a href="https://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers">harmonic mean&lt;/a> of precision and recall:
&lt;/p>
$$
F\_1=\frac{2 \cdot \text {precison} \cdot \text {recall}}{\text {precision}+\text {recall}} = \frac{2TP}{2TP + FP + FN}
$$
&lt;h4 id="f_beta-score">$F\_\beta$ score&lt;/h4>
&lt;p>$F\_\beta$ uses a positive real factor $\beta$, where $\beta$ is chosen such that &lt;strong>recall is considered $\beta$ times as important as precision&lt;/strong>
&lt;/p>
$$
F\_{\beta}=\left(1+\beta^{2}\right) \cdot \frac{\text { precision } \cdot \text { recall }}{\left(\beta^{2} \cdot \text { precision }\right)+\text { recall }}
$$
&lt;p>
Two commonly used values for $\beta$:&lt;/p>
&lt;ul>
&lt;li>$2$: weighs recall &lt;strong>higher&lt;/strong> than precision&lt;/li>
&lt;li>$0.5$: weighs recall &lt;strong>lower&lt;/strong> than precision&lt;/li>
&lt;/ul>
&lt;h3 id="specificity">Specificity&lt;/h3>
$$
\text{Specifity} = \frac{TN}{FP + TN}
$$
&lt;h3 id="false-positive-rate-fpr">False Positive Rate (FPR)&lt;/h3>
$$
\text{FPR} = \frac{FP}{FP + TN} \left(= 1- \frac{TN}{FP + TN} = 1- \text{Specifity}\right)
$$
&lt;h2 id="relation-between-sensitivity-specificity-fpr-and-threshold">Relation between Sensitivity, Specificity, FPR and Threshold&lt;/h2>
&lt;p>Assuming that the distributions of the actual postive and negative classes looks like this:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/evaluation-metrics-Page-1.png" alt="evaluation-metrics-Page-1" style="zoom:67%;" />
&lt;p>And we have already defined our threshold. What greater than the threshold will be predicted as positive, and smaller than the threshold will be predicted as negative.&lt;/p>
&lt;p>If we set a lower threshold, we&amp;rsquo;ll get the following diagram:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/evaluation-metrics-2.png" alt="evaluation-metrics-2" style="zoom:67%;" />
&lt;p>We can notice that FP ⬆️ , and FN ⬇️ .&lt;/p>
&lt;p>Therefore, we have the relationship:&lt;/p>
&lt;ul>
&lt;li>Threshold ⬇️
&lt;ul>
&lt;li>FP ⬆️ , FN ⬇️&lt;/li>
&lt;li>$\text{Sensitivity} (= TPR) = \frac{TP}{TP + FN}$ ⬆️ , $\text{Specificity} = \frac{TN}{TN + FP}$ ⬇️&lt;/li>
&lt;li>$FPR (= 1 - \text{Specificity})$⬆️&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>And vice versa&lt;/li>
&lt;/ul>
&lt;h2 id="auc-roc-curve">AUC-ROC curve&lt;/h2>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/evaluation-metrics-ROC-AUC.png" alt="evaluation-metrics-ROC-AUC" style="zoom:80%;" />
&lt;p>AUC (&lt;strong>Area Under The Curve&lt;/strong>)-ROC (&lt;strong>Receiver Operating Characteristics&lt;/strong>) curve&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Performance measurement for the classification problems at various threshold settings.&lt;/p>
&lt;ul>
&lt;li>ROC is a probability curve&lt;/li>
&lt;li>AUC represents the degree or measure of separability&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Tells how much the model is capable of distinguishing between classes.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="how-is-roc-plotted">How is ROC plotted?&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-python" data-lang="python">&lt;span class="line">&lt;span class="cl">&lt;span class="k">for&lt;/span> &lt;span class="n">threshold&lt;/span> &lt;span class="ow">in&lt;/span> &lt;span class="n">thresholds&lt;/span>&lt;span class="p">:&lt;/span> &lt;span class="c1"># iterate over all thresholds&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">TPR&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">FPR&lt;/span> &lt;span class="o">=&lt;/span> &lt;span class="n">classify&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">threshold&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># calculate TPR and FPR based on threshold&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl"> &lt;span class="n">plot_point&lt;/span>&lt;span class="p">(&lt;/span>&lt;span class="n">FPR&lt;/span>&lt;span class="p">,&lt;/span> &lt;span class="n">TPR&lt;/span>&lt;span class="p">)&lt;/span> &lt;span class="c1"># plot coordinate (FPR, TPR) in the diagram&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="n">connect_points&lt;/span>&lt;span class="p">()&lt;/span> &lt;span class="c1"># connect all plotted points to get ROC curve&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Example:&lt;/p>
&lt;p>Suppose that the probability of a series of samples being classified into positive classes has been derived and we sort them descendingly:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2021-02-24%2022.05.59.png" alt="截屏2021-02-24 22.05.59" style="zoom: 50%;" />
&lt;ul>
&lt;li>Class: actual label of test sample&lt;/li>
&lt;li>Score: probability of classifying test sample as positive&lt;/li>
&lt;/ul>
&lt;p>Next, we use the &amp;ldquo;Score&amp;rdquo; value as the threshold (from high to low).&lt;/p>
&lt;ul>
&lt;li>
&lt;p>When the probability that the test sample is a positive sample is greater than or equal to this threshold, we consider it a positive sample, otherwise it is a negative sample.&lt;/p>
&lt;ul>
&lt;li>For example, for the 4-th sample, its &amp;ldquo;Score&amp;rdquo; has value 0.6. So Sample 1, 2, 3, 4 will be considered as positive, because their &amp;ldquo;Score&amp;rdquo; values are $\geq$ 0.6. Other samples are classified as negative.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>By picking a different threshold each time, we can get a set of FPR and TPR, i.e., a point on the ROC curve. In this way, we get a total of 20 sets of FPR and TPR values. We plot them in the diagram:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/081955100088586.jpg" alt="img" style="zoom:80%;" />
&lt;/li>
&lt;/ul>
&lt;h3 id="how-to-speculate-about-the-performance-of-the-model">How to speculate about the performance of the model?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>An &lt;strong>excellent&lt;/strong> model has &lt;strong>AUC near to the 1&lt;/strong> which means it has a good measure of separability.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.02.34.png"
alt="Ideal situation: two curves don’t overlap at all means model has an ideal measure of separability. It is perfectly able to distinguish between positive class and negative class.">&lt;figcaption>
&lt;p>Ideal situation: two curves don’t overlap at all means model has an ideal measure of separability. It is perfectly able to distinguish between positive class and negative class.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;/li>
&lt;li>
&lt;p>When $0.5 &lt; \text{AUC} &lt; 1$, there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. This is because the classifier is able to detect more numbers of True positives and True negatives than False negatives and False positives.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.09.30.png"
alt="When AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish between positive class and negative class.">&lt;figcaption>
&lt;p>When AUC is 0.7, it means there is a 70% chance that the model will be able to distinguish between positive class and negative class.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;/li>
&lt;li>
&lt;p>When AUC is 0.5, it means the model has no class separation capacity whatsoever.&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.05.28.png" alt="截屏2021-02-24 21.05.28">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>A &lt;strong>poor&lt;/strong> model has &lt;strong>AUC near to the 0&lt;/strong> which means it has the worst measure of separability.&lt;/p>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-24%2021.05.54.png"
alt="When AUC is approximately 0, the model is actually reciprocating the classes. It means the model is predicting a negative class as a positive class and vice versa.">&lt;figcaption>
&lt;p>When AUC is approximately 0, the model is actually reciprocating the classes. It means the model is predicting a negative class as a positive class and vice versa.&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;/li>
&lt;/ul>
&lt;h2 id="-video-tutorials">🎥 Video tutorials&lt;/h2>
&lt;h3 id="the-confusion-matrix">The confusion matrix&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/Kdsp6soqA7o?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="sensitivity-and-specificity">Sensitivity and specificity&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/vP06aMoz4v8?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="roc-and-auc">ROC and AUC&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/4jRBRDbJemM?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5">Understanding AUC - ROC Curve&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://deepai.org/machine-learning-glossary-and-terms/f-score">What is the F-score?&lt;/a>: very nice explanation with examples&lt;/li>
&lt;li>&lt;a href="http://www.cnblogs.com/dlml/p/4403482.html">机器学习之分类器性能指标之ROC曲线、AUC值&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Overview of Machine Learning Algorithms</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/ml-algo-overview/</link><pubDate>Mon, 17 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/ml-fundamentals/ml-algo-overview/</guid><description>&lt;h2 id="supervisedunsupervised-learning">Supervised/Unsupervised Learning&lt;/h2>
&lt;h3 id="supervised-learning">Supervised learning&lt;/h3>
&lt;p>The training data you feed to the algorithm &lt;strong>includes&lt;/strong> the desired solutions, called &lt;strong>labels&lt;/strong>&lt;/p>
&lt;p>Typical task:&lt;/p>
&lt;ul>
&lt;li>Classification&lt;/li>
&lt;li>Regression&lt;/li>
&lt;/ul>
&lt;p>Important supervised learning algo:&lt;/p>
&lt;ul>
&lt;li>k-Nearest Neighbors&lt;/li>
&lt;li>Linear Regression&lt;/li>
&lt;li>Logistic Regression&lt;/li>
&lt;li>Support Vector Machine (SVM)&lt;/li>
&lt;li>Decision Trees and Random Forests&lt;/li>
&lt;li>Neural Networks&lt;/li>
&lt;/ul>
&lt;h3 id="unsupervised-learning">Unsupervised learning&lt;/h3>
&lt;p>Training data is &lt;strong>unlabeled&lt;/strong>.&lt;/p>
&lt;p>Important unsupervised learning algo:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Clustering&lt;/p>
&lt;ul>
&lt;li>K-Means&lt;/li>
&lt;li>DBSCAN&lt;/li>
&lt;li>Hierarchical Cluster Analysis (HCA)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Anomaly detection and novelty detection&lt;/p>
&lt;ul>
&lt;li>One-class SVM&lt;/li>
&lt;li>Isolation Forest&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Visualization and dimensionality reduction&lt;/p>
&lt;ul>
&lt;li>Principal Component Analysis (PCA)&lt;/li>
&lt;li>Kernel PCA&lt;/li>
&lt;li>Locally-Linear Embedding (LLE)&lt;/li>
&lt;li>t-distributed Stochastic Neighbor Embedding (t-SNE)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Association rule learning&lt;/p>
&lt;ul>
&lt;li>Apriori&lt;/li>
&lt;li>Eclat&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="semisupervised-learning-supervised--unsupervised">Semisupervised learning (supervised + unsupervised)&lt;/h3>
&lt;p>Deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data&lt;/p>
&lt;h3 id="reinforcement-learning">Reinforcement Learning&lt;/h3>
&lt;p>The learning system, called an &lt;strong>agent&lt;/strong> in this context, can observe the environment, select and perform actions, and get rewards in return or penalties in the form of negative rewards.&lt;/p>
&lt;p>It must then learn by itself what is the best strategy, called a &lt;strong>policy&lt;/strong>, to get the most reward over time.&lt;/p>
&lt;p>A policy defines what action the agent should choose when it is in a given situation.&lt;/p>
&lt;h2 id="batch-and-online-learning">Batch and Online Learning&lt;/h2>
&lt;p>whether the system can learn incrementally from a stream of incoming data or not&lt;/p>
&lt;h3 id="batch-learning">Batch Learning&lt;/h3>
&lt;p>The system muss be trained using all the available data (I.e., it is incapable of learning incrementally)&lt;/p>
&lt;p>First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called &lt;strong>offline learning&lt;/strong>.&lt;/p>
&lt;p>Want a batch learning system to know about new data?&lt;/p>
&lt;p>Need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data). Then stop the old system and replace it with the new one.&lt;/p>
&lt;h3 id="online-learning">Online Learning&lt;/h3>
&lt;p>Train the system &lt;strong>incrementally&lt;/strong> by feeding it data instances sequentially, either individually or by small groups called &lt;strong>mini-batches&lt;/strong>.&lt;/p>
&lt;p>Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives.&lt;/p>
&lt;p>👍 Advantages:&lt;/p>
&lt;ul>
&lt;li>Great for systems that receive data as a continuous flow and need to adapt to chagne rapidly or autonomously&lt;/li>
&lt;li>Save a huge amount of space (After learning the new data instance, do not need them anymore and can just discard them)&lt;/li>
&lt;/ul>
&lt;p>😠 Challenge: if bad data is fed to the system, the system&amp;rsquo;s performance will gradually decline.&lt;/p>
&lt;p>🔧 Solution:&lt;/p>
&lt;ul>
&lt;li>monitor the system closely&lt;/li>
&lt;li>promptly switch learning off if detect a drop in performance&lt;/li>
&lt;li>monitor the input data and react to abnormal data&lt;/li>
&lt;/ul>
&lt;h2 id="instance-based-vs-model-based-learning">Instance-Based Vs. Model-Based Learning&lt;/h2>
&lt;h3 id="instance-based-learning">Instance-based learning&lt;/h3>
&lt;p>The system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them), using a similarity measure&lt;/p>
&lt;h3 id="model-based-learning">Model-based learning&lt;/h3>
&lt;p>Build a model of these examples, then use that model to make predictions&lt;/p></description></item></channel></rss>