<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Classification | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/classification/</link><atom:link href="https://haobin-tan.netlify.app/tags/classification/index.xml" rel="self" type="application/rss+xml"/><description>Classification</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 07 Nov 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Classification</title><link>https://haobin-tan.netlify.app/tags/classification/</link></image><item><title>Classification</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/</guid><description/></item><item><title>K Nearest Neighbors</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/k-nearest-neighbor/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/k-nearest-neighbor/</guid><description>&lt;h2 id="non-parametric-methods">Non-parametric Methods&lt;/h2>
&lt;ul>
&lt;li>Store all the training data&lt;/li>
&lt;li>Use the training data for doing predictions&lt;/li>
&lt;li>Do &lt;strong>NOT&lt;/strong> adapt parameters&lt;/li>
&lt;li>Often referred to as &lt;em>&lt;strong>instance-based methods&lt;/strong>&lt;/em>&lt;/li>
&lt;/ul>
&lt;p>👍 &lt;strong>Advantages&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Complexity adapts to training data&lt;/li>
&lt;li>Very fast at training&lt;/li>
&lt;/ul>
&lt;p>👎 &lt;strong>Disadvantages&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Slow for prediction&lt;/li>
&lt;li>Hard to use for high-dimensional input&lt;/li>
&lt;/ul>
&lt;h2 id="k-nearest-neighbour-classifiers">$k$-Nearest Neighbour Classifiers&lt;/h2>
&lt;p>To classify a new input vector $x,$&lt;/p>
&lt;ol>
&lt;li>Examine the $k$-closest training data points to $x$ (comman values for $k$: $k=3$, $k=5$)&lt;/li>
&lt;li>Assign the object to the &lt;strong>most frequently&lt;/strong> occurring class&lt;/li>
&lt;/ol>
&lt;p>
&lt;figure >
&lt;div class="flex justify-center ">
&lt;div class="w-100" >&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200128213429949.png" alt="image-20200128213429949" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>🤔 &lt;strong>When to consider?&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Can measure distances between data-points&lt;/li>
&lt;li>Less than 20 attributes per instance&lt;/li>
&lt;li>Lots of training data&lt;/li>
&lt;/ul>
&lt;p>👍 &lt;strong>Advantages&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Training is very fast&lt;/li>
&lt;li>Learn complex target functions&lt;/li>
&lt;li>Similar algorithm can be used for regression&lt;/li>
&lt;li>High accuracy&lt;/li>
&lt;li>Insensitive to outliers&lt;/li>
&lt;li>No assumptions about data&lt;/li>
&lt;/ul>
&lt;p>👎 &lt;strong>Disadvantages&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Computationally expensive&lt;/li>
&lt;li>requires a lot of memory&lt;/li>
&lt;/ul>
&lt;h3 id="decision-boundaries">Decision Boundaries&lt;/h3>
&lt;ul>
&lt;li>The nearest neighbour algorithm does &lt;strong>NOT&lt;/strong> explicitly compute decision boundaries.&lt;/li>
&lt;li>The decision boundaries form a subset of the Voronoi diagram for the training data.&lt;/li>
&lt;li>The &lt;em>more data&lt;/em> points we have, the &lt;em>more complex the decision boundary&lt;/em> can become&lt;/li>
&lt;/ul>
&lt;h3 id="distance-metrics">Distance Metrics&lt;/h3>
&lt;p>Most common distance metric: &lt;strong>Euclidean distance (ED)&lt;/strong>&lt;/p>
$$
d(\boldsymbol{x}, \boldsymbol{y})=\|\boldsymbol{x}-\boldsymbol{y}\|=\sqrt{\left(\sum_{k=1}^{d}\left(\boldsymbol{x}_{k}-\boldsymbol{y}_{k}\right)^{2}\right)}
$$
&lt;ul>
&lt;li>
&lt;p>makes sense when different features are &lt;strong>commensurate&lt;/strong>; each is variable measured in the &lt;strong>same units.&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If the units are different (e.g.. length and weight), data needs to be &lt;strong>normalised&lt;/strong> (resulting input dimensions are zero mean, unit variance)
&lt;/p>
$$
\tilde{\boldsymbol{x}}=(\boldsymbol{x}-\boldsymbol{\mu}) \oslash \boldsymbol{\sigma}
$$
&lt;ul>
&lt;li>$\mu$: Mean&lt;/li>
&lt;li>$\sigma$: Standard deviation&lt;/li>
&lt;li>$\oslash$: &lt;strong>element-wise&lt;/strong> division&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Another distance metrics:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Cosine Distance:&lt;/strong> Good for documents, images
$$
d(\boldsymbol{x}, \boldsymbol{y})=1-\frac{\boldsymbol{x}^{T} \boldsymbol{y}}{\|\boldsymbol{x}\|\|\boldsymbol{y}\|}
$$
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Hamming Distance:&lt;/strong> For string data / categorical features
$$
d(\boldsymbol{x}, \boldsymbol{y})=\sum_{k=1}^{d}\left(\boldsymbol{x}_{k} \neq \boldsymbol{y}_{k}\right)
$$
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Manhattan Distance:&lt;/strong> Coordinate-wise distance
$$
d(\boldsymbol{x}, \boldsymbol{y})=\sum_{k=1}^{d}\left|\boldsymbol{x}_{k}-\boldsymbol{y}_{k}\right|
$$
&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Mahalanobis Distance:&lt;/strong> Normalized by the sample covariance matrix – unaffected by coordinate transformations
$$
d(\boldsymbol{x}, \boldsymbol{y})=\|\boldsymbol{x}-\boldsymbol{y}\|_{\Sigma^{-1}}=\sqrt{(\boldsymbol{x}-\boldsymbol{y})^{T} \boldsymbol{\Sigma}^{-1}(\boldsymbol{x}-\boldsymbol{y})}
$$
&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Logistic Regression: Basics</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression/</guid><description>&lt;p>💡 &lt;strong>Use regression algorithm for classification&lt;/strong>&lt;/p>
&lt;p>Logistic regression: &lt;strong>estimate the probability that an instance belongs to a particular class&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>If the estimated probability is &lt;strong>greater than 50%&lt;/strong>, then the model predicts that the instance belongs to that class (called the &lt;strong>positive&lt;/strong> class, labeled “1”),&lt;/li>
&lt;li>or else it predicts that it does not (i.e., it belongs to the &lt;strong>negative&lt;/strong> class, labeled “0”).&lt;/li>
&lt;/ul>
&lt;p>This makes it a &lt;strong>binary&lt;/strong> classifier.&lt;/p>
&lt;h2 id="logistic--sigmoid-function">Logistic / Sigmoid function&lt;/h2>
&lt;img src="https://upload.wikimedia.org/wikipedia/commons/5/53/Sigmoid-function-2.svg" style="zoom:60%; background-color:white">
&lt;p>$\sigma(t)=\frac{1}{1+\exp (-t)}$&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Bounded: $\sigma(t) \in (0, 1)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Symmetric: $1 - \sigma(t) = \sigma(-t)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Derivative: $\sigma^{\prime}(t)=\sigma(t)(1-\sigma(t))$&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="estimating-probabilities-and-making-prediction">Estimating probabilities and making prediction&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>Computes a weighted sum of the input features (plus a bias term)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Outputs the logistic of this result&lt;/p>
&lt;p>$\hat{p}=h_{\theta}(\mathbf{x})=\sigma\left(\mathbf{x}^{\mathrm{T}} \boldsymbol{\theta}\right)$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Prediction:&lt;/p>
$$
\hat{y} = \begin{cases} 0 &amp; \text{ if } \hat{p}&lt;0.5\left(\Leftrightarrow h_{\theta}(\mathbf{x})&lt;0\right) \\\\
1 &amp; \text{ if }\hat{p} \geq 0.5\left(\Leftrightarrow h_{\theta}(\mathbf{x}) \geq 0\right)\end{cases}
$$
&lt;/li>
&lt;/ol>
&lt;h2 id="train-and-cost-function">Train and cost function&lt;/h2>
&lt;p>Objective of training: to set the parameter vector $\boldsymbol{\theta}$ so that the model estimates:&lt;/p>
&lt;ul>
&lt;li>high probabilities ($\geq 0.5$) for positive instances ($y=1$)&lt;/li>
&lt;li>low probabilities ($&lt; 0.5$) for negative instances ($y=0$)&lt;/li>
&lt;/ul>
&lt;h3 id="cost-function-of-a-single-training-instance">Cost function of a single training instance:&lt;/h3>
$$
c(\boldsymbol{\theta}) = \begin{cases} -\log (\hat{p}) &amp; \text{ if } y=1 \\\\
-\log (1-\hat{p}) &amp; \text{ if } y=0\end{cases}
$$
&lt;blockquote>
&lt;img src="https://miro.medium.com/max/1621/1*_NeTem-yeZ8Pr9cVUoi_HA.png" style="zoom:30%; background-color:white">
&lt;ul>
&lt;li>Actual lable: $y=1$, Misclassification: $\hat{y} = 0 \Leftrightarrow$ $\hat{p} = \sigma(h_{\boldsymbol{\theta}}(x))$ close to 0 $\Leftrightarrow c(\boldsymbol{\theta})$ large&lt;/li>
&lt;li>Actual lable: $y=0$, Misclassification: $\hat{y} = 1 \Leftrightarrow$ $\hat{p} = \sigma(h_{\boldsymbol{\theta}}(x))$ close to 1 $\Leftrightarrow c(\boldsymbol{\theta})$ large&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;h3 id="the-cost-function-over-the-whole-training-set">The cost function over the whole training set&lt;/h3>
&lt;p>Simply the average cost over all training instances (Combining the expressions of two different cases above into one single expression):&lt;/p>
&lt;p>$\begin{aligned} J(\boldsymbol{\theta}) &amp;=-\frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \log \left(\hat{p}^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)\right] \\\\ &amp;=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(\hat{p}^{(i)}\right)-\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)\right] \end{aligned}$&lt;/p>
&lt;blockquote>
&lt;ul>
&lt;li>$y^{(i)} =1:-y^{(i)} \log \left(\hat{p}^{(i)}\right)-\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)=-\log \left(\hat{p}^{(i)}\right)$&lt;/li>
&lt;li>$y^{(i)} =0:-y^{(i)} \log \left(\hat{p}^{(i)}\right)-\left(1-y^{(i)}\right) \log \left(1-\hat{p}^{(i)}\right)=-\log \left(1-\hat{p}^{(i)}\right)$
(Exactly the same as $c(\boldsymbol{\theta})$ for a single instance above 👏)&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;h3 id="training">Training&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>No closed-form equation 🤪&lt;/p>
&lt;/li>
&lt;li>
&lt;p>But it is convex so Gradient Descent (or any other optimization algorithm) is guaranteed to find the global minimum&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Partial derivatives of the cost function with regards to the $j$-th model parameter $\theta_j$:&lt;/p>
$$
\frac{\partial}{\partial \theta_{j}} J(\boldsymbol{\theta})=\frac{1}{m} \displaystyle \sum_{i=1}^{m}\left(\sigma\left(\boldsymbol{\theta}^{T} \mathbf{x}^{(l)}\right)-y^{(i)}\right) x_{j}^{(i)}
$$
&lt;/li>
&lt;/ul></description></item><item><title>Logistic Regression: Probabilistic view</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression-in-probabilistic-view/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/logistic-regression/logistic-regression-in-probabilistic-view/</guid><description>&lt;p>Class label:&lt;/p>
$$
y_i \in \\{0, 1\\}
$$
&lt;p>Conditional probability distribution of the class label is&lt;/p>
$$
\begin{aligned}
p(y=1|\boldsymbol{x}) &amp;= \sigma(\boldsymbol{w}^T\boldsymbol{x}+b) \\\\
p(y=0|\boldsymbol{x}) &amp;= 1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}+b)
\end{aligned}
$$
&lt;p>
with&lt;/p>
$$
\sigma(x) = \frac{1}{1+\operatorname{exp}(-x)}
$$
&lt;p>This is a &lt;strong>conditional Bernoulli distribution&lt;/strong>. Therefore, the probability can be represented as&lt;/p>
$$
\begin{array}{ll}
p(y|\boldsymbol{x}) &amp;= p(y=1|\boldsymbol{x})^y p(y=0|\boldsymbol{x})^{1-y} \\\\
&amp; = \sigma(\boldsymbol{w}^T\boldsymbol{x}+b)^y (1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}+b))^{1-y}
\end{array}
$$
&lt;p>The &lt;strong>conditional Bernoulli log-likelihood&lt;/strong> is (assuming training data is i.i.d)&lt;/p>
$$
\begin{aligned}
\operatorname{loglik}(\boldsymbol{w}, \mathcal{D})
&amp;= \log(\operatorname{lik}(\boldsymbol{w}, \mathcal{D})) \\\\
&amp;= \log(\displaystyle\prod_i p(y_i|\boldsymbol{x}_i)) \\\\
&amp;= \log\left(\displaystyle\prod_i \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)^y \left(1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)\right)^{1-y}\right) \\\\
&amp;= \displaystyle\sum_i y\log\left(\sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)\right)+ (1-y)\log\left(1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}_i+b)\right)
\end{aligned}
$$
&lt;p>Let&lt;/p>
$$
\tilde{\boldsymbol{w}}=\left(\begin{array}{c}1 \\\\ \boldsymbol{w} \end{array}\right), \quad \tilde{\boldsymbol{x}_i}=\left(\begin{array}{c}b \\\\ \boldsymbol{x}_i \end{array}\right)
$$
&lt;p>Then:&lt;/p>
$$
\operatorname{loglik}(\boldsymbol{w}, \mathcal{D}) = \operatorname{loglik}(\tilde{\boldsymbol{w}}, \mathcal{D}) = \displaystyle\sum_i y\log\left(\sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i})\right)+ (1-y)\log\left(1 - \sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i}))\right)
$$
&lt;p>Our objective is to find the $\tilde{\boldsymbol{w}}^*$ that &lt;strong>maximize the log-likelihood&lt;/strong>, i.e.&lt;/p>
$$
\begin{array}{cl}
\tilde{\boldsymbol{w}}^* &amp;= \underset{\tilde{\boldsymbol{w}}}{\arg \max} \quad \operatorname{loglik}(\tilde{\boldsymbol{w}}, \mathcal{D}) \\\\
&amp;= \underset{\tilde{\boldsymbol{w}}}{\arg \min} \quad -\operatorname{loglik}(\tilde{\boldsymbol{w}}, \mathcal{D})\\\\
&amp;= \underset{\tilde{\boldsymbol{w}}}{\arg \min} \quad \underbrace{-\left(\displaystyle\sum_i y\log\left(\sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i})\right) + (1-y)\log\left(1 - \sigma(\tilde{\boldsymbol{w}}^T\tilde{\boldsymbol{x}_i}))\right)\right)}_{\text{cross-entropy loss}}
\end{array}
$$
&lt;p>In other words, &lt;strong>maximizing the (log-)likelihood is the same as minimizing the cross entropy.&lt;/strong>&lt;/p></description></item><item><title>SVM: Basics</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/support-vector-machine/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/support-vector-machine/</guid><description>&lt;h2 id="-goal-of-svm">🎯 Goal of SVM&lt;/h2>
&lt;p>To find the optimal separating hyperplane which &lt;strong>maximizes the margin&lt;/strong> of the training data&lt;/p>
&lt;ul>
&lt;li>it &lt;strong>correctly&lt;/strong> classifies the training data&lt;/li>
&lt;li>it is the one which will generalize better with unseen data (as far as possible from data points from each category)&lt;/li>
&lt;/ul>
&lt;h2 id="svm-math-formulation">SVM math formulation&lt;/h2>
&lt;p>Assuming data is linear separable&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304135136513.png" alt="image-20200304135136513" style="zoom:50%;" />
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Decision boundary&lt;/strong>: Hyperplane $\mathbf{w}^{T} \mathbf{x}+b=0$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Support Vectors:&lt;/strong> Data points closes to the decision boundary (Other examples can be ignored)&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Positive&lt;/strong> support vectors: $\mathbf{w}^{T} \mathbf{x}_{+}+b=+1$&lt;/li>
&lt;li>&lt;strong>negative&lt;/strong> support vectors: $\mathbf{w}^{T} \mathbf{x}_{-}+b=-1$&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>Why do we use 1 and -1 as class labels?&lt;/p>
&lt;ul>
&lt;li>This makes the math manageable, because -1 and 1 are only different by the sign. We can write a single equation to describe the margin or how close a data point is to our separating hyperplane and not have to worry if the data is in the -1 or +1 class.&lt;/li>
&lt;li>If a point is far away from the separating plane on the positive side, then $w^Tx+b$ will be a large positive number, and $label*(w^Tx+b)$ will give us a large number. If it’s far from the negative side and has a negative label, $label*(w^Tx+b)$ will also give us a large positive number.&lt;/li>
&lt;/ul>
&lt;/blockquote>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Margin&lt;/strong> $\rho$ : distance between the support vectors and the decision boundary and should be &lt;strong>maximized&lt;/strong>
&lt;/p>
$$
\rho = \frac{\mathbf{w}^{T} \mathbf{x}\_{+}+b}{\|\mathbf{w}\|}-\frac{\mathbf{w}^{T} \mathbf{x}\_{-}+b}{\|\mathbf{w}\|}=\frac{2}{\|\mathbf{w}\|}
$$
&lt;/li>
&lt;/ul>
&lt;h3 id="svm-optimization-problem">SVM optimization problem&lt;/h3>
&lt;p>Requirement:&lt;/p>
&lt;ol>
&lt;li>Maximal margin&lt;/li>
&lt;li>Correct classification&lt;/li>
&lt;/ol>
&lt;p>Based on these requirements, we have:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200713164553044.png" alt="image-20200713164553044" style="zoom:67%;" />
&lt;p>Reformulation:
&lt;/p>
$$
\begin{aligned}
\underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\\|\mathbf{w}\\|^{2} \\\\ \text {s.t.} \quad &amp; y_{i}\left(\mathbf{w}^{T} \mathbf{x}\_{i}+b\right) \geq 1
\end{aligned}
$$
&lt;p>This is the &lt;strong>hard margin SVM&lt;/strong>.&lt;/p>
&lt;h3 id="soft-margin-svm">Soft margin SVM&lt;/h3>
&lt;h4 id="-idea">💡 Idea&lt;/h4>
&lt;p>&lt;strong>&amp;ldquo;Allow the classifier to make some mistakes&amp;rdquo;&lt;/strong> (Soft margin)&lt;/p>
&lt;p>➡️ &lt;strong>Trade-off between margin and classification accuracy&lt;/strong>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304141838595.png" alt="image-20200304141838595" style="zoom:50%;" />
&lt;ul>
&lt;li>
&lt;p>Slack-variables: ${\color {blue}{\xi_{i}}} \geq 0$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>💡&lt;strong>Allows violating the margin conditions&lt;/strong>
&lt;/p>
$$
y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i}+b\right) \geq 1- \color{blue}{\xi_{i}}
$$
&lt;ul>
&lt;li>$0 \leq \xi\_{i} \leq 1$ : sample is between margin and decision boundary (&lt;span style="color:red">&lt;strong>margin violation&lt;/strong>&lt;/span>)&lt;/li>
&lt;li>$\xi\_{i} \geq 1$ : sample is on the wrong side of the decision boundary (&lt;span style="color:red">&lt;strong>misclassified&lt;/strong>&lt;/span>)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="soft-max-margin">Soft Max-Margin&lt;/h4>
&lt;p>Optimization problem
&lt;/p>
$$
\begin{array}{lll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\|\mathbf{w}\|^{2} + \color{blue}{C \sum_i^N \xi_i} \qquad \qquad &amp; \text{(Punish large slack variables)}\\\\
\text { s.t. } \quad &amp; y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i}+b\right) \geq 1 -\color{blue}{\xi_i}, \quad \xi_i \geq 0 \qquad \qquad &amp; \text{(Condition for soft-margin)}\end{array}
$$
&lt;ul>
&lt;li>$C$ : regularization parameter, determines how important $\xi$ should be
&lt;ul>
&lt;li>&lt;strong>Small&lt;/strong> $C$: Constraints have &lt;strong>little&lt;/strong> influence ➡️ &lt;strong>large&lt;/strong> margin&lt;/li>
&lt;li>&lt;strong>Large&lt;/strong> $C$: Constraints have &lt;strong>large&lt;/strong> influence ➡️ &lt;strong>small&lt;/strong> margin&lt;/li>
&lt;li>$C$ infinite: Constraints are enforced ➡️ &lt;strong>hard&lt;/strong> margin&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="soft-svm-optimization">Soft SVM Optimization&lt;/h4>
&lt;p>Reformulate into an unconstrained optimization problem&lt;/p>
&lt;ol>
&lt;li>Rewrite constraints: $\xi_{i} \geq 1-y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i}+b\right)=1-y_{i} f\left(\boldsymbol{x}_{i}\right)$&lt;/li>
&lt;li>Together with $\xi_{i} \geq 0 \Rightarrow \xi_{i}=\max \left(0,1-y_{i} f\left(\boldsymbol{x}_{i}\right)\right)$&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Unconstrained optimization&lt;/strong> (over $\mathbf{w}$):
&lt;/p>
$$
\underset{{\mathbf{w}}}{\operatorname{argmin}} \underbrace{\|\mathbf{w}\|^{2}}\_{\text {regularization }}+C \underbrace{\sum_{i=1}^{N} \max \left(0,1-y\_{i} f\left(\boldsymbol{x}\_{i}\right)\right)}_{\text {loss function }}
$$
&lt;p>
Points are in 3 categories:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>$y\_{i} f\left(\boldsymbol{x}\_{i}\right) > 1$ : Point &lt;strong>outside&lt;/strong> margin, &lt;strong>no contribution&lt;/strong> to loss&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$y\_{i} f\left(\boldsymbol{x}\_{i}\right) = 1$: Point is &lt;strong>on&lt;/strong> the margin, &lt;strong>no contribution&lt;/strong> to loss as &lt;strong>in hard margin&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$y\_{i} f\left(\boldsymbol{x}\_{i}\right) &lt; 1$: &lt;span style="color:red">&lt;strong>Point violates the margin, contributes to loss&lt;/strong>&lt;/span>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="loss-function">Loss function&lt;/h4>
&lt;p>SVMs uses &amp;ldquo;hinge&amp;rdquo; loss (approximation of 0-1 loss)&lt;/p>
&lt;blockquote>
&lt;p>&lt;a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss&lt;/a>&lt;/p>
&lt;p>For an intended output $t=\pm 1$ and a classifier score $y$, the hinge loss of the prediction $y$ is defined as
&lt;/p>
$$
> \ell(y)=\max (0,1-t \cdot y)
> $$
&lt;p>
Note that $y$ should be the &amp;ldquo;raw&amp;rdquo; output of the classifier&amp;rsquo;s decision function, not the predicted class label. For instance, in linear SVMs, $y = \mathbf{w}\cdot \mathbf{x}+ b$, where $(\mathbf{w},b)$ are the parameters of the hyperplane and $mathbf{x}$ is the input variable(s).&lt;/p>
&lt;/blockquote>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304172146690.png" alt="image-20200304172146690" style="zoom:40%;" />
&lt;p>The loss function of SVM is &lt;strong>convex&lt;/strong>:&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304172349088.png" alt="image-20200304172349088" style="zoom: 33%;" />
&lt;p>I.e.,&lt;/p>
&lt;ul>
&lt;li>There is only &lt;strong>one&lt;/strong> minimum&lt;/li>
&lt;li>We can find it with gradient descent&lt;/li>
&lt;li>&lt;strong>However:&lt;/strong> Hinge loss is &lt;strong>not differentiable!&lt;/strong> 🤪&lt;/li>
&lt;/ul>
&lt;h2 id="sub-gradients">Sub-gradients&lt;/h2>
&lt;p>For convex function $f: \mathbb{R}^d \to \mathbb{R}$ :
&lt;/p>
$$
f(\boldsymbol{z}) \geq f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{T}(\boldsymbol{z}-\boldsymbol{x})
$$
&lt;p>
(Linear approximation underestimates function)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304172748278.png" alt="image-20200304172748278" style="zoom:33%;" />
&lt;p>A &lt;strong>subgradient&lt;/strong> of a convex function $f$ at point $\boldsymbol{x}$ is any $\boldsymbol{g}$ such that
&lt;/p>
$$
f(\boldsymbol{z}) \geq f(\boldsymbol{x})+\nabla \mathbf{g}^{T}(\boldsymbol{z}-\boldsymbol{x})
$$
&lt;ul>
&lt;li>Always exists (even $f$ is not differentiable)&lt;/li>
&lt;li>If $f$ is differentiable at $\boldsymbol{x}$, then: $\boldsymbol{g}=\nabla f(\boldsymbol{x})$&lt;/li>
&lt;/ul>
&lt;h3 id="example">Example&lt;/h3>
&lt;p>$f(x)=|x|$&lt;/p>
&lt;ul>
&lt;li>$x \neq 0$ : unique sub-gradient is $g= \operatorname{sign}(x)$&lt;/li>
&lt;li>$x =0$ : $g \in [-1, 1]$&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/220px-Absolute_value.svg.png" alt="img">&lt;/p>
&lt;h3 id="sub-gradient-method">Sub-gradient Method&lt;/h3>
&lt;p>&lt;strong>Sub-gradient Descent&lt;/strong>&lt;/p>
&lt;ol>
&lt;li>Given &lt;strong>convex&lt;/strong> $f$, not necessarily differentiable&lt;/li>
&lt;li>Initialize $\boldsymbol{x}_0$&lt;/li>
&lt;li>Repeat: $\boldsymbol{x}\_{t+1}=\boldsymbol{x}\_{t}+\eta \boldsymbol{g}$, where $\boldsymbol{g}$ is any sub-gradient of $f$ at point $\boldsymbol{x}_{t}$&lt;/li>
&lt;/ol>
&lt;p>‼️ Notes:&lt;/p>
&lt;ul>
&lt;li>Sub-gradients do not necessarily decrease $f$ at every step (no real descent method)&lt;/li>
&lt;li>Need to keep track of the best iterate $\boldsymbol{x}^*$&lt;/li>
&lt;/ul>
&lt;h4 id="sub-gradients-for-hinge-loss">Sub-gradients for hinge loss&lt;/h4>
$$
\mathcal{L}\left(\mathbf{x}\_{i}, y\_{i} ; \mathbf{w}\right)=\max \left(0,1-y\_{i} f\left(\mathbf{x}\_{i}\right)\right) \quad f\left(\mathbf{x}\_{i}\right)=\mathbf{w}^{\top} \mathbf{x}\_{i}+b
$$
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image-20200304175930294.png" alt="image-20200304175930294" style="zoom:33%;" />
&lt;h4 id="sub-gradient-descent-for-svms">Sub-gradient descent for SVMs&lt;/h4>
&lt;p>Recall the &lt;strong>Unconstrained optimization&lt;/strong> for SVMs:
&lt;/p>
$$
\underset{{\mathbf{w}}}{\operatorname{argmin}} \quad C \underbrace{\sum\_{i=1}^{N} \max \left(0,1-y_{i} f\left(\boldsymbol{x}\_{i}\right)\right)}\_{\text {loss function }} + \underbrace{\|\mathbf{w}\|^{2}}\_{\text {regularization }}
$$
&lt;p>
At each iteration, pick random training sample $(\boldsymbol{x}_i, y_i)$&lt;/p>
&lt;ul>
&lt;li>
&lt;p>If $y_{i} f\left(\boldsymbol{x}_{i}\right)&lt;1$: ​
&lt;/p>
$$
\boldsymbol{w}{t+1}=\boldsymbol{w}{t}-\eta\left(2 \boldsymbol{w}{t}-C y{i} \boldsymbol{x}_{i}\right)
$$
&lt;/li>
&lt;li>
&lt;p>Otherwise:
&lt;/p>
$$
\quad \boldsymbol{w}\_{t+1}=\boldsymbol{w}\_{t}-\eta 2 \boldsymbol{w}\_{t}
$$
&lt;/li>
&lt;/ul>
&lt;h2 id="application-of-svms">Application of SVMs&lt;/h2>
&lt;ul>
&lt;li>Pedestrian Tracking&lt;/li>
&lt;li>text (and hypertext) categorization&lt;/li>
&lt;li>image classification&lt;/li>
&lt;li>bioinformatics (Protein classification, cancer classification)&lt;/li>
&lt;li>hand-written character recognition&lt;/li>
&lt;/ul>
&lt;p>Yet, in the last 5-8 years, neural networks have outperformed SVMs on most applications.🤪☹️😭&lt;/p></description></item><item><title>SVM: Kernel Methods</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernel-methods/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernel-methods/</guid><description>&lt;h2 id="kernel-function">Kernel function&lt;/h2>
&lt;p>Given a mapping function $\phi: \mathcal{X} \rightarrow \mathcal{V}$, the function&lt;/p>
$$
\mathcal{K}: x \rightarrow v, \quad \mathcal{K}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left\langle\phi(\mathbf{x}), \phi\left(\mathbf{x}^{\prime}\right)\right\rangle_{\mathcal{V}}
$$
&lt;p>is called a &lt;strong>kernel function&lt;/strong>.&lt;/p>
&lt;p>&lt;em>&amp;ldquo;A kernel is a function that returns the result of a dot product performed in another space.&amp;rdquo;&lt;/em>&lt;/p>
&lt;h2 id="kernel-trick">Kernel trick&lt;/h2>
&lt;p>Applying the kernel trick simply means &lt;strong>replacing the dot product of two examples by a kernel function&lt;/strong>.&lt;/p>
&lt;h3 id="typical-kernels">Typical kernels&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Kernel Type&lt;/th>
&lt;th>Definition&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Linear kernel&lt;/strong>&lt;/td>
&lt;td>$k\left(\boldsymbol{x}, \boldsymbol{x}^{\prime}\right)=\left\langle\boldsymbol{x}, \boldsymbol{x}^{\prime}\right\rangle$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Polynomial kernel&lt;/strong>&lt;/td>
&lt;td>$k\left(\boldsymbol{x}, \boldsymbol{x}^{\prime}\right)=\left\langle\boldsymbol{x}, \boldsymbol{x}^{\prime}\right\rangle^{d}$&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Gaussian / Radial Basis Function (RBF) kernel&lt;/strong>&lt;/td>
&lt;td>$k \left(\boldsymbol{x}, \boldsymbol{y}\right)=\exp \left(-\frac{\|\boldsymbol{x}-\boldsymbol{y}\|^{2}}{2 \sigma^{2}}\right)$&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="why-do-we-need-kernel-trick">Why do we need kernel trick?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Kernels can be used for all feature based algorithms that can be rewritten such that they contain &lt;strong>inner products&lt;/strong> of feature vectors&lt;/p>
&lt;ul>
&lt;li>This is true for almost all feature based algorithms (Linear regression, SVMs, &amp;hellip;)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Kernels can be used to map the data $\mathbf{x}$ in an infinite dimensional feature space (i.e., a function space)&lt;/p>
&lt;ul>
&lt;li>&lt;strong>The feature vector never has to be represented explicitly&lt;/strong>&lt;/li>
&lt;li>&lt;strong>As long as we can evaluate the inner product of two feature vectors&lt;/strong>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>➡️ We can obtain a more powerful representation than standard linear feature models.&lt;/p>
&lt;p>&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="1049px" viewBox="-0.5 -0.5 1049 675" content="&amp;lt;mxfile host=&amp;quot;app.diagrams.net&amp;quot; modified=&amp;quot;2020-07-13T14:50:43.530Z&amp;quot; agent=&amp;quot;5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&amp;quot; etag=&amp;quot;XPL9LKrkJbpFtgSQEWYO&amp;quot; version=&amp;quot;13.4.2&amp;quot; type=&amp;quot;device&amp;quot;&amp;gt;&amp;lt;diagram id=&amp;quot;Q3K544789h8GwBtGwhde&amp;quot; name=&amp;quot;Page-1&amp;quot;&amp;gt;7Vpdj5s4FP01kXYeZgQGDHmcJDOt9kNadVbd9qlywAFmALPGmST99WsHEz7sJKQTdhJ1K7XF1zaxz7n3XBt7ZE3T9QeK8ugPEuBkBIxgPbJmIwBsCPm/wrApDa5rloaQxkFpahie4u9YGg1pXcYBLloNGSEJi/O20SdZhn3WsiFKyardbEGS9q/mKMSK4clHiWr9Ow5YJK0mHNcVH3EcRvKnPeCWFSnaNS4NRYQCsipN28lZDyNrSglh5VO6nuJEYFfhUiLwuKd2NzCKM9anw4w+efnrX96vrx8/fV1Nku9P3uZWvuUVJUs5YTlYtqkQ4G/hYPPCZBXFDD/lyBc1K043t0UsTXjJ5I+LOEmmJCF0289aLBbA97m9YJS84EZNAOfQgbxGnUI1HkwZXjdMckofMEkxoxveRNZ60k+ke1ljWV7VZAFb2qIGT2bVEUkHCXevrjHkDxLGEyB1hoM0QNhbCEgXJGMyVjjrGoih7+H54jwQ27CDMVAxhkCDsWUNhTFQMXamv/C/Iup48I7cyZeRO+OGGwV8PmvWRrmNXkYy3IFemlAShxkv+hxLzO0TgWHMf+5eVqRxECT7aKVkmQVYzMroMAirshykJrRPJs3sBIZt2AppurgAQ1E2VohoA0Ioi0hIMpT8TkgumXnGjG0kTGjJSJs3jg7dfJH9t4WvonDnVMXZulk522iw10bP46PB/+xqKuEHh+KpIEvq4wMAWDJ7IRpidqCdzB84aCUmlWiKE8Ti13aeOjttJlTVi4+sctWatYfaOhmMWfcYtQqV9mw83kNl0w1cWdb1HJ5y0zk359uu95SiTaNBTuKMFY03/ykMDc0AHc1wOyuKI+256HT8rRxB7X27qfy4Q1r7pH/Ol3bFJuX/cfVfc/X/Fl+J/h90sjdIvqPmaZ3k20NJfrWG36MdEtWfLA3YPTXBu6w04F5SGgDXlQb6Un4hacBty4jT3Vcebm661vBJwO6fBJ5/7iTgGO+cBOA+quqt2ufr2qqdgSXX7myprX4sDbY7M613TdVHBf19crXbV7jNi0rW7r6Qy6N430oZwESE2pzyp1A83VxJRJ5JNx3otBfPQP1e4jpqRFpwqIi03zUij+603yciveuMSO/kiHz+PyI7EenYao7UReRw21ndYUlJkph+ixb4z5JUFbfFFph73gDAfF1XVsRKX0hQFvLuAKJUAJzNi7ysPE22wfSUV2j9jFPoTOl2ON/UlZrSofTL0sqRLcGQVZfvruc6iHI6X1+g6q7mWLOmg4P5q3cwgwy0Z7+srXe1pb62hGGqp4i/YZptz9YXy8xnMcmagdhDetx90vOyV1x0SqKRkJt9knDwwOeEo89KFN7qLv2jeXeOXH3V0HxL1e7QxoNF81jxCPNOOOw6T2I/FumHcckuFoRyvd66x8VLr6GSuFs1noFE4IC7zhrC1dBoaWh0hqJR9FcWER2iQo5Y3n/+u8svaF69wTiIi+l2cYEaXDz4n+JiHMeliFAuHv0lTTYTivwXIenHhKR2vx6ycsgjKWFlYFmz2/HbtzyH3aP/7YzOIdztztD0cQ2V3mALD6AulH2S5ssSPpQkm1K3cFaIRCpuZYkUFvGx3QZxKsyimajAiC0p3gLiM9HqqjWteckDDOYRZue+jsYfdjrY+pQxmD+oZ7ZApK6Yq5YgPqckWPrsysk9Q8KC4AITluasZQhhPmnfcBYRH5/rQ4XRUWBL86VCq8CD3UEE6vYBvZI4KNoLxhTleZyF5X5f7AI+icFtr82WC3hD1W0/wpzsK1XinU81ldgaZKmp2f47Oif4gSMdXqyvDZdHqvXda+vhXw==&amp;lt;/diagram&amp;gt;&amp;lt;/mxfile&amp;gt;" onclick="(function(svg){var src=window.event.target||window.event.srcElement;while (src!=null&amp;amp;&amp;amp;src.nodeName.toLowerCase()!='a'){src=src.parentNode;}if(src==null){if(svg.wnd!=null&amp;amp;&amp;amp;!svg.wnd.closed){svg.wnd.focus();}else{var r=function(evt){if(evt.data=='ready'&amp;amp;&amp;amp;evt.source==svg.wnd){svg.wnd.postMessage(decodeURIComponent(svg.getAttribute('content')),'*');window.removeEventListener('message',r);}};window.addEventListener('message',r);svg.wnd=window.open('https://app.diagrams.net/?client=1&amp;amp;lightbox=1&amp;amp;edit=_blank');}}})(this);" style="cursor:pointer;max-width:100%;max-height:675px;">&lt;defs>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_Preview {color: #888}
#MathJax_Message {position: fixed; left: 1em; bottom: 1.5em; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap}
#MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px}
.MathJax_Error {color: #CC0000; font-style: italic}
&lt;/style>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute}
.MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: &amp;lsquo;Courier New&amp;rsquo;,Courier; font-size: 9px; color: #F0F0F0}
.MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px}
.MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_Hover_Arrow:hover span {background-color: #CCC!important}
&lt;/style>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_SVG_Display {text-align: center; margin: 1em 0em; position: relative; display: block!important; text-indent: 0; max-width: none; max-height: none; min-width: 0; min-height: 0; width: 100%}
.MathJax_SVG .MJX-monospace {font-family: monospace}
.MathJax_SVG .MJX-sans-serif {font-family: sans-serif}
#MathJax_SVG_Tooltip {background-color: InfoBackground; color: InfoText; border: 1px solid black; box-shadow: 2px 2px 5px #AAAAAA; -webkit-box-shadow: 2px 2px 5px #AAAAAA; -moz-box-shadow: 2px 2px 5px #AAAAAA; -khtml-box-shadow: 2px 2px 5px #AAAAAA; padding: 3px 4px; z-index: 401; position: absolute; left: 0; top: 0; width: auto; height: auto; display: none}
.MathJax_SVG {display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; padding: 0; margin: 0}
.MathJax_SVG * {transition: none; -webkit-transition: none; -moz-transition: none; -ms-transition: none; -o-transition: none}
.MathJax_SVG &amp;gt; div {display: inline-block}
.mjx-svg-href {fill: blue; stroke: blue}
.MathJax_SVG_Processing {visibility: hidden; position: absolute; top: 0; left: 0; width: 0; height: 0; overflow: hidden; display: block!important}
.MathJax_SVG_Processed {display: none!important}
.MathJax_SVG_test {font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-transform: none; letter-spacing: normal; word-spacing: normal; overflow: hidden; height: 1px}
.MathJax_SVG_test.mjx-test-display {display: table!important}
.MathJax_SVG_test.mjx-test-inline {display: inline!important; margin-right: -1px}
.MathJax_SVG_test.mjx-test-default {display: block!important; clear: both}
.MathJax_SVG_ex_box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .MathJax_SVG_left_box {display: inline-block; width: 0; float: left}
.mjx-test-inline .MathJax_SVG_right_box {display: inline-block; width: 0; float: right}
.mjx-test-display .MathJax_SVG_right_box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
.MathJax_SVG .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
&lt;/style>&lt;/defs>&lt;g>&lt;ellipse cx="138" cy="434" rx="120" ry="90" fill="#fff2cc" stroke="#d6b656" pointer-events="all"/>&lt;ellipse cx="708" cy="439" rx="310" ry="165" fill="#dae8fc" stroke="#6c8ebf" pointer-events="all"/>&lt;rect x="118" y="358" width="40" height="20" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 368px; margin-left: 119px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 26px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; font-weight: bold; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-1-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1.875ex" height="1.848ex" viewBox="0 -730.1 807.5 795.5" role="img" focusable="false" style="vertical-align: -0.152ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M324 614Q291 576 250 573Q231 573 231 584Q231 589 232 592Q235 601 244 614T271 643T324 671T400 683H403Q462 683 481 610Q485 594 490 545T498 454L501 413Q504 413 551 442T648 509T705 561Q707 565 707 578Q707 610 682 614Q667 614 667 626Q667 641 695 662T755 683Q765 683 775 680T796 662T807 623Q807 596 792 572T713 499T530 376L505 361V356Q508 346 511 278T524 148T557 75Q569 69 580 69Q585 69 593 77Q624 108 660 110Q667 110 670 110T676 106T678 94Q668 59 624 30T510 0Q487 0 471 9T445 32T430 71T422 117T417 173Q416 183 416 188Q413 214 411 244T407 286T405 299Q403 299 344 263T223 182T154 122Q152 118 152 105Q152 69 180 69Q183 69 187 66T191 60L192 58V56Q192 41 163 21T105 0Q94 0 84 3T63 21T52 60Q52 77 56 90T85 131T155 191Q197 223 259 263T362 327T402 352L391 489Q391 492 390 505T387 526T384 547T379 568T372 586T361 602T348 611Q346 612 341 613T333 614H324Z"/>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-1">\mathcal{X}&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="138" y="376" fill="#000000" font-family="Helvetica" font-size="26px" text-anchor="middle" font-weight="bold">\ma&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 158 422.47 L 494.79 396.63" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 500.77 396.17 L 493.1 400.77 L 494.79 396.63 L 492.49 392.79 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;path d="M 118 424 L 58 424 L 58 171.5 L 319.76 171.5" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 325.76 171.5 L 317.76 175.5 L 319.76 171.5 L 317.76 167.5 Z" fill="#4d9900" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="118" y="404" width="40" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 424px; margin-left: 119px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-2-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.331ex" height="1.636ex" viewBox="0 -496.4 1003.8 704.4" role="img" focusable="false" style="vertical-align: -0.483ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-2">\boldsymbol{x}_i&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="138" y="430" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\bol&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 158 476.19 L 494.81 513.1" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 500.78 513.76 L 492.39 516.86 L 494.81 513.1 L 493.26 508.91 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;path d="M 118 474 L 8 474 L 8 126.5 L 319.76 126.5" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 325.76 126.5 L 317.76 130.5 L 319.76 126.5 L 317.76 122.5 Z" fill="#4d9900" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="118" y="454" width="40" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 474px; margin-left: 119px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-3-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.441ex" height="2.019ex" viewBox="0 -496.4 1051.2 869.2" role="img" focusable="false" style="vertical-align: -0.866ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-3">\boldsymbol{x}_j&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="138" y="480" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\bol&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="678" y="284" width="40" height="20" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 38px; height: 1px; padding-top: 294px; margin-left: 679px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 26px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-4-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="1.529ex" height="1.921ex" viewBox="0 -730.1 658.5 827.1" role="img" focusable="false" style="vertical-align: -0.225ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M25 633Q25 647 47 665T100 683Q291 683 291 306Q291 264 288 213T282 132L279 102Q281 102 308 126T378 191T464 279T545 381T596 479Q600 490 600 502Q600 527 581 550T523 577Q505 577 505 601Q505 622 516 647T542 681Q546 683 558 683Q605 679 631 645T658 559Q658 423 487 215Q409 126 308 37T190 -52Q177 -52 177 -28Q177 -26 183 15T196 127T203 270Q203 356 192 421T165 523T126 583T83 613T41 620Q25 620 25 633Z"/>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-4">\mathcal{V}&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="698" y="302" fill="#000000" font-family="Helvetica" font-size="26px" text-anchor="middle">\ma&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 578 401 L 779.84 427.91" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 785.78 428.7 L 777.33 431.61 L 779.84 427.91 L 778.38 423.68 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="503" y="378" width="75" height="36" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 73px; height: 1px; padding-top: 396px; margin-left: 504px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-5-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="5.526ex" height="2.689ex" viewBox="0 -826 2379.3 1157.6" role="img" focusable="false" style="vertical-align: -0.77ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;g transform="translate(596,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(986,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(1989,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-5">\phi(\boldsymbol{x}_i )&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="541" y="402" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\phi(\bo&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 578 505.67 L 779.96 460.79" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 785.82 459.49 L 778.88 465.13 L 779.96 460.79 L 777.14 457.32 Z" fill="#ff0000" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="503" y="494" width="75" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 73px; height: 1px; padding-top: 514px; margin-left: 504px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-6-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="5.636ex" height="2.784ex" viewBox="0 -826 2426.7 1198.8" role="img" focusable="false" style="vertical-align: -0.866ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;g transform="translate(596,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(986,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(2037,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-6">\phi(\boldsymbol{x}_j )&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="541" y="520" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">\phi(\bo&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="788" y="414" width="190" height="60" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 188px; height: 1px; padding-top: 444px; margin-left: 789px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 26px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;font style="font-size: 26px">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-7-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="15.319ex" height="2.656ex" viewBox="0 -793.5 6595.8 1143.7" role="img" focusable="false" style="vertical-align: -0.813ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M333 -232Q332 -239 327 -244T313 -250Q303 -250 296 -240Q293 -233 202 6T110 250T201 494T296 740Q299 745 306 749L309 750Q312 750 313 750Q331 750 333 732Q333 727 243 489Q152 252 152 250T243 11Q333 -227 333 -232Z"/>&lt;g transform="translate(389,0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;/g>&lt;g transform="translate(986,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(1375,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(2379,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;g transform="translate(2768,0)">&lt;path stroke-width="1" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"/>&lt;/g>&lt;g transform="translate(3213,0)">&lt;path stroke-width="1" d="M409 688Q413 694 421 694H429H442Q448 688 448 686Q448 679 418 563Q411 535 404 504T392 458L388 442Q388 441 397 441T429 435T477 418Q521 397 550 357T579 260T548 151T471 65T374 11T279 -10H275L251 -105Q245 -128 238 -160Q230 -192 227 -198T215 -205H209Q189 -205 189 -198Q189 -193 211 -103L234 -11Q234 -10 226 -10Q221 -10 206 -8T161 6T107 36T62 89T43 171Q43 231 76 284T157 370T254 422T342 441Q347 441 348 445L378 567Q409 686 409 688ZM122 150Q122 116 134 91T167 53T203 35T237 27H244L337 404Q333 404 326 403T297 395T255 379T211 350T170 304Q152 276 137 237Q122 191 122 150ZM500 282Q500 320 484 347T444 385T405 400T381 404H378L332 217L284 29Q284 27 285 27Q293 27 317 33T357 47Q400 66 431 100T475 170T494 234T500 282Z"/>&lt;/g>&lt;g transform="translate(3810,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(4199,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(5251,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;g transform="translate(5640,0)">&lt;path stroke-width="1" d="M55 732Q56 739 61 744T75 750Q85 750 92 740Q95 733 186 494T278 250T187 6T92 -240Q85 -250 75 -250Q67 -250 62 -245T55 -232Q55 -227 145 11Q236 248 236 250T145 489Q55 727 55 732Z"/>&lt;g transform="translate(389,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M25 633Q25 647 47 665T100 683Q291 683 291 306Q291 264 288 213T282 132L279 102Q281 102 308 126T378 191T464 279T545 381T596 479Q600 490 600 502Q600 527 581 550T523 577Q505 577 505 601Q505 622 516 647T542 681Q546 683 558 683Q605 679 631 645T658 559Q658 423 487 215Q409 126 308 37T190 -52Q177 -52 177 -28Q177 -26 183 15T196 127T203 270Q203 356 192 421T165 523T126 583T83 613T41 620Q25 620 25 633Z"/>&lt;/g>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-7">\langle \phi(\boldsymbol{x}_i ), \phi(\boldsymbol{x}&lt;em>j ) \rangle&lt;/em>\mathcal{V} &lt;/script>&lt;/font>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="883" y="452" fill="#000000" font-family="Helvetica" font-size="26px" text-anchor="middle">\langle \phi(\b&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 568 149 L 883 149 L 883 405.76" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 883 411.76 L 879 403.76 L 883 405.76 L 887 403.76 Z" fill="#4d9900" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" pointer-events="all"/>&lt;rect x="328" y="104" width="240" height="90" fill="none" stroke="#000000" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 238px; height: 1px; padding-top: 149px; margin-left: 329px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 27px; font-family: Helvetica; color: #4D9900; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">Kernel function&lt;br style="font-size: 27px" />&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-8-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="8.827ex" height="2.63ex" viewBox="0 -795 3800.7 1132.5" role="img" focusable="false" style="vertical-align: -0.784ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M121 647Q121 657 125 670T137 683Q138 683 209 688T282 694Q294 694 294 686Q294 679 244 477Q194 279 194 272Q213 282 223 291Q247 309 292 354T362 415Q402 442 438 442Q468 442 485 423T503 369Q503 344 496 327T477 302T456 291T438 288Q418 288 406 299T394 328Q394 353 410 369T442 390L458 393Q446 405 434 405H430Q398 402 367 380T294 316T228 255Q230 254 243 252T267 246T293 238T320 224T342 206T359 180T365 147Q365 130 360 106T354 66Q354 26 381 26Q429 26 459 145Q461 153 479 153H483Q499 153 499 144Q499 139 496 130Q455 -11 378 -11Q333 -11 305 15T277 90Q277 108 280 121T283 145Q283 167 269 183T234 206T200 217T182 220H180Q168 178 159 139T145 81T136 44T129 20T122 7T111 -2Q98 -11 83 -11Q66 -11 57 -1T48 16Q48 26 85 176T158 471L195 616Q196 629 188 632T149 637H144Q134 637 131 637T124 640T121 647Z"/>&lt;g transform="translate(521,0)">&lt;path stroke-width="1" d="M94 250Q94 319 104 381T127 488T164 576T202 643T244 695T277 729T302 750H315H319Q333 750 333 741Q333 738 316 720T275 667T226 581T184 443T167 250T184 58T225 -81T274 -167T316 -220T333 -241Q333 -250 318 -250H315H302L274 -226Q180 -141 137 -14T94 250Z"/>&lt;/g>&lt;g transform="translate(911,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M184 600Q184 624 203 642T247 661Q265 661 277 649T290 619Q290 596 270 577T226 557Q211 557 198 567T184 600ZM21 287Q21 295 30 318T54 369T98 420T158 442Q197 442 223 419T250 357Q250 340 236 301T196 196T154 83Q149 61 149 51Q149 26 166 26Q175 26 185 29T208 43T235 78T260 137Q263 149 265 151T282 153Q302 153 302 143Q302 135 293 112T268 61T223 11T161 -11Q129 -11 102 10T74 74Q74 91 79 106T122 220Q160 321 166 341T173 380Q173 404 156 404H154Q124 404 99 371T61 287Q60 286 59 284T58 281T56 279T53 278T49 278T41 278H27Q21 284 21 287Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(1914,0)">&lt;path stroke-width="1" d="M78 35T78 60T94 103T137 121Q165 121 187 96T210 8Q210 -27 201 -60T180 -117T154 -158T130 -185T117 -194Q113 -194 104 -185T95 -172Q95 -168 106 -156T131 -126T157 -76T173 -3V9L172 8Q170 7 167 6T161 3T152 1T140 0Q113 0 96 17Z"/>&lt;/g>&lt;g transform="translate(2359,0)">&lt;path stroke-width="1" d="M74 282H63Q43 282 43 296Q43 298 45 307T56 332T76 365T110 401T159 433Q200 451 233 451H236Q273 451 282 450Q358 437 382 400L392 410Q434 452 483 452Q538 452 568 421T599 346Q599 303 573 280T517 256Q494 256 478 270T462 308Q462 343 488 367Q501 377 520 385Q520 386 516 389T502 396T480 400T462 398Q429 383 415 341Q354 116 354 80T405 44Q449 44 485 74T535 142Q539 156 542 159T562 162H568H579Q599 162 599 148Q599 135 586 111T550 60T485 12T397 -8Q313 -8 266 35L258 44Q215 -7 161 -7H156Q99 -7 71 25T43 95Q43 143 70 165T125 188Q148 188 164 174T180 136Q180 101 154 77Q141 67 122 59Q124 54 136 49T161 43Q183 43 200 61T226 103Q287 328 287 364T236 400Q200 400 164 377T107 302Q103 288 100 285T80 282H74Z"/>&lt;g transform="translate(659,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M297 596Q297 627 318 644T361 661Q378 661 389 651T403 623Q403 595 384 576T340 557Q322 557 310 567T297 596ZM288 376Q288 405 262 405Q240 405 220 393T185 362T161 325T144 293L137 279Q135 278 121 278H107Q101 284 101 286T105 299Q126 348 164 391T252 441Q253 441 260 441T272 442Q296 441 316 432Q341 418 354 401T367 348V332L318 133Q267 -67 264 -75Q246 -125 194 -164T75 -204Q25 -204 7 -183T-12 -137Q-12 -110 7 -91T53 -71Q70 -71 82 -81T95 -112Q95 -148 63 -167Q69 -168 77 -168Q111 -168 139 -140T182 -74L193 -32Q204 11 219 72T251 197T278 308T289 365Q289 372 288 376Z"/>&lt;/g>&lt;/g>&lt;g transform="translate(3411,0)">&lt;path stroke-width="1" d="M60 749L64 750Q69 750 74 750H86L114 726Q208 641 251 514T294 250Q294 182 284 119T261 12T224 -76T186 -143T145 -194T113 -227T90 -246Q87 -249 86 -250H74Q66 -250 63 -250T58 -247T55 -238Q56 -237 66 -225Q221 -64 221 250T66 725Q56 737 55 738Q55 746 60 749Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-8">k(\boldsymbol{x}_i, \boldsymbol{x}_j) &lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="448" y="157" fill="#4D9900" font-family="Helvetica" font-size="27px" text-anchor="middle">Kernel function&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="190.5" y="524" width="230" height="50" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 228px; height: 1px; padding-top: 549px; margin-left: 192px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #FF0000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">1. explicit transformation&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="306" y="555" fill="#FF0000" font-family="Helvetica" font-size="20px" text-anchor="middle">1. explicit transformat&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 550.5 204 L 545.5 204 Q 540.5 204 540.5 214 L 540.5 624 Q 540.5 634 535.5 634 L 533 634 Q 530.5 634 535.5 634 L 538 634 Q 540.5 634 540.5 644 L 540.5 1054 Q 540.5 1064 545.5 1064 L 550.5 1064" fill="none" stroke="#ff0000" stroke-width="2" stroke-miterlimit="10" transform="rotate(-90,540.5,634)" pointer-events="all"/>&lt;rect x="270.5" y="644" width="570" height="30" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 568px; height: 1px; padding-top: 659px; margin-left: 272px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #FF0000; line-height: 1.2; pointer-events: all; font-style: italic; white-space: normal; word-wrap: normal; ">computationally expensive for high-dimensional feature vector&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="556" y="665" fill="#FF0000" font-family="Helvetica" font-size="20px" text-anchor="middle" font-style="italic">computationally expensive for high-dimensional feature ve&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="560.5" y="524" width="230" height="50" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 228px; height: 1px; padding-top: 549px; margin-left: 562px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #FF0000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">2. inner product&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="676" y="555" fill="#FF0000" font-family="Helvetica" font-size="20px" text-anchor="middle">2. inner product&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 458 -386 L 453 -386 Q 448 -386 448 -376 L 448 44 Q 448 54 443 54 L 440.5 54 Q 438 54 443 54 L 445.5 54 Q 448 54 448 64 L 448 484 Q 448 494 453 494 L 458 494" fill="none" stroke="#4d9900" stroke-width="2" stroke-miterlimit="10" transform="rotate(90,448,54)" pointer-events="all"/>&lt;rect x="190.5" y="14" width="520" height="20" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 518px; height: 1px; padding-top: 24px; margin-left: 192px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #4D9900; line-height: 1.2; pointer-events: all; font-weight: bold; font-style: italic; white-space: normal; word-wrap: normal; ">avoids explicit mapping &lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-9-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.324ex" height="1.54ex" viewBox="0 -578.8 1000.5 663.2" role="img" focusable="false" style="vertical-align: -0.196ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M580 514Q580 525 596 525Q601 525 604 525T609 525T613 524T615 523T617 520T619 517T622 512Q659 438 720 381T831 300T927 263Q944 258 944 250T935 239T898 228T840 204Q696 134 622 -12Q618 -21 615 -22T600 -24Q580 -24 580 -17Q580 -13 585 0Q620 69 671 123L681 133H70Q56 140 56 153Q56 168 72 173H725L735 181Q774 211 852 250Q851 251 834 259T789 283T735 319L725 327H72Q56 332 56 347Q56 360 70 367H681L671 377Q638 412 609 458T580 514Z"/>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-9">\Rightarrow&lt;/script> computationally cheaper&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="451" y="30" fill="#4D9900" font-family="Helvetica" font-size="20px" text-anchor="middle" font-weight="bold" font-style="italic">avoids explicit mapping \Rightarrow computationally&amp;hellip;&lt;/text>&lt;/switch>&lt;/g>&lt;/g>&lt;switch>&lt;g requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"/>&lt;a transform="translate(0,-5)" xlink:href="https://desk.draw.io/support/solutions/articles/16000042487" target="_blank">&lt;text text-anchor="middle" font-size="10px" x="50%" y="100%">Viewer does not support full SVG 1.1&lt;/text>&lt;/a>&lt;/switch>&lt;/svg>&lt;/p></description></item><item><title>SVM: Kernelized SVM</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernelized-svm/</link><pubDate>Mon, 13 Jul 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/classification/svm/kernelized-svm/</guid><description>&lt;h2 id="svm-with-features">SVM (with features)&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Maximum margin principle&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Slack variables allow for margin violation
&lt;/p>
$$
\begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\|\mathbf{w}\|^{2} + C \sum_i^N \xi_i \\\\ \text { s.t. } \quad &amp; y_{i}\left(\mathbf{w}^{T} \color{red}{\phi(\mathbf{x}_{i})} + b\right) \geq 1 -\xi_i, \quad \xi_i \geq 0\end{array}
$$
&lt;/li>
&lt;/ul>
&lt;h2 id="math-basics">Math basics&lt;/h2>
&lt;p>Solve the constrained optimization problem: &lt;strong>Method of Lagrangian Multipliers&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Primal optimization problem&lt;/strong>:&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
\underset{\boldsymbol{x}}{\min} \quad &amp; f(\boldsymbol{x}) \\\\
\text { s.t. } \quad &amp; h_{i}(\boldsymbol{x}) \geq b_{i}, \text { for } i=1 \ldots K
\end{array}
$$
&lt;ul>
&lt;li>&lt;strong>Lagrangian optimization&lt;/strong>:&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
\underset{\boldsymbol{x}}{\min} \underset{\boldsymbol{\lambda}}{\max} \quad &amp; L(\boldsymbol{x}, \boldsymbol{\lambda}) = f(\boldsymbol{x}) - \sum_{i=1}^K \lambda_i(h_i(\boldsymbol{x}) - b_i) \\\\
\text{ s.t. } &amp;\lambda_i\geq 0, \quad i = 1\dots K
\end{array}
$$
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Dual optimization problem&lt;/strong>
&lt;/p>
$$
\begin{aligned}
\boldsymbol{\lambda}^{\*}=\underset{\boldsymbol{\lambda}}{\arg \max } g(\boldsymbol{\lambda}), \quad &amp; g(\boldsymbol{\lambda})=\min \_{\boldsymbol{x}} L(\boldsymbol{x}, \boldsymbol{\lambda}) \\\\
\text { s.t. } \quad \lambda_{i} \geq 0, &amp; \text { for } i=1 \ldots K
\end{aligned}
$$
&lt;ul>
&lt;li>$g$ : &lt;strong>dual function&lt;/strong> of the optimization problem&lt;/li>
&lt;li>Essentially swapped min and max in the definition of $L$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Slaters condition:&lt;/strong> For a &lt;strong>convex&lt;/strong> objective and &lt;strong>convex&lt;/strong> constraints, &lt;strong>solving the dual is equivalent to solving the primal&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>I.e., optimal primal parameters can be obtained from optimal dual parameters
$$
\boldsymbol{x}^* = \underset{\boldsymbol{x}}{\operatorname{argmin}}L(\boldsymbol{x}, \boldsymbol{\lambda}^*)
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="dual-derivation-of-the-svm">Dual derivation of the SVM&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>SVM optimization:
&lt;/p>
$$
\begin{array}{ll}
&amp;\underset{\boldsymbol{w}}{\operatorname{argmin}} \quad &amp;\|\boldsymbol{w}\|^2 \\\\
&amp;\text{ s.t. } \quad &amp;y_i(\boldsymbol{w}^T\phi(\mathbf{x}_i) + b) \geq 1
\end{array}
$$
&lt;/li>
&lt;li>
&lt;p>Lagrangian function:
&lt;/p>
$$
L(\boldsymbol{w}, \boldsymbol{\lambda})=\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}-\sum_{i} \alpha_{i}\left(y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1\right)
$$
&lt;/li>
&lt;li>
&lt;p>Compute optimal $\boldsymbol{w}$
&lt;/p>
$$
\begin{align}
&amp;\frac{\partial L}{\partial \boldsymbol{w}} = \boldsymbol{w} - \sum_i \alpha_i y_i \phi(\boldsymbol{x}_i) \overset{!}{=} 0 \\\\
\Leftrightarrow \quad &amp; \color{CornflowerBlue}{\boldsymbol{w}^* = \sum_i \alpha_i y_i \phi(\boldsymbol{x}_i)}
\end{align}
$$
&lt;ul>
&lt;li>
&lt;p>Many of the $\alpha_i$ will be zero (constraint satisfied)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>If $\alpha_i \neq 0 \overset{\text{complementary slackness}}{\Rightarrow} y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1 =0$&lt;/p>
&lt;p>$\Rightarrow \phi(\boldsymbol{x}_i)$ is a support vector&lt;/p>
&lt;/li>
&lt;li>
&lt;p>The optimal weight vector $\boldsymbol{w}$ is a &lt;strong>linear combination of the support vectors&lt;/strong>! 👏&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Optimality condition for $b$:
&lt;/p>
$$
\frac{\partial L}{\partial b} = - \sum_i \alpha_i y_i \overset{!}{=} 0 \quad \Rightarrow \sum_i \alpha_i y_i = 0
$$
&lt;ul>
&lt;li>We do not obtain a solution for $b$&lt;/li>
&lt;li>But an additional condition for $\alpha$&lt;/li>
&lt;/ul>
&lt;p>$b$ can be computed from $w$:&lt;/p>
&lt;p>If $\alpha\_i > 0$, then $\boldsymbol{x}\_i$ is on the margin due to complementary slackness condition. I.e.:
&lt;/p>
$$
\begin{align}y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right)-1 &amp;= 0 \\\\y_{i}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right) &amp;= 1 \\\\ \underbrace{y_{i} y_{i}}_{=1}\left(\boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)+b\right) &amp;= y_{i} \\\\ \Rightarrow b = y_{i} - \boldsymbol{w}^{T} \phi\left(\boldsymbol{x}_{i}\right)\end{align}
$$
&lt;/li>
&lt;/ol>
&lt;h2 id="apply-kernel-tricks-for-svm">Apply kernel tricks for SVM&lt;/h2>
&lt;ul>
&lt;li>Lagrangian:&lt;/li>
&lt;/ul>
$$
L(\boldsymbol{w}, \boldsymbol{\lambda}) = {\color{red}{\frac{1}{2} \boldsymbol{w}^{T} \boldsymbol{w}}} - \sum_{i} \alpha\_{i}\left({\color{green}{y\_{i} (w^{T} \phi\left(x_{i}\right)}}+ b)-\color{CornflowerBlue}{1}\right), \quad \boldsymbol{w}^{\*}=\sum\_{i} \alpha_{i} y\_{i} \phi\left(\boldsymbol{x}\_{i}\right)
$$
&lt;ul>
&lt;li>Dual function (&lt;strong>Wolfe Dual Lagrangian function&lt;/strong>):&lt;/li>
&lt;/ul>
$$
\begin{aligned}
g(\boldsymbol{\alpha}) &amp;=L\left(\boldsymbol{w}^{*}, \boldsymbol{\alpha}\right) \\\\
&amp;=\color{red}{\frac{1}{2} \underbrace{\sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \phi\left(\boldsymbol{x}_{i}\right)^{T} \phi\left(\boldsymbol{x}_{j}\right)}_{{\boldsymbol{w}^*}^T \boldsymbol{w}^*}} - \color{green}{\sum_{i} \alpha_{i} y_{i}(\underbrace{\sum_{j} \alpha_{j} y_{j} \phi\left(x_{j}\right)}_{\boldsymbol{w}^*})^{T} \phi\left(x_{i}\right)} + \color{CornflowerBlue}{\sum_{i} \alpha_{i}} \\\\
&amp;=\sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \underbrace{\phi\left(\boldsymbol{x}_{i}\right)^{T} \phi\left(\boldsymbol{x}_{j}\right)}_{\overset{}{=} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j)} \\\\
&amp;= \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j )
\end{aligned}
$$
&lt;ul>
&lt;li>&lt;strong>Wolfe dual optimization problem&lt;/strong>:&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
\underset{\boldsymbol{\alpha}}{\min} \quad &amp; \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j ) \\\\
\text{ s.t } \quad &amp; \alpha_i \geq 0 \quad \forall i = 1, \dots, N \\\\
&amp; \sum_i \alpha_i y_i = 0
\end{array}
$$
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Compute primal from dual parameters&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Weight vector&lt;/strong>
&lt;/p>
$$
\boldsymbol{w}^{*}=\sum_{i} \alpha_{i} y_{i} \phi\left(\boldsymbol{x}_{i}\right)
\label{eq:weight vector}
$$
&lt;ul>
&lt;li>Can not be represented (as it is potentially infinite dimensional). But don&amp;rsquo;t worry, we don&amp;rsquo;t need the explicit representation&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Bias&lt;/strong>: For any $i$ with $\alpha_i > 0$ :&lt;/p>
&lt;/li>
&lt;/ul>
$$
\begin{array}{ll}
b &amp;=y_{k}-\mathbf{w}^{T} \phi\left(\boldsymbol{x}_{k}\right) \\\\
&amp;=y_{k}-\sum_{i} y_{i} \alpha_{i} k\left(\boldsymbol{x}_{i}, \boldsymbol{x}_{k}\right)
\end{array}
$$
&lt;ul>
&lt;li>&lt;strong>Decision function&lt;/strong> (Again, we use the kernel trick and therefore we don&amp;rsquo;t need the explicit representation of the weight vector $\boldsymbol{w}^*$)&lt;/li>
&lt;/ul>
$$
\begin{aligned}f(\boldsymbol{x}) &amp;= (\boldsymbol{w}^{*})^{T} \boldsymbol{\phi}(\boldsymbol{x}) + b \\\\
&amp;\overset{}{=} \left(\sum_{i} \alpha_{i} y_{i} \phi\left(\boldsymbol{x}_{i}\right)\right)^{T} \boldsymbol{\phi}(\boldsymbol{x}) + b \\\\
&amp;= \sum_{i} \alpha_{i} y_{i} \boldsymbol{\phi}(\boldsymbol{x}_i)^{T} \boldsymbol{\phi}(\boldsymbol{x}) + b \\\\
&amp; \overset{}{=}\sum_i y_{i} \alpha_{i} k\left(\boldsymbol{x}_{i}, \boldsymbol{x}\right)+b\end{aligned}
$$
&lt;/li>
&lt;/ul>
&lt;h2 id="relaxed-constraints-with-slack-variable">Relaxed constraints with slack variable&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Primal optimization problem&lt;/strong>
&lt;/p>
$$
\begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &amp;\|\mathbf{w}\|^{2} + \color{CornflowerBlue}{C \sum_i^N \xi_i} \\\\
\text { s.t. } \quad &amp; y_{i}\left(\mathbf{w}^{T} \mathbf{x}_{i} + b\right) \geq 1 - \color{CornflowerBlue}{\xi_i}, \quad \color{CornflowerBlue}{\xi_i} \geq 0\end{array}
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Dual optimization problem&lt;/strong>
&lt;/p>
$$
\begin{array}{ll}\underset{\boldsymbol{\alpha}}{\min} \quad &amp; \sum_{i} \alpha_{i}-\frac{1}{2} \sum_{i} \sum_{j} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{k}(\boldsymbol{x}_i, \boldsymbol{x}_j ) \\\\ \text{ s.t } \quad &amp; \color{CornflowerBlue}{C \geq} \alpha_i \geq 0 \quad \forall i = 1, \dots, N \\\\ &amp; \sum_i \alpha_i y_i = 0\end{array}
$$
&lt;p>&lt;span style="color:CornflowerBlue">Add upper bound of &lt;/span> $\color{CornflowerBlue}{C}$ &lt;span style="color:CornflowerBlue">on&lt;/span> $\color{CornflowerBlue}{\alpha_i}$&lt;/p>
&lt;ul>
&lt;li>Without slack, $\alpha_i \to \infty$ when constraints are violated (points misclassified)&lt;/li>
&lt;li>Upper bound of $C$ limits the $\alpha_i$, so misclassifications are allowed&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul></description></item><item><title>Classification And Regression Tree (CART)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/decision-tree/cart/</link><pubDate>Tue, 27 Oct 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/decision-tree/cart/</guid><description>&lt;h2 id="tree-based-methods">Tree-based Methods&lt;/h2>
&lt;p>&lt;strong>CART&lt;/strong>: &lt;strong>C&lt;/strong>lassification &lt;strong>A&lt;/strong>nd &lt;strong>R&lt;/strong>egression &lt;strong>T&lt;/strong>ree&lt;/p>
&lt;h3 id="grow-a-binary-tree">Grow a binary tree&lt;/h3>
&lt;ul>
&lt;li>At each node, “split” the data into two “daughter” nodes.&lt;/li>
&lt;li>Splits are chosen using a splitting criterion.&lt;/li>
&lt;li>Bottom nodes are “terminal” nodes.&lt;/li>
&lt;/ul>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>&lt;/th>
&lt;th>Type of tree&lt;/th>
&lt;th>Predicted value at a node&lt;/th>
&lt;th>Split criterion&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Regression&lt;/strong>&lt;/td>
&lt;td>Regression tree&lt;/td>
&lt;td>The predicted value at a node is the &lt;strong>average response&lt;/strong> variable for all observations in the node&lt;/td>
&lt;td>&lt;strong>Minimum residual sum of squares&lt;/strong> &lt;br />$$\mathrm{RSS}=\sum_{\text {left }}\left(y_{i}-\bar{y}_{L}\right)^{2}+\sum_{\text {right }}\left(y_{i}-\bar{y}_{R}\right)^{2}$$&lt;li />$\bar{y}_L$ / $\bar{y}_R$: average label values in the left / right subtree &lt;br />(Split such that variance in subtress is minimized)&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Classification&lt;/strong>&lt;/td>
&lt;td>Decision tree&lt;/td>
&lt;td>The predicted class is the &lt;strong>most common class&lt;/strong> in the node (majority vote).&lt;/td>
&lt;td>&lt;strong>Minimum entropy&lt;/strong> in subtrees&lt;br />$$\text { score }=N_{L} H\left(p_{\mathrm{L}}\right)+N_{R} H\left(p_{\mathrm{R}}\right)$$&lt;li />$H\left(p_{L}\right)=-\sum_{k} p_{L}(k) \log p_{L}(k)$: entropy in the left sub-tree &lt;li /> $p_L(k)$: proportion of class $k$ in left tree&lt;br />(Split such that class-labels in sub-trees are &amp;ldquo;pure&amp;rdquo;)&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="when-stop">When stop?&lt;/h3>
&lt;p>&lt;strong>Stop if:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Minimum number of samples per node&lt;/li>
&lt;li>Maximum depth&lt;/li>
&lt;/ul>
&lt;p>&amp;hellip; has been reached&lt;/p>
&lt;p>(Both criterias again influence the &lt;strong>complexity&lt;/strong> of the tree)&lt;/p>
&lt;h3 id="controlling-the-tree-complexity">Controlling the tree complexity&lt;/h3>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Number of samples per leaf&lt;/th>
&lt;th>Affect&lt;/th>
&lt;th>&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>&lt;strong>Small&lt;/strong>&lt;/td>
&lt;td>Tree is &lt;strong>very sensitive&lt;/strong> to noise&lt;/td>
&lt;td>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/屏幕快照%202020-03-01%2023.26.23.png" alt="屏幕快照 2020-03-01 23.26.23" style="zoom:33%;" />&lt;br />&lt;img src="https://github.com/EckoTan0804/upic-repo/blob/master/uPic/%E5%B1%8F%E5%B9%95%E5%BF%AB%E7%85%A7%202020-03-01%2023.25.40.png?raw=true" alt="屏幕快照 2020-03-01 23.25.40.png" style="zoom:33%;" />&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>&lt;strong>Large&lt;/strong>&lt;/td>
&lt;td>Tree is &lt;strong>not expressive enough&lt;/strong>&lt;/td>
&lt;td>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/屏幕快照%202020-03-01%2023.25.50.png" alt="屏幕快照 2020-03-01 23.25.50" style="zoom:33%;" />&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h3 id="advantages-">Advantages 👍&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Applicable to both regression and classification problems.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Handle categorical predictors naturally.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Computationally simple and quick to fit, even for large problems.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>No formal distributional assumptions (non-parametric).&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Can handle highly non-linear interactions and classification boundaries.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Automatic variable selection.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Very easy to interpret if the tree is small.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="disadvantages-">Disadvantages 👎&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;em>&lt;strong>Accuracy&lt;/strong>&lt;/em>&lt;/p>
&lt;p>current methods, such as support vector machines and ensemble classifiers often have 30% lower error rates than CART.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;em>&lt;strong>Instability&lt;/strong>&lt;/em>&lt;/p>
&lt;p>if we change the data a little, the tree picture can change a lot. So the interpretation is not as straightforward as it appears.&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Linear Discriminant Functions</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/linear-discriminant-functions/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/linear-discriminant-functions/</guid><description>&lt;ul>
&lt;li>No assumption about distributions -&amp;gt; &lt;strong>non-parametric&lt;/strong>&lt;/li>
&lt;li>Linear decision surfaces&lt;/li>
&lt;li>Begin by supervised training (given class of training data)&lt;/li>
&lt;/ul>
&lt;h2 id="linear-discriminant-functions-and-decision-surfaces">Linear Discriminant Functions and Decision Surfaces&lt;/h2>
&lt;p>A discriminant function that is a linear combination of the components of $x$ can be written as
&lt;/p>
$$
g(\mathbf{x})=\mathbf{w}^{T} \mathbf{x}+w\_{0}
$$
&lt;ul>
&lt;li>$\mathbf{x}$: feature vector&lt;/li>
&lt;li>$\mathbf{w}$: weight vector&lt;/li>
&lt;li>$w\_0$: bias or threshold weight&lt;/li>
&lt;/ul>
&lt;h3 id="the-two-category-case">The two category case&lt;/h3>
&lt;p>Decision rule:&lt;/p>
&lt;ul>
&lt;li>Decide $w\_1$ if $g(\mathbf{x}) > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} > 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}> -w\_{0}$&lt;/li>
&lt;li>Decide $w\_{2}$ if $g(\mathbf{x}) &lt; 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}+w\_{0} &lt; 0 \Leftrightarrow \mathbf{w}^{T} \mathbf{x}&lt;-w\_{0}$&lt;/li>
&lt;li>$g(\mathbf{x}) = 0$: assign to either class or can be left undefined&lt;/li>
&lt;/ul>
&lt;p>The equation $g(\mathbf{x}) = 0$ defines the decision surface that separates points assigned to $w\_{1}$ from points assigned to $w\_{2}$. When $g(\mathbf{x})$ is linear, this decision surface is a &lt;strong>hyperplane&lt;/strong>.&lt;/p>
&lt;p>For arbitrary $\mathbf{x}\_1$ and $\mathbf{x}\_2$ on the decision surface, we have:
&lt;/p>
$$
\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{1}+w\_{0}=\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{2}+w\_{0}
$$
$$
\mathbf{w}^{\mathrm{T}}\left(\mathbf{x}\_{1}-\mathbf{x}\_{2}\right)=0
$$
&lt;p>$\Rightarrow \mathbf{w}$ is &lt;strong>normal&lt;/strong> to any vector lying in the hyperplane.&lt;/p>
&lt;p>In general, the hyperplane $H$ divides the feature space into two half-spaces:&lt;/p>
&lt;ul>
&lt;li>decision region $R\_1$ for $w\_1$&lt;/li>
&lt;li>decision region $R\_2$ for $w\_2$&lt;/li>
&lt;/ul>
&lt;p>Because $g(\mathbf{x}) > 0$ if $\mathbf{x}$ in $R\_1$, it follows that the normal vector $\mathbf{w}$ points into $R\_1$. Therefore, It is sometimes said that any $\mathbf{x}$ in $R\_1$ is on the &lt;em>positive&lt;/em> side of $H$, and any $\mathbf{x}$ in $R\_2$ is on the &lt;em>negative&lt;/em> side of $H$&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/image015.jpg" alt="img">&lt;/p>
&lt;p>The discriminant function $g(\mathbf{x})$ gives an algebraic measure of the distance from $\mathbf{x}$ to the hyperplane. We can write $\mathbf{x}$ as
&lt;/p>
$$
\mathbf{x}=\mathbf{x}\_{p}+r \frac{\mathbf{w}}{\|\mathbf{w}\|}
$$
&lt;ul>
&lt;li>$\mathbf{x}\_{p}$: normal projection of $\mathbf{x}$ onto $H$&lt;/li>
&lt;li>$r$: desired algebraic distance which is positive if $\mathbf{x}$ is on the positive side, else negative&lt;/li>
&lt;/ul>
&lt;p>As $\mathbf{x}\_p$ is on the hyperplane&lt;/p>
$$
\begin{array}{ll}
g\left(\mathbf{x}\_{p}\right)=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}\_{p}+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}}\left(\mathbf{x}-r \frac{\mathbf{w}}{\|\mathbf{w}\|}\right)+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}-r \frac{\mathbf{w}^{\mathrm{T}} \mathbf{w}}{\|\mathbf{w}\|}+w\_{0}=0 \\\\
\mathbf{w}^{\mathrm{T}} \mathbf{x}-r\|\mathbf{w}\| + w\_0 = 0 \\\\
\underbrace{\mathbf{w}^{\mathrm{T}} \mathbf{x} + w\_0}\_{=g(\mathbf{x})} = r\|\mathbf{w}\| \\\\
\Rightarrow g(\mathbf{x}) = r\|\mathbf{w}\| \\\\
\Rightarrow r = \frac{g(\mathbf{x})}{\|\mathbf{w}\|}
\end{array}
$$
&lt;p>In particular, the distance from the origin to hyperplane $H$ is given by $\frac{w_0}{\|\mathbf{w}\|}$&lt;/p>
&lt;ul>
&lt;li>$w\_0 > 0$: the origin is on the &lt;em>positive&lt;/em> side of $H$&lt;/li>
&lt;li>$w\_0 &lt; 0$: the origin is on the &lt;em>negative&lt;/em> side of $H$&lt;/li>
&lt;li>$w\_0 = 0$: $g(\mathbf{x})$ has the homogeneous form $\mathbf{w}^{\mathrm{T}} \mathbf{x}$ and the hyperplane passes through the origin&lt;/li>
&lt;/ul>
&lt;p>A linear discriminant function divides the feature space by a hyperplane decision surface:&lt;/p>
&lt;ul>
&lt;li>orientation: determined by the normal vector $\mathbf{w}$&lt;/li>
&lt;li>location: determined by the bias $w\_0$&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.byclb.com/TR/Tutorials/neural_networks/ch9_1.htm">https://www.byclb.com/TR/Tutorials/neural_networks/ch9_1.htm&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Linear Discriminant Analysis (LDA)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/</guid><description>&lt;p>&lt;strong>Linear Discriminant Analysis (LDA)&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>also called &lt;strong>Fisher’s Linear Discriminant&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>reduces dimension (like PCA)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>but focuses on &lt;strong>maximizing seperability among known categories&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="-idea">💡 Idea&lt;/h2>
&lt;ol>
&lt;li>Create a new axis&lt;/li>
&lt;li>Project the data onto this new axis in a way to maximize the separation of two categories&lt;/li>
&lt;/ol>
&lt;h2 id="how-it-works">How it works?&lt;/h2>
&lt;h3 id="create-a-new-axis">Create a new axis&lt;/h3>
&lt;p>According to two criteria (considered simultaneously):&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Maximize the distance between means&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Minimize the variation $s^2$ (which LDA calls &amp;ldquo;scatter&amp;rdquo;) within each category&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.11.22.png" alt="截屏2020-05-14 15.11.22" style="zoom:50%;" />
&lt;/li>
&lt;/ul>
&lt;p>We have:
&lt;/p>
$$
\frac{(\overbrace{\mu_1 - \mu_2}^{=: d})^2}{s_1^2 + s_2^2} \qquad\left(\frac{\text{''ideally large''}}{\text{"ideally small"}}\right)
$$
&lt;p>
&lt;strong>Why both distance and scatter are important?&lt;/strong>&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-05-14%2015.17.59.png" alt="截屏2020-05-14 15.17.59">&lt;/p>
&lt;h4 id="more-than-2-dimensions">More than 2 dimensions&lt;/h4>
&lt;p>The process is the &lt;strong>same&lt;/strong> 👏:&lt;/p>
&lt;p>Create an axis that maximizes the distance between the means for the two categories while minimizing the scatter&lt;/p>
&lt;h4 id="more-than-2-categories-eg-3-categories">More than 2 categories (e.g. 3 categories)&lt;/h4>
&lt;p>Little difference:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Measure the distances among the means&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Find the point that is &lt;strong>central&lt;/strong> to all of the data&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Then measure the distances between a point that is central in each category and the main central point&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.26.35.png" alt="截屏2020-05-14 15.26.35" style="zoom:50%;" />
&lt;/li>
&lt;li>
&lt;p>Maximize the distance between each category and the central point while minimizing the scatter for each category&lt;/p>
&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.28.40.png" alt="截屏2020-05-14 15.28.40" style="zoom:50%;" />
&lt;/li>
&lt;li>
&lt;p>Create 2 axes to separate the data (because the 3 central points for each category define a plane)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-05-14%2015.30.16.png" alt="截屏2020-05-14 15.30.16" style="zoom:50%;" />
&lt;/li>
&lt;/ul>
&lt;h2 id="lda-and-pca">LDA and PCA&lt;/h2>
&lt;h3 id="similarities">Similarities&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Both rank the new axes in order of importance&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>PC1 (the first new axis that PCA creates) accounts for the most variation in the data
&lt;ul>
&lt;li>PC2 (the second new axis) does the second best job&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>LD1 (the first new axis that LDA creates) accounts for the most variation between the categories
&lt;ul>
&lt;li>LD2 does the second best job&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Both can let you dig in and see which features are driving the new axes&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Both try to reduce dimensions&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>PCA looks at the features with the most variation&lt;/li>
&lt;li>LDA tries to maximize the separation of known categories&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=azXCzI57Yfc">https://www.youtube.com/watch?v=azXCzI57Yfc&lt;/a>&lt;/li>
&lt;/ul></description></item></channel></rss>