<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Unsupervised Learning | Haobin Tan</title><link>https://haobin-tan.netlify.app/tags/unsupervised-learning/</link><atom:link href="https://haobin-tan.netlify.app/tags/unsupervised-learning/index.xml" rel="self" type="application/rss+xml"/><description>Unsupervised Learning</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Sat, 07 Nov 2020 00:00:00 +0000</lastBuildDate><image><url>https://haobin-tan.netlify.app/media/icon_hu7d15bc7db65c8eaf7a4f66f5447d0b42_15095_512x512_fill_lanczos_center_3.png</url><title>Unsupervised Learning</title><link>https://haobin-tan.netlify.app/tags/unsupervised-learning/</link></image><item><title>Unsupervised Learning</title><link>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/</link><pubDate>Sun, 16 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/</guid><description/></item><item><title>Auto Encoder</title><link>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/auto-encoder/</link><pubDate>Sun, 16 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/auto-encoder/</guid><description>&lt;h2 id="supervised-vs-unsupervised-learning">Supervised vs. Unsupervised Learning&lt;/h2>
&lt;figure>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-08-17%2018.42.22.png"
alt="Supervised vs. unsupervised">&lt;figcaption>
&lt;p>Supervised vs. unsupervised&lt;/p>
&lt;/figcaption>
&lt;/figure>
&lt;ul>
&lt;li>&lt;strong>Supervised learning&lt;/strong>
&lt;ul>
&lt;li>Given data $(X, Y)$&lt;/li>
&lt;li>Estimate the posterior $P(Y|X)$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Unsupervised learning&lt;/strong>
&lt;ul>
&lt;li>Concern with the &lt;strong>structure&lt;/strong> (unseen) of the data&lt;/li>
&lt;li>Try to estimate (implicitly or explicitly) the data distribution $P(X)$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="auto-encoder-structure">Auto-Encoder structure&lt;/h2>
&lt;p>In supervised learning, the hidden layers encapsulate the features useful for classification. Even there are no labels or no output layer, it is still possible to learn features in the hidden layer! &amp;#x1f4aa;&lt;/p>
&lt;h3 id="linear-auto-encoder">Linear auto-encoder&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2018.56.10.png" alt="截屏2020-08-17 18.56.10" style="zoom:80%;" />
$$
\begin{array}{l}
H=W\_{I} I+b\_{I} \\\\
\tilde{I}=W\_{O} H+b\_{O}
\end{array}
$$
&lt;ul>
&lt;li>Similar to linear compression method (such as PCA)&lt;/li>
&lt;li>Trying to find linear surfaces that most data points can lie on&lt;/li>
&lt;li>&lt;span style="color:red">Not very useful for complicated data&lt;/span> 🤪&lt;/li>
&lt;/ul>
&lt;h3 id="non-linear-auto-encoder">Non-linear auto-encoder&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2018.59.24.png" alt="截屏2020-08-17 18.59.24" style="zoom:80%;" />
$$
\begin{array}{l}
H=f(W\_{I} I+b\_{I}) \\\\
\tilde{I}=W\_{O} H+b\_{O}
\end{array}
$$
&lt;ul>
&lt;li>
&lt;p>When $D\_H > D\_I$, the activation function also prevents the network to simply copy over the data&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Goal: find optimized weights to minimize
&lt;/p>
$$
L=\frac{1}{2}(\tilde{I}-\mathrm{I})^{2}
$$
&lt;ul>
&lt;li>Optimized with &lt;em>Stochastic Gradient Descent (SGD)&lt;/em>&lt;/li>
&lt;li>Gradients computed with &lt;em>Backpropagation&lt;/em>&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="general-auto-encoder-structure">General auto-encoder structure&lt;/h3>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-08-17%2019.12.15.png" alt="截屏2020-08-17 19.12.15">&lt;/p>
&lt;ul>
&lt;li>
&lt;p>2 components in general&lt;/p>
&lt;ul>
&lt;li>Encoder: maps input $I$ to hidden $H$&lt;/li>
&lt;li>Decoder: &lt;strong>reconstructs&lt;/strong> $\tilde{I}$ from $H$&lt;/li>
&lt;/ul>
&lt;p>($f$ and $f^*$ depend on input data type)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Encoder and Decoder often have similar/reversed architectures&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="why-auto-encoders">Why Auto-Encoders?&lt;/h2>
&lt;p>With auto-encoders we can do&lt;/p>
&lt;ul>
&lt;li>&lt;a href="#compression-and-reconstruction">Compression &amp;amp; Reconstruction&lt;/a>&lt;/li>
&lt;li>&lt;a href="#unsupervised-pretraining">MLP training assistance&lt;/a>&lt;/li>
&lt;li>&lt;a href="#restricted-boltzmann-machine">Feature learning&lt;/a>&lt;/li>
&lt;li>Representation learning&lt;/li>
&lt;li>&lt;a href="#variational-auto-encoder">Sampling different variations of the inputs&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>There&amp;rsquo;re many types and variations of auto-encoders&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Different architectures for different data types&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Different loss functions for different learning purposes&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="compression-and-reconstruction">Compression and Reconstruction&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2018.59.24.png" alt="截屏2020-08-17 18.59.24" style="zoom:80%;" />
&lt;ul>
&lt;li>
&lt;p>$D\_H &lt; D\_I$&lt;/p>
&lt;ul>
&lt;li>For example a flattened image: $D\_I = 1920 \times 1080 \times 3$&lt;/li>
&lt;li>Common hidden layer sizes: $512$ or $1024$&lt;/li>
&lt;/ul>
&lt;p>$\to$ Sending $H$ takes less bandwidth then $I$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Sender uses $W\_I$ and $b\_I$ to compress $I$ into $H$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Receiver uses $W\_O$ and $b\_O$ to reconstruct $\tilde{I}$&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>With &lt;strong>corrupted inputs&lt;/strong>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2021.23.56.png" alt="截屏2020-08-17 21.23.56" style="zoom: 50%;" />
&lt;ul>
&lt;li>
&lt;p>Deliberately corrupt inputs&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Train auto-encoders to regenerate the inputs before corruption&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$D\_H &lt; D\_I$ NOT required (no risk of learning an identity function)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Benefit from a network with large capacity&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Different ways of corruption&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Images&lt;/strong>
&lt;ul>
&lt;li>Adding noise filters&lt;/li>
&lt;li>downscaling&lt;/li>
&lt;li>shifting&lt;/li>
&lt;li>&amp;hellip;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Speech&lt;/strong>
&lt;ul>
&lt;li>simulating background noise&lt;/li>
&lt;li>Creating high-articulation effect&lt;/li>
&lt;li>&amp;hellip;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Text&lt;/strong>: masking words/characters&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Application&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Deep Learning super sampling&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>use neural auto-encoders to generate HD frames from SD frames&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-08-17%2021.30.49.png" alt="截屏2020-08-17 21.30.49">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Denoising Speech from Microphones&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="unsupervised-pretraining">Unsupervised Pretraining&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2021.33.04.png" alt="截屏2020-08-17 21.33.04" style="zoom:50%;" />
&lt;p>Normal training regime&lt;/p>
&lt;ol>
&lt;li>Initialize the networks with random $W\_1, W\_2, W\_3$&lt;/li>
&lt;li>Forward pass to compute output $O$&lt;/li>
&lt;li>Get the loss function $L(O, Y)$&lt;/li>
&lt;li>Backward pass and update weights to minimize $L$&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>Pretraining regime&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Find a way to have $W\_1, W\_2, W\_3$ &lt;strong>pretrained&lt;/strong> &amp;#x1f4aa;&lt;/li>
&lt;li>They are used to optimize auxiliary functions before training&lt;/li>
&lt;/ul>
&lt;h4 id="layer-wise-pretraining">Layer-wise pretraining&lt;/h4>
&lt;h5 id="pretraining-first-layer">&lt;strong>Pretraining first layer&lt;/strong>&lt;/h5>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2021.37.12.png" alt="截屏2020-08-17 21.37.12" style="zoom: 50%;" />
&lt;ol>
&lt;li>
&lt;p>Initialize $W\_1$ to encode, $W\_1^*$ to decode&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Forward pass&lt;/p>
&lt;ul>
&lt;li>
&lt;p>$I \to H\_1 \to I^*$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Reconstruction loss:
&lt;/p>
$$
L = \frac{1}{2}(I^* - I)^2
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Backward pass&lt;/p>
&lt;ul>
&lt;li>Compute gradients $\frac{\delta L}{\delta W_{1}}$ and $\frac{\delta L}{\delta W_{1}^*}$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Update $W\_1$, $W\_1^*$ with SGD&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Repeat 1 to 4 until convergence&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h5 id="pretraining-next-layers">Pretraining next layers&lt;/h5>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2021.41.39.png" alt="截屏2020-08-17 21.41.39" style="zoom: 50%;" />
&lt;ul>
&lt;li>&lt;strong>Use $W\_1$ from previous pretraining&lt;/strong>&lt;/li>
&lt;/ul>
&lt;ol>
&lt;li>
&lt;p>Initialize $W\_2$ to encode, $W\_2^*$ to decode&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Forward pass&lt;/p>
&lt;ul>
&lt;li>
&lt;p>$I \to H\_1 \to H\_2 \to I^*$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Reconstruction loss:
&lt;/p>
$$
L = \frac{1}{2}(H\_1^* - H\_1)^2
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Backward pass&lt;/p>
&lt;ul>
&lt;li>Compute gradients $\frac{\delta L}{\delta W_{2}}$ and $\frac{\delta L}{\delta W_{2}^*}$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Update $W\_2$, $W\_2^*$ with SGD and &lt;strong>keep $W\_1$ the same&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h5 id="hidden-layers-pretraining-in-general">Hidden layers pretraining in general&lt;/h5>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2021.46.02.png" alt="截屏2020-08-17 21.46.02" style="zoom:50%;" />
&lt;ul>
&lt;li>Each layer $H\_n$ is pretrained as an AE to reconstruct the input of that layer (i.e $H\_{n-1}$)&lt;/li>
&lt;li>The backward pass is stopped at the input to prevent changing previous weights $W\_1, \dots, W\_{n-1}$ and ONLY update $W\_n, W\_n^*$&lt;/li>
&lt;li>&lt;span style="color:red">Complexity of each AE increases over depth&lt;/span> (since the forward pass requires all previously pretrained layers)&lt;/li>
&lt;/ul>
&lt;h5 id="finetuning">Finetuning&lt;/h5>
&lt;ul>
&lt;li>Start the networks with &lt;strong>pretrained&lt;/strong> $W\_1, W\_2, W\_3$&lt;/li>
&lt;li>Go back to supervised training:
&lt;ol>
&lt;li>Forward pass to compute output $O$&lt;/li>
&lt;li>Get the loss function $L(O, Y)$&lt;/li>
&lt;li>Backward pass and update weights to minimize $L$&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ul>
&lt;blockquote>
&lt;p>This process is called &lt;strong>finetuning&lt;/strong> because the weights are NOT randomly initialized, but &lt;strong>carried over from an external process&lt;/strong>&lt;/p>
&lt;/blockquote>
&lt;h4 id="what-does-unsupervised-pretraining-help">What does “unsupervised pretraining” help?&lt;/h4>
&lt;p>According to &lt;a href="https://dl.acm.org/doi/10.5555/1756006.1756025">Why Does Unsupervised Pre-training Help Deep Learning?&lt;/a>&lt;/p>
&lt;ul>
&lt;li>Pretraining helps to make networks with 5 hidden layers converge&lt;/li>
&lt;li>Lower classification error rate&lt;/li>
&lt;li>Create a better starting point for the non-convex optimization process&lt;/li>
&lt;/ul>
&lt;h3 id="restricted-boltzmann-machine">Restricted Boltzmann Machine&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2022.35.40.png" alt="截屏2020-08-17 22.35.40" style="zoom: 50%;" />
&lt;ul>
&lt;li>
&lt;p>Structure&lt;/p>
&lt;ul>
&lt;li>Visible units (Input data points $I$)&lt;/li>
&lt;li>Hidden units ($H$)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Given input $V$, we can generate the probabilities of hidden units being &lt;em>On(1)/Off (0)&lt;/em>
&lt;/p>
$$
p\left(h\_{j}=1 \mid V\right)=\sigma\left(b\_{j}+\sum\_{i=1}^{m} W\_{i j} v\_{i}\right)
$$
&lt;/li>
&lt;li>
&lt;p>Given the hidden units, we can generate the probabilities of visible units being &lt;em>On/Off&lt;/em>
&lt;/p>
$$
p\left(v\_{i} \mid H\right)=\sigma\left(b\_{i}+\sum\_{j=1}^{F} W\_{i j} h\_{j}\right)
$$
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Energy function&lt;/strong> of a visible-hidden system
&lt;/p>
$$
E(V, H)=-\sum\_{i=1}^{m} \sum\_{j=1}^{F} W\_{i j} h\_{j} v\_{i}-\sum\_{i=1}^{m} v\_{i} a\_{i}-\sum\_{j=1}^{F} h\_{j} b\_{j}
$$
&lt;ul>
&lt;li>Train the network to minimize the energy function&lt;/li>
&lt;li>Use &lt;em>&lt;strong>Contrastive Divergence&lt;/strong>&lt;/em> algorithm&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="layer-wise-pretraining-with-rbm">Layer-wise pretraining with RBM&lt;/h4>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">See also: &lt;a href="https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/rbm/">Restricted Boltzman Machine&lt;/a>&lt;/span>
&lt;/div>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2022.43.06.png" alt="截屏2020-08-17 22.43.06" style="zoom:67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2022.43.51.png" alt="截屏2020-08-17 22.43.51" style="zoom:50%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2022.44.12.png" alt="截屏2020-08-17 22.44.12" style="zoom:50%;" />
&lt;h4 id="finetuning-rbm-deep-belief-network">Finetuning RBM: Deep Belief Network&lt;/h4>
&lt;ul>
&lt;li>The end result is called a &lt;strong>Deep Belief Network&lt;/strong>&lt;/li>
&lt;li>Use &lt;strong>pretrained&lt;/strong> $W\_1, W\_2, W\_3$ to convert the network into a typical MLP&lt;/li>
&lt;li>Go back to supervised training:
&lt;ol>
&lt;li>Forward pass to compute output $O$&lt;/li>
&lt;li>Get the loss function $L(O, Y)$&lt;/li>
&lt;li>Backward pass and update weights to minimize $L$&lt;/li>
&lt;/ol>
&lt;/li>
&lt;/ul>
&lt;h4 id="rbm-pretraining-application-in-speech">RBM Pretraining application in Speech&lt;/h4>
&lt;p>&lt;strong>Speech Recognition = Looking for the most probable transcription given an audio&lt;/strong>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2022.52.35.png" alt="截屏2020-08-17 22.52.35" style="zoom:40%;" />
&lt;p>💡 We can use (deep) neural networks to replace the non-neural generative models (Gaussian Mixture Models) in the Acoustic Models&lt;/p>
&lt;h2 id="variational-auto-encoder">Variational Auto-Encoder&lt;/h2>
&lt;p>&lt;strong>💡 Main idea:Enforcing the hidden units to follow an Unit Gaussian Distribution (or a known distribution)&lt;/strong>&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2023.16.51.png" alt="截屏2020-08-17 23.16.51" style="zoom:50%;" />
&lt;ul>
&lt;li>In AE we didn’t know the “distribution” of the (hidden) code&lt;/li>
&lt;li>Knowing the distribution in advance will make sampling easier&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2023.18.50.png" alt="截屏2020-08-17 23.18.50" style="zoom: 50%;" />
&lt;ul>
&lt;li>
&lt;p>Get the Gaussian restriction&lt;/p>
&lt;ul>
&lt;li>Each Gaussian is represented by Mean $𝜇$ and Variance $𝜎$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Why do we sample?&lt;/p>
&lt;ul>
&lt;li>The hidden layers’ neurons are then “arranged” in the gaussian distribution&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>We wanted to enforce the hidden layers to follow a known distribution, for example $𝑁(0, 𝐼)$, so we can add a loss function to do so:
&lt;/p>
$$
L=\frac{1}{2}(O-I)^{2}+\mathrm{KL}(\mathrm{N}(0, I), \mathrm{N}(\mu, \sigma))
$$
&lt;/li>
&lt;li>
&lt;p>Variational methods allow us to take a sample of the distribution being estimated, then get a “noisy” gradient for SGD&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Convergence can be achieved in practice&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="structure-prediction">Structure Prediction&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Beyond auto-encoder&lt;/p>
&lt;ul>
&lt;li>Auto-Encoder
&lt;ul>
&lt;li>Given the object: reconstruct the object&lt;/li>
&lt;li>$P(X)$ is (implicitly) estimated via reconstructing the inputs&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Structure prediction
&lt;ul>
&lt;li>Given a part of the object: predict the remaining&lt;/li>
&lt;li>$P(X)$ is estimated by &lt;strong>factorizing&lt;/strong> the inputs&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2023.27.32.png" alt="截屏2020-08-17 23.27.32" style="zoom: 40%;" />
&lt;/li>
&lt;/ul>
&lt;h3 id="pixel-models">Pixel Models&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Assumption (biased): The pixels are generated from left to right, from top to bottom.&lt;/p>
&lt;p>(I.e. the content of each pixel depends only on the pixels on its left, and its top rows, like image)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2023.52.17.png" alt="截屏2020-08-17 23.52.17" style="zoom: 50%;" />
&lt;/li>
&lt;li>
&lt;p>We can estimate a probabilistic function to learn how to generate pixels&lt;/p>
&lt;ul>
&lt;li>Image $X = \\{x\_1, x\_2, \dots, x\_n\\}$ with $n$ pixels
$$
P(X)=\prod\_{i=1}^{n} p\left(x\_{i} \mid x\_{1}, \ldots x\_{i-1}\right)
$$&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2023.55.50.png" alt="截屏2020-08-17 23.55.50" style="zoom:50%;" />
&lt;p>​ Closer look:&lt;/p>
&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-17%2023.57.43.png" alt="截屏2020-08-17 23.57.43" style="zoom:50%;" />
&lt;ul>
&lt;li>But this is quite difficult
&lt;ul>
&lt;li>The number of input pixels is a variable&lt;/li>
&lt;li>There are many pixels in an image&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>We can model such context dependency using many types of neural networks:
&lt;ul>
&lt;li>
&lt;p>Recurrentneuralnetworks&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Convolutional neural networks&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Transformers/Self-attentionNN&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="neural-language-models">(Neural) Language Models&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>A common model/application in natural language processing and generation (E.g. chatbots, translation, question answering)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Similar to the pixel models, we can assume the words are generated &lt;strong>from left to right&lt;/strong>
&lt;/p>
$$
P( \text{the end of our life} )=P( \text{the} ) \times P( \text{end} \mid {the} ) \times P(\text{of} \mid \text{the end} ) \times P( \text{our} \mid \text{the end of} ) \times P(\text{life} \mid \text{the end of our})
$$
&lt;/li>
&lt;li>
&lt;p>Each term can be estimated using neural networks under the form $P(x|context)$ with context being a series of words&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2000.18.49.png" alt="截屏2020-08-18 00.18.49" style="zoom: 50%;" />
&lt;ul>
&lt;li>Input: context&lt;/li>
&lt;li>Output: classification with $V$ classes (the vocabulary size)
&lt;ul>
&lt;li>Most classes will have near 0 probabilities given the context&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="summary">Summary&lt;/h3>
&lt;ul>
&lt;li>Structure prediction is
&lt;ul>
&lt;li>An explicit and flexible method to deal with estimating the likelihood of data that can be factorized (with bias)&lt;/li>
&lt;li>Motivation to develop a lot of flexible techniques
&lt;ul>
&lt;li>Such as sequence to sequence models, attention models&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>The bias is often the weakness 🤪&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://zhuanlan.zhihu.com/p/24813602">Auto-Encoder intuition&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://www.edureka.co/blog/autoencoders-tutorial/">Autoencoders Tutorial : A Beginner’s Guide to Autoencoders&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://wiki.pathmind.com/restricted-boltzmann-machine">A Beginner&amp;rsquo;s Guide to Restricted Boltzmann Machines (RBMs)&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Hopfield Nets</title><link>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/hopfield-net/</link><pubDate>Tue, 18 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/hopfield-net/</guid><description>&lt;h2 id="binary-hopfield-nets">&lt;strong>Binary Hopfield Nets&lt;/strong>&lt;/h2>
&lt;h3 id="basic-structure-binary-unit">Basic Structure: Binary Unit&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Single layer of processing units&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each unit $i$ has an activity value or “state” $u\_i$&lt;/p>
&lt;ul>
&lt;li>Binary: $-1$ or $1$&lt;/li>
&lt;li>Denoted as $+$ and $–$ respectively&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2016.52.56.png" alt="截屏2020-08-18 16.52.56" style="zoom: 67%;" />
&lt;/li>
&lt;/ul>
&lt;h3 id="connections">Connections&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Processing units fully interconnected&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Weights from unit $j$ to unit $i$: $T\_{ij}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>No unit has a connection with itself
&lt;/p>
$$
\forall i : \qquad T\_{ii} = 0
$$
&lt;/li>
&lt;li>
&lt;p>Weights between a pair of units are &lt;strong>symmetric&lt;/strong>
&lt;/p>
$$
T\_{ji} = T\_{ij}
$$
&lt;ul>
&lt;li>Symmetric weights lead to the fact that the network will converge (relax in stable state)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Example&lt;/p>
&lt;p>&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" width="501px" viewBox="-0.5 -0.5 501 403" content="&amp;lt;mxfile host=&amp;quot;app.diagrams.net&amp;quot; modified=&amp;quot;2020-08-18T20:20:59.064Z&amp;quot; agent=&amp;quot;5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36&amp;quot; etag=&amp;quot;icQIK4C8tDW4pi8ZyByC&amp;quot; version=&amp;quot;13.6.2&amp;quot; type=&amp;quot;device&amp;quot;&amp;gt;&amp;lt;diagram id=&amp;quot;5oDFmgM-eGvQfvP3TnwN&amp;quot; name=&amp;quot;Page-1&amp;quot;&amp;gt;7Zldb5swFIZ/DdJ20QlsyMdl89H1ots0Zdqaq8oCB7waHDkmgf36mWAgfGXtlgmS5iqc18bY57HfYykanPrRR47W3ifmYKoB3Yk0ONMAMEwwkD+JEqfKcGikgsuJozoVwoL8wkrUlRoSB29KHQVjVJB1WbRZEGBblDTEOduVu60YLX91jVxcExY2onX1B3GEl6oAwnHRcI+J66lPQ6irmfso663G2HjIYbtU2veBcw1OOWMiffKjKaZJ9rLEpAPdtbTmM+M4EC954WHMzOD569P8G7K/hPefw5nn3VhqbiLOVowdmQAVMi485rIA0XmhTjgLAwcno+oyKvo8MLaWoiHFn1iIWNFEoWBS8oRPVaucMI8f1fv7YJkEH6wsnEWHjbNYRSsWCDUoSFud24SxjAMW4FS5I5Sq/hvB2XPODUolXW2yxNYkZrRYyG18JHPZbkTcxeJIP5CjlocEMx/LFcn3OKZIkG15HkjtVjfvV/CUDwrpK/AOusVbEF2WgDbj/RPOKv5e4IVd4h3X8P47vIiIx4Pn5cFzwS0J4kOI9dN8ibDNLmGrcbeIhupLmjV99/1J6vo+lvY/MaT2vn7oKZUFM0nyziMCL9Zon46drNll+mizTqvoikTJLnoxhi3mAkfHQdQTp16AA1Uz1SXByOJdUXINVd1076Da5uLJkz26GudfniVwFmdJP8pXZfFkZpp55ivM1Lj4DdBp5QQtZgpyM705Sysdgt5Z6bAbK+3RiYDnYImw5UTAMz8R1cuFaXZ+IsyWVJtnf5Or2k8Pkm0035tbNrNcuignMs3TlFHGCzNZSSepSIgSN5ChLVOFpT5JEklsRG9Vg08ch7aRKxteldUJyFjVO/aoTmbUAMb8b1yaS/Bb4wKtMhcIuubSXAhaHeliyVh6mQwYdk2muW68PTID2Dcy1tXLkuo/7JuXDa5cGmp/7myn5yLD4h+tfdvBH4Nw/hs=&amp;lt;/diagram&amp;gt;&amp;lt;/mxfile&amp;gt;" onclick="(function(svg){var src=window.event.target||window.event.srcElement;while (src!=null&amp;amp;&amp;amp;src.nodeName.toLowerCase()!='a'){src=src.parentNode;}if(src==null){if(svg.wnd!=null&amp;amp;&amp;amp;!svg.wnd.closed){svg.wnd.focus();}else{var r=function(evt){if(evt.data=='ready'&amp;amp;&amp;amp;evt.source==svg.wnd){svg.wnd.postMessage(decodeURIComponent(svg.getAttribute('content')),'*');window.removeEventListener('message',r);}};window.addEventListener('message',r);svg.wnd=window.open('https://viewer.diagrams.net/?client=1&amp;amp;edit=_blank');}}})(this);" style="cursor:pointer;max-width:100%;max-height:403px;">&lt;defs>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_Preview {color: #888}
#MathJax_Message {position: fixed; left: 1em; bottom: 1.5em; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap}
#MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px}
.MathJax_Error {color: #CC0000; font-style: italic}
&lt;/style>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute}
.MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: &amp;lsquo;Courier New&amp;rsquo;,Courier; font-size: 9px; color: #F0F0F0}
.MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px}
.MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_Hover_Arrow:hover span {background-color: #CCC!important}
&lt;/style>&lt;style xmlns="http://www.w3.org/1999/xhtml" type="text/css">.MathJax_SVG_Display {text-align: center; margin: 1em 0em; position: relative; display: block!important; text-indent: 0; max-width: none; max-height: none; min-width: 0; min-height: 0; width: 100%}
.MathJax_SVG .MJX-monospace {font-family: monospace}
.MathJax_SVG .MJX-sans-serif {font-family: sans-serif}
#MathJax_SVG_Tooltip {background-color: InfoBackground; color: InfoText; border: 1px solid black; box-shadow: 2px 2px 5px #AAAAAA; -webkit-box-shadow: 2px 2px 5px #AAAAAA; -moz-box-shadow: 2px 2px 5px #AAAAAA; -khtml-box-shadow: 2px 2px 5px #AAAAAA; padding: 3px 4px; z-index: 401; position: absolute; left: 0; top: 0; width: auto; height: auto; display: none}
.MathJax_SVG {display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; padding: 0; margin: 0}
.MathJax_SVG * {transition: none; -webkit-transition: none; -moz-transition: none; -ms-transition: none; -o-transition: none}
.MathJax_SVG &amp;gt; div {display: inline-block}
.mjx-svg-href {fill: blue; stroke: blue}
.MathJax_SVG_Processing {visibility: hidden; position: absolute; top: 0; left: 0; width: 0; height: 0; overflow: hidden; display: block!important}
.MathJax_SVG_Processed {display: none!important}
.MathJax_SVG_test {font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-transform: none; letter-spacing: normal; word-spacing: normal; overflow: hidden; height: 1px}
.MathJax_SVG_test.mjx-test-display {display: table!important}
.MathJax_SVG_test.mjx-test-inline {display: inline!important; margin-right: -1px}
.MathJax_SVG_test.mjx-test-default {display: block!important; clear: both}
.MathJax_SVG_ex_box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .MathJax_SVG_left_box {display: inline-block; width: 0; float: left}
.mjx-test-inline .MathJax_SVG_right_box {display: inline-block; width: 0; float: right}
.mjx-test-display .MathJax_SVG_right_box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
.MathJax_SVG .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
&lt;/style>&lt;/defs>&lt;g>&lt;path d="M 130 61 L 370 61" fill="none" stroke="#000000" stroke-width="3" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 70 121 L 70 281" fill="none" stroke="#000000" stroke-width="3" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 112.43 103.43 L 387.57 298.57" fill="none" stroke="#000000" stroke-width="3" stroke-miterlimit="10" pointer-events="stroke"/>&lt;ellipse cx="70" cy="61" rx="60" ry="60" fill="#ffffff" stroke="#000000" stroke-width="3" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 61px; margin-left: 11px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-16-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="8.479ex" height="2.21ex" viewBox="0 -743.6 3650.5 951.6" role="img" focusable="false" style="vertical-align: -0.483ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M52 648Q52 670 65 683H76Q118 680 181 680Q299 680 320 683H330Q336 677 336 674T334 656Q329 641 325 637H304Q282 635 274 635Q245 630 242 620Q242 618 271 369T301 118L374 235Q447 352 520 471T595 594Q599 601 599 609Q599 633 555 637Q537 637 537 648Q537 649 539 661Q542 675 545 679T558 683Q560 683 570 683T604 682T668 681Q737 681 755 683H762Q769 676 769 672Q769 655 760 640Q757 637 743 637Q730 636 719 635T698 630T682 623T670 615T660 608T652 599T645 592L452 282Q272 -9 266 -16Q263 -18 259 -21L241 -22H234Q216 -22 216 -15Q213 -9 177 305Q139 623 138 626Q133 637 76 637H59Q52 642 52 648Z"/>&lt;g transform="translate(583,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;g transform="translate(1315,0)">&lt;path stroke-width="1" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"/>&lt;/g>&lt;g transform="translate(2371,0)">&lt;path stroke-width="1" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"/>&lt;/g>&lt;g transform="translate(3149,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-16">V_1 = +1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="70" y="67" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">V_1 = +1&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 430 121 L 430 281" fill="none" stroke="#000000" stroke-width="3" stroke-miterlimit="10" pointer-events="stroke"/>&lt;path d="M 387.57 103.43 L 112.43 298.57" fill="none" stroke="#000000" stroke-width="3" stroke-miterlimit="10" pointer-events="stroke"/>&lt;ellipse cx="430" cy="61" rx="60" ry="60" fill="#ffffff" stroke="#000000" stroke-width="3" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 61px; margin-left: 371px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-4-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="8.479ex" height="2.21ex" viewBox="0 -743.6 3650.5 951.6" role="img" focusable="false" style="vertical-align: -0.483ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M52 648Q52 670 65 683H76Q118 680 181 680Q299 680 320 683H330Q336 677 336 674T334 656Q329 641 325 637H304Q282 635 274 635Q245 630 242 620Q242 618 271 369T301 118L374 235Q447 352 520 471T595 594Q599 601 599 609Q599 633 555 637Q537 637 537 648Q537 649 539 661Q542 675 545 679T558 683Q560 683 570 683T604 682T668 681Q737 681 755 683H762Q769 676 769 672Q769 655 760 640Q757 637 743 637Q730 636 719 635T698 630T682 623T670 615T660 608T652 599T645 592L452 282Q272 -9 266 -16Q263 -18 259 -21L241 -22H234Q216 -22 216 -15Q213 -9 177 305Q139 623 138 626Q133 637 76 637H59Q52 642 52 648Z"/>&lt;g transform="translate(583,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M109 429Q82 429 66 447T50 491Q50 562 103 614T235 666Q326 666 387 610T449 465Q449 422 429 383T381 315T301 241Q265 210 201 149L142 93L218 92Q375 92 385 97Q392 99 409 186V189H449V186Q448 183 436 95T421 3V0H50V19V31Q50 38 56 46T86 81Q115 113 136 137Q145 147 170 174T204 211T233 244T261 278T284 308T305 340T320 369T333 401T340 431T343 464Q343 527 309 573T212 619Q179 619 154 602T119 569T109 550Q109 549 114 549Q132 549 151 535T170 489Q170 464 154 447T109 429Z"/>&lt;/g>&lt;g transform="translate(1315,0)">&lt;path stroke-width="1" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"/>&lt;/g>&lt;g transform="translate(2371,0)">&lt;path stroke-width="1" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"/>&lt;/g>&lt;g transform="translate(3149,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-4">V_2 = -1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="430" y="67" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">V_2 = -1&lt;/text>&lt;/switch>&lt;/g>&lt;path d="M 130 341 L 370 341" fill="none" stroke="#000000" stroke-width="3" stroke-miterlimit="10" pointer-events="stroke"/>&lt;ellipse cx="70" cy="341" rx="60" ry="60" fill="#ffffff" stroke="#000000" stroke-width="3" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 341px; margin-left: 11px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-8-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="8.479ex" height="2.306ex" viewBox="0 -743.6 3650.5 992.8" role="img" focusable="false" style="vertical-align: -0.579ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M52 648Q52 670 65 683H76Q118 680 181 680Q299 680 320 683H330Q336 677 336 674T334 656Q329 641 325 637H304Q282 635 274 635Q245 630 242 620Q242 618 271 369T301 118L374 235Q447 352 520 471T595 594Q599 601 599 609Q599 633 555 637Q537 637 537 648Q537 649 539 661Q542 675 545 679T558 683Q560 683 570 683T604 682T668 681Q737 681 755 683H762Q769 676 769 672Q769 655 760 640Q757 637 743 637Q730 636 719 635T698 630T682 623T670 615T660 608T652 599T645 592L452 282Q272 -9 266 -16Q263 -18 259 -21L241 -22H234Q216 -22 216 -15Q213 -9 177 305Q139 623 138 626Q133 637 76 637H59Q52 642 52 648Z"/>&lt;g transform="translate(583,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M127 463Q100 463 85 480T69 524Q69 579 117 622T233 665Q268 665 277 664Q351 652 390 611T430 522Q430 470 396 421T302 350L299 348Q299 347 308 345T337 336T375 315Q457 262 457 175Q457 96 395 37T238 -22Q158 -22 100 21T42 130Q42 158 60 175T105 193Q133 193 151 175T169 130Q169 119 166 110T159 94T148 82T136 74T126 70T118 67L114 66Q165 21 238 21Q293 21 321 74Q338 107 338 175V195Q338 290 274 322Q259 328 213 329L171 330L168 332Q166 335 166 348Q166 366 174 366Q202 366 232 371Q266 376 294 413T322 525V533Q322 590 287 612Q265 626 240 626Q208 626 181 615T143 592T132 580H135Q138 579 143 578T153 573T165 566T175 555T183 540T186 520Q186 498 172 481T127 463Z"/>&lt;/g>&lt;g transform="translate(1315,0)">&lt;path stroke-width="1" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"/>&lt;/g>&lt;g transform="translate(2371,0)">&lt;path stroke-width="1" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"/>&lt;/g>&lt;g transform="translate(3149,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-8">V_3 = -1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="70" y="347" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">V_3 = -1&lt;/text>&lt;/switch>&lt;/g>&lt;ellipse cx="430" cy="341" rx="60" ry="60" fill="#ffffff" stroke="#000000" stroke-width="3" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 118px; height: 1px; padding-top: 341px; margin-left: 371px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-17-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="8.479ex" height="2.21ex" viewBox="0 -743.6 3650.5 951.6" role="img" focusable="false" style="vertical-align: -0.483ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M52 648Q52 670 65 683H76Q118 680 181 680Q299 680 320 683H330Q336 677 336 674T334 656Q329 641 325 637H304Q282 635 274 635Q245 630 242 620Q242 618 271 369T301 118L374 235Q447 352 520 471T595 594Q599 601 599 609Q599 633 555 637Q537 637 537 648Q537 649 539 661Q542 675 545 679T558 683Q560 683 570 683T604 682T668 681Q737 681 755 683H762Q769 676 769 672Q769 655 760 640Q757 637 743 637Q730 636 719 635T698 630T682 623T670 615T660 608T652 599T645 592L452 282Q272 -9 266 -16Q263 -18 259 -21L241 -22H234Q216 -22 216 -15Q213 -9 177 305Q139 623 138 626Q133 637 76 637H59Q52 642 52 648Z"/>&lt;g transform="translate(583,-150)">&lt;path stroke-width="1" transform="scale(0.707)" d="M462 0Q444 3 333 3Q217 3 199 0H190V46H221Q241 46 248 46T265 48T279 53T286 61Q287 63 287 115V165H28V211L179 442Q332 674 334 675Q336 677 355 677H373L379 671V211H471V165H379V114Q379 73 379 66T385 54Q393 47 442 46H471V0H462ZM293 211V545L74 212L183 211H293Z"/>&lt;/g>&lt;g transform="translate(1315,0)">&lt;path stroke-width="1" d="M56 347Q56 360 70 367H707Q722 359 722 347Q722 336 708 328L390 327H72Q56 332 56 347ZM56 153Q56 168 72 173H708Q722 163 722 153Q722 140 707 133H70Q56 140 56 153Z"/>&lt;/g>&lt;g transform="translate(2371,0)">&lt;path stroke-width="1" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"/>&lt;/g>&lt;g transform="translate(3149,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-17">V_4 = +1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="430" y="347" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">V_4 = +1&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="210" y="21" width="80" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 41px; margin-left: 211px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-7-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.971ex" height="2.115ex" viewBox="0 -743.6 1279 910.4" role="img" focusable="false" style="vertical-align: -0.387ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"/>&lt;g transform="translate(778,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-7">-1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="250" y="47" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">-1&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="0" y="161" width="80" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 181px; margin-left: 1px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-10-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.971ex" height="2.115ex" viewBox="0 -743.6 1279 910.4" role="img" focusable="false" style="vertical-align: -0.387ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"/>&lt;g transform="translate(778,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-10">-1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="40" y="187" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">-1&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="150" y="111" width="80" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 131px; margin-left: 151px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-12-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.971ex" height="2.115ex" viewBox="0 -743.6 1279 910.4" role="img" focusable="false" style="vertical-align: -0.387ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"/>&lt;g transform="translate(778,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-12">+1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="190" y="137" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">+1&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="280" y="111" width="80" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 131px; margin-left: 281px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-13-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.971ex" height="2.115ex" viewBox="0 -743.6 1279 910.4" role="img" focusable="false" style="vertical-align: -0.387ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M56 237T56 250T70 270H369V420L370 570Q380 583 389 583Q402 583 409 568V270H707Q722 262 722 250T707 230H409V-68Q401 -82 391 -82H389H387Q375 -82 369 -68V230H70Q56 237 56 250Z"/>&lt;g transform="translate(778,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-13">+1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="320" y="137" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">+1&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="420" y="161" width="80" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 181px; margin-left: 421px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-14-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.971ex" height="2.115ex" viewBox="0 -743.6 1279 910.4" role="img" focusable="false" style="vertical-align: -0.387ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"/>&lt;g transform="translate(778,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-14">-1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="460" y="187" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">-1&lt;/text>&lt;/switch>&lt;/g>&lt;rect x="210" y="341" width="80" height="40" fill="none" stroke="none" pointer-events="all"/>&lt;g transform="translate(-0.5 -0.5)">&lt;switch>&lt;foreignObject style="overflow: visible; text-align: left;" pointer-events="none" width="100%" height="100%" requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility">&lt;div xmlns="http://www.w3.org/1999/xhtml" style="display: flex; align-items: unsafe center; justify-content: unsafe center; width: 78px; height: 1px; padding-top: 361px; margin-left: 211px;">&lt;div style="box-sizing: border-box; font-size: 0; text-align: center; ">&lt;div style="display: inline-block; font-size: 20px; font-family: Helvetica; color: #000000; line-height: 1.2; pointer-events: all; white-space: normal; word-wrap: normal; ">&lt;span class="MathJax_Preview" style="">&lt;/span>&lt;span class="MathJax_SVG" id="MathJax-Element-15-Frame" tabindex="0" style="font-size: 100%; display: inline-block;">&lt;svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="2.971ex" height="2.115ex" viewBox="0 -743.6 1279 910.4" role="img" focusable="false" style="vertical-align: -0.387ex;">&lt;g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)">&lt;path stroke-width="1" d="M84 237T84 250T98 270H679Q694 262 694 250T679 230H98Q84 237 84 250Z"/>&lt;g transform="translate(778,0)">&lt;path stroke-width="1" d="M213 578L200 573Q186 568 160 563T102 556H83V602H102Q149 604 189 617T245 641T273 663Q275 666 285 666Q294 666 302 660V361L303 61Q310 54 315 52T339 48T401 46H427V0H416Q395 3 257 3Q121 3 100 0H88V46H114Q136 46 152 46T177 47T193 50T201 52T207 57T213 61V578Z"/>&lt;/g>&lt;/g>&lt;/svg>&lt;/span>&lt;script type="math/tex" id="MathJax-Element-15">-1&lt;/script>&lt;/div>&lt;/div>&lt;/div>&lt;/foreignObject>&lt;text x="250" y="367" fill="#000000" font-family="Helvetica" font-size="20px" text-anchor="middle">-1&lt;/text>&lt;/switch>&lt;/g>&lt;/g>&lt;switch>&lt;g requiredFeatures="http://www.w3.org/TR/SVG11/feature#Extensibility"/>&lt;a transform="translate(0,-5)" xlink:href="https://desk.draw.io/support/solutions/articles/16000042487" target="_blank">&lt;text text-anchor="middle" font-size="10px" x="50%" y="100%">Viewer does not support full SVG 1.1&lt;/text>&lt;/a>&lt;/switch>&lt;/svg>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Unit vector:
&lt;/p>
$$
U = (+1, -1, -1, +1)^T
$$
&lt;p>
Weight matrix:&lt;/p>
$$
T=\left(\begin{array}{cccc}
T\_{11} &amp; T\_{12} &amp; T\_{13} &amp; T\_{14} \\\\
T\_{21} &amp; T\_{22} &amp; T\_{23} &amp; T\_{24} \\\\
T\_{31} &amp; T\_{32} &amp; T\_{33} &amp; T\_{34} \\\\
T\_{41} &amp; T\_{42} &amp; T\_{43} &amp; T\_{44}
\end{array}\right)
= \left(\begin{array}{cccc}
0 &amp; -1 &amp; -1 &amp; +1 \\\\
-1 &amp; 0 &amp; +1 &amp; -1 \\\\
-1 &amp; +1 &amp; 0 &amp; -1 \\\\
+1 &amp; -1 &amp; -1 &amp; 0
\end{array}\right)
$$
&lt;h3 id="update-binary-unit">Update Binary Unit&lt;/h3>
$$
u\_i = \operatorname{sign}(\sum\_{j} T\_{ji} u\_j) = \begin{cases}
+1 &amp; \text{if }\sum\_{j} T\_{ji} u\_j \geq 0 \\\\
-1 &amp; \text {otherwise }
\end{cases}
$$
&lt;ol>
&lt;li>Evaluate the sum of the weighted inputs&lt;/li>
&lt;li>Set state $1$ if the sum is greater or equal $0$, else $-1$&lt;/li>
&lt;/ol>
&lt;h3 id="update-procedure">Update Procedure&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Network state is initialized in the beginning&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Update&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Asynchronous&lt;/strong>: Update one unit at a time&lt;/li>
&lt;li>&lt;strong>Synchronous&lt;/strong>: Update all nodes in parallel&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Continue updating until the network state does not change anymore&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="example">Example&lt;/h4>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.19.17.png" alt="截屏2020-08-18 17.19.17" style="zoom: 67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.23.26.png" alt="截屏2020-08-18 17.23.26" style="zoom:67%;" />
&lt;blockquote>
$$
> u\_4 = \operatorname{sign}(+1 \cdot (-1) + (-1) \cdot 1 + (-1) \cdot 1) = \operatorname{sign}(-3) = -1
> $$
&lt;p>So the new state of unit 4 is $-$&lt;/p>
&lt;/blockquote>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.23.29.png" alt="截屏2020-08-18 17.23.29" style="zoom:67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.23.32.png" alt="截屏2020-08-18 17.23.32" style="zoom:67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.23.34.png" alt="截屏2020-08-18 17.23.34" style="zoom:67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18 17.23.37.png" alt="截屏2020-08-18 17.23.37" style="zoom:67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.23.39.png" alt="截屏2020-08-18 17.23.39" style="zoom:67%;" />
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2017.23.42.png" alt="截屏2020-08-18 17.23.42" style="zoom:67%;" />
&lt;h4 id="order-of-updating">Order of updating&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Could be sequentially&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Random order (Hopfield networks)&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Same average update rate&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Advantages in implementation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Advantages in function (equiprobable stable states)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Randomized asynchronous&lt;/strong> updating is a closer match to the biological neuronal nets&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="energy-function">Energy function&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Assign a numerical value to each possible state of the system (&lt;strong>Lyapunov Function&lt;/strong>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Corresponds to the “energy” of the net
&lt;/p>
$$
\begin{aligned}
E &amp;= -\frac{1}{2} \sum\_{j} \sum\_{i \neq j} u\_{i} T\_{j i} u\_{j} \\\\
&amp;= -\frac{1}{2}U^T TU
\end{aligned}
$$
&lt;/li>
&lt;/ul>
&lt;h4 id="proof-on-convergence">Proof on Convergence&lt;/h4>
&lt;p>&lt;strong>Each updating step leads to lower or same energy in the net.&lt;/strong>&lt;/p>
&lt;p>Let&amp;rsquo;s say only unit $j$ is updated at a time. Energy changes only for unit $j$ is
&lt;/p>
$$
E\_{j}=-\frac{1}{2} \sum\_{i \neq j} u\_{i}T\_{j i} u\_{j}
$$
&lt;p>
Given a change in state, the difference in Energy $E$ is
&lt;/p>
$$
\begin{aligned}
\Delta E\_{j}&amp;=E\_{j\_{n e w}}-E\_{j\_{o l d}} \\\\
&amp;=-\frac{1}{2} \Delta u\_{j} \sum\_{i \neq j} u\_{j} T\_{j i}
\end{aligned}
$$
$$
\Delta u\_{j}=u\_{j\_{n e w}}-u\_{j\_{o l d}}
$$
&lt;ul>
&lt;li>
&lt;p>Change from $-1$ to $1$:
&lt;/p>
$$
\Delta u\_{j}=2, \Sigma T\_{j i} u\_{i} \geq 0 \Rightarrow \Delta E\_{j} \leq 0
$$
&lt;/li>
&lt;li>
&lt;p>Change from $1$ to $-1$:
&lt;/p>
$$
\Delta u\_{j}=-2, \Sigma T\_{j i} u\_{i}&lt;0 \Rightarrow \Delta E\_{j}&lt;0
$$
&lt;/li>
&lt;/ul>
&lt;h4 id="stable-states">Stable States&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>Stable states are minima of the energy function&lt;/p>
&lt;ul>
&lt;li>Can be global or local minima&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Analogous to finding a minimum in a mountainous terrain&lt;/p>
&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2022.36.03.png" alt="截屏2020-08-18 22.36.03" style="zoom: 67%;" />
&lt;h2 id="applications">Applications&lt;/h2>
&lt;h3 id="associative-memory">Associative memory&lt;/h3>
&lt;h3 id="optimization">Optimization&lt;/h3>
&lt;h2 id="limitations">Limitations&lt;/h2>
&lt;h3 id="found-stable-state-memory-is-not-guaranteed-the-most-similar-pattern-to-the-input-pattern">Found stable state (memory) is not guaranteed the most similar pattern to the input pattern&lt;/h3>
&lt;p>Not all memories are remembered with same emphasis (attractors region is not the same size)&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2022.39.28.png" alt="截屏2020-08-18 22.39.28" style="zoom: 67%;" />
&lt;h3 id="spurious-states">Spurious States&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Retrieval States&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Reversed States&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Mixture States: Any linear combination of an odd number of patterns&lt;/p>
&lt;/li>
&lt;li>
&lt;p>“Spinglass” states: Stable states that are no linear combination of stored patterns (occur when too many patterns are stored)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="efficiency">Efficiency&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>In a net of $N$ units, patterns of length $N$ can be stored&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Assuming uncorrelated patterns, the capacity $C$ of a hopfield net is
&lt;/p>
$$
C \approx 0.15N
$$
&lt;ul>
&lt;li>Tighter bound
$$
\frac{N}{4 \ln N}&lt;C&lt;\frac{N}{2 \ln N}
$$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://towardsdatascience.com/hopfield-networks-are-useless-heres-why-you-should-learn-them-f0930ebeadcd">Hopfield Networks are useless. Here’s why you should learn them.&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=HoWJzeAT9uc">Working with a Hopfield neural network model&lt;/a>&lt;/li>
&lt;/ul></description></item><item><title>Bolzmann Machine</title><link>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/boltzmann-machine/</link><pubDate>Tue, 18 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/boltzmann-machine/</guid><description>&lt;h2 id="boltzmann-machine">&lt;strong>Boltzmann Machine&lt;/strong>&lt;/h2>
&lt;ul>
&lt;li>Stochastic recurrent neural network&lt;/li>
&lt;li>Introduced by Hinton and Sejnowski&lt;/li>
&lt;li>Learn internal representations&lt;/li>
&lt;li>&lt;span style="color:red">Problem: unconstrained connectivity&lt;/span>&lt;/li>
&lt;/ul>
&lt;h3 id="representation">Representation&lt;/h3>
&lt;p>Model can be represented by Graph:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Undirected graph&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Nodes: &lt;a href="states">States&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="#connections">Edges: Dependencies between states&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2022.51.29.png" alt="截屏2020-08-18 22.51.29" style="zoom:50%;" />
&lt;h3 id="states">States&lt;/h3>
&lt;p>Types:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Visible states&lt;/strong>
&lt;ul>
&lt;li>Represent observed data&lt;/li>
&lt;li>Can be input/output data&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Hidden states&lt;/strong>
&lt;ul>
&lt;li>Latent variable we want to learn&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>&lt;strong>Bias states&lt;/strong>
&lt;ul>
&lt;li>Always one to encode the bias&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>Binary states&lt;/p>
&lt;ul>
&lt;li>unit value $\in \\{0, 1\\}$&lt;/li>
&lt;/ul>
&lt;p>Stochastic&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Decision of whether state is active or not is stochastically&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Depend on the input
&lt;/p>
$$
z\_{i}=b\_{i}+\sum\_{j} s\_{j} w\_{i j}
$$
&lt;ul>
&lt;li>$b\_i$: Bias&lt;/li>
&lt;li>$S\_j$: State $j$&lt;/li>
&lt;li>$w\_{ij}$: Weight between state $j$ and state $i$&lt;/li>
&lt;/ul>
$$
p\left(s\_{i}=1\right)=\frac{1}{1+e^{-z\_{i}}}
$$
&lt;/li>
&lt;/ul>
&lt;h3 id="connections">Connections&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Graph can be fully connected (no restrictions)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Unidircted:
&lt;/p>
$$
w\_{ij} = w\_{ji}
$$
&lt;/li>
&lt;li>
&lt;p>No self connections:
&lt;/p>
$$
w\_{ii} = 0
$$
&lt;/li>
&lt;/ul>
&lt;h3 id="energy">Energy&lt;/h3>
&lt;p>Energy of the network
&lt;/p>
$$
\begin{aligned}
E &amp;= -S^TWS - b^TS \\\\
&amp;= -\sum\_{i&lt;j} w\_{i j} S\_{i} S\_{j}-\sum\_{i} b\_{i} s\_{i}
\end{aligned}
$$
&lt;p>
Probability of input vector $v$
&lt;/p>
$$
p(v)= \frac{e^{-E(v)}}{\displaystyle \sum\_{u} e^{-E(u)}}
$$
&lt;p>
Updating the nodes&lt;/p>
&lt;ul>
&lt;li>
&lt;p>decrease the Energy of the network in average&lt;/p>
&lt;/li>
&lt;li>
&lt;p>reach Local Minimum (Equilibrium)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Stochastic process will avoid local minima
&lt;/p>
$$
\begin{array}{c}
p\left(s\_{i}=1\right)=\frac{1}{1+e^{-z\_{i}}} \\\\
z\_{i}=\Delta E\_{i}=E\_{i=0}-E\_{i=1}
\end{array}
$$
&lt;/li>
&lt;/ul>
&lt;h3 id="simulated-annealing">Simulated Annealing&lt;/h3>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2020-08-18%2023.53.50.png" alt="截屏2020-08-18 23.53.50">&lt;/p>
&lt;p>Use Temperature to allow for more changes in the beginning&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Start with high temperature&lt;/p>
&lt;/li>
&lt;li>
&lt;p>“&lt;strong>anneal&lt;/strong>” by slowing lowering T&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Can escape from local minima &amp;#x1f44f;&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="search-problem">Search Problem&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2023.54.58.png" alt="截屏2020-08-18 23.54.58" style="zoom:67%;" />
&lt;ul>
&lt;li>
&lt;p>Input is set and fixed (clamped)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Annealing is done&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Answer is presented at the output&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Hidden units add extra representational power&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h3 id="learning-problem">Learning problem&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Situations&lt;/p>
&lt;ul>
&lt;li>Present data vectors to the network&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Problem&lt;/p>
&lt;ul>
&lt;li>Learn weights that generate these data with high probability&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Approach&lt;/p>
&lt;ul>
&lt;li>Perform small updates on the weights&lt;/li>
&lt;li>Each time perform search problem&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="pros--cons">Pros &amp;amp; Cons&lt;/h3>
&lt;p>✅ Pros&lt;/p>
&lt;ul>
&lt;li>Boltzmann machine with enough hidden units can compute any function&lt;/li>
&lt;/ul>
&lt;p>⛔️ Cons&lt;/p>
&lt;ul>
&lt;li>Training is very slow and computational expensive &amp;#x1f622;&lt;/li>
&lt;/ul>
&lt;h2 id="restricted-boltzmann-machine">&lt;strong>Restricted Boltzmann Machine&lt;/strong>&lt;/h2>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">See also: &lt;a href="https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/rbm/">Restricted Boltzman Machine&lt;/a>&lt;/span>
&lt;/div>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2020-08-18%2023.58.36.png" alt="截屏2020-08-18 23.58.36" style="zoom:67%;" />
&lt;ul>
&lt;li>
&lt;p>Boltzmann machine with restriction&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Graph must be &lt;strong>bipartite&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Set of visible units&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Set of hidden units&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>✅ Advantage&lt;/p>
&lt;ul>
&lt;li>No connection between hidden units&lt;/li>
&lt;li>Efficient training&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="energy-1">Energy&lt;/h3>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/v2-ede70fdae3090088792aab8607b3c2db_720w.jpg" alt="img" style="zoom:67%;" />
&lt;p>Energy:
&lt;/p>
$$
\begin{aligned}
E(v, h)
&amp;= -a^{\mathrm{T}} v-b^{\mathrm{T}} h-v^{\mathrm{T}} W h \\\\
&amp;= -\sum\_{i} a\_{i} v\_{i}-\sum\_{j} b\_{j} h\_{j}-\sum_{i} \sum_{j} v_{i} w_{i j} h_{j}
\end{aligned}
$$
&lt;p>
Probability of hidden unit:
&lt;/p>
$$
p\left(h\_{j}=1 \mid V\right)=\sigma\left(b\_{j}+\sum\_{i=1}^{m} W\_{i j} v\_{i}\right)
$$
&lt;p>
Probability of input vector:
&lt;/p>
$$
p\left(v\_{i} \mid H\right)=\sigma\left(a\_{i}+\sum\_{j=1}^{F} W\_{i j} h\_{j}\right)
$$
&lt;blockquote>
$$
> \sigma(x)=\frac{1}{1+e^{-x}}
> $$
&lt;/blockquote>
&lt;p>Free Energy:
&lt;/p>
$$
\begin{array}{l}
e^{-F(V)}=\sum\_{j=1}^{F} e^{-E(v, h)} \\\\
F(v)=-\sum\_{i=1}^{m} v\_{i} a\_{i}-\sum_{j=1}^{F} \log \left(1+e^{z_{j}}\right) \\\\
z_{j}=b\_{j}+\sum\_{i=1}^{m} W\_{i j} v\_{i}
\end{array}
$$</description></item><item><title>Restricted Boltzmann Machines (RBMs)</title><link>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/rbm/</link><pubDate>Sun, 16 Aug 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/deep-learning/unsupervised-learning/rbm/</guid><description>&lt;h2 id="definition">Definition&lt;/h2>
&lt;p>Invented by Geoffrey Hinton, a Restricted Boltzmann machine is an algorithm useful for&lt;/p>
&lt;ul>
&lt;li>dimensionality reduction&lt;/li>
&lt;li>classification&lt;/li>
&lt;li>regression&lt;/li>
&lt;li>collaborative filtering&lt;/li>
&lt;li>feature learning&lt;/li>
&lt;li>topic modeling&lt;/li>
&lt;/ul>
&lt;p>Given their relative simplicity and historical importance, restricted Boltzmann machines are the first neural network we’ll tackle.&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-yellow-100 dark:bg-yellow-900">
&lt;span class="pr-3 pt-1 text-red-400">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="M12 9v3.75m-9.303 3.376c-.866 1.5.217 3.374 1.948 3.374h14.71c1.73 0 2.813-1.874 1.948-3.374L13.949 3.378c-.866-1.5-3.032-1.5-3.898 0zM12 15.75h.007v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;em>While RBMs are occasionally used, most practitioners in the machine-learning community have deprecated them in favor of &lt;a href="https://wiki.pathmind.com/generative-adversarial-network-gan">generative adversarial networks or variational autoencoders&lt;/a>. RBMs are the Model T’s of neural networks – interesting for historical reasons, but surpassed by more up-to-date models.&lt;/em>&lt;/span>
&lt;/div>
&lt;h2 id="structure">Structure&lt;/h2>
&lt;p>RBMs are shallow, two-layer neural nets that constitute the building blocks of &lt;em>deep-belief networks&lt;/em>.&lt;/p>
&lt;ul>
&lt;li>The first layer of the RBM is called the &lt;strong>visible&lt;/strong>, or &lt;strong>input&lt;/strong>, layer.&lt;/li>
&lt;li>The second is the &lt;strong>hidden&lt;/strong> layer.&lt;/li>
&lt;/ul>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/two_layer_RBM.png" alt="two_layer_RBM" style="zoom: 70%;" />
&lt;p>Each circle in the graph above represents a neuron-like unit called a &lt;strong>node&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Nodes are simply where calculations take place&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Nodes are connected to each other across layers, but NO two nodes of the SAME layer are linked&lt;/p>
&lt;p>$\to$ NO intra-layer communication (&lt;em>restriction&lt;/em> in a restricted Boltzmann machine)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each node is a locus of computation that processes input, and begins by making &lt;a href="https://wiki.pathmind.com/glossary#stochasticgradientdescent">stochastic&lt;/a> decisions about whether to transmit that input or not&lt;/p>
&lt;blockquote>
&lt;p>&lt;em>Stochastic&lt;/em> means “randomly determined”, and in this case, the coefficients that modify inputs are randomly initialized.&lt;/p>
&lt;/blockquote>
&lt;/li>
&lt;/ul>
&lt;p>Each visible node takes a low-level feature from an item in the dataset to be learned.&lt;/p>
&lt;ul>
&lt;li>E.g., from a dataset of grayscale images, each visible node would receive one pixel-value for each pixel in one image. &lt;em>(MNIST images have 784 pixels, so neural nets processing them must have 784 input nodes on the visible layer.)&lt;/em>&lt;/li>
&lt;/ul>
&lt;h3 id="forward-pass">Forward pass&lt;/h3>
&lt;h4 id="one-input-path">One input path&lt;/h4>
&lt;p>Now let’s follow that single pixel value, &lt;em>x&lt;/em>, through the two-layer net. At node 1 of the hidden layer,&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/input_path_RBM.png" alt="input path RBM" style="zoom:80%;" />
&lt;ul>
&lt;li>x is multiplied by a &lt;em>weight&lt;/em> and added to a so-called &lt;em>bias&lt;/em>&lt;/li>
&lt;li>The result of those two operations is fed into an &lt;em>activation function&lt;/em>, which produces the node’s output&lt;/li>
&lt;/ul>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-fallback" data-lang="fallback">&lt;span class="line">&lt;span class="cl">activation f((weight w * input x) + bias b ) = output a
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h4 id="weighted-inputs-combine">Weighted inputs combine&lt;/h4>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/weighted_input_RBM.png" alt="weighted_input_RBM" style="zoom:80%;" />
&lt;ul>
&lt;li>Each x is multiplied by a separate weight&lt;/li>
&lt;li>The products are summed and added to a bias&lt;/li>
&lt;li>The result is passed through an activation function to produce the node’s output.&lt;/li>
&lt;/ul>
&lt;p>Because inputs from all visible nodes are being passed to all hidden nodes, an RBM can be defined as a &lt;strong>symmetrical bipartite graph&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>Symmetrical: each visible node is connected with each hidden node&lt;/li>
&lt;li>Bipartite: it has two parts, or layers, and the &lt;em>graph&lt;/em> is a mathematical term for a web of nodes&lt;/li>
&lt;/ul>
&lt;h4 id="multiple-inputs">Multiple inputs&lt;/h4>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/multiple_inputs_RBM.png" alt="multiple_inputs_RBM" style="zoom:80%;" />
&lt;ul>
&lt;li>At each hidden node, each input x is multiplied by its respective weight w.
&lt;ul>
&lt;li>12 weights altogether (4 input nodes x 3 hidden nodes)&lt;/li>
&lt;li>The weights between two layers will always form a matrix
&lt;ul>
&lt;li>#rows = #input nodes&lt;/li>
&lt;li>#columns = #output nodes&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>Each hidden node
&lt;ul>
&lt;li>receives the four inputs multiplied by their respective weights&lt;/li>
&lt;li>The sum of those products is again added to a bias (which forces at least some activations to happen)&lt;/li>
&lt;li>The result is passed through the activation algorithm producing one output for each hidden node&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h4 id="multiple-hidden-layers">Multiple hidden layers&lt;/h4>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/multiple_hidden_layers_RBM.png" alt="multiple_hidden_layers_RBM" style="zoom:80%;" />
&lt;p>If these two layers were part of a deeper neural network, the outputs of hidden layer no. 1 would be passed as inputs to hidden layer no. 2, and from there through as many hidden layers as you like until they reach a final classifying layer.&lt;/p>
&lt;p>(For simple feed-forward movements, the RBM nodes function as an &lt;em>autoencoder&lt;/em> and nothing more.)&lt;/p>
&lt;h2 id="reconstructions">Reconstructions&lt;/h2>
&lt;p>In this section, we’ll focus on how they learn to &lt;strong>reconstruct data by themselves&lt;/strong> in an unsupervised fashion, making several forward and backward passes between the visible layer and hidden layer no. 1 without involving a deeper network.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/reconstruction_RBM.png" alt="reconstruction_RBM" style="zoom:80%;" />
&lt;ul>
&lt;li>The activations of hidden layer no. 1 become the input in a backward pass.&lt;/li>
&lt;li>They are multiplied by the same weights, one per internode edge, just as x was weight-adjusted on the forward pass.&lt;/li>
&lt;li>The sum of those products is added to a visible-layer bias at each visible node&lt;/li>
&lt;li>The output of those operations is a &lt;strong>reconstruction&lt;/strong>; i.e. an approximation of the original input.&lt;/li>
&lt;/ul>
&lt;p>We can think of reconstruction error as the difference between the values of &lt;code>r&lt;/code> and the input values, and that error is then backpropagated against the RBM’s weights, again and again, in an iterative learning process until an error minimum is reached.&lt;/p>
&lt;h3 id="kullback-leibler-divergence">Kullback Leibler Divergence&lt;/h3>
&lt;p>On its forward pass, an RBM uses inputs to make predictions about node activations, or the &lt;a href="https://en.wikipedia.org/wiki/Bayes'_theorem">probability of output given a weighted x&lt;/a>: &lt;code>p(a|x; w)&lt;/code>.&lt;/p>
&lt;p>on its backward pass, an RBM is attempting to estimate the probability of inputs &lt;code>x&lt;/code> given activations &lt;code>a&lt;/code>, which are weighted with the &lt;em>same&lt;/em> coefficients as those used on the forward pass: &lt;code>p(x|a; w)&lt;/code>&lt;/p>
&lt;p>Together, those two estimates will lead us to the joint probability distribution of inputs &lt;em>x&lt;/em> and activations &lt;em>a&lt;/em>, or &lt;code>p(x, a)&lt;/code>.&lt;/p>
&lt;p>Reconstruction is making guesses about the probability distribution of the original input; i.e. the values of many varied points at once. And this is known as &lt;a href="http://cs229.stanford.edu/notes/cs229-notes2.pdf">generative learning&lt;/a>.&lt;/p>
&lt;p>Imagine that both the input data and the reconstructions are normal curves of different shapes, which only partially overlap. To measure the distance between its estimated probability distribution and the ground-truth distribution of the input, RBMs use &lt;strong>&lt;a href="https://www.quora.com/What-is-a-good-laymans-explanation-for-the-Kullback-Leibler-Divergence">Kullback Leibler Divergence&lt;/a>&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>KL-Divergence measures the non-overlapping, or diverging, areas under the two curves&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/KL_divergence_RBM.png" alt="KL_divergence_RBM" style="zoom:67%;" />
&lt;/li>
&lt;/ul>
&lt;p>An RBM’s optimization algorithm attempts to &lt;em>minimize&lt;/em> those areas so that the shared weights, when multiplied by activations of hidden layer one, produce a close approximation of the original input. By iteratively adjusting the weights according to the error they produce, an RBM learns to approximate the original data.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>The learning process looks like two probability distributions converging, step by step.&lt;/p>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/KLD_update_RBM.png" alt="KLD_update_RBM" style="zoom:67%;" />
&lt;/li>
&lt;/ul>
&lt;h2 id="probabilistic-view">Probabilistic View&lt;/h2>
&lt;p>For example, image datasets have unique probability distributions for their pixel values, depending on the kind of images in the set.&lt;/p>
&lt;p>Assuming an RBM that was only fed images of elephants and dogs, and which had only two output nodes, one for each animal.&lt;/p>
&lt;ul>
&lt;li>The question the RBM is asking itself on the forward pass is: Given these pixels, should my weights send a stronger signal to the elephant node or the dog node?&lt;/li>
&lt;li>The question the RBM asks on the backward pass is: Given an elephant, which distribution of pixels should I expect?&lt;/li>
&lt;/ul>
&lt;p>That’s joint probability: the simultaneous probability of &lt;em>x&lt;/em> given &lt;em>a&lt;/em> and of &lt;em>a&lt;/em> given &lt;em>x&lt;/em>, expressed as the &lt;strong>shared weights&lt;/strong> between the two layers of the RBM.&lt;/p>
&lt;p>The process of learning reconstructions is, in a sense, &lt;strong>learning which groups of pixels tend to co-occur for a given set of images.&lt;/strong> The activations produced by nodes of hidden layers deep in the network represent significant co-occurrences; e.g. “nonlinear gray tube + big, floppy ears + wrinkles” might be one.&lt;/p>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://wiki.pathmind.com/restricted-boltzmann-machine">A Beginner&amp;rsquo;s Guide to Restricted Boltzmann Machines (RBMs)&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="http://deeplearning.net/tutorial/rbm.html">Restricted Boltzmann Machines (RBM)&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://www.quora.com/What-is-a-good-laymans-explanation-for-the-Kullback-Leibler-divergence">What is a good layman&amp;rsquo;s explanation for the Kullback-Leibler divergence?&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul></description></item><item><title>Unsupervised Learning</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/</link><pubDate>Mon, 07 Sep 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/</guid><description/></item><item><title>Gaussian Mixture Model</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/gaussian-mixture-model/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/gaussian-mixture-model/</guid><description>&lt;h2 id="gaussian-distribution">Gaussian Distribution&lt;/h2>
&lt;p>&lt;strong>Univariate&lt;/strong>: The Probability Density Function (PDF) is:
&lt;/p>
$$
P(x | \theta)=\frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right)
$$
&lt;ul>
&lt;li>$\mu$: mean&lt;/li>
&lt;li>$\sigma$: standard deviation&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/gaussians.png" alt="gaussian mixture models">&lt;/p>
&lt;p>&lt;strong>Multivariate&lt;/strong>: The Probability Density Function (PDF) is:
&lt;/p>
$$
P(x | \theta)=\frac{1}{(2 \pi)^{\frac{D}{2}}|\Sigma|^{\frac{1}{2}}} \exp \left(-\frac{(x-\mu)^{T} \Sigma^{-1}(x-\mu)}{2}\right)
$$
&lt;ul>
&lt;li>$\mu$: mean&lt;/li>
&lt;li>$\Sigma$: covariance&lt;/li>
&lt;li>$D$: dimension of data&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/gaussians-3d-300x224.png" alt="gaussian mixture models">&lt;/p>
&lt;h3 id="learning">Learning&lt;/h3>
&lt;p>For univariate Gaussian model, we can use Maximum Likelihood Estimation (MLE) to estimate parameter $\theta$ :
&lt;/p>
$$
\theta= \underset{\theta}{\operatorname{argmax}} L(\theta)
$$
&lt;p>
Assuming data are i.i.d, we have:
&lt;/p>
$$
L(\theta)=\prod\_{j=1}^{N} P\left(x\_{j} | \theta\right)
$$
&lt;p>
For numerical stability, we usually use Maximum Log-Likelihood:
&lt;/p>
$$
\begin{align} \theta &amp;= \underset{\theta}{\operatorname{argmax}} L(\theta) \\\\
&amp;= \underset{\theta}{\operatorname{argmax}} \log(L(\theta)) \\\\
&amp;= \underset{\theta}{\operatorname{argmax}} \sum\_{j=1}^{N} \log P\left(x\_{j} | \theta\right)\end{align}
$$
&lt;h2 id="gaussian-mixture-model">Gaussian Mixture Model&lt;/h2>
&lt;p>A Gaussian mixture model is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. One can think of mixture models as generalizing k-means clustering to incorporate information about the covariance structure of the data as well as the centers of the latent Gaussians.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/mYN2Q9VqZH-gaussian-mixture-example.png" alt="A Gaussian mixture of three normal distributions.">&lt;/p>
&lt;p>Define:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>$x\_j$: the $j$-th observed data, $j=1, 2,\dots, N$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$K$: number of Gaussian model components&lt;/p>
&lt;/li>
&lt;li>
&lt;p>$\alpha\_k$: probability that the observed data belongs to the $k$-th model component&lt;/p>
&lt;ul>
&lt;li>$\alpha\_k \geq 0$&lt;/li>
&lt;li>$\displaystyle \sum\_{k=1}^{K}\alpha\_k=1$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>$\phi(x|\theta\_k)$: probability density function of the $k$-th model component&lt;/p>
&lt;ul>
&lt;li>$\theta\_k = (\mu\_k, \sigma\_k^2)$&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>$\gamma\_{jk}$: probability that the $j$-th obeserved data belongs to the $k$-th model component&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Probability density function of Gaussian mixture model:
&lt;/p>
$$
P(x | \theta)=\sum\_{k=1}^{K} \alpha\_{k} \phi\left(x | \theta\_{k}\right)
$$
&lt;p>
For this model, parameter is $\theta=\left(\tilde{\mu}\_{k}, \tilde{\sigma}\_{k}, \tilde{\alpha}\_{k}\right)$.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="expectation-maximum-em">Expectation-Maximum (EM)&lt;/h2>
&lt;blockquote>
&lt;p>&lt;em>Expectation-Maximization (EM) is a statistical algorithm for finding the right model parameters. We typically use EM when the data has missing values, or in other words, when the data is incomplete.&lt;/em>&lt;/p>
&lt;/blockquote>
&lt;p>These missing variables are called &lt;strong>latent variables&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>&lt;em>NEVER&lt;/em> observed&lt;/li>
&lt;li>We do &lt;em>NOT&lt;/em> know the correct values in advance&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Since we do not have the values for the latent variables, Expectation-Maximization tries to use the existing data to determine the optimum values for these variables and then finds the model parameters.&lt;/strong> Based on these model parameters, we go back and update the values for the latent variable, and so on.&lt;/p>
&lt;p>The Expectation-Maximization algorithm has two steps:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>E-step:&lt;/strong> In this step, the available data is used to estimate (guess) the values of the missing variables&lt;/li>
&lt;li>&lt;strong>M-step:&lt;/strong> Based on the estimated values generated in the E-step, the complete data is used to update the parameters&lt;/li>
&lt;/ul>
&lt;h3 id="em-in-gaussian-mixture-model">EM in Gaussian Mixture Model&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Initialize the parameters ($K$ Gaussian distributionw with the mean $\mu\_1, \mu\_2,\dots,\mu\_k$ and covariance $\Sigma\_1, \Sigma\_2, \dots, \Sigma\_k$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Repeat&lt;/p>
&lt;ul>
&lt;li>&lt;strong>E-step&lt;/strong>: For each point $x\_j$, calculate the probability that it belongs to cluster/distribution $k$&lt;/li>
&lt;/ul>
$$
\begin{align}
\gamma\_{j k} &amp;= \frac{\text{Probability } x\_j \text{ belongs to cluster } k}{\text{Sum of probability } x\_j \text{ belongs to cluster } 1, 2, \dots, k} \\\\
&amp;= \frac{\alpha\_{k} \phi\left(x\_{j} | \theta\_{k}\right)}{\sum\_{k=1}^{K} \alpha\_{k} \phi\left(x\_{j} | \theta\_{k}\right)}\qquad j=1,2, \ldots, N ; k=1,2 \ldots, K
\end{align}
$$
&lt;p>​ The value will be high when the point is assigned to the right cluster and lower otherwise&lt;/p>
&lt;ul>
&lt;li>&lt;strong>M-step&lt;/strong>: update parameters&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
$$
\alpha\_k = \frac{\text{Number of points assigned to cluster } k}{\text{Total number of points}} = \frac{\sum\_{j=1}^{N} \gamma\_{j k}}{N} \qquad k=1,2, \ldots, K
$$
$$
\mu\_{k}=\frac{\sum\_{j}^{N}\left(\gamma\_{j k} x\_{j}\right)}{\sum\_{j}^{N} \gamma\_{j k}}\qquad k=1,2, \ldots, K
$$
$$
\Sigma\_{k}=\frac{\sum\_{j}^{N} \gamma\_{j k}\left(x\_{j}-\mu\_{k}\right)\left(x\_{j}-\mu\_{k}\right)^{T}}{\sum\_{j}^{N} \gamma\_{j k}} \qquad k=1,2, \ldots, K
$$
&lt;p>until convergence ($\left\|\theta\_{i+1}-\theta\_{i}\right\|&lt;\varepsilon$)&lt;/p>
&lt;p>Visualization:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/ek1bu6ogj2-em_clustering_of_old_faithful_data.gif" alt="The EM algorithm updating the parameters of a two-component bivariate Gaussian mixture model.">&lt;/p>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://zhuanlan.zhihu.com/p/30483076">https://zhuanlan.zhihu.com/p/30483076&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://www.analyticsvidhya.com/blog/2019/10/gaussian-mixture-models-clustering/">https://www.analyticsvidhya.com/blog/2019/10/gaussian-mixture-models-clustering/&lt;/a>&lt;/li>
&lt;li>&lt;a href="http://blog.pluskid.org/?p=39">http://blog.pluskid.org/?p=39&lt;/a> 👍&lt;/li>
&lt;/ul></description></item><item><title>Principle Components Analysis (PCA)</title><link>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/pca/</link><pubDate>Sat, 07 Nov 2020 00:00:00 +0000</pubDate><guid>https://haobin-tan.netlify.app/docs/ai/machine-learning/unsupervised/pca/</guid><description>&lt;h2 id="tldr">TL;DR&lt;/h2>
&lt;p>The usual procedure to calculate the $d$-dimensional principal component analysis consists of the following steps:&lt;/p>
&lt;ol start="0">
&lt;li>
&lt;p>Calculate&lt;/p>
&lt;ul>
&lt;li>
&lt;p>average
&lt;/p>
$$
\bar{m}=\sum\_{i=1}^{N} m_{i} \in \mathbb{R}
$$
&lt;/li>
&lt;li>
&lt;p>data matrix
&lt;/p>
$$
\mathbf{M}=\left(m\_{1}-\bar{m}, \ldots, m\_{N}-\bar{m}\right) \in \mathbb{R}^{d \times \mathrm{N}}
$$
&lt;/li>
&lt;li>
&lt;p>scatter matrix (covariance matrix)
&lt;/p>
$$
\mathbf{S}=\mathbf{M M}^{\mathrm{T}} \in \mathbb{R}^{d \times d}
$$
&lt;/li>
&lt;/ul>
&lt;p>of all feature vectors $m\_{1}, \ldots, m\_{N}$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Calculate the normalized ($\\|\cdot\\|=1$) eigenvectors $\mathbf{e}\_1, \dots, \mathbf{e}\_d$ and sort them such that the corresponding eigenvalues $\lambda\_1, \dots, \lambda\_d$ are decreasing, i.e. $\lambda\_1 > \lambda\_2 > \dots > \lambda\_d$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Construct a matrix
&lt;/p>
$$
\mathbf{A}:=\left(e\_{1}, \ldots, e\_{d^{\prime}}\right) \in \mathbb{R}^{d \times d^{\prime}}
$$
&lt;p>
with the first $d^{\prime}$ eigenvectors as its columns&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Transform each feature vector $m\_i$ into a new feature vector
&lt;/p>
$$
\mathrm{m}\_{\mathrm{i}}^{\prime}=\mathrm{A}^{\mathrm{T}}\left(\mathrm{m}\_{\mathrm{i}}-\overline{\mathrm{m}}\right) \quad \text { for } i=1, \ldots, N
$$
&lt;p>
of smaller dimension $d^{\prime}$&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="dimensionality-reduction">Dimensionality reduction&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>Goal: represent instances with fewer variables&lt;/p>
&lt;ul>
&lt;li>Try to preserve as much structure in the data as possible&lt;/li>
&lt;li>Discriminative: only structure that affects class separability&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Feature selection&lt;/p>
&lt;ul>
&lt;li>Pick a subset of the original dimensions&lt;/li>
&lt;li>Discriminative: pick good class &amp;ldquo;predictors&amp;rdquo;&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Feature extraction&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Construct a new set of dimensions
&lt;/p>
$$
E\_{i} = f(X\_1 \dots X\_d)
$$
&lt;ul>
&lt;li>$X\_1, \dots, X\_d$: features&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>(Linear) combinations of original&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="direction-of-greatest-variance">Direction of greatest variance&lt;/h2>
&lt;ul>
&lt;li>Define a set of principal components
&lt;ul>
&lt;li>1st: direction of the &lt;strong>greatest variability&lt;/strong> in the data (i.e. Data points are spread out as far as possible)&lt;/li>
&lt;li>2nd: &lt;em>perpendicular&lt;/em> to 1st, greatest variability of what&amp;rsquo;s left&lt;/li>
&lt;li>&amp;hellip;and so on until $d$ (original dimensionality)&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>First $m \ll d$ components become $m$ dimensions
&lt;ul>
&lt;li>Change coordinates of every data point to these dimensions&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-06%2023.51.17.png" alt="截屏2021-02-06 23.51.17">&lt;/p>
&lt;div class="flex px-4 py-3 mb-6 rounded-md bg-primary-100 dark:bg-primary-900">
&lt;span class="pr-3 pt-1 text-primary-600 dark:text-primary-300">
&lt;svg height="24" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">&lt;path fill="none" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5" d="m11.25 11.25l.041-.02a.75.75 0 0 1 1.063.852l-.708 2.836a.75.75 0 0 0 1.063.853l.041-.021M21 12a9 9 0 1 1-18 0a9 9 0 0 1 18 0m-9-3.75h.008v.008H12z"/>&lt;/svg>
&lt;/span>
&lt;span class="dark:text-neutral-300">&lt;p>Q: Why greatest variablility?&lt;/p>
&lt;p>A: If you pick the dimension with the highest variance, that will preserve the distances as much as possible&lt;/p>
&lt;/span>
&lt;/div>
&lt;h2 id="how-to-pca">How to PCA?&lt;/h2>
&lt;ol>
&lt;li>
&lt;p>&amp;ldquo;Center&amp;rdquo; the data at zero (subtract mean from each attribute)
&lt;/p>
$$
x\_{i, a} = x\_{i, a} - \mu
$$
&lt;/li>
&lt;li>
&lt;p>Compute covariance matrix $\Sigma$&lt;/p>
&lt;blockquote>
&lt;p>The &lt;strong>covariance&lt;/strong> between two attributes is an indication of whether they change together (positive correlation) or in opposite directions (negative correlation).&lt;/p>
&lt;p>For example, $cov(x\_1, x\_2) = 0.8 > 0 \Rightarrow$ When $x\_1$ increases/decreases, $x\_2$ also increases/decreases.&lt;/p>
&lt;/blockquote>
$$
cov(b, a) = \frac{1}{n} \sum\_{i=1}^{n} x\_{ib} x\_{ia}
$$
&lt;/li>
&lt;li>
&lt;p>We want vectors $\mathbf{e}$ which aren&amp;rsquo;t turned by covariance matrix $\Sigma$:
&lt;/p>
$$
\Sigma \mathbf{e} = \lambda \mathbf{e}
$$
&lt;p>
$\Rightarrow$ $\mathbf{e}$ are eigenvectors of $\Sigma$, and $\lambda$ are corresponding eigenvalues&lt;/p>
&lt;p>&lt;strong>Principle components = eigenvectors with largest eigenvalues&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h3 id="finding-principle-components">Finding principle components&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>Find eigenvalues by solving &lt;a href="https://en.wikipedia.org/wiki/Characteristic_polynomial">Characteristic Polynomial&lt;/a>
&lt;/p>
$$
\operatorname{det}(\Sigma - \lambda \mathbf{I}) = 0
$$
&lt;ul>
&lt;li>$\mathbf{I}$: Identity matrix&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Find $i$-th eigenvector by solving
&lt;/p>
$$
\Sigma \mathbf{e}\_i = \lambda\_i \mathbf{e}\_i
$$
&lt;p>
and we want $\mathbf{e}\_{i}$ to have unit length ($\\|\mathbf{e}\_{i}\\| = 1$)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Eigenvector with the largest eigenvalue will be the first principle component, eigenvector with the second largest eigenvalue will be the second priciple component, so on and so on.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;details>
&lt;summary>Example&lt;/summary>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/截屏2021-02-07%2000.21.08.png" alt="截屏2021-02-07 00.21.08" style="zoom:67%;" />
&lt;/details>
&lt;h3 id="projecting-to-new-dimension">Projecting to new dimension&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>We pick $m&lt;d$ Eigenvectors $\mathbf{e}\_1, \dots, \mathbf{e}\_m$ with the biggest Eigenvalues. Now $\mathbf{e}\_1, \dots, \mathbf{e}\_m$ are the new dimension vectors&lt;/p>
&lt;/li>
&lt;li>
&lt;p>For instance $\mathbf{x} = \{x\_1, \dots, x\_d\}$ (original coordinates), we want new coordinates $\mathbf{x}^{\prime} = \{x^{\prime}\_1, \dots, x^{\prime}\_d\}$&lt;/p>
&lt;ul>
&lt;li>&amp;ldquo;Center&amp;rdquo; the instance (subtract the mean): $\mathbf{x} - \mathbf{\mu}$&lt;/li>
&lt;li>&amp;ldquo;Project&amp;rdquo; to each dimension: $(\mathbf{x} - \mathbf{\mu})^T \mathbf{e}\_j$ for $j=1, \dots, m$&lt;/li>
&lt;/ul>
&lt;details>
&lt;summary>Example&lt;/summary>
&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/PCA.png" alt="PCA" style="zoom:80%;" />
&lt;/details>
&lt;/li>
&lt;/ul>
&lt;h2 id="go-deeper-in-details">Go deeper in details&lt;/h2>
&lt;h3 id="why-eigenvectors--greatest-variance">Why eigenvectors = greatest variance?&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/cIE2MDxyf80?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="why-eigenvalue--variance-along-eigenvector">Why eigenvalue = variance along eigenvector?&lt;/h3>
&lt;div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
&lt;iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="allowfullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/tL0wFZ9aJP8?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"
>&lt;/iframe>
&lt;/div>
&lt;h3 id="how-many-dimensions-should-we-reduce-to">How many dimensions should we reduce to?&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Now we have eigenvectors $\mathbf{e}\_1, \dots, \mathbf{e}\_d$ and we want new dimension $m \ll d$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>We pick $\mathbf{e}\_i$ that &amp;ldquo;explain&amp;rdquo; the most variance:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Sort eigenvectors s.t. $\lambda\_1 \geq \dots \geq \lambda\_d$&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Pick first $m$ eigenvectors which explain 90% or the total variance (typical threshold values: 0.9 or 0.95)&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2013.06.46.png" alt="截屏2021-02-07 13.06.46">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Or we can use a scree plot&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="pca-in-a-nutshell">PCA in a nutshell&lt;/h2>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2013.09.32.png" alt="截屏2021-02-07 13.09.32">&lt;/p>
&lt;h2 id="pca-example-eigenfaces">PCA example: Eigenfaces&lt;/h2>
&lt;p>Perform PCA on bitmap images of human faces:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.22.02.png" alt="截屏2021-02-07 16.22.02">&lt;/p>
&lt;p>Belows are the eigenvectors after we perform PCA on the dataset:&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.25.01.png" alt="截屏2021-02-07 16.25.01">&lt;/p>
&lt;p>Then we can project new face to space of eigen-faces, and represent vector of new face as a linear combination of principle components.&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.24.28.png" alt="截屏2021-02-07 16.24.28">&lt;/p>
&lt;p>As we use more and more eigenvectors in this decomposition, we can end up with a face that looks more and more like the original guy&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.33.28.png" alt="截屏2021-02-07 16.33.28">&lt;/p>
&lt;details>
&lt;summary>Why is eigenface neat and interesting?&lt;/summary>
&lt;ul>
&lt;li>This is neat because by taking the first few eigenvectors you can get a pretty close representation of the face. Suppose that this corresponds to maybe 20 eigenvectors. &lt;strong>This means you&amp;rsquo;re using only 20 numbers to represent a face bitmap which looks kind of like the original guy!&lt;/strong> Can you use only 20 pixels to represent him nearly? No, there&amp;rsquo;s no way!&lt;/li>
&lt;li>You&amp;rsquo;re effectively picking 20 numbers/mixture coefficients/coordinates. One really nice way to use this is you can use this for &lt;strong>massive compression&lt;/strong> of the data. If you communicate to others if they all have access to the same eigenvectors, all they need to send between each other are just the projection coordinates. Then they can transmit arbitrary faces between them. This is massive reduction in the size of data.&lt;/li>
&lt;li>Your classifier or your regression system now operate in low dimensional space. So they have plenty of redundancy to grab on to and learn a better hyperplane. &amp;#x1f44f;&lt;/li>
&lt;/ul>
&lt;/details>
&lt;h3 id="application-of-eigenface">Application of eigenface&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>Face similarity&lt;/p>
&lt;ul>
&lt;li>in the reduced space&lt;/li>
&lt;li>insensitive to lighting expression, orientation&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>Projecting new &amp;ldquo;faces&amp;rdquo;&lt;/p>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2016.49.58.png" alt="截屏2021-02-07 16.49.58">&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="pratical-issues-of-pca">Pratical issues of PCA&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>PCA is based on covariance matrix and covariance is extremely sensitive to large values&lt;/p>
&lt;ul>
&lt;li>
&lt;p>E.g. multiple some dimension by 1000. Then this dimension dominates covariance and become a principle component.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Solution: normalize each dimension to zero mean and unit variacne
&lt;/p>
$$
x^{\prime} = \frac{x - \text{mean}}{\text{standard deviation}}
$$
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>
&lt;p>PCA assumes underlying subspace is linear.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>PCA can sometimes hurt the performace of classification&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Because PCA doesn&amp;rsquo;t see the labels&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Solution: &lt;a href="https://haobin-tan.netlify.app/docs/ai/machine-learning/non-parametric/lda-summary/">Linear Discriminant Analysis (LDA)&lt;/a>&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Picks a new dimension that gives&lt;/p>
&lt;ul>
&lt;li>maximum separation between means of prejected classes&lt;/li>
&lt;li>minimum variance within each projected class&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://raw.githubusercontent.com/EckoTan0804/upic-repo/master/uPic/%E6%88%AA%E5%B1%8F2021-02-07%2017.23.36.png" alt="截屏2021-02-07 17.23.36">&lt;/p>
&lt;/li>
&lt;li>
&lt;p>But this relies on some assumptions of the data and does not always work. 🤪&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h2 id="reference">Reference&lt;/h2>
&lt;ul>
&lt;li>&lt;a href="https://www.youtube.com/watch?v=IbE0tbjy6JQ&amp;amp;list=PLBv09BD7ez_5_yapAg86Od6JeeypkS4YM&amp;amp;index=1">Principle Component Analysis&lt;/a>: a great series of video tutorials explaining PCA clearly 👍&lt;/li>
&lt;/ul></description></item></channel></rss>