Vectorization Consider a single layer in MLP: x_1\sumx_1x_1x_0\sigmaz_jy_jw_{0j}w_{1j}w_{2j}w_{nj}Viewer does not support full SVG 1.1 $$ \begin{aligned} y\_j &= \sum\_{i=0}^{n} w\_{ij}x\_i \\\\ z\_j &= \sigma(y\_j) = \frac{1}{1 + e^{-x}} \end{aligned} $$ Naive implementation:
2020-08-16
2020-08-16
Motivation Ensure shift-invariance The model should produce the same output regardless of the position of the considering object Overview Multilayer Neural Network: Nonlinear Classifier Consider Context (Receptive Field) Shift-Invariant Learning
2020-08-16
Definition Invented by Geoffrey Hinton, a Restricted Boltzmann machine is an algorithm useful for dimensionality reduction classification regression collaborative filtering feature learning topic modeling Given their relative simplicity and historical importance, restricted Boltzmann machines are the first neural network we’ll tackle.
2020-08-16
Supervised vs. Unsupervised Learning Supervised vs. unsupervised Supervised learning Given data $(X, Y)$ Estimate the posterior $P(Y|X)$ Unsupervised learning Concern with the structure (unseen) of the data Try to estimate (implicitly or explicitly) the data distribution $P(X)$ Auto-Encoder structure In supervised learning, the hidden layers encapsulate the features useful for classification.
2020-08-16
2020-08-16
TL;DR Problem: During training, updating a lower layer changes the input distribution for the next layer → next layer constantly needs to adapt to changing inputs 💡Idea: mean/variance normalization step between layers
2020-08-16
2020-08-16
Motivation Overfitting happens because of having too few examples to train on, resulting in a model that has poor generalization performance 😢. If we had infinite training data, we wouldn’t overfit because we would see every possible instance.
2020-08-16
Model Overfitting In order to give more “capacity” to capture different features, we give neural nets a lot of neurons. But this can cause overfitting. Reason: Co-adaptation Neurons become dependent on others Imagination: neuron $H\_i$ captures a particular feature $X$ which however, is very frequenly seen with some inputs.
2020-08-16