Deep Learning

Bolzmann Machine

Boltzmann Machine Stochastic recurrent neural network Introduced by Hinton and Sejnowski Learn internal representations Problem: unconstrained connectivity Representation Model can be represented by Graph: Undirected graph Nodes: States Edges: Dependencies between states

2020-08-18

Hopfield Nets

Binary Hopfield Nets Basic Structure: Binary Unit Single layer of processing units Each unit $i$ has an activity value or “state” $u\_i$ Binary: $-1$ or $1$ Denoted as $+$ and $–$ respectively Example

2020-08-18

Generalization

Generalization: Ability to Apply what was learned during Training to new (Test) Data Reasons for bad generalization Overfitting/Overtraining (trained too long) Too little training material Too many Parameters (weights) or inappropriate network architecture error Prevent Overfitting The obviously best approach: Collect More Data!

2020-08-17

Multilayer Perceptron and Backpropagation

Multi-Layer Perceptron (MLP) Input layer $I \in R^{D\_{I} \times N}$ How we initially represent the features Mini-batch processing with $N$ inputs Weight matrices Input to Hidden: $W\_{H} \in R^{D\_{I} \times D\_{H}}$

2020-08-17

👍 Loss Functions

Quantifies what it means to have a “good” model Different types of loss functions for different tasks, such as: Classification Regression Metric Learning Reinforcement Learning Classification Classification: Predicting a discrete class label

2020-08-17

👍 Activation Functions

Activation functions should be non-linear differentiable (since training with Backpropagation) Q: Why can’t the mapping between layers be linear? A: Compositions of linear functions is still linear, whole network collapses to regression.

2020-08-17

👍 Attention

Core Idea The main assumption in sequence modelling networks such as RNNs, LSTMs and GRUs is that the current state holds information for the whole of input seen so far. Hence the final state of a RNN after reading the whole input sequence should contain complete information about that sequence.

2020-08-16

Sequence to Sequence

Language Modeling Language model is a particular model calculating the probability of a sequence $$ \begin{aligned} P(W) &= P(W\_1 W\_2 \dots W\_n) \\\\ &= P\left(W\_{1}\right) P\left(W_{2} \mid W\_{1}\right) P\left(W\_{3} \mid W\_{1} W\_{2}\right) \ldots P\left(W\_{n} \mid W\_{1 \ldots n-1}\right) \end{aligned} $$ Softmax layer

2020-08-16

Encoder-Decoder Models

2020-08-16

Recurrent Neural Networks

For detailed explanation and summary see: RNN Summary Overview Specifically designed for long-range dependency 💡 Main idea: connecting the hidden states together within a layer Simple RNNs Elman Networks The output of the hidden layer is used as input for the next time step They use a copy mechanism.

2020-08-16