Boltzmann Machine Stochastic recurrent neural network Introduced by Hinton and Sejnowski Learn internal representations Problem: unconstrained connectivity Representation Model can be represented by Graph: Undirected graph Nodes: States Edges: Dependencies between states
2020-08-18
Binary Hopfield Nets Basic Structure: Binary Unit Single layer of processing units Each unit $i$ has an activity value or βstateβ $u\_i$ Binary: $-1$ or $1$ Denoted as $+$ and $β$ respectively Example
2020-08-18
Generalization: Ability to Apply what was learned during Training to new (Test) Data Reasons for bad generalization Overfitting/Overtraining (trained too long) Too little training material Too many Parameters (weights) or inappropriate network architecture error Prevent Overfitting The obviously best approach: Collect More Data!
2020-08-17
Multi-Layer Perceptron (MLP) Input layer $I \in R^{D\_{I} \times N}$ How we initially represent the features Mini-batch processing with $N$ inputs Weight matrices Input to Hidden: $W\_{H} \in R^{D\_{I} \times D\_{H}}$
2020-08-17
Quantifies what it means to have a βgoodβ model Different types of loss functions for different tasks, such as: Classification Regression Metric Learning Reinforcement Learning Classification Classification: Predicting a discrete class label
2020-08-17
Activation functions should be non-linear differentiable (since training with Backpropagation) Q: Why canβt the mapping between layers be linear? A: Compositions of linear functions is still linear, whole network collapses to regression.
2020-08-17
Core Idea The main assumption in sequence modelling networks such as RNNs, LSTMs and GRUs is that the current state holds information for the whole of input seen so far. Hence the final state of a RNN after reading the whole input sequence should contain complete information about that sequence.
2020-08-16
Language Modeling Language model is a particular model calculating the probability of a sequence $$ \begin{aligned} P(W) &= P(W\_1 W\_2 \dots W\_n) \\\\ &= P\left(W\_{1}\right) P\left(W_{2} \mid W\_{1}\right) P\left(W\_{3} \mid W\_{1} W\_{2}\right) \ldots P\left(W\_{n} \mid W\_{1 \ldots n-1}\right) \end{aligned} $$ Softmax layer
2020-08-16
2020-08-16
For detailed explanation and summary see: RNN Summary Overview Specifically designed for long-range dependency π‘ Main idea: connecting the hidden states together within a layer Simple RNNs Elman Networks The output of the hidden layer is used as input for the next time step They use a copy mechanism.
2020-08-16