Nerual Network Basics

Softmax and Its Derivative

Softmax We use softmax activation function to predict the probability assigned to $n$ classes. For example, the probability of assigning input sample to $j$-th class is: $$ p\_j = \operatorname{softmax}(z\_j) = \frac{e^{z\_j}}{\sum\_{k=1}^n e^{z\_k}} $$ Furthermore, we use One-Hot encoding to represent the groundtruth $y$, which means $$ \sum\_{k=1}^n y\_k = 1 $$ Loss function (Cross-Entropy): $$ \begin{aligned} L &= -\sum\_{k=1}^n y\_k \log(p\_k) \\\\ &= - \left(y\_j \log(p\_j) + \sum\_{k \neq j}y\_k \log(p\_k)\right) \end{aligned} $$ Gradient w.

2020-09-08

Perceptron

Structure A perceptron is a single-layer neural network used for supervised learning of binary classifiers Perceptron $$ g(x) = \underbrace{\sum\_{i=0}^n w\_i x\_i}\_{\text{linear separator}} + \underbrace{w\_0}\_{\text{offset/bias}} $$ Decision for classification $$ \hat{y} = \begin{cases} 1 &\text{if } g(x) > 0 \\\\ -1 &\text{else}\end{cases} $$ Update Rule $w=w+y x$ if prediction is wrong

2020-09-01

Generalization

Generalization: Ability to Apply what was learned during Training to new (Test) Data Reasons for bad generalization Overfitting/Overtraining (trained too long) Too little training material Too many Parameters (weights) or inappropriate network architecture error Prevent Overfitting The obviously best approach: Collect More Data!

2020-08-17

Multilayer Perceptron and Backpropagation

Multi-Layer Perceptron (MLP) Input layer $I \in R^{D\_{I} \times N}$ How we initially represent the features Mini-batch processing with $N$ inputs Weight matrices Input to Hidden: $W\_{H} \in R^{D\_{I} \times D\_{H}}$

2020-08-17

👍 Loss Functions

Quantifies what it means to have a “good” model Different types of loss functions for different tasks, such as: Classification Regression Metric Learning Reinforcement Learning Classification Classification: Predicting a discrete class label

2020-08-17

👍 Activation Functions

Activation functions should be non-linear differentiable (since training with Backpropagation) Q: Why can’t the mapping between layers be linear? A: Compositions of linear functions is still linear, whole network collapses to regression.

2020-08-17

👍 Data Augmentation

Motivation Overfitting happens because of having too few examples to train on, resulting in a model that has poor generalization performance 😢. If we had infinite training data, we wouldn’t overfit because we would see every possible instance.

2020-08-16

Dropout

Model Overfitting In order to give more “capacity” to capture different features, we give neural nets a lot of neurons. But this can cause overfitting. Reason: Co-adaptation Neurons become dependent on others Imagination: neuron $H\_i$ captures a particular feature $X$ which however, is very frequenly seen with some inputs.

2020-08-16

Neural Network Basics

2020-07-31