Logistic Regression: Summry
Supervised classification
Input:
Output:
Parameters:
- Weight:
- Bias
Prediction
Training/Learning
Loss function
For a single sample
And we define
The probability of correct prediction can thus be expressed as: We want to maximize is called the Cross-Entropy loss.
For a mini-batch of samples of size
: -th Training sample
Loss function is the average loss for each example
Algorithm: Gradient descent
Gradient for single sample:
Gradient for mini-batch:
- : -th feature of the -th sample
Multinomial Logistic Regression
Also called Softmax regression, MaxEnt classifier
Softmax function
Compute the probability of being in each potential class , , using softmax function:
Prediction
Learning
For a singnle sample , the loss function is
- : evaluates to if the condition in the brackets is true and to otherwise.
Gradient