Objective Function
How does the objective function look like?
Objective function:
$$ \operatorname{Obj}(\Theta)= \overbrace{L(\Theta)}^{\text {Training Loss}} + \underbrace{\Omega(\Theta)}_{\text{Regularization}} $$Training loss: measures how well the model fit on training data
$$ L=\sum_{i=1}^{n} l\left(y_{i}, g_{i}\right) $$- Square loss: $$ l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2 $$
- Logistic loss: $$ l(y_i, \hat{y}_i) = y_i \log(1 + e^{-\hat{y}_i}) + (1 - y_i) \log(1 + e^{\hat{y}_i}) $$
Regularization: How complicated is the model?
- $L_2$ norm (Ridge): $\omega(w) = \lambda \|w\|^2$
- $L_1$ norm (Lasso): $\omega(w) = \lambda \|w\|$
Objective Function | Linear model? | Loss | Regularization | |
---|---|---|---|---|
Ridge regression | $\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|^{2}$ | ✅ | square | $L_2$ |
Lasso regression | $\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|$ | ✅ | square | $L_1$ |
Logistic regression | $\sum_{i=1}^{n}\left[y_{i} \cdot \ln \left(1+e^{-w^{\top} x_{i}}\right)+\left(1-y_{i}\right) \cdot \ln \left(1+e^{w^{\top} x_{i}}\right)\right]+\lambda\|w\|^{2}$ | ✅ | logistic | $L_2$ |
Why do we want to contain two component in the objective?
Optimizing training loss encourages predictive models
- Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution
Optimizing regularization encourages simple models
- Simpler models tends to have smaller variance in future predictions, making prediction stable