Objective Function

Objective Function

How does the objective function look like?

Objective function:

Obj⁑(Θ)=L(Θ)⏞Training Loss+Ξ©(Θ)⏟Regularization \operatorname{Obj}(\Theta)= \overbrace{L(\Theta)}^{\text {Training Loss}} + \underbrace{\Omega(\Theta)}_{\text{Regularization}}
  • Training loss: measures how well the model fit on training data

    L=βˆ‘i=1nl(yi,gi) L=\sum_{i=1}^{n} l\left(y_{i}, g_{i}\right)
    • Square loss: l(yi,y^i)=(yiβˆ’y^i)2 l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2
    • Logistic loss: l(yi,y^i)=yilog⁑(1+eβˆ’y^i)+(1βˆ’yi)log⁑(1+ey^i) l(y_i, \hat{y}_i) = y_i \log(1 + e^{-\hat{y}_i}) + (1 - y_i) \log(1 + e^{\hat{y}_i})
  • Regularization: How complicated is the model?

    • L2L_2 norm (Ridge): Ο‰(w)=Ξ»βˆ₯wβˆ₯2\omega(w) = \lambda \|w\|^2
    • L1L_1 norm (Lasso): Ο‰(w)=Ξ»βˆ₯wβˆ₯\omega(w) = \lambda \|w\|
Objective FunctionLinear model?LossRegularization
Ridge regressionβˆ‘i=1n(yiβˆ’w⊀xi)2+Ξ»βˆ₯wβˆ₯2\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|^{2}βœ…squareL2L_2
Lasso regressionβˆ‘i=1n(yiβˆ’w⊀xi)2+Ξ»βˆ₯wβˆ₯\sum_{i=1}^{n}\left(y_{i}-w^{\top} x_{i}\right)^{2}+\lambda\|w\|βœ…squareL1L_1
Logistic regressionβˆ‘i=1n[yiβ‹…ln⁑(1+eβˆ’w⊀xi)+(1βˆ’yi)β‹…ln⁑(1+ew⊀xi)]+Ξ»βˆ₯wβˆ₯2\sum_{i=1}^{n}\left[y_{i} \cdot \ln \left(1+e^{-w^{\top} x_{i}}\right)+\left(1-y_{i}\right) \cdot \ln \left(1+e^{w^{\top} x_{i}}\right)\right]+\lambda\|w\|^{2}βœ…logisticL2L_2

Why do we want to contain two component in the objective?

  • Optimizing training loss encourages predictive models

    • Fitting well in training data at least get you close to training data which is hopefully close to the underlying distribution
  • Optimizing regularization encourages simple models

    • Simpler models tends to have smaller variance in future predictions, making prediction stable