Polynomial Regression

Polynomial Regression (Generalized linear regression models)

💡Idea

Use a linear model to fit nonlinear data: add powers of each feature as new features, then train a linear model on this extended set of features.

Generalize Linear Regression to Polynomial Regression

In Linear Regression ff is modelled as linear in x\boldsymbol{x} and w\boldsymbol{w}

f(x)=x^Tw f(x) = \hat{\boldsymbol{x}}^T \boldsymbol{w}

Rewrite it more generally:

f(x)=ϕ(x)Tw f(x) = \phi(\boldsymbol{x})^T \boldsymbol{w}

  • ϕ(x)\phi(\boldsymbol{x}): vector valued funtion of the input vector x\boldsymbol{x} (also called “linear basis function models”)
    • ϕi(x)\phi_i(\boldsymbol{x}): basis functions

In principle, this allows us to learn any non-linear function, if we know suitable basis functions (which is typically not the case 🤪).

Example 1

x=[x1x2]R2\boldsymbol{x}=\left[\begin{array}{c}{x_1 \\ x_2}\end{array}\right] \in \mathbb{R}^{2}

ϕ:R2R3,[x1x2][1x1x2]\phi: \mathbb{R}^2 \to \mathbb{R}^3, \left[\begin{array}{c}{x_1 \\ x_2}\end{array}\right] \mapsto \left[\begin{array}{c}{1 \\ x_1 \\ x_2}\end{array}\right]\qquad (I.e.: ϕ1(x)=1,ϕ2(x)=x1,ϕ3(x)=x2\phi_1(\boldsymbol{x}) = 1, \phi_2(\boldsymbol{x}) = x_1, \phi_3(\boldsymbol{x}) = x_2)

Pol_Reg_Example_2

Example 2

x=[x1x2]R2\boldsymbol{x}=\left[\begin{array}{c}{x_1 \\ x_2}\end{array}\right] \in \mathbb{R}^{2}

ϕ:R2R5,[x1x2][1x1x2x12x22]\phi: \mathbb{R}^2 \to \mathbb{R}^5, \left[\begin{array}{c}{x_1 \\ x_2}\end{array}\right] \mapsto \left[\begin{array}{c}{1 \\ x_1 \\ x_2 \\ x_1^2 \\ x_2^2}\end{array}\right]

Pol_Reg_Example_3

Optimal value of w\boldsymbol{w}

w=(ΦTΦ)1ΦTy,Φ=[ϕ1TϕnT] \boldsymbol{w}^{*}=\left(\boldsymbol{\Phi}^{T} \boldsymbol{\Phi}\right)^{-1} \boldsymbol{\Phi}^{T} \boldsymbol{y}, \qquad \mathbf{\Phi}=\left[\begin{array}{c}{\phi_{1}^{T}} \\\\ {\vdots} \\\\ {\phi_{n}^{T}}\end{array}\right]

(The same as in Linear Regression, just the data matrix is now replaced by the basis function matrix)

Challenge of Polynomial Regression: Overfitting

Reason: Too complex model (Degree of the polynom is too high!). It fits the noise and has unspecified behaviour between the training points.😭

Solution: Regularization

Regularization

Regularization: Constrain a model to make it simpler and reduce the task of overfitting.

💡 Avoid overfitting by forcing the weights w\boldsymbol{w} to be small

Assume that our model has degree of 3 (x1,x2,x3x^1, x^2, x^3), and the corresponding parameters/weights are w1,w2,w3w_1, w_2, w_3. If we force w3=0w_3=0, then w3x3=0w_3 x^3 = 0, meaning that the model now has only degree of 2. In other words. the model is somehow simpler.

In general, a regularized model has the following cost/objective function:

E_D(w)_Data term+λE_W(w)_Regularization term \underbrace{E\_D(\boldsymbol{w})}\_{\text{Data term}} + \underbrace{\lambda E\_W(\boldsymbol{w})}\_{\text{Regularization term}}

λ\lambda: regularization factor (hyperparameter, need to be tuned manually), controls how much you want to regularize the model.

Regularized Least Squares (Ridge Regression)

Consists of:

  • Sum of Squareds Error (SSE) function
  • quadratic regulariser (L2L_2-Regularization)

Lridge =SSE+λw2=(yΦw)T(yΦw)+λwTw \begin{aligned} L_{\text {ridge }} &= \mathbf{SSE} + \lambda \|w\|^2 \\\\ &= (\boldsymbol{y}-\boldsymbol{\Phi} \boldsymbol{w})^{T}(\boldsymbol{y}-\boldsymbol{\Phi} \boldsymbol{w})+\lambda \boldsymbol{w}^{T} \boldsymbol{w} \end{aligned}

Solution:

wridge=(ΦTΦ+λI)1ΦTy\boldsymbol{w}_{\mathrm{ridge}}^{*}=\left(\boldsymbol{\Phi}^{T} \boldsymbol{\Phi}+\lambda \boldsymbol{I}\right)^{-1} \boldsymbol{\Phi}^{T} \boldsymbol{y}

  • I\boldsymbol{I}: Identity matrix
  • (ΦTΦ+λI)\left(\boldsymbol{\Phi}^{T} \boldsymbol{\Phi}+\lambda \boldsymbol{I}\right) is full rank and can be easily inverted