Linear Regression

Linear Regression

Linear Regression Model

A linear model makes a prediction y^i\hat{y}_i by simply computing a weighted sum of the input xi\boldsymbol{x}_i, plus a constant w0w_0 called the bias term:

For single sample/instances

y^i=f(x)=w0+โˆ‘_j=1Dw_jx_i,j \hat{y}_i = f \left( \boldsymbol{x} \right) = w_0 + \sum\_{j=1}^{D}w\_{j} x\_{i, j}

In matrix-form:

y^_i=w0+โˆ‘_j=1Dwjxi,j=x~_iTw  \hat{y}\_{i}=w_{0}+ \displaystyle \sum\_{j=1}^{D} w_{j} x_{i, j}=\tilde{\boldsymbol{x}}\_{i}^{T} \boldsymbol{w}\
  • x~_i=[1xi]=[1x_i,1โ‹ฎx_i,D]โˆˆRD+1\tilde{\boldsymbol{x}}\_{i} = \left[\begin{array}{c}{1} \\\\ {x_{i}}\end{array}\right] = \left[\begin{array}{c} {1} \\\\ x\_{i, 1} \\\\ \vdots \\\\ {x\_{i, D}}\end{array}\right] \in \mathbb{R}^{D+1}

  • w=[w_0โ‹ฎw_D]โˆˆRD+1\boldsymbol{w}=\left[\begin{array}{c}{w\_{0}} \\\\ {\vdots} \\\\ {w\_{D}}\end{array}\right] \in \mathbb{R}^{D+1}

On full dataset

y^=[y^_1โ‹ฎy^_n]=[x~_1Twโ‹ฎx~_nTw]=[1x_1Tโ‹ฎโ‹ฎ1x_nT]โŸ_=:Xw=Xw \hat{\boldsymbol{y}}=\left[\begin{array}{c}{\hat{y}\_{1}} \\\\{\vdots} \\\\ {\hat{y}\_{n}}\end{array}\right]=\left[\begin{array}{c}{\tilde{\boldsymbol{x}}\_{1}^{T} \boldsymbol{w}} \\\\ {\vdots} \\\\ {\tilde{\boldsymbol{x}}\_{n}^{T} \boldsymbol{w}}\end{array}\right] = \underbrace{\left[\begin{array}{cc}{1} & {\boldsymbol{x}\_{1}^{T}} \\\\ {\vdots} & {\vdots} \\\\ {1} & {\boldsymbol{x}\_{n}^{T}}\end{array}\right]}\_{=: \boldsymbol{X}} \boldsymbol{w} = \boldsymbol{X} \boldsymbol{w}
  • y^\hat{\boldsymbol{y}}: vector containing the output for each sample
  • X\boldsymbol{X}: data-matrix containing a vector of ones as the first column as bias

y=[y^_1โ‹ฎy^_n]โŸ_โˆˆRnร—1=[x^_1Twโ‹ฎx^_nTw]=[1โ‹…w_0+x_1,1โ‹…w_1+โ‹ฏ+x_1,Dโ‹…w_Dโ‹ฎ1โ‹…w_0+xn,1โ‹…w_1+โ‹ฏ+x_n,Dโ‹…wD]=[1x_1,1โ‹ฏx_1,Dโ‹ฎโ‹ฎโ‹ฑโ‹ฎ1x_n,1โ‹ฏx_n,D]โŸโ‹…=[1x_1Tโ‹ฎโ‹ฎ1xnT]=:XโˆˆRnร—(1+D)[w_0w1โ‹ฎw_D]โŸ_=:wโˆˆR(1+D)ร—1y=\underbrace{\begin{bmatrix}{\widehat y}\_1 \\\\ \vdots\\\\{\widehat y}\_n\end{bmatrix}}\_{\boldsymbol\in\mathbf โ„^{n\times1}}=\begin{bmatrix}\widehat x\_1^Tw\\\\\vdots\\\\\widehat x\_n^Tw\end{bmatrix}=\begin{bmatrix}1\cdot w\_0+x\_{1,1}\cdot w\_1+\cdots+x\_{1,D}\cdot w\_D\\\\\vdots\\\\1\cdot w\_0+x_{n,1}\cdot w\_1+\cdots+x\_{n,D}\cdot w_D\end{bmatrix}=\underset{=\begin{bmatrix}1&x\_1^T\\\\\vdots&\vdots\\\\1&x_n^T\end{bmatrix}\\\\=:\boldsymbol X\in\mathbb{R}^{n\times(1+D)}}{\underbrace{\begin{bmatrix}1&x\_{1,1}&\cdots&x\_{1,D}\\\\\vdots&\vdots&\ddots&\vdots\\\\1&x\_{n,1}&\cdots&x\_{n,D}\end{bmatrix}}\cdot}\underbrace{\begin{bmatrix}w\_0\\\\w_1\\\\\vdots\\\\w\_D\end{bmatrix}}\_{=:\boldsymbol w\boldsymbol\in\mathbf โ„^{\boldsymbol(\mathbf1\boldsymbol+\mathbf D\boldsymbol)\boldsymbol\times\mathbf1}}