Math Basics

Linear Algebra

Vectors

Vector: multi-dimensional quantity

  • Each dimension contains different information (e.g.: Age, Weight, Heightโ€ฆ)

    Vectors
  • represented as bold symbols

  • A vector x\boldsymbol{x} is always a column vector

    x=[124] \boldsymbol{x}=\left[\begin{array}{l} {1} \\\\ {2} \\\\ {4} \end{array}\right]
  • A transposed vector xT\boldsymbol{x}^T is a row vector

    xT=[124] \boldsymbol{x}^{T}=\left[\begin{array}{lll} {1} & {2} & {4} \end{array}\right]

Vector Operations

  • Multiplication by scalars

    2[12]=[24] 2\left[\begin{array}{l} {1} \\\\ {2} \end{array}\right]=\left[\begin{array}{l} {2} \\\\ {4} \end{array}\right]
  • Addtition of vectors

    [12]+[31]=[43] \left[\begin{array}{l}{1} \\\\ {2} \end{array}\right]+\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]=\left[\begin{array}{l}{4} \\\\ {3} \end{array}\right]
  • Scalar (Inner) products: Sum the element-wise products

    v=[124],w=[248] \boldsymbol{v}=\left[\begin{array}{c}{1} \\\\ {2} \\\\ {4}\end{array}\right], \quad \boldsymbol{w}=\left[\begin{array}{l}{2} \\\\ {4} \\\\ {8}\end{array}\right]
โŸจv,wโŸฉ=1โ‹…2+2โ‹…4+4โ‹…8=42 \langle v, w\rangle= 1 \cdot 2+2 \cdot 4+4 \cdot 8=42
  • Length of a vector: Square root of the inner product with itself โˆฅvโˆฅ=โŸจv,vโŸฉ12=(12+22+42)12=21 \|\boldsymbol{v}\|=\langle\boldsymbol{v}, \boldsymbol{v}\rangle^{\frac{1}{2}}=\left(1^{2}+2^{2}+4^{2}\right)^{\frac{1}{2}}=\sqrt{21}

Matrices

Matrix: rectangular array of numbers arranged in rows and columns

  • denoted with bold upper-case letters

    X=[132347] \boldsymbol{X}=\left[\begin{array}{ll}{1} & {3} \\\\ {2} & {3} \\\\ {4} & {7}\end{array}\right]
  • Dimension: \\#rows \\times \\#columns (E.g.: ๐Ÿ‘†XโˆˆR3ร—2X \in \mathbb{R}^{3 \times 2})

  • Vectors are special cases of matrices

    xT=[124]โŸ1ร—3 matrix  \boldsymbol{x}^{T}=\underbrace{\left[\begin{array}{ccc}{1} & {2} & {4}\end{array}\right]}_{1 \times 3 \text { matrix }}

####Matrices in ML

  • Data set can be represented as matrix, where single samples are vectors

    e.g.:

    AgeWeightHeight
    Joe3772175
    Mary103061
    Carol2565121
    Brad6667175
     Joe: x_1=[3772175], Mary: x_2=[103061] \text { Joe: } \boldsymbol{x}\_{1}=\left[\begin{array}{c}{37} \\\\ {72} \\\\ {175}\end{array}\right], \qquad \text { Mary: } \boldsymbol{x}\_{2}=\left[\begin{array}{c}{10} \\\\ {30} \\\\ {61}\end{array}\right] \\\\  Carol: x_3=[2565121], Brad: x_4=[6667175] \text { Carol: } \boldsymbol{x}\_{3}=\left[\begin{array}{c}{25} \\\\ {65} \\\\ {121}\end{array}\right], \qquad \text { Brad: } \boldsymbol{x}\_{4}=\left[\begin{array}{c}{66} \\\\ {67} \\\\ {175}\end{array}\right]
  • Most typical representation:

    • row ~ data sample (e.g. Joe)
    • column ~ data entry (e.g. age)
    X=[x_1Tx_2Tx_3Tx_4T]=[377217510306125651216667175] \boldsymbol{X}=\left[\begin{array}{l}{\boldsymbol{x}\_{1}^{T}} \\\\ {\boldsymbol{x}\_{2}^{T}} \\\\ {\boldsymbol{x}\_{3}^{T}} \\\\ {\boldsymbol{x}\_{4}^{T}}\end{array}\right]=\left[\begin{array}{ccc}{37} & {72} & {175} \\\\ {10} & {30} & {61} \\\\ {25} & {65} & {121} \\\\ {66} & {67} & {175}\end{array}\right]

Matrice Operations

  • Multiplication with scalar

    3M=3[345101]=[91215303] 3 \boldsymbol{M}=3\left[\begin{array}{lll}{3} & {4} & {5} \\\\ {1} & {0} & {1}\end{array}\right]=\left[\begin{array}{ccc}{9} & {12} & {15} \\\\ {3} & {0} & {3}\end{array}\right]
  • Addition of matrices

    M+N=[345101]+[121311]=[466412] \boldsymbol{M} + \boldsymbol{N}=\left[\begin{array}{lll}{3} & {4} & {5} \\\\ {1} & {0} & {1}\end{array}\right]+\left[\begin{array}{lll}{1} & {2} & {1} \\\\ {3} & {1} & {1}\end{array}\right]=\left[\begin{array}{lll}{4} & {6} & {6} \\\\ {4} & {1} & {2}\end{array}\right]
  • Transposed

    M=[345101],MT=[314051] \boldsymbol{M}=\left[\begin{array}{lll}{3} & {4} & {5} \\\\ {1} & {0} & {1}\end{array}\right], \boldsymbol{M}^{T}=\left[\begin{array}{ll}{3} & {1} \\\\ {4} & {0} \\\\ {5} & {1}\end{array}\right]
  • Matrix-Vector product (Vector need to have same dimensionality as number of columns)

    [w_1,โ€ฆ,w_n]โŸW[v_1โ‹ฎv_n]โŸ_v=[v_1w_1+โ‹ฏ+v_nw_n]โŸ_u \underbrace{\left[\boldsymbol{w}\_{1}, \ldots, \boldsymbol{w}\_{n}\right]}_{\boldsymbol{W}} \underbrace{\left[\begin{array}{c}{v\_{1}} \\\\ {\vdots} \\\\ {v\_{n}}\end{array}\right]}\_{\boldsymbol{v}}=\underbrace{\left[\begin{array}{c}{v\_{1} \boldsymbol{w}\_{1}+\cdots+v\_{n} \boldsymbol{w}\_{n}}\end{array}\right]}\_{\boldsymbol{u}}

    E.g.:

    u=Wv=[345101][102]=[3โ‹…1+4โ‹…0+5โ‹…21โ‹…1+0โ‹…0+1โ‹…2]=[133] \boldsymbol{u}=\boldsymbol{W} \boldsymbol{v}=\left[\begin{array}{ccc}{3} & {4} & {5} \\\\ {1} & {0} & {1}\end{array}\right]\left[\begin{array}{l}{1} \\\\ {0} \\\\ {2}\end{array}\right]=\left[\begin{array}{l}{3 \cdot 1+4 \cdot 0+5 \cdot 2} \\\\ {1 \cdot 1+0 \cdot 0+1 \cdot 2}\end{array}\right]=\left[\begin{array}{c}{13} \\\\ {3}\end{array}\right]

    ๐Ÿ’ก Think as: We sum over the columns wi\boldsymbol{w}_i of W\boldsymbol{W} weighted by viv_i

u=v_1w_1+โ‹ฏ+v_nw_n=1[31]+0[40]+2[51]=[133] u=v\_{1} w\_{1}+\cdots+v\_{n} w\_{n}=1\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]+0\left[\begin{array}{l}{4} \\\\ {0}\end{array}\right]+2\left[\begin{array}{l}{5} \\\\ {1}\end{array}\right]=\left[\begin{array}{c}{13} \\\\ {3}\end{array}\right]
  • Matrix-Matrix product

    U=WV=[345101][100324]=[3โ‹…1+4โ‹…0+5โ‹…23โ‹…0+4โ‹…3+5โ‹…41โ‹…1+0โ‹…0+1โ‹…21โ‹…0+0โ‹…3+1โ‹…4]=[133234] \boldsymbol{U} = \boldsymbol{W} \boldsymbol{V}=\left[\begin{array}{lll}{3} & {4} & {5} \\\\ {1} & {0} & {1}\end{array}\right]\left[\begin{array}{ll}{1} & {0} \\\\ {0} & {3} \\\\ {2} & {4}\end{array}\right]=\left[\begin{array}{ll}{3 \cdot 1+4 \cdot 0+5 \cdot 2} & {3 \cdot 0+4 \cdot 3+5 \cdot 4} \\\\ {1 \cdot 1+0 \cdot 0+1 \cdot 2} & {1 \cdot 0+0 \cdot 3+1 \cdot 4}\end{array}\right]=\left[\begin{array}{cc}{13} & {32} \\\\ {3} & {4}\end{array}\right]

    ๐Ÿ’ก Think of it as: Each column u_i=Wv_i\boldsymbol{u}\_i = \boldsymbol{W} \boldsymbol{v}\_i can be computed by a matrix-vector product

    W[v_1,โ€ฆ,v_n]โŸ_V=[Wv_1โŸu_1,โ€ฆ,Wv_nโŸ_u_n]=U \boldsymbol{W} \underbrace{\left[\boldsymbol{v}\_{1}, \ldots, \boldsymbol{v}\_{n}\right]}\_{\boldsymbol{V}}=[\underbrace{\boldsymbol{W} \boldsymbol{v}\_{1}}_{\boldsymbol{u}\_{1}}, \ldots, \underbrace{\boldsymbol{W} \boldsymbol{v}\_{n}}\_{\boldsymbol{u}\_{n}}]=\boldsymbol{U}
    • Non-commutative: VWโ‰ WV\boldsymbol{V} \boldsymbol{W} \neq \boldsymbol{W} \boldsymbol{V}

    • Associative: V(WX)=(VW)X\boldsymbol{V}(\boldsymbol{W} \boldsymbol{X})=(\boldsymbol{V} \boldsymbol{W}) \boldsymbol{X}

    • Transpose product:

      (VW)T=WTVT (\boldsymbol{V} \boldsymbol{W}) ^{T}=\boldsymbol{W}^{T} \boldsymbol{V}^{T}
  • Matrix inverse

    • scalar

      wโ‹…wโˆ’1=1 w \cdot w^{-1}=1
    • matrices

      WWโˆ’1=I,Wโˆ’1W=I \boldsymbol{W} \boldsymbol{W}^{-1}=\boldsymbol{I}, \quad \boldsymbol{W}^{-1} \boldsymbol{W}=\boldsymbol{I}

Important Special Cases

  • Scalar (Inner) product:

    โŸจw,vโŸฉ=wTv=[w_1,โ€ฆ,w_n][v_1โ‹ฎv_n]=w_1v_1+โ‹ฏ+w_nv_n \langle\boldsymbol{w}, \boldsymbol{v}\rangle = \boldsymbol{w}^{T} \boldsymbol{v}=\left[w\_{1}, \ldots, w\_{n}\right]\left[\begin{array}{c}{v\_{1}} \\\\ {\vdots} \\\\ {v\_{n}}\end{array}\right]=w\_{1} v\_{1}+\cdots+w\_{n} v\_{n}
  • Compute row/column averages of matrix

    X=[X_1,1โ€ฆX_1,mโ‹ฎโ‹ฎX_n,1โ€ฆX_n,m]โŸ_n (samples) ร—m (entries)  \boldsymbol{X}=\underbrace{\left[\begin{array}{ccc}{X\_{1,1}} & {\dots} & {X\_{1, m}} \\\\ {\vdots} & {} & {\vdots} \\\\ {X\_{n, 1}} & {\dots} & {X\_{n, m}}\end{array}\right]}\_{n \text { (samples) } \times m \text { (entries) }}
    • Vector of row averages (average over all entries per sample)

      [1mโˆ‘_i=1mX_1,iโ‹ฎ1mโˆ‘i=1mX_n,i]=X[1mโ‹ฎ1m]=Xa, with a=[1mโ‹ฎ1m] \left[\begin{array}{cc}{\frac{1}{m} \sum\_{i=1}^{m} X\_{1, i}} \\\\ {\vdots} & {} \\\\ {\frac{1}{m} \sum_{i=1}^{m} X\_{n, i}}\end{array}\right]=\boldsymbol{X}\left[\begin{array}{c}{\frac{1}{m}} \\\\ {\vdots} \\\\ {\frac{1}{m}}\end{array}\right]=\boldsymbol{X} \boldsymbol{a}, \quad \text { with } \boldsymbol{a}=\left[\begin{array}{c}{\frac{1}{m}} \\\\ {\vdots} \\\\ {\frac{1}{m}}\end{array}\right]
    • Vector of column averages (average over all samples per entry)

      [1nโˆ‘i=1nX_i,1,โ€ฆ,1nโˆ‘_i=1nX_i,m]=[1n,โ€ฆ,1n]X=bTX, with b=[1nโ‹ฎ1n] \left[\frac{1}{n} \sum_{i=1}^{n} X\_{i, 1}, \ldots, \frac{1}{n} \sum\_{i=1}^{n} X\_{i, m}\right]=\left[\frac{1}{n}, \ldots, \frac{1}{n}\right] \boldsymbol{X}=\boldsymbol{b}^{T} \boldsymbol{X}, \text { with } \boldsymbol{b}=\left[\begin{array}{c}{\frac{1}{n}} \\\\ {\vdots} \\\\ {\frac{1}{n}}\end{array}\right]

Calculus

  • โ€œThe derivative of a function of a real variable measures the sensitivity to change of a quantity (a function value or dependent variable) which is determined by another quantity (the independent variable)โ€

ScalarVector
Functionf(x)f(x)f(x)f(\boldsymbol{x})
Derivativeโˆ‚f(x)โˆ‚x=g\frac{\partial f(x)}{\partial x}=gโˆ‚f(x)โˆ‚x=[โˆ‚f(x)โˆ‚x_1,โ€ฆ,โˆ‚f(x)โˆ‚x_d]T=:โˆ‡f(x)\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}=\left[\frac{\partial f(\boldsymbol{x})}{\partial x\_{1}}, \ldots, \frac{\partial f(\boldsymbol{x})}{\partial x\_{d}}\right]^{T} =: \nabla f(x)\quad
(๐Ÿ‘† gradient of function ff at x\boldsymbol{x})
Min/Maxโˆ‚f(x)โˆ‚x=0\frac{\partial f(x)}{\partial x}=0โˆ‚f(x)โˆ‚x=[0,โ€ฆ,0]T=0\frac{\partial f(\boldsymbol{x})}{\partial \boldsymbol{x}}=[0, \ldots, 0]^{T}=\mathbf{0}

Matrix Calculus

ScalarVector
Linearโˆ‚axโˆ‚x=a\frac{\partial a x}{\partial x}=aโˆ‡_xAx=AT\nabla\_{\boldsymbol{x}} \boldsymbol{A} \boldsymbol{x}=\boldsymbol{A}^{T}
Quadraticโˆ‚x2โˆ‚x=2x\frac{\partial x^{2}}{\partial x}=2 xโˆ‡_xxTx=2xโˆ‡_xxTAx=2Ax\begin{array}{l}{\nabla\_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{x}=2 \boldsymbol{x}} \\\\ {\nabla\_{\boldsymbol{x}} \boldsymbol{x}^{T} \boldsymbol{A} \boldsymbol{x}=2 \boldsymbol{A} \boldsymbol{x}}\end{array}