Machine Learning

Overview of Machine Learning Algorithms

Supervised/Unsupervised Learning Supervised learning The training data you feed to the algorithm includes the desired solutions, called labels Typical task: Classification Regression Important supervised learning algo: k-Nearest Neighbors Linear Regression Logistic Regression Support Vector Machine (SVM) Decision Trees and Random Forests Neural Networks Unsupervised learning Training data is unlabeled.

2020-08-17

Evaluation

TL;DR Confusion matrix, ROC, and AUC Confuse matrix A confusion matrix tells you what your ML algorithm did right and what it did wrong. Known Truth Positive Negative Prediction Positive True Positive (TP) False Positive (FP) Precision = TP / (TP+FP) Negative False Negative (FN) True Negative (TN) TPR = Sensitivity = Recall = TP / (TP + FN) Specificity = TN / (FP+TN) FPR = FP / (FP + TN) = 1 - Specificity Row: Prediction Column: Known truth Each cell:

2020-08-17

End-to-End Machine Learning Project

1. Look at the big picture 1.1 Frame the problem Consider the business objective: How do we expect to use and benefit from this model? 1.2 Select a performance measure 1.

2020-08-17

Math Basics

Linear Algebra Vectors Vector: multi-dimensional quantity Each dimension contains different information (e.g.: Age, Weight, Height…) represented as bold symbols A vector $\boldsymbol{x}$ is always a column vector $$ \boldsymbol{x}=\left[\begin{array}{l} {1} \\\\ {2} \\\\ {4} \end{array}\right] $$ A transposed vector $\boldsymbol{x}^T$ is a row vector $$ \boldsymbol{x}^{T}=\left[\begin{array}{lll} {1} & {2} & {4} \end{array}\right] $$ Vector Operations Multiplication by scalars $$ 2\left[\begin{array}{l} {1} \\\\ {2} \end{array}\right]=\left[\begin{array}{l} {2} \\\\ {4} \end{array}\right] $$ Addtition of vectors $$ \left[\begin{array}{l}{1} \\\\ {2} \end{array}\right]+\left[\begin{array}{l}{3} \\\\ {1}\end{array}\right]=\left[\begin{array}{l}{4} \\\\ {3} \end{array}\right] $$ Scalar (Inner) products: Sum the element-wise products $$ \boldsymbol{v}=\left[\begin{array}{c}{1} \\\\ {2} \\\\ {4}\end{array}\right], \quad \boldsymbol{w}=\left[\begin{array}{l}{2} \\\\ {4} \\\\ {8}\end{array}\right] $$ $$ \langle v, w\rangle= 1 \cdot 2+2 \cdot 4+4 \cdot 8=42 $$ Length of a vector: Square root of the inner product with itself $$ \|\boldsymbol{v}\|=\langle\boldsymbol{v}, \boldsymbol{v}\rangle^{\frac{1}{2}}=\left(1^{2}+2^{2}+4^{2}\right)^{\frac{1}{2}}=\sqrt{21} $$ Matrices Matrix: rectangular array of numbers arranged in rows and columns

2020-08-17

SVM: Kernelized SVM

SVM (with features) Maximum margin principle Slack variables allow for margin violation $$ \begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &\|\mathbf{w}\|^{2} + C \sum_i^N \xi_i \\\\ \text { s.t. } \quad & y_{i}\left(\mathbf{w}^{T} \color{red}{\phi(\mathbf{x}_{i})} + b\right) \geq 1 -\xi_i, \quad \xi_i \geq 0\end{array} $$ Math basics Solve the constrained optimization problem: Method of Lagrangian Multipliers

2020-07-13

SVM: Kernel Methods

Kernel function Given a mapping function $\phi: \mathcal{X} \rightarrow \mathcal{V}$, the function $$ \mathcal{K}: x \rightarrow v, \quad \mathcal{K}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left\langle\phi(\mathbf{x}), \phi\left(\mathbf{x}^{\prime}\right)\right\rangle_{\mathcal{V}} $$ is called a kernel function. “A kernel is a function that returns the result of a dot product performed in another space.

2020-07-13

SVM: Basics

🎯 Goal of SVM To find the optimal separating hyperplane which maximizes the margin of the training data it correctly classifies the training data it is the one which will generalize better with unseen data (as far as possible from data points from each category) SVM math formulation Assuming data is linear separable

2020-07-13

Logistic Regression: Probabilistic view

Class label: $$ y_i \in \\{0, 1\\} $$ Conditional probability distribution of the class label is $$ \begin{aligned} p(y=1|\boldsymbol{x}) &= \sigma(\boldsymbol{w}^T\boldsymbol{x}+b) \\\\ p(y=0|\boldsymbol{x}) &= 1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}+b) \end{aligned} $$ with

2020-07-13

Logistic Regression: Basics

💡 Use regression algorithm for classification Logistic regression: estimate the probability that an instance belongs to a particular class If the estimated probability is greater than 50%, then the model predicts that the instance belongs to that class (called the positive class, labeled “1”), or else it predicts that it does not (i.

2020-07-13

Objective Function

How does the objective function look like? Objective function: $$ \operatorname{Obj}(\Theta)= \overbrace{L(\Theta)}^{\text {Training Loss}} + \underbrace{\Omega(\Theta)}_{\text{Regularization}} $$ Training loss: measures how well the model fit on training data $$ L=\sum_{i=1}^{n} l\left(y_{i}, g_{i}\right) $$ Square loss: $$ l(y_i, \hat{y}_i) = (y_i - \hat{y}_i)^2 $$ Logistic loss: $$ l(y_i, \hat{y}_i) = y_i \log(1 + e^{-\hat{y}_i}) + (1 - y_i) \log(1 + e^{\hat{y}_i}) $$ Regularization: How complicated is the model?

2020-07-06