Linear Discriminant Analysis (LDA) also called Fisher’s Linear Discriminant reduces dimension (like PCA) but focuses on maximizing seperability among known categories 💡 Idea Create a new axis Project the data onto this new axis in a way to maximize the separation of two categories How it works?
2020-11-07
No assumption about distributions -> non-parametric Linear decision surfaces Begin by supervised training (given class of training data) Linear Discriminant Functions and Decision Surfaces A discriminant function that is a linear combination of the components of $x$ can be written as $$ g(\mathbf{x})=\mathbf{w}^{T} \mathbf{x}+w\_{0} $$ $\mathbf{x}$: feature vector $\mathbf{w}$: weight vector $w\_0$: bias or threshold weight The two category case Decision rule:
2020-11-07
Tree-based Methods CART: Classification And Regression Tree Grow a binary tree At each node, “split” the data into two “daughter” nodes. Splits are chosen using a splitting criterion. Bottom nodes are “terminal” nodes.
2020-10-27
Assign a class label to imput sample.
2020-09-07
SVM (with features) Maximum margin principle Slack variables allow for margin violation $$ \begin{array}{ll} \underset{\mathbf{w}}{\operatorname{argmin}} \quad &\|\mathbf{w}\|^{2} + C \sum_i^N \xi_i \\\\ \text { s.t. } \quad & y_{i}\left(\mathbf{w}^{T} \color{red}{\phi(\mathbf{x}_{i})} + b\right) \geq 1 -\xi_i, \quad \xi_i \geq 0\end{array} $$ Math basics Solve the constrained optimization problem: Method of Lagrangian Multipliers
2020-07-13
Kernel function Given a mapping function $\phi: \mathcal{X} \rightarrow \mathcal{V}$, the function $$ \mathcal{K}: x \rightarrow v, \quad \mathcal{K}\left(\mathbf{x}, \mathbf{x}^{\prime}\right)=\left\langle\phi(\mathbf{x}), \phi\left(\mathbf{x}^{\prime}\right)\right\rangle_{\mathcal{V}} $$ is called a kernel function. “A kernel is a function that returns the result of a dot product performed in another space.
2020-07-13
🎯 Goal of SVM To find the optimal separating hyperplane which maximizes the margin of the training data it correctly classifies the training data it is the one which will generalize better with unseen data (as far as possible from data points from each category) SVM math formulation Assuming data is linear separable
2020-07-13
Class label: $$ y_i \in \\{0, 1\\} $$ Conditional probability distribution of the class label is $$ \begin{aligned} p(y=1|\boldsymbol{x}) &= \sigma(\boldsymbol{w}^T\boldsymbol{x}+b) \\\\ p(y=0|\boldsymbol{x}) &= 1 - \sigma(\boldsymbol{w}^T\boldsymbol{x}+b) \end{aligned} $$ with
2020-07-13
💡 Use regression algorithm for classification Logistic regression: estimate the probability that an instance belongs to a particular class If the estimated probability is greater than 50%, then the model predicts that the instance belongs to that class (called the positive class, labeled “1”), or else it predicts that it does not (i.
2020-07-13
Classification models.
2020-07-13