ML Algo overview

Overview of Machine Learning Algorithms

Supervised/Unsupervised Learning

Supervised learning

The training data you feed to the algorithm includes the desired solutions, called labels

Typical task:

  • Classification
  • Regression

Important supervised learning algo:

  • k-Nearest Neighbors
  • Linear Regression
  • Logistic Regression
  • Support Vector Machine (SVM)
  • Decision Trees and Random Forests
  • Neural Networks

Unsupervised learning

Training data is unlabeled.

Important unsupervised learning algo:

  • Clustering

    • K-Means
    • DBSCAN
    • Hierarchical Cluster Analysis (HCA)
  • Anomaly detection and novelty detection

    • One-class SVM
    • Isolation Forest
  • Visualization and dimensionality reduction

    • Principal Component Analysis (PCA)
    • Kernel PCA
    • Locally-Linear Embedding (LLE)
    • t-distributed Stochastic Neighbor Embedding (t-SNE)
  • Association rule learning

    • Apriori
    • Eclat

Semisupervised learning (supervised + unsupervised)

Deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data

Reinforcement Learning

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return or penalties in the form of negative rewards.

It must then learn by itself what is the best strategy, called a policy, to get the most reward over time.

A policy defines what action the agent should choose when it is in a given situation.

Batch and Online Learning

whether the system can learn incrementally from a stream of incoming data or not

Batch Learning

The system muss be trained using all the available data (I.e., it is incapable of learning incrementally)

First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.

Want a batch learning system to know about new data?

Need to train a new version of the system from scratch on the full dataset (not just the new data, but also the old data). Then stop the old system and replace it with the new one.

Online Learning

Train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches.

Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives.

👍 Advantages:

  • Great for systems that receive data as a continuous flow and need to adapt to chagne rapidly or autonomously
  • Save a huge amount of space (After learning the new data instance, do not need them anymore and can just discard them)

😠 Challenge: if bad data is fed to the system, the system’s performance will gradually decline.

🔧 Solution:

  • monitor the system closely
  • promptly switch learning off if detect a drop in performance
  • monitor the input data and react to abnormal data

Instance-Based Vs. Model-Based Learning

Instance-based learning

The system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of them), using a similarity measure

Model-based learning

Build a model of these examples, then use that model to make predictions