Machine Learning (ML)#

Machine Learning is a branch of Artificial Intelligence that focuses on models and algorithms that let computers learn from data and improve from previous experience without being explicitly programmed. There are many types of machine learning.

Note

Some methods fit into multiple categories or can be adapted to be used for other categories. For the sake of brevity, these cases are not always mentioned here.

Note

I’m not sure how Neural Network’s, Deep Learning, AutoEncoders, DenseNets, etc. fit into these categories. They may span multiple categories, or perhaps these are more traditional ML techniques.

Categories#

  • Supervised Learning - Use labeled data.

    • Classification - Predict categorical (discrete) values.

    • Regression - Predict continuous numerical values.

    • Classification|Regression - Some models can perform either Classification or Regression.

    • Ensemble Learning - Combine multiple models of either type into one better model.

      • Bagging (Bootstrap Aggregating) Method - Train models independently on different subsets of the data, then combine their predictions.

      • Boosting Method - Train models sequentially, each model focusing on errors of prior models, then do weighted combination of their predictions.

      • Stacking (Stacked Generalization) Method - train multiple different models (often different types), use predictions as inputs to final “meta-model”.

  • Unsupervised Learning - Use unlabeled data.

    • Clustering - Group data into clusters based on similarity.

      • Centroid-Based (Partitioning) Clustering cluster around centroids of points, choose number of clusters in advance.

      • Distribution-Based Clustering - cluster by mixture of probability distributions.

      • Connectivity-Based (Hierarchical) Clustering - cluster with tree-like nested groupings by connections between points.

      • Density-Based (Model-Based) Clustering - clusters as contiguous regions of high data density separated by areas of lower density.

    • Dimensionality Reduction - Simplify datasets by reducing features while keeping important information (often used to select features for other models).

    • Association Rule Mining - Discover rules where the presence of one item in a dataset indicates the probability of the presence of another.

  • Reinforcement Learning - Agent learns by interacting with environment via trial and error and receiving reward feedback.

    • Model-Based Methods - interact with a simulated model of the environment, helping the agent plan actions by simulating potential results.

    • Model-Free Methods - interact with the actual environment, learning directly from experience.

  • Forecasting Models - Use past data to predict future trends (often time series problems).

  • Semi-Supervised Learning - Use some labeled data with more unlabeled data.

  • Self-Supervised Learning - Generates its own labels from unlabeled data.

Supervised Learning#

Classification#

Regression#

Classification|Regression#

Ensemble Learning#

Bagging#

Boosting#

Stacking#

  • Stacks methods discussed above like K-Nearest Neighbors, Perceptron and Logistic Regression

Unsupervised Learning#

Clustering#

Centroid-Based#

  • K-Means Clustering - groups data into K clusters based on how close the points are to each other. Iteratively assigns points to the nearest centroid, recalculating centroids after each addition. Can use the Elbow Method to choose a good value for K

  • KMeans++ Clustering - improves K-Means by choosing initial cluster centers intelligently instead of randomly

  • K-Medoids Clustering - similar to K-means, but uses actual data points (medoids) as the centers, making more robust to outliers

  • FCM (Fuzzy C-Means Clustering) - similar to K-means but uses Fuzzy Clustering, allowing each data point to belong to multiple clusters with varying degrees of membership

  • K-Mode Clustering - works on categorical data, unlike K-Means which is for numerical data

Distribution-Based#

  • GMM (Gaussian Mixture Models) - fits data as a weighted mixture of Gaussian distributions and assigns data points based on likelihood

  • DPMMs (Dirichlet Process Mixture Models) - extension of Gaussian Mixture Models that can automatically decide the number of clusters based on the data

  • EM (Expectation-Maximization) Algorithm - Estimate unknown parameters using E-Step (Expectation Step) (calculating expected values of missing/hidden variables) and M-Step (Maximization Step) (maximizing log-likelihood to see how well the model explains the data)

Connectivity-Based#

  • Hierarchical Clustering - create clusters by building a tree step-by-step, merging or splitting groups

  • Agglomerative Clustering - (Bottom-up) start with each point as a cluster and iteratively merge the closest ones

  • Divisive Clustering - (Top-down) starts with one cluster and splits iteratively into smaller clusters

  • Spectral Clustering - groups data by analyzing connections between points using graphs

  • AP (Affinity Propagation) - identify data clusters by sending messages between data points, calculates optimal number of clusters automatically

Density-Based#

Dimensionality Reduction#

Association Rule Mining#

Reinforcement Learning#

Model-Based#

  • MDPs (Markov Decision Processes) - describe step-by-step decisions where the results of actions are uncertain. Evaluates all possible moves?

  • Monte Carlo Tree Search - designed to solve problems with huge decision spaces, like the board game Go with \(10^{170}\) possible board states, by building a search tree iteratively/randomly instead of exploring all possible moves.

Model-Free#

  • Q-Learning - makes trial-and-error guesses, building and updating a Q-table which stores Q-values which estimate how good it is to take a specific action in a given state.

  • Deep Q-Learning - Regular Q-Learning is good for small problems, but struggles on complex ones (like images) since the Q-table gets huge and computationally expensive. Deep Q-Learning fixes this by using a neural network to estimate the Q-values instead of a Q-table

  • SARSA (State-Action-Reward-State-Action) - helps an agent to learn an optimal policy by exploring the environment, taking actions, receiving feedback, and updating behavior for long-term rewards.

  • REINFORCE Algorithm - instead of estimating how good each action is, just tries actions and adjusts the chances of those actions based on the total reward afterwards

  • Actor-Critic Algorithm - combines an Actor (which selects actions via a Policy Gradient) and Critic (which evaluates the Actor via a Value Function), both of which learn (like your Loss function is getting smarter alongside your model)

  • A3C (Asynchronous Advantage Actor-Critic) - uses multiple agents which learn in parallel, each interacting with their own private environments, then contribute their updates to a shared global model.

Forecasting Models#

  • ARIMA (Auto-Regressive Integrated Moving Average) - Combines Autoregression (AR), Differencing (I) and Moving Averages (MA) to capture patterns to predict future values based on historical data. Not great with seasonal data..

  • SARIMA (Seasonal ARIMA) - extension of ARIMA designed for time series data with seasonal patterns.

  • Exponential Smoothing - assumes future patterns will be similar to more recent past data, focuses on learning average demand level over time. Simple and accurate for short-term forecasts, not great for long term forecasts. Uses Simple, Double, or Holt-Winters Exponential Smoothing.

  • RNNs (Recurrent Neural Networks) (Tensorflow Example) - neural networks where information can be passed backwards as well as forwards. They have many uses beyond forecasting, such as text generation

Semi-Supervised Learning#

  • Self-Training - The model is first trained on labeled data. It then predicts labels for unlabeled data, adding high-confidence predictions to the labeled set iteratively to refine the model. Includes Pseudo Labelling

  • Co-Training - Two or more models are trained on different feature subsets of the data (like one model looks at the body of an email, another looks at the subject and sender, etc). Each model labels unlabeled data for the other, enabling them to learn from complementary views.

  • Multi-View Training - A variation of co-training where models train on different data representations (e.g., images and text) to predict the same output.

  • Graph-Based Models (Label Propagation) - Data is represented as a graph with nodes (data points) and edges (similarities). Labels are propagated from labeled nodes to unlabeled ones based on graph connectivity.

  • GAN (Generative Adversarial Network) (PyTorch Example) - create new, realistic data by learning from existing examples (creates good synthetic data)

  • Few-Shot Learning - a meta-learning process where you train the model to learn quickly from new and unseen data, so you don’t have to train it with a bunch of data initially. So I guess it does some quick additional learning when you “inference” it later?

Self-Supervised Learning#

  • Haven’t found specific examples for this yet, most links are to research papers.