Review 5#

This notes is completed with assistance of ChatGPT

ToC#

  • Probabilistic Graphical Models

  • representation, inference

  • Hidden Markov Model

  • Gaussian Mixture Model

HMM#

Learning Resources#

KNN & Kmeans#

Feature/Aspect

K-NN (K-Nearest Neighbors)

K-Means

Type of Algorithm

Supervised Learning

Unsupervised Learning

Purpose

Classification or Regression

Clustering

Input Data

Labeled data for training

Unlabeled data

Output

Predicted label/value

Cluster centroids

Training Requirement

Yes, stores all training data

Yes, finds centroids

Prediction Mechanism

Votes from ‘K’ closest points

Assigns to nearest centroid

Parameter ‘K’

Number of neighbors to consider

Number of clusters

Distance Metric

Typically Euclidean, but can be others

Typically Euclidean

Sensitivity to Outliers

Sensitive (depends on ‘K’)

Can be sensitive

Scalability

Can be computationally expensive for large datasets (unless optimized e.g., with KD-trees)

Can be more scalable with techniques like MiniBatch K-means

Real-time Updates

Easy, just add data to dataset

May require re-running the algorithm

Interpretability

Direct relationship between input and output based on distance

Centroids represent cluster ‘centers’ but may not correspond to actual data points

Gaussian#

EM#

  • soft assignment: computes the prob this point comes from this/that distribution

em em