Review 5#
This notes is completed with assistance of ChatGPT
ToC#
- Probabilistic Graphical Models 
- representation, inference 
- Hidden Markov Model 
- Gaussian Mixture Model 
HMM#
Learning Resources#
KNN & Kmeans#
| Feature/Aspect | K-NN (K-Nearest Neighbors) | K-Means | 
|---|---|---|
| Type of Algorithm | Supervised Learning | Unsupervised Learning | 
| Purpose | Classification or Regression | Clustering | 
| Input Data | Labeled data for training | Unlabeled data | 
| Output | Predicted label/value | Cluster centroids | 
| Training Requirement | Yes, stores all training data | Yes, finds centroids | 
| Prediction Mechanism | Votes from ‘K’ closest points | Assigns to nearest centroid | 
| Parameter ‘K’ | Number of neighbors to consider | Number of clusters | 
| Distance Metric | Typically Euclidean, but can be others | Typically Euclidean | 
| Sensitivity to Outliers | Sensitive (depends on ‘K’) | Can be sensitive | 
| Scalability | Can be computationally expensive for large datasets (unless optimized e.g., with KD-trees) | Can be more scalable with techniques like MiniBatch K-means | 
| Real-time Updates | Easy, just add data to dataset | May require re-running the algorithm | 
| Interpretability | Direct relationship between input and output based on distance | Centroids represent cluster ‘centers’ but may not correspond to actual data points | 
Gaussian#
List two similarities and two differences between K-means and Gaussian Mixture Model
Similarities:
- Objective: Both K-Means and GMM are used for clustering, that is, partitioning a dataset into groups (clusters) based on similarities. 
- Initialization: Both can use similar initialization techniques. For example, the initial cluster centers in K-Means can be used as the initial means in GMM. 
Differences:
- Model Assumption: - K-Means: Assumes that clusters are spherical and equally sized. It achieves this by assigning points to the nearest centroid based on Euclidean distance. 
- GMM: Assumes that data is generated from a mixture of several Gaussian distributions. Each cluster can have a different elliptical shape, size, and orientation because GMM captures the variance and covariance of the data. 
 
- Soft vs. Hard Assignment: - K-Means: Performs hard assignments, meaning each point is assigned to one and only one cluster. 
- GMM: Performs soft assignments, meaning each point is assigned a probability for each cluster, indicating how likely it belongs to the cluster. This allows for a more nuanced understanding of cluster membership. 
 
While K-Means can be thought of as a special case of GMM where each cluster’s covariance along all dimensions approaches 0 (leading to spherical clusters), GMM provides a more general and flexible approach at the cost of being computationally more expensive.
How many free parameters do we need at least for a Gaussian Mixture Model in a D-dimensional space with k clusters?
For a Gaussian Mixture Model (GMM) in a \( D \)-dimensional space with \( k \) clusters, the free parameters include:
- Means: For each of the \( k \) clusters, we have a \( D \)-dimensional mean vector. Total parameters for means = \( k \times D \) 
- Covariances: Each cluster has a \( D \times D \) covariance matrix. However, since the covariance matrix is symmetric, the number of unique values in the matrix is \( \frac{D(D + 1)}{2} \). Total parameters for covariances = \( k \times \frac{D(D + 1)}{2} \) 
- Mixture weights (priors): For \( k \) clusters, there are \( k \) mixture weights. But they sum to 1, so you only need to know \( k - 1 \) of them to determine the last one. Total parameters for mixture weights = \( k - 1 \) 
Now, summing up all the parameters: Total parameters = \( k \times D + k \times \frac{D(D + 1)}{2} + k - 1 \)
Let’s simplify this expression.
The total number of free parameters for a Gaussian Mixture Model in a \( D \)-dimensional space with \( k \) clusters is given by:
This formula accounts for the means, covariance matrices, and mixture weights for each of the \( k \) clusters in the \( D \)-dimensional space.
EM#
- soft assignment: computes the prob this point comes from this/that distribution 
 
 
