week7 lec1#
PCA vs PLS#
Here’s a simplified comparison of PCA and PLS in a table format:
Aspect |
PCA |
PLS |
---|---|---|
Purpose |
Dimensionality reduction |
Dimensionality reduction & Regression |
Method |
Unsupervised |
Supervised |
Component Extraction |
Maximizes variance of data |
Maximizes covariance between predictors & response |
Use Cases |
Data visualization, exploratory data analysis |
Predictive modeling with collinear predictors |
Outputs |
Orthogonal principal components |
PLS components related to outcome prediction |
Assumptions |
Maximum variance is informative |
Variance shared with the response is informative |
Limitations |
May not capture relevant patterns for prediction |
Might miss informative variance not shared with response |
Interpretation |
Typically lacks clear physical meaning |
Somewhat more interpretable, but still complex |
6 Factor Analysis#
Factor Analysis (FA) is a statistical method that seeks to uncover hidden factors or latent structures behind observed data. In essence, it aims to identify the underlying reasons for the correlations present among data points.
The Factor Analysis model can be expressed as:
In this model, \( Q \) is the Loading Matrix, which contains the loadings of observed variables on the factors, and \( F \) is the Factor Scores Matrix, representing the scores of each observation on the latent factors.
The Loading Matrix \( Q \) is derived from:
And the Factor Scores Matrix \( F \) is obtained by:
The goal of FA is to explain \( p \) variables using \( q \) factors, where \( q < p \).
However, FA cannot always be directly represented as \( X = QF + \mu \). We often need to consider a specific factor, \( U \) (unique factors), which is an error term matrix. This matrix accounts for the random errors that cannot be explained by the relationship between latent factors and observed variables. It includes the random noise in the data and the variability not captured by the factor analysis model.
It is important to note that the expectation of \( U \) is zero, \( E(U) = 0 \), and that the covariance between any two specific factors \( U_i \) and \( U_j \) is zero, \( \text{COV}(U_i, U_j) = 0 \), as well as the covariance between the factor scores \( F \) and specific factors \( U \), \( \text{COV}(F, U) = 0 \). This means that the errors \( U \) are uncorrelated with each other, and also uncorrelated with the factor scores \( F \).