admin管理员组

文章数量:1616435

Quick Summary of PCA:

1. Organize data as an m*n matrix, where m is the number of measurement types and n is the number of samples

2.Subtract off the mean for each measurement type

3. Calculate the SVD or the eigenvectors of the covariance


A deeper appreciation of the limits of PCA requires some consideration about the underlying assumptions and in tandem, a more rigorous description of the source of data. Generally speaking, the primary motivation behind this method is to decorrelate the data set, i.e. remove second-order depencies.


In the context of dimensional reduction, one measure of success is the degree to which a reduced representation can predict the original data. In statistical terms, we must define the error function(or loss function). It can be proved that under a common loss function, mean squared error(i.e. L2 norm), PCA provides the optimal reduced representation of the data. The means that selecting orthogonal directions for principal component is the best solution to predicting the original data.


The goal of the analysis is to decorrelate the data, or said in other terms, the goal is to remove second-order dependencies exist between the variables.


Multiple solutions exist for removing higher-order dependencies. For instance, if prior knowledge is known about the problem, then a nonlinearity might be applied to the data to transform the data to a more appropriate naive basis.


Another direction is to impose more general statistical definitions of dependency within a data, e.g. requiring that data along reduced dimensions be statistically independent. This class of algorithm, termed, independent component analysis(ICA), has been to demonstrated to succeed in many domains where PCA fails.

本文标签: PCASVD