"PCA is an important data preprocessing step for large, redundant data sets. Data is compressed to a smaller number of variables reflecting the majority of the variation. In principle, any analysis (cluster analysis, regression, etc.) can be performed on the compressed data. We can also use PCA in the context of exploratory data analysis: we can find the rank of the data, we can look at the loadings, etc., to determine what the data tells us and what steps we should take next."
In statistics, Mahalanobis distance is a distance measure introduced by P. C. Mahalanobis in 1936. It is based on correlations between variables by which different patterns can be identified and analysed. It is a useful way of determining similarity of an unknown sample set to a known one. It differs from Euclidean distance in that it takes into account the correlations of the data set and is scale-invariant, i.e. not dependent on the scale of measurements.