Principal Component Analysis
The application provides the traditional Principal Component Analysis (PCA) and Min/Max Autocorrelation Factors (MAF) tools in Isatis.neo.
The Principal Component Analysis can be used on any type of data and numerical variables and helps you constructing:
- the resulting orthogonal factors;
- the translation matrix, which allows converting raw variables into factors and vice versa. These application specific parameters are attached to the output macro variable(s).
This method is a well-known statistical transformation which consists in transforming a set of raw variables (possibly correlated) as a linear combination of a new set of uncorrelated variables, called factors or principal components (PCs). It enables to investigate correlations between important groups of variables and then perform dimension reduction.
To cope with any numerical variable, the input variables are automatically normalized beforehand.
The statistical criterion is based on the variance-covariance matrix (note that this technique does not involve the distance between samples and therefore does not really belong to the geostatistical framework). This matrix is established experimentally and its eigen values and eigen vectors are derived to be used in the calculation of the factors.
The eigen vectors, that are calculated from the variance-covariance matrix of the N variables, define the principal directions of the N-dimensional cloud made by the values of the variables. These directions are orthogonal and explain successively a decreasing part of the total variability of the cloud, from the first direction explaining the largest part of the variability, to the last direction explaining the lowest part of the variability. Each eigen vector is defined as a linear combination of the variables and is associated with an eigen value, equal to its variance and to the part of the total variability that this eigen vector explains. By normalizing the eigen vectors, we obtain the factors - these are linear combinations of the variables, without correlation and with variance 1.
By definition, the complete matrix will be fully explained if the number of factors is exactly equal to the number of raw variables. However, it is often enough to deal with a limited number of factors in order to explain a large percentage of the total variability. This is why the factors are sorted by decreasing order.
However, one known problem with PCA is the inability to reproduce the cross-variograms after simulating independently the PCA dimensions. The Min/Max Autocorrelation Factors method is a partial solution to solve this problem. It is meant to spatially decorrelate at a given lag distance correlated variables to non-correlated factors, thus reducing the number of variables to consider. This user-set distance decorrelation transformation is built from the experimental variogram on sphered, i.e. normalized, PCs.
Note: This method has been developed by Switzer and Green [1] and applied more recently by Desbarats and Dimitrakopoulos [2].
[1] - Switzer P., Green A. A., April 1984, Min/Max Autocorrelation Factors for Multivariate Spatial Imagery, Technical Report N.6, Department Of Statistics, Stanford University.
[2] - Desbarats A. J., Dimitrakopoulos R., 2000, Geostatistical Simulation of Regionalized Pore-Size Distributions Using Min/Max Autocorrelation Factors, Mathematical Geology, Vol 32, N.8.
Note: When simulating PCA components, it is strongly advised to set a different seed for each simulated component. Otherwise, you risk introducing a bias (i.e. a correlation) between the simulated variables.
This method is based on a principal components approach and consists in transforming linearly a family of n geo-referenced variables {Zi(x)} in orthogonal factors ranked in order of increasing spatial correlation.
This MAF decomposition is to connect with the Principal Component Analysis (PCA) where this decomposition only uses the variance-covariance matrix and therefore does not make explicit use of the global spatial dependency between variables. It is a pointwise procedure.
In MAF theory, the set of factors {Yi(x)} are orthogonal linear combinations of the original multivariate observation data (at the same point) Yi(x)=sum[ai j.Zj(x)]. Each transform Yi(x) is determined so as to exhibit greater spatial correlation than any of the previously determined transforms {Yj<i(x)} while remaining orthogonal to these transforms.
This property introduces a distance h for which this correlation is minimized. This distance should be small (it usually corresponds to the most represented sampling distance).
The MAF decomposition is used to express an initial set of variables into a family of factors which are spatially uncorrelated (at least at distance 0 and at distance h). The factors are sorted by increasing spatial correlation. The factors consisting largely of noise and exhibiting pure nugget-effect correlation structures are isolated in the lower rankings. The factors capturing most of the spatial correlation in the data are isolated in the highest rankings.
The MAF factors are used:
- to reduce the number of variables to be estimated in a filtering purpose. The first factors do not need to be accounted for as they correspond, by construction, to the uncorrelated parts of the phenomenon which is precisely what a filtering procedure seeks to remove.
- to allow an easy processing of each factor independently as they are uncorrelated.
Note: This method should only be applied in a stationary part of a field and preferably on gaussian variables.