Correspondence analysis or reciprocal averaging is a multivariate statistical technique proposed by Herman Otto Hartley and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form. All data should be on the same scale for CA to be applicable, keeping in mind that the method treats rows and columns equivalently. It is traditionally applied to contingency tables — CA decomposes the chi-squared statistic associated with this table into orthogonalfactors. Because CA is a descriptive technique, it can be applied to tables whether or not the statistic is appropriate.
Details
Like principal components analysis, correspondence analysis creates orthogonal components and, for each item in a table, a set of scores. Correspondence analysis is performed on a contingency table, C, of size m×n where m is the number of rows and n is the number of columns.
Preprocessing
From table C, compute a set of weights for the columns and the rows, where row weights are and column weights are where is the sum of C and is a column vector of ones with the appropriate dimension. Next, compute a table S, where C is divided by the sum of C Finally, compute a table M from S and the weights as such
Interpretation of preprocessing
The vectors and give the marginal probabilities of being the row and column classes, respectively, while gives the joint probability distribution of rows and columns. Therefore gives deviations from independence. These deviations, appropriately scaled and then squared, are summed up to yield the chi-squared statistic on.
Orthogonal components
The table M is then decomposed with the generalized singular value decomposition where the left and right singular vectors are constrained by weights. The weights are diagonal tables and where the diagonal elements of are and the off-diagonal elements are all 0. M is then decomposed via the generalized singular value decomposition where
Factor scores
Factor scores for the row items of table C are and for the column items
The statistical system R includes the packages: MASS, ade4, ca, vegan, ExPosition, andFactoMineR which perform correspondence analysis and multiple correspondence analysis.