Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

cd10-0-maha

Using Mahalanobis Distance and PCA for Quality Control


Description

Compute the Mahalanobis distance of each sample from the center of an N-dimensional principal component space.

Usage

mahalanobisQC(spca, N)

Arguments

spca

object of class SamplePCA representing the results of a principal components analysis.

N

integer scalar specifying the number of components to use when assessing QC.

Details

The theory says that, under the null hypothesis that all samples arise from the same multivariate normal distribution, the distance from the center of a D-dimensional principal component space should follow a chi-squared distribution with D degrees of freedom. This theory lets us compute p-values associated with the Mahalanobis distances for each sample. This method can be used for quality control or outlier identification.

Value

Returns a data frame containing two columns, with the rows corresponding to the columns of the original data set on which PCA was performed. First column is the chi-squared statistic, with N degrees of freedom. Second column is the associated p-value.

Author(s)

Kevin R. Coombes krc@silicovore.com

References

Coombes KR, et al.
Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clin Chem 2003; 49:1615-23.

Examples

library(oompaData)
data(lungData)
spca <- SamplePCA(na.omit(lung.dataset))
mc <- mahalanobisQC(spca, 2)
mc[mc$p.value < 0.01,]

ClassDiscovery

Classes and Methods for "Class Discovery" with Microarrays or Proteomics

v3.3.13
Apache License (== 2.0)
Authors
Kevin R. Coombes
Initial release
2020-11-10

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.