Check Genotype Matrix
Check that the provided genotype matrix is in the correct format, and check for low call rate samples and SNPs.
CheckGeno( GenoM, quiet = FALSE, Plot = FALSE, Return = "GenoM", DumPrefix = c("F0", "M0") )
GenoM |
the genotype matrix. |
quiet |
suppress messages. |
Plot |
display the plots of |
Return |
either 'GenoM' to return the cleaned-up genotype matrix, or 'excl' to return a list with excluded SNPs and individuals (see Value). |
DumPrefix |
length 2 vector, to check if these don't occur among genotyped individuals. |
If Return='excl'
a list with, if any are found:
ExcludedSNPs |
SNPs scored for <10
excluded when running |
ExcludedSnps-mono |
monomorphic (fixed) SNPs; automatically excluded
when running |
ExcludedIndiv |
Individuals scored for <5 reliably included during pedigree reconstruction. Individual call rate is calculated after removal of 'Excluded SNPs' |
Snps-LowCallRate |
SNPs scored for 10 recommended to be filtered out |
Indiv-LowCallRate |
individuals scored for <50 recommended to be filtered out |
When Return='excl'
the return is invisible
, i.e. a check
is run and warnings or errors are always displayed, but nothing may be
returned.
Appropriate call rate thresholds for SNPs and individuals depend on the total number of SNPs, distribution of call rates, genotyping errors, and the proportion of candidate parents that are SNPd (sibship clustering is more prone to false positives). Note that filtering first on SNP call rate tends to keep more individuals in.
data(Ped_HSg5) GenoM <- SimGeno(Ped_HSg5, nSnp=400, CallRate = runif(400, 0.2, 0.8)) # quick alternative: GenoM.checked <- CheckGeno(GenoM) # user supervised alternative: Excl <- CheckGeno(GenoM, Return = "excl") GenoM.orig <- GenoM # make a 'backup' copy if ("ExcludedSnps" %in% names(Excl)) GenoM <- GenoM[, -Excl[["ExcludedSnps"]]] if ("ExcludedInd" %in% names(Excl)) GenoM <- GenoM[!rownames(GenoM) %in% Excl[["ExcludedInd"]], ] if ("ExcludedIndiv" %in% names(Excl)) GenoM <- GenoM[!rownames(GenoM) %in% Excl[["ExcludedIndiv"]], ] # warning about SNPs scored for <50% of individuals ? SnpCallRate <- apply(GenoM, MARGIN=2, FUN = function(x) sum(x!=-9)) / nrow(GenoM) hist(SnpCallRate, breaks=50, col="grey") GenoM <- GenoM[, SnpCallRate > 0.6] # to be on the safe side, filter out low call rate individuals IndivCallRate <- apply(GenoM, MARGIN=1, FUN = function(x) sum(x!=-9)) / ncol(GenoM) hist(IndivCallRate, breaks=50, col="grey") GoodSamples <- rownames(GenoM)[ IndivCallRate > 0.8]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.