Read proteinGroups.txt files exported from MaxQuant
Quantification results form MaxQuant can be read using this function and relevant information extracted.
Innput files compressed as .gz can be read as well. Besides protein abundance values (XIC) peptide counting information like number of unique razor-peptides or PSM values can be extracted, too.
The protein abundance values mat be normalized using multiple methods (median normalization is default), the determination of normalization values can be restricted to specific proteins
(normalization to bait protein(s), or to matrix in UPS1 spike-in experiments).
Besides, a graphical display of the distruibution of protein abundance values may be generated.
The final output is a list containing these elements: $raw
, $quant
, $annot
, $counts
, $quantNotes
, $notes
, or (if separateAnnot=FALSE
) data.frame
with annotation- and main quantification-content.
readMaxQuantFile( path, fileName = "proteinGroups.txt", normalizeMeth = "median", quantCol = "LFQ.intensity", contamCol = "Potential.contaminant", pepCountCol = c("Razor...unique.peptides.", "MS.MS.count."), uniqPepPat = NULL, refLi = NULL, extrColNames = c("Majority.protein.IDs", "Fasta.headers", "Number.of.proteins"), specPref = c(conta = "conta|CON_|LYSC_CHICK", mainSpecies = "OS=Homo sapiens"), remRev = TRUE, separateAnnot = TRUE, tit = NULL, wex = 1.6, plotGraph = TRUE, silent = FALSE, callFrom = NULL )
path |
(character) path of file to be read |
fileName |
(character) name of file to be read (default 'proteinGroups.txt' as typically generated by MaxQuant in txt folder). Gz-compressed files can be read, too. |
normalizeMeth |
(character) normalization method (for details see |
quantCol |
(character or integer) exact col-names, or if length=1 content of |
contamCol |
(character or integer, length=1) which columns should be used for contaminants marked by ProteomeDiscoverer |
pepCountCol |
(character) pattern to search among column-names for count data of PSM and NoOfPeptides |
uniqPepPat |
(character, length=1) depreciated, please use |
refLi |
(character or integer) custom specify which line of data is main species, if character (eg 'mainSpe'), the column 'SpecType' in $annot will be searched for exact match of the (single) term given |
extrColNames |
(character) column names to be read (1: prefix for LFQ quantitation, default 'LFQ.intensity'; 2: column name for protein-IDs, default 'Majority.protein.IDs'; 3: column names of fasta-headers, default 'Fasta.headers', 4: column name for number of protein IDs matching, default 'Number.of.proteins') |
specPref |
(character) prefix to identifiers allowing to separate i) recognize contamination database, ii) species of main identifications and iii) spike-in species |
remRev |
(logical) option to remove all protein-identifications based on reverse-peptides |
separateAnnot |
(logical) if |
tit |
(character) custom title to plot |
wex |
(numeric) relative expansion factor of the violin in plot |
plotGraph |
(logical) optional plot vioplot of initial and normalized data (using |
silent |
(logical) suppress messages |
callFrom |
(character) allow easier tracking of message produced |
This function has been developed using MaxQuant versions 1.6.10.x to 1.6.17.x, the format of resulting file 'proteinGroups.txt' is typically well conserved.
list with $raw
(initial/raw abundance values), $quant
with final normalized quantitations, $annot
(columns ), $counts
an array with 'PSM' and 'NoOfRazorPeptides', $quantNotes
and $notes
; or a data.frame with quantitation and annotation if separateAnnot=FALSE
path1 <- system.file("extdata", package="wrProteo") # Here we'll load a short/trimmed example file (thus not MaxQuant default name) fiNa <- "proteinGroupsMaxQuant1.txt.gz" specPr <- c(conta="conta|CON_|LYSC_CHICK", mainSpecies="YEAST",spike="HUMAN_UPS") dataMQ <- readMaxQuantFile(path1, file=fiNa, specPref=specPr, tit="tiny MaxQuant") summary(dataMQ$quant) matrixNAinspect(dataMQ$quant, gr=gl(3,3))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.