wrProteo: readProlineFile – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

readProlineFile

Read csv or txt files exported from Proline and MS-Angel

Description

Quantification results form MS-Angel and Proline Proline exported as xlsx format can be read directly. Besides, files in tsv, csv (European and US format) or tabulated txt can be read, too. Then relevant information gets extracted, the data can optionally normalized and displayed as boxplot or vioplot. The final output is a list containing 6 elements: $raw, $quant, $annot, $counts, $quantNotes and $notes. Alternatively, a data.frame with annotation and quantitation data may be returned if separateAnnot=FALSE. Note: There is no normalization by default since quite frequently data produced by Proline are already sufficiently normalized. The figure produced using the argument plotGraph=TRUE may help judging if the data appear sufficiently normalized (distribtions should align).

Usage

readProlineFile(
  fileName,
  path = NULL,
  normalizeMeth = NULL,
  logConvert = TRUE,
  sampleNames = NULL,
  quantCol = "^abundance_",
  annotCol = c("accession", "description", "is_validated", "protein_set_score",
    "X.peptides", "X.specific_peptides"),
  remStrainNo = TRUE,
  pepCountCol = c("^psm_count_", "^peptides_count_"),
  trimColnames = FALSE,
  refLi = NULL,
  separateAnnot = TRUE,
  plotGraph = TRUE,
  tit = NULL,
  graphTit = NULL,
  wex = 2,
  specPref = c(conta = "_conta\\|", mainSpecies = "OS=Homo sapiens"),
  silent = FALSE,
  callFrom = NULL
)

Arguments

`fileName`	(character) name of file to read; .xlsx-, .csv-, .txt- and .tsv can be read (csv, txt and tsv may be gz-compressed). Reading xlsx requires package 'readxl'.
`path`	(character) optional path (note: Windows backslash sould be protected or written as '/')
`normalizeMeth`	(character) normalization method (for details and options see `normalizeThis`)
`logConvert`	(logical) convert numeric data as log2, will be placed in $quant
`sampleNames`	(character) new column-names for quantification data (ProteomeDiscoverer does not automatically use file-names from spectra); Please use with care since order of samples might be different as you expect
`quantCol`	(character or integer) colums with main quantitation-data : precise colnames to extract, or if length=1 content of `quantCol` will be used as pattern to search among column-names for $quant using `grep`
`annotCol`	(character) precise colnames or if length=1 pattern to search among column-names for $annot
`remStrainNo`	(logical) if `TRUE`, the organism annotation will be trimmed to uppercaseWord+space+lowercaseWord (eg Homo sapiens)
`pepCountCol`	(character) pattern to search among column-names for count data of PSM and NoOfPeptides
`trimColnames`	(logical) optional trimming of column-names of any redundant characters from beginning and end
`refLi`	(integer) custom decide which line of data is main species, if single character entry it will be used to choose a group of species (eg 'mainSpe')
`separateAnnot`	(logical) separate annotation form numeric data (quantCol and annotCol must be defined)
`plotGraph`	(logical or matrix of integer) optional plot vioplot of initial data; if integer, it will be passed to `layout` when plotting
`tit`	(character) custom title to plot
`graphTit`	(character) (depreciated custom title to plot), please use 'tit'
`wex`	(integer) relative expansion factor of the violin-plot (will be passed to `vioplotW`)
`specPref`	(character or list) define characteristic text for recognizing (main) groups of species (1st for comtaminants - will be marked as 'conta', 2nd for main species- marked as 'mainSpe', and optional following ones for supplemental tags/species - maked as 'species2','species3',...); if list and list-element has multiple values they will be used for exact matching of accessions (ie 2nd of argument `annotCol`)
`silent`	(logical) suppress messages
`callFrom`	(character) allow easier tracking of message(s) produced

Details

This function has been developed using Proline version 1.6.1 coupled with MS-Angel 1.6.1. The format of the exported file depends on the columns chosen for export, default settings from Proline and MS-Angel work fine.

Value

list with $raw (initial/raw abundance values), $quant with final normalized quantitations, $annot (columns ), $counts an array with 'PSM' and 'NoOfPeptides', $quantNotes and $notes; or a data.frame with quantitation and annotation if separateAnnot=FALSE

Examples

path1 <- system.file("extdata", package="wrProteo")
fiNa <- "exampleProlineABC.csv.gz"
dataABC <- readProlineFile(file.path(path1, fiNa))
summary(dataABC$quant)

wrProteo

Proteomics Data Analysis Functions

v1.4.1

GPL-3

Authors

Wolfgang Raffelsberger [aut, cre]

Initial release