Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

readFasta2

Read file of protein sequences in fasta format Read fasta formatted file (from UniProt) to extract (protein) sequences and name. If tableOut=TRUE output may be organized as matrix for separating meta-annotation (eg GeneName, OrganismName, ProteinName) in separate columns.


Description

Read file of protein sequences in fasta format

Read fasta formatted file (from UniProt) to extract (protein) sequences and name. If tableOut=TRUE output may be organized as matrix for separating meta-annotation (eg GeneName, OrganismName, ProteinName) in separate columns.

Usage

readFasta2(
  filename,
  delim = "|",
  databaseSign = c("sp", "tr", "generic", "gi"),
  tableOut = FALSE,
  UniprSep = c("OS=", "OX=", "GN=", "PE=", "SV="),
  cleanCols = TRUE,
  silent = FALSE,
  callFrom = NULL,
  debug = FALSE
)

Arguments

filename

(character) names fasta-file to be read

delim

(character) delimeter at header-line

databaseSign

(character) characters at beginning right afetr the '>' (typically specifying the data-base-origin), they will be excluded from the sequance-header

tableOut

(logical) toggle to return named character-vector or matrix with enhaced parsing of fasta-header. The resulting matrix will contain the comumns 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep

UniprSep

(character) separators for further separating entry-fields if tableOut=TRUE, see also UniProt-FASTA-headers

cleanCols

(logical) remove columns with all entries NA, if tableOut=TRUE

silent

(logical) suppress messages

callFrom

(character) allows easier tracking of message(s) produced

debug

(logical) supplemental messages for debugging

Value

return (based on 'tableOut') simple character vector (of sequence) with Uniprot ID as name or matrix with columns: 'database','uniqueIdentifier','entryName','proteinName','sequence' and further columns depending on argument UniprSep

See Also

scan or read.fasta from the package seqinr

Examples

# tiny example with common contaminants 
path1 <- system.file('extdata',package='wrProteo')
fiNa <-  "conta1.fasta"
fasta1 <- readFasta2(file.path(path1,fiNa))
## now let's read and further separate annotation-fields
fasta2 <- readFasta2(file.path(path1,fiNa),tableOut=TRUE)
str(fasta1)

wrProteo

Proteomics Data Analysis Functions

v1.4.1
GPL-3
Authors
Wolfgang Raffelsberger [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.