Read protein annotation as exported from UniProt batch-conversion
This function allows reading and importing protein-ID conversion results from UniProt.
To do so, first copy/paste your query IDs into UniProt 'Retrieve/ID mapping' field called '1. Provide your identifiers' (or upload as file), verify '2. Select options'.
In a typical case of 'enst000xxx' IDs you may leave default settings, ie 'Ensemble Transcript' as input and 'UniProt KB' as output. Then, 'Submit' your search and retreive results via
'Download', you need to specify a 'Tab-separated' format ! If you download as 'Compressed' you need to decompress the .gz file before running the function readUCSCtable
In addition, a file with UCSC annotation (Ensrnot accessions and chromosomic locations, obtained using readUCSCtable
) can be integrated.
readUniProtExport( UniProtFileNa, deUcsc = NULL, targRegion = NULL, useUniPrCol = NULL, silent = FALSE, callFrom = NULL )
UniProtFileNa |
(character) name (and path) of file exported from Uniprot (tabulated text file inlcuding headers) |
deUcsc |
(data.frame) object produced by |
targRegion |
(character or list) optional marking of chromosomal locations to be part of a given chromosomal target region,
may be given as character like |
useUniPrCol |
(character) optional declaration which colums from UniProt exported file should be used/imported (default 'EnsID','Entry','Entry.name','Status','Protein.names','Gene.names','Length'). |
silent |
(logical) suppress messages |
callFrom |
(character) allows easier tracking of message(s) produced |
In a typicall use case, first chromosomic location annotation is extracted from UCSC for the species of interest and imported to R using readUCSCtable
.
However, the tables provided by UCSC don't contain Uniprot IDs. Thus, an additional (batch-)conversion step needs to get added.
For this reason readUCSCtable
allows writing a file with Ensemble transcript IDs which can be converted tu UniProt IDs at the site of UniProt.
Then, UniProt annotation (downloaded as tab-separated) can be imported and combined with the genomic annotation using this function.
data.frame (with columns $EnsID, $Entry, $Entry.name, $Status, $Protein.names, $Gene.names, $Length; if deUcsc
is integrated plus: $chr, $type, $start, $end, $score, $strand, $Ensrnot, $avPos)
path1 <- system.file("extdata",package="wrProteo") deUniProtFi <- file.path(path1,"deUniProt_hg38chr11extr.tab") deUniPr1a <- readUniProtExport(deUniProtFi) str(deUniPr1a) ## Workflow starting with UCSC annotation (gtf) files : gtfFi <- file.path(path1,"UCSC_hg38_chr11extr.gtf.gz") UcscAnnot1 <- readUCSCtable(gtfFi) ## Results of conversion at UniProt are already available (file "deUniProt_hg38chr11extr.tab") myTargRegion <- list("chr1", pos=c(198110001,198570000)) myTargRegion2 <-"chr11:1-135,086,622" # works equally well deUniPr1 <- readUniProtExport(deUniProtFi,deUcsc=UcscAnnot1, targRegion=myTargRegion) ## Now UniProt IDs and genomic locations are both available : str(deUniPr1)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.