Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

dustyScore

Summarize low-complexity sequences


Description

dustyScore identifies low-complexity sequences, in a manner inspired by the dust implementation in BLAST.

Usage

dustyScore(x, batchSize=NA, ...)

Arguments

x

A DNAStringSet object, or object derived from ShortRead, containing a collection of reads to be summarized.

batchSize

NA or an integer(1) vector indicating the maximum number of reads to be processed at any one time.

...

Additional arguments, not currently used.

Details

The following methods are defined:

dustyScore

signature(x = "DNAStringSet"): operating on an object derived from class DNAStringSet.

dustyScore

signature(x = "ShortRead"): operating on the sread of an object derived from class ShortRead.

The dust-like calculations used here are as implemented at https://stat.ethz.ch/pipermail/bioc-sig-sequencing/2009-February/000170.html. Scores range from 0 (all triplets unique) to the square of the width of the longest sequence (poly-A, -C, -G, or -T).

The batchSize argument can be used to reduce the memory requirements of the algorithm by processing the x argument in batches of the specified size. Smaller batch sizes use less memory, but are computationally less efficient.

Value

A vector of numeric scores, with length equal to the length of x.

Author(s)

Herve Pages (code); Martin Morgan

References

Morgulis, Getz, Schaffer and Agarwala, 2006. WindowMasker: window-based masker for sequenced genomes, Bioinformatics 22: 134-141.

See Also

Examples

sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
range(dustyScore(rfq))

ShortRead

FASTQ input and manipulation

v1.48.0
Artistic-2.0
Authors
Martin Morgan, Michael Lawrence, Simon Anders
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.