Functions for user-created and built-in ShortRead filters
These functions create user-defined (srFitler
) or built-in
instances of SRFilter
objects. Filters can be
applied to objects from ShortRead
, returning a logical vector
to be used to subset the objects to include only those components
satisfying the filter.
srFilter(fun, name = NA_character_, ...) ## S4 method for signature 'missing' srFilter(fun, name=NA_character_, ...) ## S4 method for signature 'function' srFilter(fun, name=NA_character_, ...) compose(filt, ..., .name) idFilter(regex=character(0), fixed=FALSE, exclude=FALSE, .name="idFilter") occurrenceFilter(min=1L, max=1L, withSread=c(NA, TRUE, FALSE), duplicates=c("head", "tail", "sample", "none"), .name=.occurrenceName(min, max, withSread, duplicates)) nFilter(threshold=0L, .name="CleanNFilter") polynFilter(threshold=0L, nuc=c("A", "C", "T", "G", "other"), .name="PolyNFilter") dustyFilter(threshold=Inf, batchSize=NA, .name="DustyFilter") srdistanceFilter(subject=character(0), threshold=0L, .name="SRDistanceFilter") ## ## legacy filters for ungapped alignments ## chromosomeFilter(regex=character(0), fixed=FALSE, exclude=FALSE, .name="ChromosomeFilter") positionFilter(min=-Inf, max=Inf, .name="PositionFilter") strandFilter(strandLevels=character(0), .name="StrandFilter") alignQualityFilter(threshold=0L, .name="AlignQualityFilter") alignDataFilter(expr=expression(), .name="AlignDataFilter")
fun |
An object of class |
name |
A |
filt |
A |
.name |
An optional |
regex |
Either |
fixed |
|
exclude |
|
min |
|
max |
|
strandLevels |
Either |
withSread |
A |
duplicates |
Either |
threshold |
A |
nuc |
A |
batchSize |
|
subject |
A |
expr |
A |
... |
Additional arguments for subsequent methods; these arguments are not currently used. |
srFilter
allows users to construct their own filters. The
fun
argument to srFilter
must be a function accepting a
single argument x
and returning a logical vector that can be
used to select elements of x
satisfying the filter with
x[fun(x)]
The signature(fun="missing")
method creates a default filter
that returns a vector of TRUE
values with length equal to
length(x)
.
compose
constructs a new filter from one or more existing
filter. The result is a filter that returns a logical vector with
indices corresponding to components of x
that pass all
filters. If not provided, the name of the filter consists of the names
of all component filters, each separated by " o "
.
The remaining functions documented on this page are built-in filters
that accept an argument x
and return a logical vector of
length(x)
indicating which components of x
satisfy the
filter.
idFilter
selects elements satisfying
grep(regex, id(x), fixed=fixed)
.
chromosomeFilter
selects elements satisfying
grep(regex, chromosome(x), fixed=fixed)
.
positionFilter
selects elements satisfying
min <= position(x) <= max
.
strandFilter
selects elements satisfying
match(strand(x), strand, nomatch=0) > 0
.
occurrenceFilter
selects elements that occur >=min
and
<=max
times. withSread
determines how reads will be
treated: TRUE
to include the sread, chromosome, strand, and
position when determining occurrence, FALSE
to include
chromosome, strand, and position, and NA
to include only
sread. The default is withSread=NA
. duplicates
determines how reads with more than max
reads are
treated. head
selects the first max
reads of each set of
duplicates, tail
the last max
reads, and sample
a
random sample of max
reads. none
removes all reads
represented more than max
times. The user can also provide a
function (as used by tapply
) of a single argument to
select amongst reads.
nFilter
selects elements with fewer than threshold
'N'
symbols in each element of sread(x)
.
polynFilter
selects elements with fewer than threshold
copies of any nucleotide indicated by nuc
.
dustyFilter
selects elements with high sequence complexity, as
characterized by their dustyScore
. This emulates the
dust
command from WindowMaker
software. Calculations can be memory intensive; use
batchSize
to process the argument to dustyFilter
in
batches of the specified size.
srdistanceFilter
selects elements at an edit distance greater
than threshold
from all sequences in subject
.
alignQualityFilter
selects elements with alignQuality(x)
greater than threshold
.
alignDataFilter
selects elements with
pData(alignData(x))
satisfying expr
. expr
should
be formulated as though it were to be evaluated as
eval(expr, pData(alignData(x)))
.
srFilter
returns an object of SRFilter
.
Built-in filters return a logical vector of length(x)
, with
TRUE
indicating components that pass the filter.
Martin Morgan <mtmorgan@fhcrc.org>
sp <- SolexaPath(system.file("extdata", package="ShortRead")) aln <- readAligned(sp, "s_2_export.txt") # Solexa export file, as example # a 'chromosome 5' filter filt <- chromosomeFilter("chr5.fa") aln[filt(aln)] # filter during input readAligned(sp, "s_2_export.txt", filter=filt) # x- and y- coordinates stored in alignData, when source is SolexaExport xy <- alignDataFilter(expression(abs(x-500) > 200 & abs(y-500) > 200)) aln[xy(aln)] # both filters as a single filter chr5xy <- compose(filt, xy) aln[chr5xy(aln)] # both filters as a collection filters <- c(filt, xy) subsetByFilter(aln, filters) summary(filters, aln) # read, chromosome, strand, position tuples occurring exactly once aln[occurrenceFilter(withSread=TRUE, duplicates="none")(aln)] # reads occurring exactly once aln[occurrenceFilter(withSread=NA, duplicates="none")(aln)] # chromosome, strand, position tuples occurring exactly once aln[occurrenceFilter(withSread=FALSE, duplicates="none")(aln)] # custom filter: minimum calibrated base call quality >20 goodq <- srFilter(function(x) { apply(as(quality(x), "matrix"), 1, min, na.rm=TRUE) > 20 }, name="GoodQualityBases") goodq aln[goodq(aln)]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.