Sampling and streaming records from fastq files
FastqFile
represents a path and connection to a fastq
file. FastqFileList
is a list of such connections.
FastqSampler
draws a subsample from a fastq file. yield
is the method used to extract the sample from the FastqSampler
instance; a short illustration is in the example
below. FastqSamplerList
is a list of FastqSampler
elements.
FastqStreamer
draws successive subsets from a fastq file, a
short illustration is in the example below. FastqStreamerList
is a list of FastqStreamer
elements.
## FastqFile and FastqFileList FastqFile(con, ...) FastqFileList(..., class="FastqFile") ## S3 method for class 'ShortReadFile' open(con, ...) ## S3 method for class 'ShortReadFile' close(con, ...) ## S4 method for signature 'FastqFile' readFastq(dirPath, pattern=character(), ...) ## S4 method for signature 'FastqFile' countFastq(dirPath, pattern=character(), ...) ## FastqSampler and FastqStreamer FastqSampler(con, n=1e6, readerBlockSize=1e8, verbose=FALSE, ordered = FALSE) FastqSamplerList(..., n=1e6, readerBlockSize=1e8, verbose=FALSE, ordered = FALSE) FastqStreamer(con, n, readerBlockSize=1e8, verbose=FALSE) FastqStreamerList(..., n, readerBlockSize=1e8, verbose=FALSE) yield(x, ...)
con, dirPath |
A character string naming a connection, or (for
|
n |
For |
readerBlockSize |
The number of bytes or characters to be read at
one time; smaller |
verbose |
Display progress. |
ordered |
logical(1) indicating whether sampled reads should be returned in the same order as they were encountered in the file. |
x |
An instance from the |
... |
Additional arguments. For |
pattern |
Ignored. |
class |
For developer use, to specify the underlying class
contained in the |
Available classes include:
FastqFile
A file path and connection to a fastq file.
FastqFileList
A list of FastqFile
instances.
FastqSampler
Uniformly sample records from a fastq file.
FastqStreamer
Iterate over a fastq file, returning successive parts of the file.
The following methods are available to users:
readFastq,FastqFile-method
:see also
?readFastq
.
writeFastq,ShortReadQ,FastqFile-method
:see also
?writeFastq
,
?"writeFastq,ShortReadQ,FastqFile-method"
.
countFastq,FastqFile-method
:see also
?countFastq
.
yield
:Draw a single sample from the
instance. Operationally this requires that the underlying data
(e.g., file) represented by the Sampler
instance be
visited; this may be time consuming.
FastqSampler
and FastqStreamer
use OpenMP threads (when
available) during creation of the return value. This may sometimes
create problems when a process is already running on multiple threads,
e.g., with an error message like
libgomp: Thread creation failed: Resource temporarily unavailable
A solution is to precede problematic code with the following code snippet, to disable threading
nthreads <- .Call(ShortRead:::.set_omp_threads, 1L) on.exit(.Call(ShortRead:::.set_omp_threads, nthreads))
sp <- SolexaPath(system.file('extdata', package='ShortRead')) fl <- file.path(analysisPath(sp), "s_1_sequence.txt") f <- FastqFile(fl) rfq <- readFastq(f) close(f) f <- FastqSampler(fl, 50) yield(f) # sample of size n=50 yield(f) # independent sample of size 50 close(f) ## Return sample as ordered in original file f <- FastqSampler(fl, 50, ordered=TRUE) yield(f) close(f) f <- FastqStreamer(fl, 50) yield(f) # records 1 to 50 yield(f) # records 51 to 100 close(f) ## iterating over an entire file f <- FastqStreamer(fl, 50) while (length(fq <- yield(f))) { ## do work here print(length(fq)) } close(f) ## iterating over IRanges rng <- IRanges(c(50, 100, 200), width=10:8) f <- FastqStreamer(fl, rng) while (length(fq <- yield(f))) { print(length(fq)) } close(f) ## Internal fields, methods, and help; for developers ShortRead:::.FastqSampler_g$methods() ShortRead:::.FastqSampler_g$fields() ShortRead:::.FastqSampler_g$help("yield")
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.