Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

plotComplexity

Plot sequence complexity profile of a fastq file.


Description

This function plots a histogram of the distribution of sequence complexities in the form of effective numbers of kmers as determined by seqComplexity. By default, kmers of size 2 are used, in which case a perfectly random sequences will approach an effective kmer number of 16 = 4 (nucleotides) ^ 2 (kmer size).

Usage

plotComplexity(
  fl,
  kmerSize = 2,
  window = NULL,
  by = 5,
  n = 1e+05,
  bins = 100,
  aggregate = FALSE,
  ...
)

Arguments

fl

(Required). character. File path(s) to fastq or fastq.gz file(s).

kmerSize

(Optional). Default 2. The size of the kmers (or "oligonucleotides" or "words") to use.

window

(Optional). Default NULL. The width in nucleotides of the moving window. If NULL the whole sequence is used.

by

(Optional). Default 5. The step size in nucleotides between each moving window tested.

n

(Optional). Default 100,000. The number of records to sample from the fastq file.

bins

(Optional). Default 100. The number of bins to use for the histogram.

aggregate

(Optional). Default FALSE. If TRUE, compute an aggregate quality profile for all fastq files provided.

...

(Optional). Arguments passed on to geom_histogram.

Value

A ggplot2 object. Will be rendered to default device if printed, or can be stored and further modified. See ggsave for additional options.

See Also

Examples

plotComplexity(system.file("extdata", "sam1F.fastq.gz", package="dada2"))

dada2

Accurate, high-resolution sample inference from amplicon sequencing data

v1.18.0
LGPL-3
Authors
Benjamin Callahan <benjamin.j.callahan@gmail.com>, Paul McMurdie, Susan Holmes
Initial release
2020-08-07

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.