dada2: plotComplexity – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

plotComplexity

Plot sequence complexity profile of a fastq file.

Description

This function plots a histogram of the distribution of sequence complexities in the form of effective numbers of kmers as determined by seqComplexity. By default, kmers of size 2 are used, in which case a perfectly random sequences will approach an effective kmer number of 16 = 4 (nucleotides) ^ 2 (kmer size).

Usage

plotComplexity(
  fl,
  kmerSize = 2,
  window = NULL,
  by = 5,
  n = 1e+05,
  bins = 100,
  aggregate = FALSE,
  ...
)

Arguments

`fl`	(Required). `character`. File path(s) to fastq or fastq.gz file(s).
`kmerSize`	(Optional). Default 2. The size of the kmers (or "oligonucleotides" or "words") to use.
`window`	(Optional). Default NULL. The width in nucleotides of the moving window. If NULL the whole sequence is used.
`by`	(Optional). Default 5. The step size in nucleotides between each moving window tested.
`n`	(Optional). Default 100,000. The number of records to sample from the fastq file.
`bins`	(Optional). Default 100. The number of bins to use for the histogram.
`aggregate`	(Optional). Default FALSE. If TRUE, compute an aggregate quality profile for all fastq files provided.
`...`	(Optional). Arguments passed on to `geom_histogram`.

Value

A ggplot2 object. Will be rendered to default device if printed, or can be stored and further modified. See ggsave for additional options.

Examples

plotComplexity(system.file("extdata", "sam1F.fastq.gz", package="dada2"))

dada2

Accurate, high-resolution sample inference from amplicon sequencing data

v1.18.0

LGPL-3

Authors

Benjamin Callahan <benjamin.j.callahan@gmail.com>, Paul McMurdie, Susan Holmes

Initial release

2020-08-07