Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

msaTrim

Trimming multiple sequence alignments


Description

Trimming a multiple sequence alignment by discarding columns with too many gaps.

Usage

msaTrim(msa, gap.end = 0.5, gap.mid = 0.9)

Arguments

msa

A fasta object containing a multiple alignment.

gap.end

Fraction of gaps tolerated at the ends of the alignment (0-1).

gap.mid

Fraction of gaps tolerated inside the alignment (0-1).

Details

A multiple alignment is trimmed by removing columns with too many indels (gap-symbols). Any columns containing a fraction of gaps larger than gap.mid are discarded. For this reason, gap.mid should always be farily close to 1.0 therwise too many columns may be discarded, destroying the alignment.

Due to the heuristics of multiple alignment methods, both ends of the alignment tend to be uncertain and most of the trimming should be done at the ends. Starting at each end, columns are discarded as long as their fraction of gaps surpasses gap.end. Typically gap.end can be much smaller than gap.mid, but if set too low you risk that all columns are discarded!

Value

The trimmed alignment is returned as a fasta object.

Author(s)

Lars Snipen.

See Also

Examples

msa.file <- file.path(path.package("microseq"),"extdata", "small.msa")
msa <- readFasta(msa.file)
print(str_length(msa$Sequence))
msa.trimmed <- msaTrim(msa)
print(str_length(msa.trimmed$Sequence))
msa.mat <- msa2mat(msa)  # for use with ape::as.DNAbin(msa.mat)

microseq

Basic Biological Sequence Handling

v2.1.4
GPL-2
Authors
Lars Snipen, Kristian Hovde Liland
Initial release
2021-01-25

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.