Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

SplitStrelkaSBSVCF

Split an in-memory Strelka VCF into SBS, DBS, and variants involving > 2 consecutive bases


Description

SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...

Usage

SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02, name.of.VCF = NULL)

Arguments

vcf.df

An in-memory data frame containing a Strelka VCF file contents.

max.vaf.diff

The maximum difference of VAF, default value is 0.02. If the absolute difference of VAFs for adjacent SBSs is bigger than max.vaf.diff, then these adjacent SBSs are likely to be "merely" asynchronous single base mutations, opposed to a simultaneous doublet mutation or variants involving more than two consecutive bases.

name.of.VCF

Name of the VCF file.

Value

A list of in-memory objects with the elements:

  1. SBS.vcf: Data frame of pure SBS mutations – no DBS or 3+BS mutations.

  2. DBS.vcf: Data frame of pure DBS mutations – no SBS or 3+BS mutations.

  3. discarded.variants: Non-NULL only if there are variants that were excluded from the analysis. See the added extra column discarded.reason for more details.


ICAMS

In-Depth Characterization and Analysis of Mutational Signatures ('ICAMS')

v2.3.10
GPL-3 | file LICENSE
Authors
Steve Rozen, Nanhai Jiang, Arnoud Boot, Mo Liu, Yang Wu
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.