Split an in-memory Strelka VCF into SBS, DBS, and variants involving > 2 consecutive bases
SBSs are single base substitutions, e.g. C>T, A>G,.... DBSs are double base substitutions, e.g. CC>TT, AT>GG, ... Variants involving > 2 consecutive bases are rare, so this function just records them. These would be variants such ATG>CCT, AGAT>TCTA, ...
SplitStrelkaSBSVCF(vcf.df, max.vaf.diff = 0.02, name.of.VCF = NULL)
vcf.df |
An in-memory data frame containing a Strelka VCF file contents. |
max.vaf.diff |
The maximum difference of VAF, default value is 0.02. If
the absolute difference of VAFs for adjacent SBSs is bigger than
|
name.of.VCF |
Name of the VCF file. |
A list of in-memory objects with the elements:
SBS.vcf
: Data frame of pure SBS mutations – no DBS or 3+BS
mutations.
DBS.vcf
: Data frame of pure DBS mutations – no SBS or 3+BS
mutations.
discarded.variants
: Non-NULL only if there are
variants that were excluded from the analysis. See the added extra column
discarded.reason
for more details.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.