Independent LD blocks
Split a correlation matrix in blocks as independent as possible. This finds the splitting in blocks that minimizes the sum of squared correlation between these blocks (i.e. everything outside these blocks). In case of equivalent splits, it then minimizes the sum of squared sizes of the blocks.
snp_ldsplit( corr, thr_r2, min_size, max_size, max_K = 500, max_r2 = 0.3, max_cost = ncol(corr)/200 )
corr |
Sparse correlation matrix. Usually, the output of |
thr_r2 |
Threshold under which squared correlations are ignored.
This is useful to avoid counting noise, which should give clearer patterns
of costs vs. number of blocks. It is therefore possible to have a splitting
cost of 0. If this parameter is used, then |
min_size |
Minimum number of variants in each block. This is used not to have a disproportionate number of small blocks. |
max_size |
Maximum number of variants in each block. This is used not to have blocks that are too large, e.g. to limit computational and memory requirements of applications that would use these blocks. For some long-range LD regions, it may be needed to allow for large blocks. You can now provide a vector of values to try. |
max_K |
Maximum number of blocks to consider. All optimal solutions for K
from 1 to |
max_r2 |
Maximum squared correlation allowed for one pair of variants in
two different blocks. This is used to make sure that strong correlations are
not discarded and also to speed up the algorithm. Default is |
max_cost |
Maximum cost reported. Default is |
A tibble with seven columns:
$max_size
: Input parameter, useful when providing a vector of values to try.
$n_block
: Number of blocks.
$cost
: The sum of squared correlations outside the blocks.
$cost2
: The sum of squared sizes of the blocks.
$perc_kept
: Percentage of initial non-zero values kept within the blocks defined.
$all_last
: Last index of each block.
$all_size
: Sizes of the blocks.
$block_num
: Resulting block numbers for each variant. This is not reported
anymore, but can be computed with rep(seq_along(all_size), all_size)
.
## Not run: corr <- readRDS(url("https://www.dropbox.com/s/65u96jf7y32j2mj/spMat.rds?raw=1")) THR_R2 <- 0.02 m <- ncol(corr) (SEQ <- round(seq_log(m / 30, m / 5, length.out = 10))) (res <- snp_ldsplit(corr, thr_r2 = THR_R2, min_size = 10, max_size = SEQ)) library(ggplot2) # trade-off cost / number of blocks qplot(n_block, cost, color = factor(max_size, SEQ), data = res) + theme_bw(14) + scale_y_log10() + theme(legend.position = "top") + labs(x = "Number of blocks", color = "Maximum block size", y = "Sum of squared correlations outside blocks") # trade-off cost / number of non-zero values qplot(perc_kept, cost, color = factor(max_size, SEQ), data = res) + theme_bw(14) + # scale_y_log10() + theme(legend.position = "top") + labs(x = "Percentage of non-zero values kept", color = "Maximum block size", y = "Sum of squared correlations outside blocks") # trade-off cost / sum of squared sizes qplot(cost2, cost, color = factor(max_size, SEQ), data = res) + theme_bw(14) + scale_y_log10() + geom_vline(xintercept = 0)+ theme(legend.position = "top") + labs(x = "Sum of squared blocks", color = "Maximum block size", y = "Sum of squared correlations outside blocks") ## Pick one solution and visualize blocks library(dplyr) all_ind <- res %>% arrange(cost2 * sqrt(5 + cost)) %>% print() %>% slice(1) %>% pull(all_last) ## Transform sparse representation into (i,j,x) triplets corrT <- as(corr, "dgTMatrix") upper <- (corrT@i <= corrT@j & corrT@x^2 >= THR_R2) df <- data.frame( i = corrT@i[upper] + 1L, j = corrT@j[upper] + 1L, r2 = corrT@x[upper]^2 ) df$y <- (df$j - df$i) / 2 ggplot(df) + geom_point(aes(i + y, y, alpha = r2)) + theme_minimal() + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), strip.background = element_blank(), strip.text.x = element_blank()) + scale_alpha_continuous(range = 0:1) + scale_x_continuous(expand = c(0.02, 0.02), minor_breaks = NULL, breaks = head(all_ind[[1]], -1) + 0.5) + facet_wrap(~ cut(i + y, 4), scales = "free", ncol = 1) + labs(x = "Position", y = NULL) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.