wrMisc: searchDataPairs – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

searchDataPairs

Search duplicated data over multiple columns, ie pairs of data

Description

searchDataPairs searches matrix for columns of similar data, ie 'duplicate' values in separate columns or very similar columns if 'realDupsOnly'=FALSE. Initial distance measures will be normalized either to diagonale (normRange=TRUE) of 'window' or to the real max distance observed (equal or less than diagonale). Return data.frame with names for sample-pair, percent of identical values (100 for complete identical pair) and relative (Euclidean) distance (ie max dist observed =1.0). Note, that low distance values do not necessarily imply correlating data.

Usage

searchDataPairs(
  dat,
  disThr = 0.01,
  byColumn = TRUE,
  normRange = TRUE,
  altNa = NULL,
  realDupsOnly = TRUE,
  silent = FALSE,
  callFrom = NULL
)

Arguments

`dat`	matrix or data.frame
`disThr`	(numeric) threshold to decide when to report similar data (applied on normalized distances, low val fewer reported), applied on normalized distances (norm to diagonale of all data for best relative 'unbiased' view)
`byColumn`	(logical) rotates main input by 90 degrees (using `t`), thus allows to read by rows instead of by columns
`normRange`	(logical) normize each columns separately if TRUE
`altNa`	(character, default NULL) vector with alternative names (for display)
`realDupsOnly`	(logical) if `TRUE` will consider equal values only, otherwise will also consider very close values (based on argument `disThr`)
`silent`	(logical) suppres messages
`callFrom`	(character) allows easier tracking of message(s) produced

Value

data.frame with names for sample-pair, percent of identical values (100 for complete identical pair) and rel (Euclidean) distance (ie max dist observed =1.0)

Examples

mat <- round(matrix(c(11:40,runif(20)+12,11:19,17,runif(20)+18,11:20),nrow=10),1); colnames(mat)=1:9
searchDataPairs(mat,disThr=0.05)

wrMisc

Analyze Experimental High-Throughput (Omics) Data

v1.5.4

GPL-3

Authors

Wolfgang Raffelsberger [aut, cre]

Initial release