Generates missing data in a complete data matrix.
This function generates missing data in a complete data matrix. Both random and left-censored missing data can be generated. The percentage of all missing data is controlled by mean.THR. The percentage of missing data which are left-censored is controlled by MNAR.rate.
insertMVs(original, mean.THR, sd.THR, MNAR.rate)
original |
Original complete data matrix of peptide/protein expression. |
mean.THR |
Mean value of the threshold distribution which controls the total missing data rate ( |
sd.THR |
Standard deviation of the threshold distribution which controls the total missing data rate. |
MNAR.rate |
Percentage of MVs which are missing not at random. Among the total number of missing data ( |
A list including elements:
original |
Original complete data matrix |
original.mvs |
Data matrix derived from the original by generating missing data |
pNaNs |
The percetage of missing data generated in the original complete dataset |
Cosmin Lazar
# generate expression data matrix
exprsDataObj = generate.ExpressionData(nSamples1 = 6, nSamples2 = 6,
meanSamples = 0, sdSamples = 0.2,
nFeatures = 1000, nFeaturesUp = 50, nFeaturesDown = 50,
meanDynRange = 20, sdDynRange = 1,
meanDiffAbund = 1, sdDiffAbund = 0.2)
exprsData = exprsDataObj[[1]]
# insert 15% missing data with 50% missing not at random
m.THR = quantile(exprsData, probs = 0.15)
sd.THR = 0.1
MNAR.rate = 50
exprsData.MD.obj = insertMVs(exprsData,m.THR,sd.THR,MNAR.rate)
exprsData.MD = exprsData.MD.obj[[2]]
## Not run:
hist(exprsData[,1])
hist(exprsData.MD[,1])
hist(exprsData.imputed[,1])
## End(Not run)
## The function is currently defined as
function (original, mean.THR, sd.THR, MNAR.rate)
{
originalNaNs = original
nProt = nrow(original)
nSamples = ncol(original)
thr = matrix(rnorm(nSamples * nProt, mean.THR, sd.THR), nProt,
nSamples)
indices.MNAR = which(original < thr)
no.MNAR = round(MNAR.rate/100 * length(indices.MNAR))
temp = matrix(original, 1, nSamples * nProt)
temp[sample(indices.MNAR, no.MNAR)] = NaN
indices.MCAR = which(!is.na(temp))
no.MCAR = floor((100 - MNAR.rate)/100 * length(indices.MNAR))
print(no.MCAR + no.MNAR)
temp[sample(indices.MCAR, no.MCAR)] = NaN
originalNaNs = matrix(temp, nProt, nSamples)
originalNaNs_adjusted = originalNaNs
noNaNs_Var = rowSums(is.na(originalNaNs))
allNaNs_Vars = which(noNaNs_Var == nSamples)
sampleIndexToReplace = sample(1:nSamples, length(allNaNs_Vars),
replace = T)
for (i in 0:length(sampleIndexToReplace)) {
originalNaNs_adjusted[allNaNs_Vars[i], sampleIndexToReplace[i]] = original[allNaNs_Vars[i],
sampleIndexToReplace[i]]
}
pNaNs = length(which(is.na(originalNaNs_adjusted)))/(nSamples *
nProt)
print(pNaNs)
return(list(original, originalNaNs_adjusted, pNaNs))
}Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.