Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

impute

Imputation of missing values


Description

impute is calculating imputation values for missing data depending on the selected method.

Usage

impute(
  data,
  sample,
  grouping,
  intensity,
  condition,
  comparison,
  missingness,
  noise = NULL,
  method,
  skip_log2_transform_error = FALSE,
  retain_columns = NULL
)

Arguments

data

a dataframe that is ideally the output from the assign_missingness function. It should containing at least the input variables. For each "reference_vs_treatment" comparison, there should be the pair of the reference and treatment condition. That means the reference condition should be doublicated once for every treatment.

sample

the name of the column containing the sample names.

grouping

the name of the column containing precursor or peptide identifiers.

intensity

the name of the column containing intensity values. Note: The input intensities should be log2 transformed.

condition

the name of the column containing the conditions.

comparison

the name of the column containing the comparisons of treatment/reference pairs. This is an output of the assign_missingnes function.

missingness

the name of the column that contains the missingness type of the data determines how values for imputation are sampled. This should at least contain "MAR" or "MNAR".

noise

the name of the column that contains the noise value for the precursor/peptide. Is only required if method = "noise". Note: Noise values need to be log2 transformed.

method

the method to be used for imputation. For method = "ludovic", MNAR missingness is sampled from a normal distribution around a value that is three lower (log2) than the lowest intensity value recorded for the precursor/peptide and that has a spread of the mean standard deviation for the precursor/peptide. For method = "noise", MNAR missingness is sampled from a normal distribution around the mean noise for the precursor/peptide and that has a spread of the mean standard deviation (from each condition) for the precursor/peptide.

skip_log2_transform_error

logical, if FALSE a check is performed to validate that input values are log2 transformed. If input values are > 40 the test is failed and an error is thrown.

retain_columns

A vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains an imputed_intensity and imputed column in addition to the required input columns. The imputed column indicates if a value was imputed. The imputed_intensity column contains imputed intensity values for previously missing intensities.

Examples

## Not run: 
impute(
  data,
  sample = r_file_name,
  grouping = eg_precursor_id,
  intensity = intensity_log2,
  condition = r_condition,
  comparison = comparison,
  missingness = missingness,
  method = "ludovic"
)

## End(Not run)

protti

Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

v0.1.1
MIT + file LICENSE
Authors
Jan-Philipp Quast [aut, cre], Dina Schuster [aut], ETH Zurich [cph, fnd]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.