protti: diff_abundance – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

diff_abundance

Calculate differential abundance between conditions

Description

Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.

Usage

diff_abundance(
  data,
  sample,
  condition,
  grouping,
  intensity_log2,
  missingness,
  comparison,
  mean = NULL,
  sd = NULL,
  n_samples = NULL,
  ref_condition,
  filter_NA_missingness = TRUE,
  method = c("t-test", "t-test_mean_sd", "moderated_t-test", "proDA"),
  p_adj_method = "BH",
  retain_columns = NULL
)

Arguments

`data`	A data frame containing at least the input variables that are required for the selected method. Ideally the output of `assign_missingness` or `impute` is used.
`sample`	The column in the data frame containing the sample name. Is not required if `method = "t-test_mean_sd"`.
`condition`	The column in the data frame containing the conditions.
`grouping`	The column in the data frame containing precursor or peptide identifiers.
`intensity_log2`	The column in the data frame containing intensity values. The intensity values need to be log2 transformed. Is not required if `method = "t-test_mean_sd"`.
`missingness`	The column in the data frame containing missingness information. Can be obtained by calling `assign_missingness`. Is not required if `method = "t-test_mean_sd"`.
`comparison`	The column in the data frame containing comparison information of treatment/reference condition pairs. Can be obtained by calling `assign_missingness`. Is not required if `method = "t-test_mean_sd"`.
`mean`	The column in the data frame containing mean values for two conditions. Is only required if `method = "t-test_mean_sd"`.
`sd`	The column in the data frame containing standard deviations for two conditions. Is only required if `method = "t-test_mean_sd"`.
`n_samples`	The column in the data frame containing the number of samples per condition for two conditions. Is only required if `method = "t-test_mean_sd"`.
`ref_condition`	The condition that is used as a reference for differential abundance calculation.
`filter_NA_missingness`	A logical, default is `TRUE`. For all methods except `"t-test_mean_sd"` missingness information has to be provided. If a reference/treatment pair has too few samples to be considered robust, it is annotated with `NA` as missingness. If this argument is `TRUE`, these reference/treatment pairs are filtered out.
`method`	A character vector, specifies the method used for statistical hypothesis testing. Methods include Welch test ("`t-test`"), a Welch test on means, standard deviations and number of replicates ("`t-test_mean_sd`") and a moderated t-test based on the `limma` package ("`moderated_t-test`"). More information on the moderated t-test can be found in the `limma` documentation. Furthermore, the `proDA` package specific method ("`proDA`") can be used to infer means across samples based on a probabilistic dropout model. This eliminates the need for data imputation since missing values are infered from the model. More information can be found in the `proDA` documentation.
`p_adj_method`	A character vector, specifies the p-value correction method. Possible methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default method is `"BH"`.
`retain_columns`	A vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns `retain_columns = NULL`. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains differential abundances (diff), p-values (pval) and adjusted p-values (adj_pval) for each protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair. Depending on the method the data frame contains additional columns:

"t-test": The std_error column contains the standard error of the differential abundances. n_obs contains the number of observations for the specific protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair.
"t-test_mean_sd": mean_control and mean_treated columns contain the means for the reference and treatment condition, respectively. sd_control and sd_treated columns contain the standard deviations for the reference and treatment condition, respectively. n_control and n_treated columns contain the numbers of samples for the reference and treatment condition, respectively. The std_error column contains the standard error of the differential abundances. t_statistic contains the t_statistic for the t-test.
"moderated_t-test": CI_2.5 and CI_97.5 give the 2.5 contains average abundances for treatment/reference pairs (mean of the two group means). t_statistic contains the t_statistic for the t-test. B The B-statistic is the log-odds that the protein, peptide or precursor (depending on grouping) has a differential abundance between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e, about four and a half to one. The probability that there is a differential abundance is 4.48/(1+4.48)=0.82, i.e., the probability is about 82 abundant.n_obs contains the number of observations for the specific protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair.
"proDA": The std_error column contains the standard error of the differential abundances. avg_abundance contains average abundances for treatment/reference pairs (mean of the two group means). t_statistic contains the t_statistic for the t-test. n_obs contains the number of observations for the specific protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair.

Examples

## Not run: 
diff_abundance(
  data,
  sample = r_file_name,
  condition = r_condition,
  grouping = eg_precursor_id,
  intensity_log2 = normalised_intensity_log2,
  missingness = missingness,
  comparison = comparison,
  ref_condition = "control",
  method = "t-test",
  retain_columns = c(pg_protein_accessions)
)

## End(Not run)

protti

Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

v0.1.1

MIT + file LICENSE

Authors

Jan-Philipp Quast [aut, cre], Dina Schuster [aut], ETH Zurich [cph, fnd]

Initial release