Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

diff_abundance

Calculate differential abundance between conditions


Description

Performs differential abundance calculations and statistical hypothesis tests on data frames with protein, peptide or precursor data. Different methods for statistical testing are available.

Usage

diff_abundance(
  data,
  sample,
  condition,
  grouping,
  intensity_log2,
  missingness,
  comparison,
  mean = NULL,
  sd = NULL,
  n_samples = NULL,
  ref_condition,
  filter_NA_missingness = TRUE,
  method = c("t-test", "t-test_mean_sd", "moderated_t-test", "proDA"),
  p_adj_method = "BH",
  retain_columns = NULL
)

Arguments

data

A data frame containing at least the input variables that are required for the selected method. Ideally the output of assign_missingness or impute is used.

sample

The column in the data frame containing the sample name. Is not required if method = "t-test_mean_sd".

condition

The column in the data frame containing the conditions.

grouping

The column in the data frame containing precursor or peptide identifiers.

intensity_log2

The column in the data frame containing intensity values. The intensity values need to be log2 transformed. Is not required if method = "t-test_mean_sd".

missingness

The column in the data frame containing missingness information. Can be obtained by calling assign_missingness. Is not required if method = "t-test_mean_sd".

comparison

The column in the data frame containing comparison information of treatment/reference condition pairs. Can be obtained by calling assign_missingness. Is not required if method = "t-test_mean_sd".

mean

The column in the data frame containing mean values for two conditions. Is only required if method = "t-test_mean_sd".

sd

The column in the data frame containing standard deviations for two conditions. Is only required if method = "t-test_mean_sd".

n_samples

The column in the data frame containing the number of samples per condition for two conditions. Is only required if method = "t-test_mean_sd".

ref_condition

The condition that is used as a reference for differential abundance calculation.

filter_NA_missingness

A logical, default is TRUE. For all methods except "t-test_mean_sd" missingness information has to be provided. If a reference/treatment pair has too few samples to be considered robust, it is annotated with NA as missingness. If this argument is TRUE, these reference/treatment pairs are filtered out.

method

A character vector, specifies the method used for statistical hypothesis testing. Methods include Welch test ("t-test"), a Welch test on means, standard deviations and number of replicates ("t-test_mean_sd") and a moderated t-test based on the limma package ("moderated_t-test"). More information on the moderated t-test can be found in the limma documentation. Furthermore, the proDA package specific method ("proDA") can be used to infer means across samples based on a probabilistic dropout model. This eliminates the need for data imputation since missing values are infered from the model. More information can be found in the proDA documentation.

p_adj_method

A character vector, specifies the p-value correction method. Possible methods are c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none"). Default method is "BH".

retain_columns

A vector indicating if certain columns should be retained from the input data frame. Default is not retaining additional columns retain_columns = NULL. Specific columns can be retained by providing their names (not in quotations marks, just like other column names, but in a vector).

Value

A data frame that contains differential abundances (diff), p-values (pval) and adjusted p-values (adj_pval) for each protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair. Depending on the method the data frame contains additional columns:

  • "t-test": The std_error column contains the standard error of the differential abundances. n_obs contains the number of observations for the specific protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair.

  • "t-test_mean_sd": mean_control and mean_treated columns contain the means for the reference and treatment condition, respectively. sd_control and sd_treated columns contain the standard deviations for the reference and treatment condition, respectively. n_control and n_treated columns contain the numbers of samples for the reference and treatment condition, respectively. The std_error column contains the standard error of the differential abundances. t_statistic contains the t_statistic for the t-test.

  • "moderated_t-test": CI_2.5 and CI_97.5 give the 2.5 contains average abundances for treatment/reference pairs (mean of the two group means). t_statistic contains the t_statistic for the t-test. B The B-statistic is the log-odds that the protein, peptide or precursor (depending on grouping) has a differential abundance between the two groups. Suppose B=1.5. The odds of differential abundance is exp(1.5)=4.48, i.e, about four and a half to one. The probability that there is a differential abundance is 4.48/(1+4.48)=0.82, i.e., the probability is about 82 abundant.n_obs contains the number of observations for the specific protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair.

  • "proDA": The std_error column contains the standard error of the differential abundances. avg_abundance contains average abundances for treatment/reference pairs (mean of the two group means). t_statistic contains the t_statistic for the t-test. n_obs contains the number of observations for the specific protein, peptide or precursor (depending on the grouping variable) and the associated treatment/reference pair.

Examples

## Not run: 
diff_abundance(
  data,
  sample = r_file_name,
  condition = r_condition,
  grouping = eg_precursor_id,
  intensity_log2 = normalised_intensity_log2,
  missingness = missingness,
  comparison = comparison,
  ref_condition = "control",
  method = "t-test",
  retain_columns = c(pg_protein_accessions)
)

## End(Not run)

protti

Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

v0.1.1
MIT + file LICENSE
Authors
Jan-Philipp Quast [aut, cre], Dina Schuster [aut], ETH Zurich [cph, fnd]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.