openEBGM: EBGM Scores for Mining Large Contingency Tables
openEBGM is a Bayesian data mining package for calculating Empirical Bayes scores based on the Gamma-Poisson Shrinker (GPS) model for large, sparse contingency (frequency) tables. openEBGM includes several important functions implementing DuMouchel's (1999, 2001) methods for calculating the EBGM (Empirical Bayes Geometric Mean) score and the quantile scores used to create credibility intervals. Some simple disproportionality scores (relative report rate and proportional reporting ratio) are also included. Adverse event report data are used as an example application. Much of openEBGM's code is derived from the PhViD and mederrRank packages.
The data preparation function, processRaw
, converts raw data
into actual and expected counts for product/event pairs.
processRaw
also adds the relative reporting ratio (RR) and
proportional reporting ratio (PRR). The data squashing function,
squashData
, implements the simple version of data squashing
described in DuMouchel et al. (2001). Data squashing can be used to reduce
computational burden.
The negative log-likelihood functions (negLL
,
negLLsquash
, negLLzero
, and
negLLzeroSquash
) provide the means of calculating the
negative log-likelihoods as mentioned in the DuMouchel papers. DuMouchel
uses the likelihood function, based on the marginal distributions of the
counts, to estimate the hyperparameters of the prior distribution.
The hyperparameter estimation functions (exploreHypers
and
autoHyper
) use gradient-based approaches to estimate the
hyperparameters, θ, of the prior distribution (gamma mixture)
using the negative log-likelihood functions from the marginal distributions
of the counts (negative binomial). θ is a vector containing five
parameters (α_1, β_1, α_2, β_2,
and P). hyperEM
estimates θ using a version
of the EM algorithm.
The posterior distribution functions calculate the mixture fraction
(Qn
), geometric mean (ebgm
), and quantiles
(quantBisect
) of the posterior distribution. Alternatively,
ebScores
can be used to create an object of class openEBGM
that contains the EBGM and quantiles scores. Appropriate methods exist for
the generic functions print
,
summary
, and plot
for openEBGM
objects.
Ahmed I, Poncet A (2016). PhViD: an R package for PharmacoVigilance signal Detection. R package version 1.0.8.
Venturini S, Myers J (2015). mederrRank: Bayesian Methods for Identifying the Most Harmful Medication Errors. R package version 0.0.8.
DuMouchel W (1999). "Bayesian Data Mining in Large Frequency Tables, With an Application to the FDA Spontaneous Reporting System." The American Statistician, 53(3), 177-190.
DuMouchel W, Pregibon D (2001). "Empirical Bayes Screening for Multi-item Associations." In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp. 67-76. ACM, New York, NY, USA. ISBN 1-58113-391-X.
Evans SJW, Waller P, Davis S (2001). "Use of Proportional Reporting Ratios (PRRs) for Signal Generation from Spontaneous Adverse Drug Reaction Reports." Pharmacoepidemiology and Drug Safety, 10(6), 483-486.
FDA (2017). "CFSAN Adverse Event Reporting System (CAERS)." URL https://www.fda.gov/Food/ComplianceEnforcement/ucm494015.htm.
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.