Statistical Software: Bayesian Analyses

Some further considerations on approaches to assessing software for Bayesian analysis:

The core of a Bayesian software package is the estimation of posterior distributions. These are generally sampled from a range of possible distributional estimates, and so this procedure is also commonly referred to as sampling. Bayesian software can accordingly be distinguished between the two primary sub-categories of software which implements its own sampling procedures, and software which relies on external, pre-existing software to generate samples. The latter software often implements Bayesian procedures in order to develop domain-specific models, while the former tend to be more general in application.

Bayesian Software with Internal Sampling Procedures

Numerous software packages for sampling posterior distributions have been developed in most computer languages, and standard algorithms for doing so are both stable and well developed. The implementation of new sampling procedures may thus be presumed to reflect some advance in the underlying sampling procedures, the algorithmic implementation thereof, or both. Given this presumption, the following standards may be considered to apply:

  1. The sampling procedure should have some form of support, typically either through published literature or previous implementations in other languages.
  2. Evidence should be provided, either through tests or perhaps separate vignettes, of how the implemented sampling procedure compares with other procedures.

Other general points for standards might usefully include:

  • The use of conjugate prior relationships for common distributional forms?

Bayesian Software with External Sampling Procedures

Most packages which rely on other software to generate posterior samples are designed to construct specific classes of models, for which the following aspects may be considered in software assessment:

  • Documentation of sampling procedure(s) and associated packages used.
  • Documentation of how to use or access different sampling procedures (where appropriate)?
  • Are methods implemented, or is guidance provided, for hyperparameter tuning?
  • Are improper prior distributions admitted? If so, are or can appropriate diagnostic messages be provided?
  • Is guidance provided for selecting appropriate prior distributions?
  • Are messages or warnings issued for low effective sample sizes for posterior distributions?
  • Do tests examine sensitivity to different forms of prior distributions?
  • Are any restrictions on admissible forms of prior distribution clearly explained? Do non-admissible forms generate appropriate messages, warnings, or errors?
  • Do tests consider and compare multiple sets of input data?

Many of these points may also be considered for the first category, and perhaps be best considered general points, with the preceding category detailing only points applying specifically to software which implements its own sampling procedures.

General Standards

Regardless of the above categories, Bayesian software may be generally expected to:

  • Implement output structures or classes which are as compatible as possible with other packages for Bayesian analyses (obviously rstan is a special case here, and so other packages might also rightly be expected to only loosely adhere to any such requirements or suggestions)
  • Write tests and/or vignettes to demonstrate conversion between output classes from various packages
  • Report and provide methods to extract prior distributional (hyper-)parameters from output class
  • Report and provide methods to extract sampling parameters from output class
  • Clearly describe how missing values are handled
  • Clearly describe acceptable types of input data
  • Clearly describe the structure of output data (including Class implementations where appropriate)
  • Either implement generic plot and summary methods for return objects, or ensure object class structures work with external plot and summary methods.

Diagnostic procedures

The performance of Bayesian algorithms and models may also be assessed via a number of diagnostic routines. It may be useful to recommend the use of one or more of the following:

  • The use of coda diagnostics
  • The use of Simulation-Based Calibration for rstan models, through the rstan::sbc() function, or equivalent procedures for other packages.
  • Messages or warnings for extreme scale reduction factors (R_hat << 1 | R_hat >> 1)
  • Presence of additional diagnostic procedures such as for convergence of estimates
  • Convergence of posterior distributions with maximum likelihood estimates for flat prior distributions.
1 Like