Statistical Software: Bayesian Analyses

mpadge · June 30, 2020, 11:27am

Here is a provisional draft of standards for Bayesian software - please comment! All those interested are also invited to edit these standards on this hackmd.io document. Note that the following list is intended to illustrate the nature of standards we envision implementing, and to serve as a concrete starting point for further discussion, rather than to be an exhaustive list. We are particularly interested in hearing opinions regarding aspects which we may have missed here.

Bayesian Software

Standards

Many of the following standards are written with reference to how software should function. Such aspects can and should often also be tested. Where testing of described functionality is expected, a “(TEST)” is added to the description.

Standards regarding documentation imply doing so at appropriate places within the software; either within functions themselves, within extended Vignettes, within the main README document of a package, or elsewhere.

Class Systems

Bayesian Software should use or implement explicit class systems
Bayesian Software may extend common class systems used in other packages for Bayesian analysis; see the CRAN Task view on Bayesian Inference. CRAN Task View: Bayesian Inference) for reference to other packages and class systems.
Class systems should offer default summary and plot methods

General

Bayesian software should:

Return appropriate metadata documenting types (or classes) and dimensions of input data
Either return the input function with results, or enable direct access to such via additional functions which accept the return object as single argument.
Implement convergence checkers and enable computations to be stopped on convergence
Ensure appropriate mechanisms are provided for models which do not converge. This is often achieved by having a default value for numbers of iterations, and stopping there if not converged (TEST).
Return convergence statistics or equivalent (such as full samples before and after convergence), to allow statistical comparison between different sampling algorithms (TEST)
Return appropriate values sufficient to indicate absence of convergence (TEST)
Allow convergence checkers to be over-ridden in order to generate chains of pre-specified lengths (TEST)
Allow fine control of output verbosity, including an option to suppress all output messages. This should generally be achieved through either at least two parameters, or one parameter specifying multiple levels of output. One minimal level of output should determine whether normal output (for example of computational progress) is produced or not, and a second should determine whether useful diagnostic messages (for example, regarding convergence, or missing-value substitution) are produced or not.

Documentation

Bayesian software should document:

How to specify and modify prior distributions
How to extract and compare convergence statistics
How to use results of function to restart chain(s) at previous point
How to use different sampling algorithms
How to use stopping criteria

Hyperparameters

Bayesian software should

Only use the term “hyperparameter” to refer to parameters of prior distributions, and not to other parameters controlling the computational algorithm (which are described further below).

Bayesian software should implement the following pre-processing checks on hyperparameters:

Ensure that hyperparameter vectors are commensurate with expected model input (TEST).
Implement relevant pre-processing checks on hyperparameters, particularly on all second-order hyperparameters (for example, variance or shape hyperparameters) to ensure non-negativity (TEST).

Consider the following example of a log-likelihood estimator for a linear regression, controlled via a vector of three parameters, p:

ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)

Pre-processing stages should be used to determine:

Numbers of parameters represented by any of the input arguments to the main log-likelihood function (see example below) (TEST).
That the length of the vector of hyperparameters matches this length (TEST).
That dimensions of input data are commensurate; typically by ensuring that inputs and outputs have same numbers of rows or same lengths; non-commensurate inputs should error by default (TEST).

For example, the rstan package only allows named model parameters, and so largely avoids problems of mismatched lengths of parameter vectors, yet does not ensure the existence of named parameters prior to starting the computational chains. This ultimately results in each chain generating an error when non-existent parameters are named. Such controls should be part of a single pre-processing stage, and so should only generate a single error.

Determining numbers of parameters represented by input arguments requires parsing the underlying function objects passed to internal algorithms. Although not a trivial operation, the following code illustrates how this can be achieved. Presume the above line defines ll as an input function defining the log-likelihood of a linear regression model. R has very good utilities for parsing function calls, primarily through the getParseData function from the utils package. The parse data for a function can be extracted with the following lines:

p <- parse (deparse (ll))
x <- getParseData (p)

The object x is a data.frame of every R token (such as an expression, symbol, or operator) parsed from the function ll. Input arguments used to define parameter vectors in any R software are accessed through R’s standard vector access syntax of vec[i], for some element i of a vector vec. The parse data for such begins with the SYMBOL of vec, the [, a NUM_CONST for the value of i, and a closing ]. The following code can be used to extract elements of the parse data which match this pattern, and ultimately to extract the various values of i used to access members of vec.

vector_length <- function (x, i) {
    xn <- x [which (x$token %in% c ("SYMBOL", "NUM_CONST", "'['", "']'")), ]
    # split resultant data.frame at first "SYMBOL" entry
    xn <- split (xn, cumsum (xn$token == "SYMBOL"))
    # reduce to only those matching the above pattern
    xn <- xn [which (vapply (xn, function (j)
                             j$text [1] == i & nrow (j) > 3,
                             logical (1)))]
    ret <- NA_integer_ # default return value
    if (length (xn) > 0) {
        # get all values of NUM_CONST as integers
        n <- vapply (xn, function (j)
                         as.integer (j$text [j$token == "NUM_CONST"] [1]),
                         integer (1), USE.NAMES = FALSE)
        # and return max of these
        ret <- max (n)
    }
    return (ret)
}

That function can then be used to determine the length of any inputs which are used as hyperparameter vectors:

ll <- function (p, x, y) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
p <- parse (text = deparse (ll))
x <- utils::getParseData (p)

# extract the names of the parameters:
params <- unique (x$text [x$token == "SYMBOL"])
lens <- vapply (params, function (i) vector_length (x, i), integer (1))
lens
#>  y  p  x 
#> NA  3 NA

And the vector p is used as a hyperparameter vector containing three parameters. Any initial value vectors can then be examined to ensure that they have this same length.

Computational Parameters

Computational parameters are considered here as those passed to Bayesian functions other than hyperparameters determining the forms of prior distributions. They typically include parameters controlling lengths of runs, lengths of burn-in periods, numbers of parallel computations, other parameters controlling how samples are to be generated, or convergence criteria.

All Computational Parameters should be checked for general “sanity” prior to calling primary computational algorithms. Standard sanity checks include:

Checks that values are positive (for all length-type parameters) (TEST)
Checks that arguments are of expected classes or types (for example, check that integer-type arguments are indeed integer, with explicit conversion via as.integer where not) (TEST).
Automatic rejection of parameters of inappropriate type (for example character values passed for integer-type parameters).
Either automatic rejection or appropriate diagnostic messaging for parameters of inappropriate length; for example passing a vector of length > 1 to a parameter presumed to define a single value.

Data Structure

Bayesian software should:

Implement pre-processing routines to diagnose perfect collinearity, and provide appropriate diagnostic messages or warnings (TEST).
Provide distinct routines for processing perfectly collinear data, potentially bypassing sampling algorithms (TEST).
- An appropriate test would confirm that system.time() or equivalent timing expressions for perfectly collinear data should be less than equivalent routines called with non-collinear data.
- Alternatively, a test could ensure that perfectly collinear data passed to a function with a stopping criteria generated no results, while specifying a fixed number of iterations may generate results.
Provide options for users to specify how to handle missing data, with options minimally including:
- warn or error on missing data (TEST), in which case appropriate documentation should be provided.
- either remove or replace missing data, with the latter option further enabling specification of how values may be imputed (TEST).

Seed control

Bayesian software should:

Enable seeds to be passed as a parameter, either explicitly (a direct seed argument or similar) or implicitly (via starting values for chain) (TEST)
Enable results of previous runs to be used as starting points for subsequent runs
Include seed(s) or starting value(s) within return object (TEST)

Software which implements parallel processing should:

Ensure each chain is started with a different seed by default (TEST)
Issue diagnostic messages when same seeds are passed to distinct computational chains (TEST)
Explicitly document advice not to use set.seed()
Provide the parameter with a plural name: for example, “starting_values” and not “starting_value”
Include the seeds or starting values for each chain within return object (TEST)

To avoid potential confusion between separate parameters to control random seeds and starting values, we recommended a single “starting values” rather than “seeds” argument, which appropriate translation of these parameters into seeds where necessary.

Testing

Bayesian software should implement the following tests (where appropriate):

Sensitivity to hyperparameters

As an example of testing sensitivity to hyperparameters, consider the linear regression function from above,

ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)

The hyperparameters are expressed as a single vector, p, with three elements. The first two values of p are the intercept and slope, the scales of which are entirely arbitrary within that function. Posterior distributions for these parameters should therefore be statistically independent of starting values. Such independence can be tested by writing tests to statistically compare posterior distributions using different initial values, and expecting no significant difference (for example, via a simple t.test).

Convergence

Convergence should be tested in the following two ways:

Convergence checkers should be tested by ensuring that algorithms with convergence checkers stop at some point less than an otherwise nominated number of steps, or less than a result run without a convergence checker.
Convergence should then be tested by using the results from the above test as the starting point for an additional run of the same fixed length (that is, without automatic convergence checking), and ensuring that the two posterior distributions are statistically equivalent.

Where convergence checkers are themselves parameterised, the effects of such parameters should also be tested. For threshold parameters, for example, lower values should taken result in longer chains needed for convergence, and that should be tested.

Internal Sampling Algorithms

Packages which implement their own sampling algorithms should document

Support for algorithm (via citation, or reference to other software)
Comparisons with external samplers which demonstrate intended advantage of implementation

Topic		Replies	Views
Statistical Software: Regression and Supervised Learning Statistical Software Peer Review regression , supervised-learning	18	1270	August 24, 2020
Statistical Software: Time Series Statistical Software Peer Review time-series	4	1018	June 7, 2021
Statistical Software: General Standards Statistical Software Peer Review	0	661	August 24, 2020
Statistical software peer review categories Statistical Software Peer Review	13	2165	June 19, 2020
Statistical Software: Exploratory Data Analysis and Summary Statistics Statistical Software Peer Review eda	7	1143	August 24, 2020

Statistical Software: Bayesian Analyses

Bayesian Software

Standards

Class Systems

General

Documentation

Hyperparameters

Computational Parameters

Data Structure

Seed control

Testing

Internal Sampling Algorithms

Related topics