Here is a provisional draft of standards for Bayesian software - please comment! All those interested are also invited to edit these standards on this hackmd.io document. Note that the following list is intended to illustrate the nature of standards we envision implementing, and to serve as a concrete starting point for further discussion, rather than to be an exhaustive list. We are particularly interested in hearing opinions regarding aspects which we may have missed here.

# Bayesian Software

## Standards

Many of the following standards are written with reference to how software should function. Such aspects can and should often also be *tested*. Where testing of described functionality is expected, a “(TEST)” is added to the description.

Standards regarding *documentation* imply doing so at appropriate places within the software; either within functions themselves, within extended Vignettes, within the main `README`

document of a package, or elsewhere.

### Class Systems

- Bayesian Software should use or implement explicit class systems
- Bayesian Software may extend common class systems used in other packages for Bayesian analysis; see the CRAN Task view on Bayesian Inference. CRAN Task View: Bayesian Inference) for reference to other packages and class systems.
- Class systems should offer default
`summary`

and `plot`

methods

### General

Bayesian software should:

- Return appropriate metadata documenting types (or classes) and dimensions of input data
- Either return the input function with results, or enable direct access to such via additional functions which accept the return object as single argument.
- Implement convergence checkers and enable computations to be stopped on convergence
- Ensure appropriate mechanisms are provided for models which do not converge. This is often achieved by having a default value for numbers of iterations, and stopping there if not converged (TEST).
- Return convergence statistics or equivalent (such as full samples before and after convergence), to allow statistical comparison between different sampling algorithms (TEST)
- Return appropriate values sufficient to indicate absence of convergence (TEST)
- Allow convergence checkers to be over-ridden in order to generate chains of pre-specified lengths (TEST)
- Allow fine control of output verbosity, including an option to suppress all output messages. This should generally be achieved through either at least two parameters, or one parameter specifying multiple levels of output. One minimal level of output should determine whether normal output (for example of computational progress) is produced or not, and a second should determine whether useful diagnostic messages (for example, regarding convergence, or missing-value substitution) are produced or not.

### Documentation

Bayesian software should *document*:

- How to specify and modify prior distributions
- How to extract and compare convergence statistics
- How to use results of function to restart chain(s) at previous point
- How to use different sampling algorithms
- How to use stopping criteria

### Hyperparameters

Bayesian software should

- Only use the term “hyperparameter” to refer to parameters of prior distributions, and not to other parameters controlling the computational algorithm (which are described further below).

Bayesian software should implement the following pre-processing checks on hyperparameters:

- Ensure that hyperparameter vectors are commensurate with expected model input (TEST).
- Implement relevant pre-processing checks on hyperparameters, particularly on all second-order hyperparameters (for example, variance or shape hyperparameters) to ensure non-negativity (TEST).

Consider the following example of a log-likelihood estimator for a linear regression, controlled via a vector of three parameters, `p`

:

```
ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
```

Pre-processing stages should be used to determine:

- Numbers of parameters represented by any of the input arguments to the main log-likelihood function (see example below) (TEST).
- That the length of the vector of hyperparameters matches this length (TEST).
- That dimensions of input data are commensurate; typically by ensuring that inputs and outputs have same numbers of rows or same lengths; non-commensurate inputs should error by default (TEST).

For example, the `rstan`

package only allows named model parameters, and so largely avoids problems of mismatched lengths of parameter vectors, yet does not ensure the existence of named parameters prior to starting the computational chains. This ultimately results in each chain generating an error when non-existent parameters are named. Such controls should be part of a single pre-processing stage, and so should only generate a single error.

Determining numbers of parameters represented by input arguments requires parsing the underlying `function`

objects passed to internal algorithms. Although not a trivial operation, the following code illustrates how this can be achieved. Presume the above line defines `ll`

as an input function defining the log-likelihood of a linear regression model. R has very good utilities for parsing function calls, primarily through the `getParseData`

function from the `utils`

package. The parse data for a function can be extracted with the following lines:

```
p <- parse (deparse (ll))
x <- getParseData (p)
```

The object `x`

is a `data.frame`

of every R token (such as an expression, symbol, or operator) parsed from the function `ll`

. Input arguments used to define parameter vectors in any R software are accessed through R’s standard vector access syntax of `vec[i]`

, for some element `i`

of a vector `vec`

. The parse data for such begins with the `SYMBOL`

of `vec`

, the `[`

, a `NUM_CONST`

for the value of `i`

, and a closing `]`

. The following code can be used to extract elements of the parse data which match this pattern, and ultimately to extract the various values of `i`

used to access members of `vec`

.

```
vector_length <- function (x, i) {
xn <- x [which (x$token %in% c ("SYMBOL", "NUM_CONST", "'['", "']'")), ]
# split resultant data.frame at first "SYMBOL" entry
xn <- split (xn, cumsum (xn$token == "SYMBOL"))
# reduce to only those matching the above pattern
xn <- xn [which (vapply (xn, function (j)
j$text [1] == i & nrow (j) > 3,
logical (1)))]
ret <- NA_integer_ # default return value
if (length (xn) > 0) {
# get all values of NUM_CONST as integers
n <- vapply (xn, function (j)
as.integer (j$text [j$token == "NUM_CONST"] [1]),
integer (1), USE.NAMES = FALSE)
# and return max of these
ret <- max (n)
}
return (ret)
}
```

That function can then be used to determine the length of any inputs which are used as hyperparameter vectors:

```
ll <- function (p, x, y) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
p <- parse (text = deparse (ll))
x <- utils::getParseData (p)
# extract the names of the parameters:
params <- unique (x$text [x$token == "SYMBOL"])
lens <- vapply (params, function (i) vector_length (x, i), integer (1))
lens
#> y p x
#> NA 3 NA
```

And the vector `p`

is used as a hyperparameter vector containing three parameters. Any initial value vectors can then be examined to ensure that they have this same length.

### Computational Parameters

Computational parameters are considered here as those passed to Bayesian functions other than hyperparameters determining the forms of prior distributions. They typically include parameters controlling lengths of runs, lengths of burn-in periods, numbers of parallel computations, other parameters controlling how samples are to be generated, or convergence criteria.

All Computational Parameters should be checked for general “sanity” prior to calling primary computational algorithms. Standard sanity checks include:

- Checks that values are positive (for all length-type parameters) (TEST)
- Checks that arguments are of expected classes or types (for example, check that
`integer`

-type arguments are indeed `integer`

, with explicit conversion via `as.integer`

where not) (TEST).
- Automatic rejection of parameters of inappropriate type (for example
`character`

values passed for `integer`

-type parameters).
- Either automatic rejection or appropriate diagnostic messaging for parameters of inappropriate length; for example passing a vector of length > 1 to a parameter presumed to define a single value.

### Data Structure

Bayesian software should:

- Implement pre-processing routines to diagnose perfect collinearity, and provide appropriate diagnostic messages or warnings (TEST).
- Provide distinct routines for processing perfectly collinear data, potentially bypassing sampling algorithms (TEST).
- An appropriate test would confirm that
`system.time()`

or equivalent timing expressions for perfectly collinear data should be *less* than equivalent routines called with non-collinear data.
- Alternatively, a test could ensure that perfectly collinear data passed to a function with a stopping criteria generated no results, while specifying a fixed number of iterations may generate results.

- Provide options for users to specify how to handle missing data, with options minimally including:
- warn or error on missing data (TEST), in which case appropriate documentation should be provided.
- either remove or replace missing data, with the latter option further enabling specification of how values may be imputed (TEST).

### Seed control

Bayesian software should:

- Enable seeds to be passed as a parameter, either explicitly (a direct
`seed`

argument or similar) or implicitly (via starting values for chain) (TEST)
- Enable results of previous runs to be used as starting points for subsequent runs
- Include seed(s) or starting value(s) within return object (TEST)

Software which implements parallel processing should:

- Ensure each chain is started with a different seed by default (TEST)
- Issue diagnostic messages when same seeds are passed to distinct computational chains (TEST)
- Explicitly document advice
*not* to use `set.seed()`

- Provide the parameter with a
*plural* name: for example, “starting_values” and not “starting_value”
- Include the seeds or starting values for each chain within return object (TEST)

To avoid potential confusion between separate parameters to control random seeds and starting values, we recommended a single “starting values” rather than “seeds” argument, which appropriate translation of these parameters into seeds where necessary.

### Testing

Bayesian software should implement the following tests (where appropriate):

**Sensitivity to hyperparameters**

As an example of testing sensitivity to hyperparameters, consider the linear regression function from above,

```
ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
```

The hyperparameters are expressed as a single vector, `p`

, with three elements. The first two values of `p`

are the intercept and slope, the scales of which are entirely arbitrary within that function. Posterior distributions for these parameters should therefore be statistically independent of starting values. Such independence can be tested by writing tests to statistically compare posterior distributions using different initial values, and expecting no significant difference (for example, via a simple `t.test`

).

**Convergence**

Convergence should be tested in the following two ways:

- Convergence checkers should be tested by ensuring that algorithms with convergence checkers stop at some point less than an otherwise nominated number of steps, or less than a result run without a convergence checker.
- Convergence should then be tested by using the results from the above test as the starting point for an additional run of the same fixed length (that is, without automatic convergence checking), and ensuring that the two posterior distributions are statistically equivalent.

Where convergence checkers are themselves parameterised, the effects of such parameters should also be tested. For threshold parameters, for example, lower values should taken result in longer chains needed for convergence, and that should be tested.

### Internal Sampling Algorithms

Packages which implement their own sampling algorithms should document

- Support for algorithm (via citation, or reference to other software)
- Comparisons with external samplers which demonstrate intended advantage of implementation