We now have preliminary drafts of both general standards for statistical software, and specific extensions into a few initial categories - Regression, Bayesian and Monte Carlo, Exploratory Data Analysis, and Time Series. We would very much appreciate any feedback, comments, suggestions, improvements to any and all of the current standards. Everybody is encouraged to peruse the single “master” document in bookdown form, and to provide feedback in increasing orders of formality in one or more of the following ways:
- The #stats-peer-review slack channel
- The relevant sections of the
discussion forum
- The github repository for the “master” document, either via issues for general discussion, or pull requests for more concrete improvements.
Note that we anticipate some degree of “shuffling” between general and category-specific standards, much of which will be deferred until we have developed standards for all of our anticipated categories. There is thus one and only one form of comment for which we are currently not seeking feedback, which is comments regard whether category-specific standard X might be better considered general standard Y - that will be worked out later.
Looking forward to any and all feedback from anybody interested in helping to create our system for peer reviewing statistical software. Without further ado, the relevant standards follow immediately below.
Standards for Bayesian Software
Bayesian and Monte Carlo Software (hereafter referred to for simplicity as “Bayesian Software”) is presumed to perform one or more of the following steps:
-
- Document how to specify inputs including:
- 1.1 Data
- 1.2 Hyperparameters determining prior distributions
- 1.3 Parameters determining the computational processes
-
- Accept and validate all of forms of input
-
- Apply data transformation and pre-processing steps
-
- Apply one or more analytic algorithms, generally sampling algorithms used to generate estimates of posterior distributions
-
- Return the result of that algorithmic application
-
- Offer additional functionality such as printing or summarising return results
This document details standards for each of these steps, each prefixed with “BS”.
1. Documentation of Inputs
Prior to actual standards for documentation of inputs, we note one terminological standard for Bayesian software:
- BS1.0 Bayesian software should use the term “hyperparameter” exclusively to refer to parameters determining the form of prior distributions, and should use either the generic term “parameter” or some conditional variant(s) such as “computation parameters” to refer to all other parameters.
Bayesian Software should provide the following documentation of how to specify inputs:
- BS2.1 Description of how to enter data, both in textual form and via code examples. Both of these should consider the simplest cases of single objects representing independent and dependent data, and potentially more complicated cases of multiple independent data inputs.
- BS3.1 Description of how to specify prior distributions, both in textual form describing the general principles of specifying prior distributions, along with more applied descriptions and examples, within:
- B32.1a The main package
README
, either as textual description or example code
- B32.1b At least one package vignette, both as general and applied textual descriptions, and example code
- B32.1c Function-level documentation, preferably with code included in examples
- B43.1 Description of all parameters which control the computational process (typically those determining aspects such as numbers and lengths of sampling processes, seeds used to start them, thinning parameters determining post-hoc sampling from simulated values, and convergence criteria). In particular:
- B43.1a Bayesian Software should document, both in text and examples, how to use the output of previous simulations as starting points of subsequent simulations.
- B43.1b Where applicable, Bayesian software should document, both in text and examples, how to use different sampling algorithms for a given model.
- BS5.1 For Bayesian Software which implements or otherwise enables convergence checkers, documentation should explicitly describe and provide examples of use with and without convergence checkers.
- BS5.2 For Bayesian Software which implements or otherwise enables multiple convergence checkers, differences between these should be explicitly tested.
2. Input Data Structures and Validation
This section contains standards primarily intended to ensure that input data, including model specifications, are validated prior to passing through to the main computational algorithms.
2.1 Input Data
Bayesian Software is commonly designed to accept generic one- or two-dimensional forms such as vector, matrix, or data.frame
objects. The first standards concerns the range of possible generic forms for input data:
- BS2.1 Bayesian Software which accepts one-dimensional input should ensure values are appropriately pre-processed regardless of class structures. The
units
package provides a good example, in creating objects that may be treated as vectors, yet which have a class structure that does not inherit from the vector
class. Using these objects as input often causes software to fail. The storage.mode
of the underlying objects may nevertheless be examined, and the objects transformed or processed accordingly to ensure such inputs do not lead to errors.
- BS2.2 Bayesian Software which accepts two-dimension input should implement pre-processing routines to ensure conversion of as many possible forms as possible to some standard format which is then passed to all analytic functions. In particular, tests should demonstrate that:
- BS2.2a
data.frame
or equivalent objects which have columns which do not themselves have standard class attributes (typically, vector
) are appropriately processed, and do not error without reason. This behaviour should be tested. Again, columns created by the units
package provide a good test case.
- BS2.2b
data.frame
or equivalent objects which have list columns should ensure that those columns are appropriately pre-processed either through being removed, converted to equivalent vector columns where appropriate, or some other appropriate treatment. This behaviour should be tested.
- BS2.3 Bayesian Software should implement pre-processing routines to ensure all input data is dimensionally commensurate, for example by ensuring commensurate lengths of vectors or numbers of rows of rectangular inputs.
2.2 Prior Distributions, Model Specifications, and Hyperparameters
The second set of standards in this section concern specification of prior distributions, model structures, or other equivalent ways of specifying hypothesised relationships among input data structures. R already has a diverse range of Bayesian Software with distinct approaches to this task, commonly either through specifying a model as a character vector representing an R function, or an external file either as R code, or encoded according to some alternative system (such as for rstan
).
As explicated above, the term “hyperparameters” is interpreted here to refer to parameters which define prior distributions, while a “model specification”, or simply “model”, is an encoded description of how those hyperparameters are hypothesised to transform to a posterior distribution.
Bayesian Software should:
- BS2.4 Ensure that all appropriate validation and pre-processing of hyperparameters are implemented as distinct pre-processing steps prior to submitting to analytic routines, and especially prior to submitting to multiple parallel computational chains.
- BS2.5 Ensure that lengths of hyperparameter vectors are checked, with no excess values silently discarded (unless such output is explicitly suppressed, as detailed below).
- BS2.6 Ensure that lengths of hyperparameter vectors are commensurate with expected model input (see example immediately below)
- BS2.7 Where possible, implement pre-processing checks to validate appropriateness of numeric values submitted for hyperparameters; for example, by ensuring that hyperparameters defining second-order moments such as distributional variance or shape parameters, or any parameters which are logarithmically transformed, are non-negative.
The following example demonstrates how standards like the above (BS2.6-2.7) might be addressed. Consider the following function which defines a log-likelihood estimator for a linear regression, controlled via a vector of three hyperparameters, p
:
ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
Pre-processing stages should be used to determine:
- That the dimensions of the input data,
x
and y
, are commensurate (BS2.3); non-commensurate inputs should error by default.
- The length of the vector
p
(BS2.5)
The latter task is not necessarily straightforward, because the definition of the function, ll()
, will itself generally be part of the input to an actual Bayesian Software function. This functional input thus needs to be examined to determine expected lengths of hyperparameter vectors. The following code illustrates one way to achieve this, relying on utilities for parsing function calls in R, primarily through the getParseData
function from the utils
package. The parse data for a function can be extracted with the following line:
x <- getParseData (parse (text = deparse (ll)))
The object x
is a data.frame
of every R token (such as an expression, symbol, or operator) parsed from the function ll
. The following section illustrates how this data can be used to determine the expected lengths of vector inputs to the function, ll()
.
click to see details
Input arguments used to define parameter vectors in any R software are accessed through R’s standard vector access syntax of vec[i]
, for some element i
of a vector vec
. The parse data for such begins with the SYMBOL
of vec
, the [
, a NUM_CONST
for the value of i
, and a closing ]
. The following code can be used to extract elements of the parse data which match this pattern, and ultimately to extract the various values of i
used to access members of vec
.
vector_length <- function (x, i) {
xn <- x [which (x$token %in% c ("SYMBOL", "NUM_CONST", "'['", "']'")), ]
# split resultant data.frame at first "SYMBOL" entry
xn <- split (xn, cumsum (xn$token == "SYMBOL"))
# reduce to only those matching the above pattern
xn <- xn [which (vapply (xn, function (j)
j$text [1] == i & nrow (j) > 3,
logical (1)))]
ret <- NA_integer_ # default return value
if (length (xn) > 0) {
# get all values of NUM_CONST as integers
n <- vapply (xn, function (j)
as.integer (j$text [j$token == "NUM_CONST"] [1]),
integer (1), USE.NAMES = FALSE)
# and return max of these
ret <- max (n)
}
return (ret)
}
That function can then be used to determine the length of any inputs which are used as hyperparameter vectors:
ll <- function (p, x, y) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
p <- parse (text = deparse (ll))
x <- utils::getParseData (p)
# extract the names of the parameters:
params <- unique (x$text [x$token == "SYMBOL"])
lens <- vapply (params, function (i) vector_length (x, i), integer (1))
lens
#> y p x
#> NA 3 NA
And the vector p
is used as a hyperparameter vector containing three parameters. Any initial value vectors can then be examined to ensure that they have this same length.
Not all Bayesian Software is designed to accept model inputs expressed as R code. The rstan
package, for example, implements its own model specification language, and only allows hyperparameters to be named, and not addressed by index. While this largely avoids problems of mismatched lengths of parameter vectors, the software (at v2.21.1) does not ensure the existence of named parameters prior to starting the computational chains. This ultimately results in each chain generating an error when a model specification refers to a non-existent or undefined hyperparameter. Such controls should be part of a single pre-processing stage, and so should only generate a single error.
2.3 Computational Parameters
Computational parameters are considered here as those passed to Bayesian functions other than hyperparameters determining the forms of prior distributions. They typically include parameters controlling lengths of runs, lengths of burn-in periods, numbers of parallel computations, other parameters controlling how samples are to be generated, or convergence criteria. All Computational Parameters should be checked for general “sanity” prior to calling primary computational algorithms. The standards for such sanity checks include that Bayesian Software should:
- BS2.8 Check that values for parameters are positive (except where negative values may be accepted)
- BS2.9 Check lengths and/or dimensions of inputs, and either automatically reject or provide appropriate diagnostic messaging for parameters of inappropriate length or dimension; for example passing a vector of length > 1 to a parameter presumed to define a single value (unless such output is explicitly suppressed, as detailed below)
- BS2.10 Check that arguments are of expected classes or types (for example, check that
integer
-type arguments are indeed integer
, with explicit conversion via as.integer
where not)
- BS2.11 Automatically reject parameters of inappropriate type (for example
character
values passed for integer
-type parameters that are unable to be appropriately converted).
The following two sub-sections consider particular cases of computational parameters.
2.4 Seed Parameters
Bayesian software should:
- BS2.12 Enable seeds to be passed as a parameter (through a direct
seed
argument or similar), or as a vector of parameters, one for each chain.
- BS2.13 Enable results of previous runs to be used as starting points for subsequent runs
Bayesian Software which implements parallel processing should:
- BS2.14 Ensure each chain is started with a different seed by default
- BS2.15 Issue diagnostic messages when identical seeds are passed to distinct computational chains
- BS2.16 Explicitly document advice not to use
set.seed()
- BS2.17 Provide the parameter with a plural name: for example, “starting_values” and not “starting_value”
To avoid potential confusion between separate parameters to control random seeds and starting values, we recommended a single “starting values” rather than “seeds” argument, with appropriate translation of these parameters into seeds where necessary.
2.5 Output Verbosity
All Bayesian Software should implement computational parameters to control output verbosity. Bayesian computations are often time-consuming, and often performed as batch computations. The following standards should be adhered to in regard to output verbosity:
- BS2.18 Bayesian Software should implement at least one parameter controlling the verbosity of output, defaulting to verbose output of all appropriate messages, warnings, errors, and progress indicators.
- BS2.19 Bayesian Software should enable suppression of messages and progress indicators, while retaining verbosity of warnings and errors. This should be tested.
- BS2.20 Bayesian Software should enable suppression of warnings where appropriate. This should be tested.
- BS2.21 Bayesian Software should explicitly enable errors to be caught, and appropriately processed either through conversion to warnings, or otherwise captured in return values. This should be tested.
3. Pre-processing and Data Transformation
3.1 Missing Values
Bayesian Software should:
- BS3.1 Explicitly document assumptions made in regard to missing values; for example that data is assumed to contain no missing (
NA
, Inf
) values, and that such values, or entire rows including any such values, will be automatically removed from input data.
- BS3.2 Implement appropriate routines to pre-process missing values prior to passing data through to main computational algorithms.
3.2 Perfect Collinearity
Where appropriate, Bayesian Software should:
- BS3.3 Implement pre-processing routines to diagnose perfect collinearity, and provide appropriate diagnostic messages or warnings
- BS3.4 Provide distinct routines for processing perfectly collinear data, potentially bypassing sampling algorithms
An appropriate test for BS3.4 would confirm that system.time()
or equivalent timing expressions for perfectly collinear data should be less than equivalent routines called with non-collinear data. Alternatively, a test could ensure that perfectly collinear data passed to a function with a stopping criteria generated no results, while specifying a fixed number of iterations may generate results.
4. Analytic Algorithms
As mentioned, analytic algorithms for Bayesian Software are commonly algorithms to simulate posterior distributions, and to draw samples from those simulations. Numerous extent R packages implement and offer sampling algorithms, and not all Bayesian Software will internally implement sampling algorithms. The following standards apply to packages which do implement internal sampling algorithms:
- BS4.1 Packages should document sampling algorithms (generally via literary citation, or reference to other software)
- BS4.2 Packages should provide explicit comparisons with external samplers which demonstrate intended advantage of implementation (generally via tests, vignettes, or both).
Regardless of whether or not Bayesian Software implements internal sampling algorithms, it should:
- BS4.3 Implement at least one means to validate posterior estimates (for example through the functionality of the
BayesValidate
package, noting that that package has not been updated for almost 15 years, and such approaches may need adapting; or the Simulation Based Calibration approach implemented in the rstan
function sbc
).
Where possible or applicable, Bayesian Software should:
- BS4.4 Implement at least one type of convergence checker, and provide a documented reference for that implementation.
- BS4.5 Enable computations to be stopped on convergence (although not necessarily by default).
- BS4.6 Ensure that appropriate mechanisms are provided for models which do not converge. This is often achieved by having default behaviour to stop after specified numbers of iterations regardless of convergence.
- BS4.7 Implement tests to confirm that results with convergence checker are statistically equivalent to results from equivalent fixed number of samples without convergence checking.
- BS4.8 Where convergence checkers are themselves parametrised, the effects of such parameters should also be tested. For threshold parameters, for example, lower values should result in longer sequence lengths.
5. Return Values
Unlike software in many other categories, Bayesian Software should generally return several kinds of distinct data, both the raw data derived from statistical algorithms, and associated metadata. Such distinct and generally disparate forms of data will be generally best combined into a single object through implementing a defined class structure, although other options are possible, including (re-)using extant class structures (see the CRAN Task view on Bayesian Inference. CRAN Task View: Bayesian Inference) for reference to other packages and class systems). Regardless of the precise form of return object, and whether or not defined class structures are used or implemented, the objects returned from Bayesian Software should include:
- BS5.1 Seed(s) or starting value(s), including values for each sequences where multiple sequences are included
- BS5.2 Appropriate metadata on types (or classes) and dimensions of input data
With regard to the input function, or alternative means of specifying prior distributions:
- BS5.3 Bayesian Software should either:
- BS5.3a Return the input function or prior distributional specification in the return object; or
- BS5.3b Enable direct access to such via additional functions which accept the return object as single argument.
Where convergence checkers are implemented or provided, Bayesian Software should:
- BS5.4 Return convergence statistics or equivalent
- BS5.5 Where multiple checkers are enabled, return details of convergence checker used
- BS5.6 Appropriate diagnostic statistics to indicate absence of convergence are either returned or immediately able to be accessed.
6. Additional Functionality
Bayesian Software should:
- BS6.1 Implement a default
print
method for return objects
- BS6.2 Implement a default
plot
method for return objects
- BS6.3 Provide and document straightforward abilities to plot sequences of posterior samples, with burn-in periods clearly distinguished
- BS6.4 Provide and document straightforward abilities to plot posterior distributional estimates
Bayesian Software may:
- BS6.5 Provide
summary
methods for return objects
- BS6.6 Provide abilities to plot both sequences of posterior samples and distributional estimates together in single graphic