Provisional draft of standards for time-series software - please comment! All those interested are also invited to edit these standards on this
hackmd.io document. Note that the following list is intended to illustrate the nature of standards we envision implement, and is not exhaustive. We are particularly interested in hearing opinions regarding aspects which we may have missed here.
Time Series Software
Many of the following standards are written with reference to how software should function. Such aspects can and should often also be tested. Where testing of described functionality is expected, a “(TEST)” is added to the description.
Standards regarding documentation imply doing so at appropriate places within
the software; either within functions themselves, within extended Vignettes,
within the main
README document of a package, or elsewhere.
- Time Series Software should use or implement explicit class systems
- Time Series Software may extend common class systems for time series; see the section “Time Series Classes” in the CRAN Task view on Time Series Analysis".
- Class systems should require units (unless justified otherwise), such as those offered by the
lubridate packages. (Note that the
stats::ts class does not directly support specification of units.) (TEST).
- Where units are used, class systems should work with units provided by as many of the above packages and unit systems as possible (TEST).
- Where time intervals or periods are admitted, and where these may be months or years, software should be explicit about the system used to represent such, particularly regarding whether a calendar system is used, or whether a year is presumed to have 365 days, 365.2422 days, or some other value (TEST).
- Where covariance matrices are returned from functions, these should also use a class system, and potentially also include specification of appropriate units (TEST).
A Class System should:
- Ensure strict ordering of the time, frequency, or equivalent ordering index variable (TEST).
- Catch any violations of ordering in the pre-processing stages of all functions (TEST).
- Where covariance matrices are generated or used, ensure that ordering or rows and columns is maintained and/or not able to be violated (TEST).
Time Series Software should explicitly document assumptions or requirements
made with respect to the stationarity or otherwise of all input data. In
particular, any (sub-)functions which assume or rely on stationarity should:
- Consider stationarity of all relevant moments - typically first (mean) and second (variance) order (TEST), or otherwise document why such consideration may be restricted to lower orders only.
- implement appropriate checks for such (TEST);
- issue diagnostic messages or warnings (TEST); or
- enable or advise on appropriate transformations to ensure stationarity (TEST).
Time Series Software should deal appropriately with missing values.
- All functions which accept time series as input data should perform appropriate checks and associated steps as part of initial pre-processing prior to passing data to analytic algorithms.
- Where possible, all functions should provide options for users to specify how to handle missing data, with options minimally including:
- error on missing data (TEST).
- warn or ignore missing data, and proceed to analyse irregular data, ensuring that results from function calls with regular yet missing data return identical values to submitting equivalent irregular data with no missing values (TEST).
- replace missing data with appropriately imputed values (TEST).
Where Time Series Software implements or otherwise enables forecasting abilities, it should:
- Permit limits on forecasting horizon to be specified in terms of maximal threshold or divergence criteria (such as in terms of standard errors), either as:
- additional parameters to algorithmic routines alongside input data (TEST); or
- additional post-processing functions to trim output data to only those within specified threshold.
- Always return either:
- A distribution object, for example via one of the many packages described in the CRAN Task View on Probability Distributions (or the new
distributional package as used in the
fable package for time-series forecasting) (TEST).
- At least twice the number of variables to be forecast as the number used to generate the models (one variable for mean or first-order predictions, and a second for variance or second-order predictions) (TEST).
Time Series Software should:
- Implement default
plot methods for any implemented class system (TEST).
- When representing results in temporal domain(s), ensure that one axis is clearly labelled “time” (or equivalent), with continuous units.
- Default to placing the “time” (or equivalent) variable on the horizontal axis.
- Ensure that units of the time, frequency, or index variable are printed by default on the axis.
- For frequency visualization, abscissa spanning $[-\pi, \pi]$ should be avoided in favour positive units of $[0, 2\pi]$ or $[0, 0.5]$, in all cases with appropriate additional explanation of units.
- Provide options to determine whether plots of data with missing values should generate continuous or broken lines.
For the results of forecast operations, Time Series Software should
- By default indicate distributional limits of forecast on plot
Refer to examples below for further clarification of these points.
(… those example are then given in the associated