Our project for reviewing statistical software now has another set of completed standards, this time for “Spatial Software”. The Standards can be viewed in the main project book. We’re keen on receiving any and all feedback on these standards, particularly on the core section of “Algorithmic Standards” (5.9.3). These standards for spatial software notably differ from our other categories in that these core algorithmic standards are primarily standards for packages which also fit into other categories (regression, unsupervised learning, and machine learning). Specific questions we’d like input into and discussion of include:
- What might we be missing in these standards?
- Are there other aspects of spatial statistical algorithms which might be sufficiently general to be expressible and generally applicable as standards?
- Are there any other category-specific aspects of spatial software which might be appropriately expressed via our standards?
For ease of reference, these core algorithmic standards are:
The following standards will be conditionally applicable to some but not all spatial software. Procedures for standards deemed not applicable are described in the R package of this project.
-
SP3.0 Spatial software which considers spatial neighbours should enable user control over neighbourhood forms and sizes. In particular:
- SP3.0a Neighbours (able to be expressed) on regular grids should be able to be considered in both rectangular only, or rectangular and diagonal (respectively “rook” and “queen” by analogy to chess.
- SP3.0b Neighbourhoods in irregular spaces should be minimally able to be controlled via an integer number of neighbours, an area (or equivalent distance defining an area) in which to include neighbours, or otherwise equivalent user-controlled value.
- SP3.1 Spatial software which considers spatial neighbours should enable neighbour contributions to be weighted by distance (or other weighting variable), and not rely on a uniform-weight rectangular cut-off.
- SP3.2 Spatial software which relies on sampling from input data (even if only of spatial coordinates) should enable sampling procedures to be based on local spatial densities of those input data.
Algorithms for spatial software are often related to other categories of statistical software, and it is anticipated that spatial software will commonly also be subject to standards from these other categories. Nevertheless, because spatial analyses frequently face unique challenges, some of these category-specific standards also have extension standards when applied to spatial software. The following standards will be applicable for any spatial
software which also fits any of the other listed categories of statistical software.
Regression Software
- SP3.3 Spatial regression software should explicitly quantify and distinguish autocovariant or autoregressive processes from those covariant or regressive processes not directly related to spatial structure alone.
Unsupervised Learning Software
The following standard applies to any spatial unsupervised learning software which uses clustering algorithms.
- SP3.4 Spatial clustering should not use standard non-spatial clustering algorithms in which spatial proximity is merely represented by an additional weighting factor. Rather, clustering schemes should be derived from explicitly spatial algorithms.
Machine Learning Software
One common application in which machine learning algorithms are applied to spatial software is in analyses of raster images. The first of the following standards applies because the individual cells or pixels of these raster images represent fixed spatial coordinates. (This standard also renders ML2.1 inapplicable).
- SP3.5 Spatial machine learning software should ensure that broadcasting procedures for reconciling inputs of different dimensions are not applied.
- SP3.6 Spatial machine learning software should ensure that test and training data are spatially distinct, and not simply sampled uniformly from a common region.
The latter standard, SP3.6, is commonly met by applying some form of spatial partitioning to data, and using spatially distinct partitions to define test and training data.