Hi all! This thread is for discussion of standards for peer review of Exploratory Data Analysis (EDA) and Summary Statistics packages, one of our initial statistical package categories. We’re interested in the community’s input as to what standards should apply specifically to such packages.
EDA/Summary software captures and presents statistical representations of data and inter-relationships. This may be transforming input data into summary output data in some novel form, including standard summary values specific to a field. EDA/Summary software also generally aids the understanding of data properties or distributions via quantitative procedures, commonly aided by visualisation tools. Note that categories or non-exclusive; EDA/Summary packages may also be, for instance, Time Series packages. We do however, distinguish this category from unsupervised learning, clustering, or dimensionality-reducing packages, which we will address later.
We’d like your answers to these questions:
- What should a EDA/Summary packages do so as to pass a peer review process?
- How should EDA/Summary software do that?
- What should be documented in EDA/Summary packages?
- How should EDA/Summary software be tested?
Don’t worry if your contributions might apply to other categories. We will look at suggestions across topics to see which should be elevated to more general standards. Please do comment on whether you think a standard should be required or recommended for a package to pass peer-review. We’ll use these comments as input for prioritizing standards in the future.
For some useful reference, see our background chapter. Feel free to drop other references in the thread!