Statistical Software: Exploratory Data Analysis and Summary Statistics

stephaniehicks · June 23, 2020, 1:40pm

I think this will likely be one of the hardest categories, but here are a few thoughts that I had.

Let’s imagine a 4 dimensional space with the following dimensions:

The type of EDA software roughly categorized into: (i) visualization only, (ii) non-visualization only, or (iii) both viz and non-viz software.
The type of question being asked about the data being explored (predictive, inferential, associative, causal, etc)
The data type (e.g. count data, time series, continuous data, genomic data, etc)
The audience for whom the EDA software is being designed for (e.g. a financial analyst without much R training, students in primary school, or experienced R developer, etc)

I think the answers to your questions @mpadge from your post on May 29 are going to depend on the answering the questions above first.

For example, EDA software that has no visualization component (e.g. only identifies if there is missing data in your dataset), it will likely need a different set of standards and/or tests than EDA software that only contains data viz functionality.

I would suggest a questionnaire (with the 4 questions above) be provided to the developer of the EDA software package being submitted for peer review. This would be submitted as part of the peer review process. Then, the person doing the peer review might have a tailored list of standards/tests for each position along this 4 dimensional space. I’m probably forgetting other important dimensions and would welcome feedback on this though.

Finally, I’ll say that one important set of standards / test that I think can be automatized for EDA software is checking about the accessibility of the EDA software. For example, if the EDA software produces data visualizations, a standard should be make sure that the colors are accessible for individuals with a color vision deficiency. Here are some other things we might think about too. Tools | 18F Accessibility Guide

Here is a post about how to simulate color vision deficiency using ggplots, which would be helpful for developers: https://www.datanovia.com/en/blog/how-to-stimulate-colorblindness-vision-in-r-figures/

Topic		Replies	Views
Statistical Software: Time Series Statistical Software Peer Review time-series	4	1018	June 7, 2021
Statistical Software: Regression and Supervised Learning Statistical Software Peer Review regression , supervised-learning	18	1270	August 24, 2020
Statistical Software: General Standards Statistical Software Peer Review	0	661	August 24, 2020
Statistical software peer review categories Statistical Software Peer Review	13	2165	June 19, 2020
Statistical Software: Bayesian Analyses Statistical Software Peer Review bayesian	8	1481	August 24, 2020

Statistical Software: Exploratory Data Analysis and Summary Statistics

Related topics