Statistical Software: Regression and Supervised Learning

juliasilge · June 26, 2020, 1:10am

I am coming to this quite late, but wanted to address the “how” a little bit more than the discussion has so far.

I would love to see encouragement for new modeling packages to adopt boring vanilla data formats for input and output (nothing fancy or unique) and S3 methods. This contributes both to correctness and to interoperability with the ecosystem of post-processing packages that @bbolker mentioned. Certainly hardhat offers scaffolding for this, if modelers are interested in using it.
Clarity about data validation and data transformation (how are factors handled? so differently by so many models! scaling/centering? who even knows when this happens) is so important. I see this as something to consider in implementation (good practices around validation and transformation) and documentation, i.e. “factor predictors are converted to…” in docs.
I haven’t seen much discussion here of just really straightforward testing for correctness and how to do it, although I guess that is in @alexpghayes’s blog post. That is what I really would like to see in peer review standards, more even than (what seem to me) more subtle issues around assumptions. Some ideas to suggest there are using simulated datasets, resampling real ones, etc.

Topic		Replies	Views
Statistical Software: Time Series Statistical Software Peer Review time-series	4	1018	June 7, 2021
Statistical Software: Bayesian Analyses Statistical Software Peer Review bayesian	8	1481	August 24, 2020
Statistical Software: General Standards Statistical Software Peer Review	0	661	August 24, 2020
Statistical software peer review categories Statistical Software Peer Review	13	2165	June 19, 2020
Statistical Software: Machine Learning Statistical Software Peer Review machine-learning	0	857	October 8, 2020