Statistical Software: Regression and Supervised Learning

I am coming to this quite late, but wanted to address the “how” a little bit more than the discussion has so far.

  • I would love to see encouragement for new modeling packages to adopt boring vanilla data formats for input and output (nothing fancy or unique) and S3 methods. This contributes both to correctness and to interoperability with the ecosystem of post-processing packages that @bbolker mentioned. Certainly hardhat offers scaffolding for this, if modelers are interested in using it.

  • Clarity about data validation and data transformation (how are factors handled? so differently by so many models! scaling/centering? :woman_shrugging: who even knows when this happens) is so important. I see this as something to consider in implementation (good practices around validation and transformation) and documentation, i.e. “factor predictors are converted to…” in docs.

  • I haven’t seen much discussion here of just really straightforward testing for correctness and how to do it, although I guess that is in @alexpghayes’s blog post. That is what I really would like to see in peer review standards, more even than (what seem to me) more subtle issues around assumptions. Some ideas to suggest there are using simulated datasets, resampling real ones, etc.

1 Like