Some interesting feedback here (as well as on the Slack channel). My thoughts in response to it:
- Most packages would be expected to check of more than one category. For instance, one might have a Bayesian Time Series Regression package or a Machine-Learning Clustering package. In each case the guidance and standards for all relevant categories would apply.
- I think “Dimensionality Reduction and Feature Selection” should be “Dimensionality Reduction, Clustering, and Unsupervised Learning.” Feature selection or even some feature engineering might or might not be in this category. For instance, I’d put LASSO in Regression because it is a primarily supervised technique.
- “Regression and Interpolation” should be “Regression and Supervised Learning”
- It makes sense to have both “Time Series Analysis” and “Spatial Analysis”.
- “Machine Learning” is a term that means different things to different people, so we should define how we are using it here. For us I think it can mean, “non-likelihood, predictive approaches to model fitting.” Most packages checking off ML would also check off the unsupervised or supervised categories, and standards in the ML category would relate to things like how objective functions are defined, how out-of-sample prediction and regularization / validation is handled, etc.
- I like Study Design and Meta Analysis (I’ve heard a few comments on this, too). Many of these would have some overlap. For instance, a Meta Analysis might ultimately be a form of a hierarchical regression, or a power analysis for study design might be a simulation from a regression model. They might not be the first areas we tackle.