rOpenSci onboarding: package fit

sckott · June 10, 2016, 8:27pm

One of the criteria we use for submitted packages to our onboarding repo is how well it fits. See guidelines about fit at https://github.com/ropensci/onboarding/blob/master/policies.md#package-fit

In that link above for our policies about fit we list a number of areas that are considered in scope, or a good fit. In brief bulleted form:

data retrieval (from APIs, data storage services, journals, and other remote servers). The data retrieved must have a scientific application and merely wrapping an API that serves data does not meet our criteria. (e.g.'s rplos)
data extraction tools that aid in retrieving data from unstructured sources such as text, images and PDFs. (e.g.'s pdftools)
data visualization (interactive graphics in R that extend beyond base and ggplot2). (e.g.'s plotly)
data deposition into research repositories, including metadata generation. (e.g.'s zenodo)
data munging (In the context of the tools described above. Generic tools such as reshape2, tidyr do not fit this criteria). Geospatial tools fall under this category. (e.g.'s geojsonio)
data packages that aggregate large, heterogenous data sets of scientific value or provide R-specific formats for widely-used data (e.g., shapefiles for geographic boundaries) (e.g.'s rnaturalearth)
reproducibility (tools that facilitate reproducible research, such as interfacing with git to track provenance or similar). (e.g.'s git2r)

These are rOpenSci’s onboarding policies for fit and scope at this time. This discussion is aimed at revising the scope and deciding what areas to broaden and what others to focus more narrowly.

Let us know what you think. What should remain as is, what should change.

maelle · June 13, 2016, 9:19am

Maybe add a few examples of existing repositories in the guidelines? For instance I’m not sure that “data munging (In the context of the tools described above. Generic tools such as reshape2, tidyr do not fit this criteria)” is very clear.

Another point would be to state how to make a “pre-submission enquiry” in the case the package author is not sure he/she has understood whether their package fits? Or saying that opening an issue does not demand much effort so that when in doubt they should submit the package?

sckott · June 13, 2016, 4:06pm

will do! good idea.

Good point. Should we encourage people to simply submit and fit is part of the discussion, or if they aren’t sure their package is a good fit to open an issue to discuss fit?

noamross · June 15, 2016, 8:15pm

It seems we have two sort of categories of packages: those that have to do with specific data types , repositories and scientific sub-fields, and then more general R tools. The specific ones have included taxonomic, geospatial, scientific literature data, and more specialized data like oderant responses (DoOR). In the context of narrow data types, its easy to be inclusive of tools that do retrieval/extraction/manipulation/visualization for those specific things as long as they are related to a scientific field.

Then we have more general packages, such as, git2r, our database clients, and general data things like assertr. I think these are great, and can all be captured under “reproducibility”. But its sort of a catch-all for everything. The question is how to define the boundaries of this category? I note that this conversation partly kicked off over analogsea, Scott’s Digital Ocean client. This package is definitely a win for reproducibility, and probably would be (is?) very widely used both by scientists and others.

Data visualization is also potentially very broad - pretty much everything going on in R graphics these days has to do with interactive web graphics somehow.

noamross · June 15, 2016, 8:16pm

I concur on pre-submission. We could just add a note to onboarding docs to let people know to open an issue if they are unsure if their package is a good fit.

thosjleeper · June 16, 2016, 9:02am

I think these categories are mostly straightforward except for (1) data munging because that would seem to mean things like reshape2, tidyr, etc. but doesn’t mean those, so I’m not sure what it means, and (2) data visualization. Are all plotting packages in scope or are they limited to those that connect to a web service (like plotly)?

Regardless of the final set of categories, it may make sense to align those categories with the ones used as headers on the packages page. For example, that list would seem to imply that geospatial tools are particularly important, but that’s not one of the categories described in “package fit”.

noamross · June 16, 2016, 1:51pm

geospatial tools are particularly important, but that’s not one of the categories described in “package fit”.

Yeah, geospatial is sort of a cross-cutting category, and I think that’s OK. If we update our packages page to be more dynamic perhaps this will be a “tag” across categories, along with “text analysis”.

For “data munging” the prototypical package I imagine is something for parsing data from formats generated by scientific equipment (sort of like genbankr which parses genbank files, rather than accesses the database). This isn’t too different from data extraction except that the data may be structured, just in other format types. Similarly, they may be packages that generate specific output types, like those that would ultimately be used in data deposition or be used by other research software tools. assertr falls outside this, but I think is well in the category of reproducibility.

sckott · June 21, 2016, 6:54pm

I’m in the process of getting the ropensci API up, which includes categories for all of the packages - we should have a way of getting feedback on those once it’s up and everyone can see what packages are assigned in which areas. Then we can update as needed

Topic		Replies	Views
dataspice, codebook and ropensci scope General Q&A onboarding , package	11	1414	June 25, 2018
rOpenSci \| rOpenSci News Digest, January 2023 Blog	0	204	January 20, 2023
Overlap policy for package onboarding Software-Review	14	3178	May 5, 2016
Harmonizing RO onboarding with JOSS Software-Review	5	2086	June 16, 2016
Does ptstem package fit? Package Use Questions r , package	4	1391	October 10, 2016

rOpenSci onboarding: package fit

Related topics