The rOpenSci registry
rOpenSci packages for the most part go to CRAN eventually - though some are on Bioconductor. So, rOpenSci packages are listed on CRAN/BioC, but there are many rOpenSci packages that are not yet on either of those platforms, and perhaps may never be for various reasons. Despite rOpenSci packages being on CRAN/BioC there’s nothing actually that marks any of those packages as being part of rOpenSci.
We keep track of rOpenSci packages in a GitHub repo at https://github.com/ropensci/roregistry within a single large JSON file. We can’t simply assume all repositories in the ropensci or ropenscilabs GitHub organizations are R packages, because some are not, and some are abandoned. This registry JSON file helps us know what packages are in the rOpenSci suite, some basic metadata about them, who maintains each one, and more.
Here’s an example entry:
{
"name": "rfishbase",
"type": "package",
"maintainer": "Carl Boettiger",
"email": "cboettig@gmail.com",
"status": "good",
"installable": true,
"ropensci_category": "data-access",
"category": "biology",
"on_cran": true,
"on_bioc": false,
"cran_archived": false,
"url": "https://github.com/ropensci/rfishbase",
"root": "",
"fork": false,
"description": "Access any fish data from Fishbase.org, including occurrence records, habitat data, and more",
"badges": []
}
Time for a change?
I (Scott) maintain this registry file manually. As you can imagine, this is likely prone to being out of sync with the true state of rOpenSci packages - especially likely as our suite of packages grows. I think if we somehow automatically pull in data from CRAN or elsewhere, or have pkg maintainers submit PR’s to update the registry, we can maybe be better off.
Rather than maintaining one huge JSON file, I like the idea of maintaining a separate file for each R package. (though we still would generate the single file most likely, but would be automated) A separate file for each package would make it easier for people other than me to contribute.
Carl and others, including our interns, have been making good progress on introducing codemeta.json
files to rOpenSci packages. There isn’t a large portion of packages that have them yet, but it’s getting there. This consists of adding entries to the DESCRIPTION
file like https://github.com/ropensci/RNeXML/blob/master/DESCRIPTION#L82-L84 and adding a codemeta.json
file like https://github.com/ropensci/RNeXML/blob/master/codemeta.json with pretty rich metadata about the package. It’s possible we can use the DESCRIPTION files with the added metadata fields and/or the codemeta files instead of the custom JSON you see above.
Given
- the increasing chance of the registry being out of sync with reality, and
- the appeal of single files per package as opposed to one huge file, and
- the arrival of codemeta stuff
Perhaps we should change how we construct our registry.
Feedback
What do you think? Should I keep maintaining the registry myself? Should we make something that’s all automated? Should we get all rOpenSci packages to have codemeta files, then use those? Should the registry not be a single repo, but a so to speak decentralized registry made up of codemeta files in each repo? Do you have any other ideas that may help guide us here?
cc @cboettig since we talked about this earlier