Best practice for documenting raw data in a package?

sckott · August 25, 2017, 3:51pm

via Jakub Nowosad on twitter

Thoughts community?

brycem · August 25, 2017, 8:43pm

I think it’s pretty safe to say that “best practice” for documenting data is to use something machine-readable and not something that isn’t a plain text format. What do others things about this statement?

Now, what that that machine-readable thing is and where it goes is a great space for discussion…

I would recommend that raw data be documented using the EML package. This package produces machine-readable XML metadata all from within R according to the Ecological Metadata Language XML schema which is in wide use around the world. There are many other XML metadata formats which would be reasonable to use but having the EML R package lets us stay within the R ecosystem which I think is a benefit. And EML is a really powerful and flexible metadata schema.

As for the how, I think it makes sense that one EML XML file would be produced for each dataset included in your package and I guess a good place would be right next to the data file(s) inside the package.

I’m sorta steeped in this world and I’m hoping others will have some quite different ideas about how to do this.

mbjones · August 26, 2017, 1:24am

At OS Codefest, @sckott, I, and others planned the ROpensci datapack package to be a container for data in R that would include documentation in multiple formats. We also planned to make those data easily loadable in R using lazy-loading, but that work has yet to be done – see https://github.com/ropensci/datapack/issues/2 for some use cases that people were thinking of. I’m totally with Bryce on the EML path being a good one, but I also think it would be good to be able to associate any metadata document with the data files, which is what datapack allows. See https://github.com/ropensci/datapack. I’d love to get the lazy loading feature working.

Topic		Replies	Views
rOpenSci and standard scientific data formats data	7	1586	July 11, 2016
EML to FGDC-CSDGM metadata? Package Use Questions r , r-eml	1	606	July 3, 2020
Can you point me to really well written R/Python package documentation?	10	2802	February 24, 2016
Creating Persistent Metadata for an R Object for Data Provenace General Q&A opendata , r , package , eml , metadata	5	3700	April 25, 2024
Language documentation for a package Software-Review	6	1026	May 18, 2021

Best practice for documenting raw data in a package?

Related topics