I think it’s pretty safe to say that “best practice” for documenting data is to use something machine-readable and not something that isn’t a plain text format. What do others things about this statement?
Now, what that that machine-readable thing is and where it goes is a great space for discussion…
I would recommend that raw data be documented using the EML package. This package produces machine-readable XML metadata all from within R according to the Ecological Metadata Language XML schema which is in wide use around the world. There are many other XML metadata formats which would be reasonable to use but having the EML R package lets us stay within the R ecosystem which I think is a benefit. And EML is a really powerful and flexible metadata schema.
As for the how, I think it makes sense that one EML XML file would be produced for each dataset included in your package and I guess a good place would be right next to the data file(s) inside the package.
I’m sorta steeped in this world and I’m hoping others will have some quite different ideas about how to do this.