Thanks Jeff, Ben. Great question and ideas, I too don’t have any clearer answers but can offer a few thoughts.
I’m currently opting for CC-BY in LICENSE.md file, which GitHub & Zenodo can automatically detect. I currently use a minimal DESCRIPTION file that lacks metadata fields, it is just for dependency management. That may not actually be a good idea. (recent example here: https://github.com/boettiger-lab/pomdp-intro)
possibly useful example out of the way, now for rambling thoughts:
One option is to just use CC0 for the whole compendium. Public domain is equally applicable to all of these. Main downside is that you might prefer CC0 for the manuscript-y stuff, though I doubt that has a strong practical difference to CC-BY. (Somehow the question of citations always comes up, which in my mind is entirely orthogonal to the question of copyright. Norms of academic citation are older than copyright law and apply every bit as much to work that is out of copyright.). On the code side, CC0 isn’t approved as ‘open source’ by Open Source Initiative (https://opensource.org/, though for arbitrary technical reasons that would probably have barred approval of older and even looser licenses like MIT had they not had so much historical precedent), but then I don’t think compendia are really “software” anyway. CC0 is recognized for code by Creative Commons, and as GPL compatible by the Free Software Foundation.
Notably, StackOverflow licenses it’s site, including code posted there, as CC-BY-SA, which isn’t recognized as GPL compatible by FSF or as a software license by CC, but then, both stackoverflow and compendia are arguably not software but more heterogeneous objects. So, arguably you could license your compendium CC-BY (perhaps noting the data is CC0, or that public data is by definition not covered by copyright law since it is not a creative work).
I suppose it is also worth noting that this issue isn’t really specific to compendia in any way: many R packages include both data and extensive ‘creative’ documentation, including vignettes and websites. One could interpret the DESCRIPTION field as thus applying specifically and only to the ‘software’ portion, while just noting the other copyrights where appropriate (e.g. a website or vignette could say “cc-by” on it, metadata for data could say CC0).
Note that this approach is also what you see in RStudio’s professional packages, like rmarkdown, where they probably have legal advice to guide them. DESCRIPTION just says GPL-3: https://github.com/rstudio/rmarkdown/blob/bbd0786d82f82d9eaef7f0b9e2c553b523407161/DESCRIPTION, while inst/NOTICE includes full copies of the licenses from 7 other projects, as required by the terms of those licenses: https://github.com/rstudio/rmarkdown/blob/bbd0786d82f82d9eaef7f0b9e2c553b523407161/inst/NOTICE
As a final thought, I know as programmers we pay close attention to where these things get stated: in the DESCRIPTION field, in a file called LICENSE, and treat existing licenses as these reusable templates almost like software dependencies, but from a legal context I don’t think this is so algorithmically defined – I believe the text you include matters more than where you put it; the conventions are just for computers.