Licensing for research compendia?

reproducibility
research-compendia
Tags: #<Tag:0x00007f7ee391d688> #<Tag:0x00007f7ee391d3e0>

#1

I am starting a new project and like all new projects I have grand plans to make this one the “perfect” one. I will be mostly following the suggestions for an R package as research compendia (eg. https://www.tandfonline.com/doi/abs/10.1080/00031305.2017.1375986).

As I was creating the repo and was confronted with assigning a license, I was stumped on how best to license this project. In @benmarwick, @cboettig, and @lincoln paper the discuss licensing and follow the suggestions in
https://web.stanford.edu/~vcs/papers/ERROLSI03092009.pdf. Essentially the different parts of a research compendia should have different licenses. For example, the manuscript and figures could use CC-BY, the code MIT, and the data CC0.

My question is more on the nuts and bolts best practice of capturing the fact that a compendia would be released under the multiple licenses. Should I:

  • List License: CC-BY, MIT, CC0 in the DESCRIPTION
  • Explain in a README
  • Both
  • Something else?

I’ve dug around and can’t find an implementation of this multiple license concept for compendia. Thoughts most welcome!


#2

This is a great question, and I’d also love to know what other people are doing in practice. My casual observation is that there is a high diversity of approaches of communicating licenses.

I’ve been doing the first option in your list (most recently here: https://github.com/benmarwick/guanyingdongstoneartefacts), but I’m just following what Stodden says because she seems to be an authority on the topic. Plus it seems like the most visible and efficient way to show my readers that I’m happy for them to use all my stuff, and remind them the scholarly norms of credit and citation still apply to these things. Explaining licenses in detail in a readme seems a bit of a burden for me and my readers, in my opinion, I’d prefer to just list them in the readme.

As far as I know there has not been any legal test or other kind of resistance to how licenses are attached to compendia. This is probably the only way we can have any certainty on what is definitely a bad way to use licenses. Until that happens, probably it doesn’t really matter exactly what we do, any of the first three options on your list would be fine. I don’t think anyone (i.e. scholarly publishers and other organisations with lots of lawyers) are paying much attention to this at the moment.

There are some considerations to be mindful of in the choice of code license, some noted here: https://github.com/ropensci/unconf17/issues/32 but that might be a separate rabbit-hole!

Listing multiple licenses in the DESCRIPTION file could be awkward, you can do this, but it’s not clear what license belongs to what component:

License: LGPL (>= 2.0, < 3) | Mozilla Public License
License: GPL-2 | file LICENCE
License: GPL (>= 2) | BSD_3_clause + file LICENSE
License: Artistic-2.0 | AGPL-3 + file LICENSE

I think you will have to write in the DESCRIPTION “License: file LICENSE” and describe the individual licenses in that file. However I guess only a tiny fraction of your readers will ever look into that file, so it doesn’t seem like the most effective use of licenses as a formal communication tool, compared to listing them in the readme.


#3

Thanks Jeff, Ben. Great question and ideas, I too don’t have any clearer answers but can offer a few thoughts.

I’m currently opting for CC-BY in LICENSE.md file, which GitHub & Zenodo can automatically detect. I currently use a minimal DESCRIPTION file that lacks metadata fields, it is just for dependency management. That may not actually be a good idea. (recent example here: https://github.com/boettiger-lab/pomdp-intro)

possibly useful example out of the way, now for rambling thoughts:

One option is to just use CC0 for the whole compendium. Public domain is equally applicable to all of these. Main downside is that you might prefer CC0 for the manuscript-y stuff, though I doubt that has a strong practical difference to CC-BY. (Somehow the question of citations always comes up, which in my mind is entirely orthogonal to the question of copyright. Norms of academic citation are older than copyright law and apply every bit as much to work that is out of copyright.). On the code side, CC0 isn’t approved as ‘open source’ by Open Source Initiative (https://opensource.org/, though for arbitrary technical reasons that would probably have barred approval of older and even looser licenses like MIT had they not had so much historical precedent), but then I don’t think compendia are really “software” anyway. CC0 is recognized for code by Creative Commons, and as GPL compatible by the Free Software Foundation.

Notably, StackOverflow licenses it’s site, including code posted there, as CC-BY-SA, which isn’t recognized as GPL compatible by FSF or as a software license by CC, but then, both stackoverflow and compendia are arguably not software but more heterogeneous objects. So, arguably you could license your compendium CC-BY (perhaps noting the data is CC0, or that public data is by definition not covered by copyright law since it is not a creative work).

I suppose it is also worth noting that this issue isn’t really specific to compendia in any way: many R packages include both data and extensive ‘creative’ documentation, including vignettes and websites. One could interpret the DESCRIPTION field as thus applying specifically and only to the ‘software’ portion, while just noting the other copyrights where appropriate (e.g. a website or vignette could say “cc-by” on it, metadata for data could say CC0).

Note that this approach is also what you see in RStudio’s professional packages, like rmarkdown, where they probably have legal advice to guide them. DESCRIPTION just says GPL-3: https://github.com/rstudio/rmarkdown/blob/bbd0786d82f82d9eaef7f0b9e2c553b523407161/DESCRIPTION, while inst/NOTICE includes full copies of the licenses from 7 other projects, as required by the terms of those licenses: https://github.com/rstudio/rmarkdown/blob/bbd0786d82f82d9eaef7f0b9e2c553b523407161/inst/NOTICE

As a final thought, I know as programmers we pay close attention to where these things get stated: in the DESCRIPTION field, in a file called LICENSE, and treat existing licenses as these reusable templates almost like software dependencies, but from a legal context I don’t think this is so algorithmically defined – I believe the text you include matters more than where you put it; the conventions are just for computers.


#4

Licensing of compound data objects, eg Research Compendia, is a priority issue for the CODATA-Research Data Alliance Legal Interoperability IG and will be the focal point for our session at RDA Plenary 13 (Philadelphia) in April.

You are most welcome to join the session or participate in our electronic discussion group and weekly tele-meetings. Group info is here: https://rd-alliance.org/groups/rdacodata-legal-interoperability-ig.html

Best wishes, Gail


#5

Sorry for asking a question and then running away!

Thanks all the responses. Looks like an area with some interest and no clear guidance as of yet.

@Repositorian thanks for the link to the RDA/CODATA session. I’ll dig around on there and look forward to what comes out of the plenary in April.

@benmarwick and @cboettig, thank you to both as well for the thoughts. As I am have been thinking through this, I think CC0 makes a lot of sense for the work I am doing now. Seems to have the broadest applicability and meshes nicely with expectations of my work as a fed being public domain. Think this will be my defualt for the short-term.