Licensing my R code

Hello!

Important disclaimer: I am a librarian with a background in the humanities, not a programmer, data person, or statistician. Please speak slowly and clearly without a lot of jargon so I can understand your reply. :slight_smile:

Some colleagues and I are writing a chapter for a book on community college library assessment. We looked at the relationship between in-class library instruction and student retention and used R for our statistical analysis. I am trying to clean up the R code and make it sort of plug-and-play so we can share it and our data for other librarians to do similar analysis on data from their institutions and/or refine our methodology.

I see some information about licensing “software,” “packages,” “analysis code,” and “repos,” but I’m not sure exactly what all these designations mean or what category the code we’ve created falls under. Basically, we used RStudio and some existing packages (fmsb, readxl, and svDialogs) to make a little file of code (file suffix is .R) that can be used to duplicate our analysis by other non-experts like us.

I think this is probably either “analysis code” or a “repo,” but again, not sure exactly what those mean. Then, are these things considered “software” and therefore not eligible for Creative Commons Licensing? I think we are going to license the rest of our info (directions and datasets) under CC-BY-SA, which seems like it could potentially be compatible with software licensing.

Do I need to find out what license R, RStudio, and all the packages we used are under and emulate that? My directions require the user to download and install R and RStudio, and then the code causes the user to find, download, install, and load the packages, so does that mean that I am “distributing” them?

In summary, at this point it seems like I need to choose between CC-BY-SA and GPLv3. Also, I am seeing some people say that the license should be written into the code. Would this also apply in my case, or should I just state the license in the README file?

Does anyone have suggestions or input on this? Thanks for any advice you can share!

1 Like

@kendraperry It’s so good that you’re doing this

make it sort of plug-and-play so we can share it and our data for other librarians to do similar analysis on data from their institutions and/or refine our methodology.

Ping me here if you don’t get an answer within a few days.

Stefanie Butland
rOpenSci Community Manager

This is great that you are thinking about it.
A license sets the rules for other people to use your code. So the first thing you need to do is to decide what you want to allow and to block. Are you okay with commercial use? Are you okay with people using it without acknowledging you? And so on.

CC licenses are not designed for code and end up causing problems. That said, I have a current package in development that is CC SA NC ATTRIB because it is a data package and the person who collected the data wants to bar commercial use.

Normally GPLv3 is straightforward, the only thing is that if someone wants to incorporate your code into a non GPLv3 project. But you can always give them permission under a separate license if they ask and you want to allow it. GPL will require share alike and attribution but it does not bar commercial.

1 Like

Doesn’t the Share Alike caveat prevent commercial use? This also looks like it should not allow commercialization.

So if I do GPLv3 for the code, would I then need to use a separate CC-BY-SA for the data and directions?

So great that you are doing this leg work to share this for other teams. I am offering an attempt at some answers, caveat being I am a biologist turned coder turned someone who had to learn a bit about llicensing for sharing code — so not an expert.

I think this is probably either “analysis code” or a “repo,” but again, not sure exactly what those mean.

Your .R script can be both these things, a single script with analysis code. If you put the script into a (GitHub or Gitlab or other) remote repository (fancy name for everything inside one folder), then it is analysis code in a “repo”.

Do I need to find out what license R, RStudio, and all the packages we used are under and emulate that?

Not unless your script copies & pastes the source code from these tools – if the script relies on a user to have these tools installed (i.e. library(dplyr)) then you do not need to consider the licence of the tools when you choose the licence for your work.

Then, are these things considered “software” and therefore not eligible for Creative Commons Licensing?

As already mentioned, CC licenses are not really designed for code (or software). A common code licence is MIT, my place of work uses Apache 2.0. You can licence the analysis script with a code licence, and the text or data content inside the repo with a different licence (e.g. CC). Here is an example of a repo with different licensing for different pieces, the code is Apache 2.0, the data are various open data licences from the source organizations GitHub - bcgov/bcmaps: An R package of map layers for British Columbia

Once you select your code licence, the licence itself should detail how to licence. With Apache 2.0 for example, the licence file needs to be in the repo with the code, as well as a licence details at the top of every script.

I hope this helps. Also, open to any feedback from others in case any of the above is not accurate. Good luck!

3 Likes

Thank you!! This is super helpful and also helps me understand some of the other threads and conversation I’ve looked at.

I appreciate your time!

Kendra

2 Likes