Measuring usage of R packages

[Reposting this from the Google Groups per Scott Chamberlain’s suggestion:]

Hi all,

I’ve been working on a project called the Scientific Software Network Map, to track usage of R packages in the wild. The goal is to help a community that uses a set of related packages understand their use patterns: i.e. which packages are being used, which ones are used together, and, as best we can discover, what academic publications relied on them. The idea is to give package authors data to justify the work they do, let them know what other packages they should think about compatibility with, and to give users information about what other people are using.

I’m posting here at Karthik’s suggestion to solicit ideas or suggestions about the project. We have developed an R package that sends an occasional anonymous packet to our server saying which packages are in use, and a web page that lets anyone explore the accumulated data. You can take a look around and read more about it at http://scisoft-net-map.isri.cmu.edu. (The site only has a few test uses in it right now, but you can see some similar, real data imported from non-R supercomputer usage logs at http://scisoft-net-map.isri.cmu.edu:7777)

I’m especially interested in knowing if:

  • the information it shows would be useful to you as a user or author
  • you think users would feel comfortable installing it
  • you have any other suggestions about it.

We (myself and the project’s originators: James Howison and Jim Herbsleb) are doing this under an NSF grant, and our plan in the short term is to collect data with it for a while from some scientific community, then do a survey and/or interviews with site visitors and package authors.

Thanks for any suggestions you might have!

Chris Bogart
Institute for Software Research, CMU

Hey @cbogart - Thanks for reposting here!

A few thoughts:

  • I’ll give this a try myself and see what it’s like
  • Do you have any sense yet as to how open users are to allow this kind of software use tracking? This is one of your questions, but maybe you have gotten some feedback already?
  • The information gathered IMO would definitely be useful to me as a package author.

There’s also an issue of CRAN policy:

Packages should not send information about the R session to the maintainer’s or third-party sites without obtaining confirmation from the user

I would assume BDR interprets that rather strictly.

Sckott and thosjleeper:

In my limited experience so far, most people I’ve asked are willing to install it – the information it collects is abstract enough that people I’ve talked to (mostly profs and grad students around CMU) haven’t been worried about it. But, of course, they know me; the reaction may be different from strangers.

As you say, though, thos, BDR does interpret CRAN policy strictly on this point and I’ll probably just have to distribute it through github or r-forge or somewhere. In corresponding with the CRAN team, though, I did get some good suggestions about better ways to deal with confirmation and privacy, that I’ll be incorporating in an upcoming new version.

Thanks,
Chris