How do you review code that accompanies a research project or paper? Help rOpenSci plan a Community Call

I hope this is useful!

Fantastically useful. Thank you for the clear layout of your approach @brunj7.
We’ll be arranging the details of this community call soon.

A bit off topic, but I am giving a lot of thoughts these days about how to move towards having journal editors require the submission of code, along with a paper and its data.

My experience is aligned with that of @jenniferthompson and @zabore, but much worse: my supervisor does not use R; nobody in my lab does any code review (even though most students use R); PIs only look at the results of analyses and question the statistical methods used, but never the code to achieve those; there is never any refactoring done by anyone-the sole idea of it would surprise everybody I work with; even basic code formatting is all over the place, so forget about writing descent code using functional programming to replace copy-paste or crazy loops…; absolute paths, setwd(), and other forms of non portable code cripple everybody’s scripts; nobody uses GitHub or even version control. Our culture is so far away from anything acceptable on this front that it will take years and years to get to any reasonable place. And that is why I feel that things will only start to really change when there will be pressure from higher up (meaning the journals) to provide code. Until then, people don’t know, don’t care, don’t have the time, don’t have the incentive, don’t give any thought to the subject of writing readable, portable, and reviewed code.

Reading some of the posts in this thread, I was impressed to see that in other labs, things are much further ahead. But I think that my lab is, unfortunately, more representative of a classic university research lab. There is a lot to do. And things are often so bad that doing it from the ground up seems unrealistic. And a top down incentive seems to me to be the only way to shake things up. It could also be a way to impose some form of norm. But I have no idea how to walk towards this goal.

2 Likes

(I am aware that my post is very naive and I am extremely thankful and excited to read all the compendiums and other great links on this thread, and papers that were published on the importance of code publication. But all of this feels like bottom up grind work and I am pessimistic about when this might reach over to my lab and countless others like it. That’s why I would love to hear about approaches to reach out to journal editors or funding agencies like NSF-things that are BIG incentives to research labs and make things change on the large scale. In Canada, the 3 main funding bodies (the Tri-Agency), recently made a wonderful move towards open access and that is really making things change over here (not about code however). But maybe a lot of this sort of bottom work needs to be done before big funding agencies or journals can be convinced to set policies that will then force the generalization of these better practices to a much wider community of researchers?).

But I am getting more and more off-topic. Sorry about that.

2 Likes

Save the date :spiral_calendar:!
Community Call on this topic takes place Tuesday, October 16, 2018, 9-10AM Pacific (find your timezone)

Agenda:

  • Stefanie Butland @stefanie - welcome, logistics, introduce presenters (5min)
  • Carl Boettiger @cboettig , moderator and presenter (10 min)
  • Melanie Frazier (possibly tag-teaming with Julia Stewart-Lowndes @jules32) (10 min)
  • Hao Ye @hye (10 min)
  • Q&A (20 min)

More details to come very soon

2 Likes

@prosoitos - I feel your pain, and I have also encountered labs where spending time on code review or refactoring would be scoffed at. I also agree that a big lever here is for funding agencies and journals to be involved in promoting better practices. Unfortunately, I think it needs to be more than just requirements for code sharing, because that doesn’t address standards or enforcement. My hope is that funding agencies see the need to implement both requirements and support training for entire research labs, since I imagine there are plenty of places that would be interested in improving practices, but can’t overcome the barrier of changing on their own.

(And if you want to chat more, feel free to reach out via private channels.)

2 Likes

You are completely right about the leverage that journal editors and granting agencies have. Journal policies have been a very useful tool in similar areas, such as expanding data publication requirements. In my field, ecology, the adoption of preprint and data-deposition policies occurred in the major journals occurred largely in the past 10 years. We slowly see this happening with code, too - partial policies like code upon request (example from Nature) are useful. They give reviewers the tools to request code and start to push for standards that eventually can make their way up to policy. I pretty much always make such requests if the journal has such a policy and attempt to reproduce results, and I know this provides a pretty powerful incentive for the authors! (This can also annoy the authors a great deal, so it’s important to be helpful and constructive when reviewing the results so that they appreciate the feedback.)

If you want an example of lobbying effort, I sent this letter regarding data access and preprints to the editor in chief of a journal in my field about three years ago, and 80% of the recommendations were adopted. This was accompanied by some personal lobbying, which is the pattern I’ve seen with other journals - a few private and public letters plus some conversations with colleagues at a conference can go a long way. I imagine enough places have adopted minimal code-sharing policies now that they could be used as examples. Most editors are eager emulate the policies of what are perceived as prestige or competitor journals, so when a big campaign pushes a Nature to change policies, it makes it much easier to leverage that to lobby for policies in more niche publications.

3 Likes

Wow. This is fantastic. Thank you!

Your lobbying efforts are really great and a beautiful example of how to have an impact at the individual level. This really answers a lot of my questions and is extremely inspirational. Thank you very much for sharing!

1 Like

In addition to requesting / requiring code in relevant submissions, one could also imagine journals recruiting reviewers specifically to evaluate code / reproducibility.

While it is obviously not reasonable to expect such a reviewer to exhaustively evaluate the validity of large and complicated software, there are some very basic and easy things that could be checked with minimal effort.

For example, I recently participated in a three day reproducibility workshop at NIH which attempted to teach researchers about the principles and best practices of reproducibility by reproducing ~10 bioinformatics / genomics papers which appeared to have all of the information / code necessary to be easily reproduced. Within 2 minutes of looking at the RMarkdown code for the very first paper, it was obvious that it had no hope of ever being run (referenced variables not defined anywhere in the file).

Some things that would be easy to check:

  • Are all variables defined?
  • Is package information captured?
  • Is the code documented at some minimal level?
  • Does the file reference directories / data that are unique to some user’s system? (e.g. setwd("/home/jsmith"))

In about ten minutes of downloading a script / software and attempting to get it running, you could at least make sure it passes these minimum requirements.

3 Likes

Seems like these kind of checks could be fairly easily bundled into a package. Does this functionality already exist (in devtools or elsewhere)?

I imagine something like

reprod_check <- check_reprod('myscript.R')

That would return lines containing undeclared variables, setwd() calls, etc.

I suppose the alternative is to just encourage researchers to bundle their code into packages to accompany publications, thereby addressing the documentation issues, calls to libraries, etc. but that might be an unrealistic ask…

1 Like

one could also imagine journals recruiting reviewers specifically to evaluate code / reproducibility

rOpenSci actually has a collaboration with Methods in Ecology and Evolution (MEE). Publications destined for MEE that include the development of a scientific R package now have the option of a joint review process whereby the R package is reviewed by rOpenSci, followed by fast-tracked review of the manuscript by MEE. Authors opting for this process will be recognized via a mark on both web and print versions of their paper.

Described here: https://ropensci.org/blog/2017/11/29/review-collaboration-mee/

In this case, rOpenSci manages the package review process so it’s not the journals recruiting reviewers, but it’s a good start

1 Like

Details of this Tues Oct 16 Commmunity Call including how to join: https://ropensci.org/blog/2018/10/05/commcall-oct2018/

Pass it on!

1 Like

Resources on this topic, in no particular order (add yours!)

2 Likes

I would love to help you do whatever it would take to make this thing!

2 Likes

Very similar boat at my workplace. Additionally, all the coders we have are essentially at the same level - doesn’t mean we can’t help each other, but does mean we may have a bit of a plateau effect when it comes to improving code through review…

2 Likes

Thanks for the great community call today! As promised, a quick post about code review as part of the peer review process of journal articles. Code Ocean is piloting code review with Nature currently, you can find some details here: https://www.nature.com/articles/s41592-018-0137-5; and the perspective of our developer advocate, Seth Green, on the code review process here: https://codeocean.com/blog/post/nature-journals-pilot-with-code-ocean-a-developer-advocates-perspective

Generally speaking, the code review process is as follows:

  1. Authors upload a working copy of their code to Code Ocean.
  2. Code Ocean verifies that the code runs and delivers results.
  3. Code Ocean provides Nature editors with a private link (blinded or unblinded) to the code capsule for peer review of the code.
  4. Once the code and article are approved by reviewers, Code Ocean will mint a DOI and include a link to the article in the metadata.
  5. Nature includes a link or embed the Code Ocean widget in the the article.
  6. Nature readers will be able to run code and reproduce the results associated with an article by simply clicking a button, as well as edit the code or test it with new data and parameters.
3 Likes

Not everyone reading about this stuff knows what “refactoring” or “linters” are. In the summary blog post about this Community Call, we’d like to link those words to definitions. Anyone have favourite? Otherwise I’ll go with wikipedia

good call! For linting, it might be more helpful to most of our audience to link directly to https://github.com/jimhester/lintr.

for code refactoring, the wikipedia entry looks like a good choice to me.

Thanks Carl! Do you have recommendations for links for unit testing, continuous integration, and container as well?

recommendations for links for unit testing, continuous integration, and container

We’ve published a summary of the Community Call on this topic, written by the speakers, Hao Ye, Melanie Frazier, Julia Stewart-Lowndes, Carl Boettiger, and rOpenSci software peer review editor Noam Ross!

2 Likes