Do you have code that accompanies a research project or manuscript? How do you review and archive that code before you submit the paper?
Help us plan a community call on analytical code review, or “code review in the lab”. Please comment below to describe the scenarios in which you want to review your code, how you do it, what works well for you and what aspects pose a challenge. We’re imagining a 1-hr call with 2-3 presenters on good practice in analytical code review, examples of how it’s done in different groups, and plenty of time for Q&A.
Based on your comments and questions, we’ll invite presenters and choose a date.
I had questions on this earlier this week over here at RStudio Community!
I’m getting a little code cleaned up and am wondering what the next steps are before submitting it. I’d love to see some presenters with experience getting their code reviewed - especially if they didn’t have many nearby colleagues to rely on.
Question: Should this conversation cover reviewing code as a reviewer of the manuscript? Or only getting collaborators to review your code prior to submission?
I happen to be working reviewing a student’s code right now, prior to submitting the associated manuscript for peer review.
- I meet with my students one-on-one on a weekly basis and occasionally do some informal code review, mostly to check on correctness and style. Usually more often for a new student, and less frequently for those who I can trust will produce decent code.
- As the project progresses, I examine code outputs and apply some sanity checks (“toy” examples, examining plots).
- For larger projects, I require students to write a suite of unit tests, which are sometimes helpful for shaking out bugs. (Writing unit tests also makes busy work for short-term students, e.g., summer volunteers.)
- At the present stage (pre-submission), I am rerunning the analysis to confirm that I can reproduce the outputs on my own workstation.
- Presently, I am going through the student’s code and refactoring some sections, particularly the portions that were written near the end of term. As a reviewer, I usually look through any source code that accompanies the manuscript, and I want the level of code quality in our paper to reflect well (at least not too poorly!) on my group.
I have published code to recreate all analyses and figures with two manuscripts. I don’t have a systematic way of getting my code reviewed by colleagues (I have just asked informally), so I would be very interested to listen in on this call and hear what people do.
One point I’d like to add is that I prefer to deposit the code and data with the publisher and paper (rather than with a third-party like github or figshare). My reasoning is that there is then a canonical, unchangeable version that will persist as long as the paper is available (who knows how long github will be around?). It is also directly linked to the paper. Unfortunately I think some publishers may not support .zip files as supplementary information.
Good Q @noamross. I feel like this one should focus on getting collaborators to review your code prior to submission, or review code that is used internally (not necessarily for a paper)
@ArtPoon (great to see you here)
Your response gets at what I think many people will appreciate hearing. Many faculty members trying to figure out how to manage this. Thank you!
No date set yet for the call but I hope you’re able to attend.
My motivation for suggesting this call was to further a) but I don’t see how these are too different. Perhaps we can empower more people to do in lab code prior to submission and then have those make their way up to manuscript review.
One challenge I see is how to provide a roadmap of a code base to a reviewer that links specific outputs to the code that created them.
Makefiles are great for this but I’m not sure they are for everyone. Are there less intimidating (lighter-weight) alternatives or guidelines for doing this maybe in a
The irony of working for the govt is that everything we do/produce should be publicly available, but there are heavy restrictions on the tools/software at our disposal. For example, my agency received permission to use GitHub <2 years ago. The result is that not many are yet up to speed on its capabilities, particularly with respect to collaborative coding projects (eg, linking issues and projects in GH). I think many people would benefit from a general overview of the many possible avenues and their pros/cons.
That said, I typically work with 1+ others to review my code, which involves simulated data and verification. I also started writing supplemental markdown docs for nearly everything I do, which usually contain enough of the background/math/stats to understand what I’m trying to accomplish, followed by the code to read, munge, analyze, plot, etc. In addition to the benefits for internal sharing/reviewing, external reviewers/editors of papers also find it enormously helpful when they can replicate your entire process, figures, etc.
@mark_scheuerell Helpful perspective, thank you! I hope you can join the call (date tbd).
re: gov’t, not sure if you met @boshek (Sam Albers) at the unconf dinner. He, Stephanie Hazlitt (https://twitter.com/stephhazlitt) and @andy_teucher work in a British Columbia government group that has forged a path of working in GitHub, some open code, and submitting a package for rOpenSci review.
So far my approach has been to focus more on structure / organization than on function.
I expect code and other material associated with a given research project of a trainee to be on a (public or private) GitHub repository. I encourage trainees to follow the R package / ‘compendium’ model of organization for their repo (e.g. https://github.com/cboettig/compendium), with analyses in .Rmd notebooks, functions in an
R/ directory, a README.md and a DESCRIPTION file.
Usually I don’t worry about this including unit tests or roxygen documentation. I like folks to have .travis.yml that runs some or all of the .Rmd notebooks, but this isn’t always realistic. In many cases it would take too long, and in any event it would almost always be a better check to have more minimal unit tests testing the functions/code against circumstances where we already know what the result should be, but in practice I haven’t been that successful in getting folks to write “trivial” tests. I’d like to also turn on linting as part of the automated checks, but so far haven’t inflicted that on anyone. (My classroom students we’ll probably be the first guinea pigs on the linting this semester).
Actual code review has been very ad hoc; i.e. I may comment on how code could be improved during a discussion with a trainee about a result or about attempting to debug something, but rarely have I done code review explicitly; unless we are developing a software package as the main objective. I do code review in my classroom, but would like this practice to play a bigger role in my research trainees as well. Often it feels hard to review code when you don’t understand what it is doing in the first place; but I’m slowly getting over that as I learn more about how robust and maintainable code should look (e.g. code smells) and feel more confident in suggesting how to improve code which I don’t understand.
Great question, I agree this is a challenge. Personally I think Makefiles are often a difficult entry-point; as a reviewer I first want to have more of a big picture overview of what the project is about and what the final outputs are (e.g. the key figures or tables that make up the results of a paper), and work backwards from there to see where they come from.
A README is a nice start to this, but I prefer a more long-form vignette / paper.Rmd for a finished project. I think a vignette or paper.Rmd should keep its embedded code as concise as possible, moving any complex operations into stand-alone functions in an R/ directory. I think long-running code should generate intermediate, cacheable objects in as generic a format as possible; e.g. output a data tables as .csv files that can be read in if they already exist, or re-generated if they do not.
Agree with a lot of points here, especially minimal unit tests against demo inputs, especially when the real analysis takes a while to run.
I do want to add that my hesitation against unit tests has a lot to do with how much refactoring I’m doing as I go through the project. It would be great to include when submitting code to accompany a paper, where the analysis is relatively fixed, but seems more questionable in day-to-day work.
It also makes sense to me that code review to help someone debug is going to look different to code review to make sure results are computational reproducible.
@cboettig’s opinions remind of what has been a perennial challenge on this front: to have a review process, one needs some common agreement on what a project structure should be, and what standards should be. Expectations need to be set at the lab/group level for this. My current opinion is that it’s not a problem that can be solved for large communities. Nonetheless, last year at the unconf some of us took a crack at creating a checklist that is agnostic to language and structure/build systems: https://docs.google.com/document/d/1OYcWJUk-MiM2C1TIHB1Rn6rXoF5fHwRX-7_C12Blx8g/edit#
I’m percent with Hao’s comment about how refactoring makes the notion of unit tests pretty hard in research code. In software development, I often have a reasonable idea what the API should look like, so even while I still need to do a deal of refactoring of internals, it doesn’t break anything. In contrast, research code often isn’t even functions to begin with. Still, it maybe possible to think of a somewhat different testing/assertion paradigm for (rapidly evolving) research scripts. For instance, there was an interesting effort to develop a testing framework for Rmds at the 2017 unconf; https://github.com/ropenscilabs/testrmd.
Ironically, I think the more common problem I encounter in trainee code is too little refactoring. Am I alone in thinking this? I believe it’s easy to get stuck feeling “this giant block of code took me 2 weeks to write, I’ll just continue to modify and extend it”, when the more sensible thing is more often to re-write into smaller functions, not bigger ones. Refactoring a script that already “works” can seem like a waste of time and only risk breaking things. When is it time to refactor, and how do folks encourage refactoring?
A few thoughts:
I think a simple checklist of the most critical things to get checked on all analysis code before submission of a manuscript would be super helpful. There are a lot of niceties that could be included, but perhaps fewer necessities, especially for students and others who are relatively new to R. I think there’s also a distinction between guidelines for data/code/archival quality and guidelines for stats/viz/analysis quality, and those are related but not necessarily entirely overlapping.
Relatedly: I think there’s a difference between code that is central to the findings of a manuscript (say for theoretical or simulation studies), and code that is used in the presentation of results. For example, an undergrad or graduate student who has analyzed their data and written it all up in an Rmd might not need the code to be fully functionalized and refactored and unit tested, but I think it is absolutely still critical that it is documented and organized such that it at the very least easily runs on another machine without much fuss. In classes, I have Travis check for successful knitting of student assignments (submitted via PR on GitHub), which requires machine readable documentation of non-base packages, including any installed from GitHub, and data in the repository (with proper relative paths!) or else pulled from the cloud, and working syntax. I also have everything strictly linted with lintr to enforce basic code style conventions. For (often DNA sequencing-based) analyses with large data files (10s to 1000s of GB), the primarily analysis scripts are pulled out into R scripts and run separately, and then the intermediate results are saved as a csv when possible and Rdata when necessary; this then become the data that’s loaded at the top of the Rmd to further munge, visualize, run stats on, etc. Only the code in the Rmd has to run successfully to keep Travis happy. I haven’t yet found a good way to automate testing on scripts that take days to run on a huge server. I suppose linting those too would be a fair start, and perhaps encouraging more liberal use of
Repos that are associated with manuscripts get a version tag and are archived to Zenodo, which mints a DOI, which we then cite in the manuscript.
I think a big part of doing code review on these type of project efficiently and effectively is having a good default compendium-type structure, as @noamross and @cboettig mentioned. I have personally felt a little torn between some of the different options out there (ProjectTemplate, rrtools, the others listed at https://github.com/ropensci/rrrpkg and the Compendium Google Doc etc).
Hi – This is a really interesting topic and I’d definitely be interested to join the call to discuss it further (thanks for looping me in @stefanie!)
One quick thought is that with code review I immediately think about who would be involved. PIs? Collaborators? Lab Groups? I think there has to be a shared culture for coding at a pretty basic level for labs/teams, focusing on structure and commenting, hopefully with notebooks.
Our team has shared practices, look over and run each others’ code, do individual checks with functions we’ve written, and we ultimately do GitHub releases before submitting papers (and also annually for our repeated annual studies). But it really takes this culture of shared practices and time to spend on looking at each other’s code, and increasing the value of it to give people the time is an important piece.