How could the onboarding / package review process be even better?

Hello everyone,

I’ve recently contributed a review of the awesome Ropenaq package by Maelle Salmon (here’s a link to the review issue). The other reviewer was @andy_teucher . I had great time & learned a lot. Definitely the best review (of paper/code/anything) experience of my career so far!

That said, I was thinking we could have a discussion of how to make this review process even more awesome. Its already a positive experience for reviewers and (i think?) for package authors. Here’s a few points that I thought we could discuss:

  1. How to reference specific lines of code. I found myself frequently writing things like “So at the start of this function” or “in file X at line Y”. I know I could have linked to the specific line in via github, but I saw two problems with that. First, it is kind of laborious (if you have lots of such comments), and second, one can’t look at code and tell what the comments are (you can only go from the comments to the code that the reviewer indicated). I’m also not sure what the best practice is here. Link to a specific commit? will that get confusing? What helps the package author the most?

  2. interactions between reviewers: I was fortunate to review alongside @andy_teucher who did a very thorough job. Andy and I have similar (but not identical) skills and opinions about R packages. As it turned out, in some places we had nearly identical comments and in others, we covered different points. Where we were different, it probably would have been better to coordinate (ie Andy does the mapping part) and in others, it might have been interesting to compare notes (Andy and I both wrote nearly identical code as a suggestion)

  3. suggesting code changes I didn’t see anything in the reviewer guide about what the best approach here is. I settled on just dropping in code suggestions as formatted code blocks. But that quickly gets out of hand if there are lots of suggestions. PRs feel like a more natural way to make those suggestions, but that would fragment the review.

  4. carrying on the conversation. The discussion between Andy, Maelle and I is big. Andy and I each wrote about 1500 words in our reviews, and Maelle’s response to us is about the same length! This is awesome and very interesting. But it is also getting to the point of “regarding your fourth point in the third section”. There’s also things that Andy said that i want to respond to, and responses of Maelle’s to Andy that I’d comment on…

I was wondering if there is some github workflow that might make this process easier. For example, what if the entire package was submitted as a PR to an empty repository at ropensci? Say, as a pull request between an empty branch and a branch containing the entire source of the package. I can see some pros and cons to that approach

pros

  • reviewers could make inline comments on the diff.
  • Separate issues could be opened to cover larger points.
  • more economical: When one reviewer sees that the other has commented on something already, they can decide to either move on to a different part of the code, or respond to the first reviewer’s comments.
  • nice for future users, who could view this discussion where the package lives (ie as a closed issue at github.com/ropensci/newcoolpackage)

cons

  • if one reviewer is fast / thorough, the other might be reduced to just saying “+1” everywhere, which would be unfair
  • obliging authors to learn branching
  • cluttering ropensci’s gallery of repos with all these in-progress things

Those are some thoughts from me! What do you guys think?

4 Likes
  • Regarding #2, it’s been previously suggested that we try to diversify the reviewers by having one reviewer focused on usability - likely someone familiar with the field/data but not necessarily an experienced R package developer. This would have the additional benefit of expanding the pool of reviewers. We should probably make more use of the review board so the task of finding such reviewers doesn’t entirely fall on @sckott.
  • Regarding #3, I think once you’re suggesting enough change to necessitate a PR, do a PR. You can always reference it your review (e.g. https://github.com/ropensci/onboarding/issues/20#issuecomment-151892039)
  • As to your PR suggestion, I like it:
    • I don’t think clutter is an issue. RO has nice front-end web pages to display completed packages, and has plenty of deprecated, half-done (including a couple of mine), packages in the repos.
    • “+1” is the flip side of efficiency, and I think efficiency wins.
    • There shouldn’t be a lot of complexity if we ask users to submit by making a PR to onboarding/empty.
    • An additional benefit is that, if changes as a result of the review are done on the same branch, they will be visible in the review.
1 Like

hello @noamross :smile:

  • hm, interesting about the diverse reviewers. Though I’m not sure that quite describes the dynamic we had on our review, I can certainly see why that was/is an intention in choosing reviewers
  • re: finding new reviewers, why not ask previous reviewers to nominate people they know? The network would then grow very quickly
  • Indeed i’d seen your example and I’d considered emulating it. But then again, it’s not always a simple atomic change – sometimes more of a redesign is what the reviewer wants to suggest. Perhaps some guidelines about when and how to do a PR would be a good addition to the onboarding/ reviewing guides, rather than leaving it all up to taste.

An additional benefit is that, if changes as a result of the review are
done on the same branch, they will be visible in the review.

yes, that’s what i’d been thinking. I guess I was imagining something like this:

  • ropensci creates a new, empty repo for the onboarding package
  • adds a file to provide some boilerplate – perhaps the ropensci logo added to the top of README or something
  • package author adds that repo as a remote, fetches and merges (so that there is at least 1 commit in common between the ropensci repo and their local version)
  • author creates branch onboarding, containing everything in the package
  • author pushes onboarding branch to ropensci/coolnewpackage, creates pull request to master branch of ropensci/coolnewpackage
  • reviewers comment on the diff and on the PR

the review would be over when the PR is merged into master. Would that work? (perhaps that’s what you were thinking too – it just sounded like you meant conducting all this in the onboarding repo, which wasn’t quite what i had in mind so i wanted to clarify)

Hi @aammd ,
Best post on the forum so far! Thanks very much for sharing your thoughts on the review process. Much appreciated!

Re 1. I don’t have a better solution than linking directly to line numbers or line ranges. It is a bit tedious, but it’s hard to comment otherwise unless you revisit specific commits, which takes up more effort than doing this.

Identical comments are probably a good thing, since that way we do know that it’s not just one reviewer’s personal coding preferences. But we have discussed what @noamross has written about asking reviewers to work on separate aspects. It still requires some testing though since I don’t think that is the right solution.

Re: 4, I wonder then if it would be worth taking a different approach. For larger issues regarding a package, write those in the onboarding comments. When you have specific fixes, post those in the package issues and just link to those in the review. So that way, individual fixes or changes can be discussed and closed separately. Perhaps a series of checkboxes in the main review would help keep track (we can try and automate some of this).

Thanks so much for your feedback @aammd !!

  1. This is a tough one…
  2. Would it be easiest to have a reviewers slack room? Where reviewers could discuss these kinds of things if needed? I guess the article review system has reviewers not know anything about the other reviewers, but maybe that’s not efficient use of reviewers time.
  3. I’m all for PR and opening issues on the repo where the code is developed, and just referencing in the review - I’ve done that in reviews I’ve done.
  4. Are you talking here about it being hard to reply to specific things easily?

We should improve the guidelines with some or all of these things - will do soon…

I really like this idea. We could help automate some of this (steps 1, 2) and give the author some text to copy and paste into their terminal for 3,5.
We would most likely do it in the ropenscilabs (needs a better name) org rather than the main one to prevent clutter. That way, once a package is accepted, it can be moved top github/ropensci.

I wouldn’t ask reviewers to cover different aspects, I would just pick reviewers with different specialties, as an editor might pick an expert on the study system and another on statistical methods.

I do think there is some value in having the review as consolidated as possible in one place, as it’s an output of the reviewer and it’s nice to be able to point to it as a contribution.

That wont be open nor easy to link to though. The open review is part of the #win here.

You are all too fast, so I’m not sure how much I can add, but a couple of thoughts:

Re: 2. This was my first formal code review, and I was a bit anxious about doing a good job. I think I can say the same for @aammd based on our conversations. We actually had a Skype call to help ground our expectations for the review and general impressions of the package. I found this super helpful, and while it may not be everyone’s cup of tea, something like a slack room as @sckott suggested would be nice. As for reviewer selection, I assumed we were chosen because how awesome is it to have two Andrews from British Columbia tag team on the review?

Big +1 to be able to put inline comments against a diff, so submitting as a PR seems like a great idea to me.

I agree with @noamross; I think there is a lot of value to having the reviews being consolidated… If reviewers do decide to submit a PR or open a separate issue, I think a link to the PR/issue and a summary in the main review would be sufficient? Maelle also did a nice job of creating distinct issues to work on based on the reviews. Perhaps you could specifically ask authors to do that, and tag reviewers in the issues so that the conversation can continue there in a more directed fashion?

Great discussion!
Regarding the issues, @andy_teucher , do you mean:

  • small things / easy things modified as the author wishes, and discussed in the onboarding issue (I modified some things at the same time and I must say I didn’t have one commit = solves one review point)
  • biggest/more difficult things as separate issues of the package?
    Because one issue for everything or one commit for every point would make the authors’ work more tedious?

Automating some things would be good for not making authors anxious if they do not know branching etc. yet. Maybe also having a picture in the readme that shows the repositories and arrows, etc. for explaining the process (reading the part about Github in Hadley Wikckham’s R package book sure helps but having the specific picture with Ropensci/Ropenscilabs/etc would be even better).

On a side note, I also wondered if the onboarding guidelines should include more advertising for submission, e.g. “Why should you submit your package”. Submitting is clearly worth the effort but maybe it could be written explicitely in the readme?

1 Like

@maelle I wouldn’t want to bind the author to opening issues for every point, certainly. More like opening an issue to address review points for which the author may want to continue a more focused discussion with the reviewer.

1 Like

Good pt. We can better articulate why it’s beneficial

It occurs to me that by having a package submitted as a PR, we could automatically do CI for running R CMD check, lintr for style guidelines, and test coverage, and this would be a service to authors because they wouldn’t have to set it up themselves.

but someone has to add the .travis.yml and optionally appveyor.yml file, no?

Yes, but if they’re PR’ing into a ‘template’ repository those files could already be there. It also may be possible to keep them on another branch - not sure about this, I think it depends on the service.

I think the recently added features in Github for PR reviews, requesting changes, etc may provide a nice opportunity for changes to the review process. I’m wondering if anyone has poked around in there with an eye toward how it may be incorporated into the existing rOpenSci practices.

I’m playing with the new review tools, still figuring things out with them. They still have the main challenge we’ve found with every review system we’ve examined: they are all built around the concept of a PR as the unit of review, while we review whole packages. As discussed above, we could build a system where a whole package is submitted as a PR to an empty repo, but its still rather awkward - the diffs github displays become massive and its not the easiest way to explore the package.

As reviewers are ultimately going to mostly clone the package and explore it in the IDE, we’re thinking about an approach where reviewers could insert standard comments into the code (much like the TODO: convention) which could then be compiled into a review.

1 Like