Code review/onboarding milestones


#1

Proposal for the onboarding guide:

  • Submit your code for review by us, or have someone review the code using our guide, for every 400 lines of new code in the R/ and `src/’ directories.
  • After each review, consider whether the package is ready to release as-is.

The latter point is to encourage early release of well-reviewed, tested, and documented packages over waiting for feature-complete packages. Not everything is ready for release at 400 lines, but it seems like a good stage to consider it. Lines of code are of course a rough metric, but this encourages a regular check-in.

I note that rdrop2 was at ~800 lines when I reviewed it. geojsonio will be ~1500.

I added some more guidelines to the reviewers guide, too.


#2

Seems like this should line up with Milestones, if used. Hmmm, where some milestones could be very brief and require a few lines of code, and some may be much more than 400 lines. So not sure how these fit together?

Curious if when we talk about lines of code, we include oxygen lines in that count? Definitely it needs review, but I imagine it will perhaps need less attention than the code itself? E.g., a quick count of geojsonio suggests 1375 lines of non-oxygen stuff. As an example, in elastic I have very extensive examples, so perhaps half of the lines of code could be oxygen lines .

How do we consider C++ code? Perhaps only need to review the actual code if written from scratch, not simply wrapping a C++ library? (ignorance speaking here)


Multi-stage code review for manageable size
#3

Perhaps you should create a milestone for each review, so PRs/issues
associated with that review can be grouped by that milestone, and if
appropriate the same milestone can be used for a release.


#4

hey @noamross thanks for the review on geojsonio

you had a few things that apply to all packages:

Reading S3-intensive code is much more cognitively draining and time-consuming than functional code. Just an FYI for future reviews.

Any further thoughts on this? Why is that? Anything that can be done to help make this easier? I think for users, this approach is useful in that a single function can act appropriately on lots of inputs, and outputs are simple to manipulate, so i like the s3-intensive approach

I didn’t have an easy-to-grab use case to try out this package on, as I did for rdrop2. Working through that use case helped identify bugs and UI improvements. We might try selecting reviewers partially based on whether they have something available to try the package out on.

Good point about picking people to review that would have use cases to use to test. Lincoln comes to mind here

paging @cboettig any thoughts ?


#5

One more thing on picking a reviewer: I wonder if there’s significant value in someone reviewing that isn’t familiar with the use cases/etc. That is, they’ll have food feedback from a newcomer’s perspective. Could have a more thorough review, and ping one person that isn’t likely to be familiar (including maybe not have deps installed, e.g.) to do a quick look over


How could the onboarding / package review process be even better?
#6

Re: S3 - I think you did pretty well in geojsonio organizing the code so it could be reviewed/modified. Methods of one function were grouped together, and the relevant internal functions tended to be in the same file.

I think the advantage for usability is far outstrips any convenience for review. One thing to consider is whether a few comments in source would make it easier to decipher how the functions work.


#7

Re: picking reviewers - This is beginning to become remarkably like journal peer review: An “editor” can try to find someone familiar with the use case, API, etc., to do a code review, and a newcomer to do a usability review.


#8

good idea, like this you mean?

foo.list <- function(x) { 
   UseMethod("foo")  
}

# lists ------------------------------
## bar() does a and b to each element of the list
foo.list <- function(x) { 
  bar(x)
}

# data.frame ------------------------------
## hello_world() does c and d to each row of the data.frame
foo.data.frame <- function(x) { 
  hello_world(x)
}

#9

Great idea. This is generally harder to implement in practice though. Most packages don’t come to us in such an early (idea) stage. And the onboarding repos usually come to us as “I have a package I think fits ropensci…” so there isn’t time to go back.

We could however do this for our own work. Sometimes I want to review something but it will take me days to wade through all of the code. So at least internally, Scott and I could do this where we review stuff before it gets too big. And after that initial review, the next round becomes easier since we have read the first 400 lines of code. Of course we can expand this to others who develop within rOpenSci (like Rich, Carl, Jenny, you etc). But for standard onboarding, this is often not possible.


#10

Great idea. Suggestions (for diverse candidates) to our board are welcome. You can se the current list here: https://github.com/ropensci/onboarding#review-board


#11

Great idea! At least internally to begin with. I’ll do this for zenodo at least, that people can weight in before I dig in too deep.


#12

@sckott Yeah, I like those comments. Or something like

## Each of the methods below ultimately use process_data() to manipulate the data after converting the various types to vectors.

#13

@noamross got it, makes sense, can do