Minimal Package Standards for the R Journal

Tags: #<Tag:0x00007f59fffd4020>

Hi All,

I’ve recently joined the R-Journal as an editor, and one of the things I would like to introduce is clear guidance on package standards for authors. Instead of reinventing the wheel, I thought I would ask the experts!

So a few comments/notes:

  • Not all papers submitted to the R-Journal have a package, but the vast majority do
  • Currently, there isn’t a minimal standard for a package. Although as all(?) packages are on CRAN, there are some checks.
  • In my head, the R-Journal should be the exemplar for writing packages. Any package in the R-Journal should be “good”
  • The R Journal has four (ish) editors who assign papers to reviewers. Getting reviewers can be hard and sometimes impossible
  • Papers may also have a strong (non-R) technical focus
  • Papers are reviewed by anonymous referees

Now my questions:

  • What are the minimal package standards you would like to see?
  • What are the expected package standards, e.g. you might not insist on CI, but it would be nice
  • Any other comments/suggestions

To be clear, this topic is me finding options, not an official request from the R-Journal.

5 Likes

Would R-Journal in theory create a list of standards? Or would there just be a list of optional things for authors?

For minimal standards I’d say perhaps these:

  • use roxygen2 for documentation - hand making .Rd files shouldn’t be happening anymore
  • tests - at least some
  • examples are given for all exported functions

Optional:

  • CI, hard to argue against having it given how easy it is now
  • vignette, or at least a readme with installation/examples
  • code of conduct (ideally all pkgs have one, but I’d imagine R journal wouldn’t want to require this)
  • contributing.md file
2 Likes

(rOpenSci Community Manager here) Thank you for asking here. I’ve shared a link to this with our Slack community and asked them to reply here.

2 Likes

Would R-Journal in theory create a list of standards?

If you mean for the “community”, I don’t see that happening.

If you mean for authors, then yes.

Or would there just be a list of optional things for authors?

I think it would be nice to have a list of things. Many can be optional, but if you don’t do, then you should justify.

Regarding your list, I agree with all. Although I seem to recall from an R-dev email thread, not everyone like roxygen2. Other things would be,

  • A README
  • Linting (?)
  • On GitHub (or something similar)
  • For the R-Journal some history, e.g. it’s not a “very new package”

I disagree with @sckott about roxygen (and looking at the packages recently published in the R journal it’s not universal). I’d actually be super curious to know what fraction of CRAN packages use Roxygen, but a quick github search suggests ~4k of the 16k total. There are well documented packages that do not use roxygen, and plenty of roxygen packages with barely stubs for documentation.

It looks like many of the packages that are currently submitted to the R-journal are of a moderately heavy stats focus - I’m not sure that many of these folks are quite as on board with the engineering-first approach to packaging that we tend to focus on here. At random, I looked through the list of packages in R-journal 2019-1 and found of the 24 packages obviously listed

  • 23 were on CRAN (the other being on bioconductor)
  • 7/23 (30%) had tests (depenency on testthat or a test directory)
  • 9/23 (40%) had a vignette
  • 15/23 (65%) used roxygen for the docs

IMO if guidelines come out now saying, for example, “use CI or else” a good chunk of the community that is currently submitting articles will wonder what has happened and feel isolated. Particularly if that push emphasises a particular set of packages over the general guidelines (write tests vs use testthat, use version control vs use github). There are endless tedious twitter discussions where this distinction has been lost.

Perhaps some way of saying that the journal expects package to contribute an important advance in statistics etc and/or display a degree of engineering. Then encourage the stats folks to gradually improve their practices. We should also recognise that the evidence that software testing reduces bugs is scant (vs our belief, in my case strongly held, that it does - but see also meditation driven development).

TL;DR - the things that I think make good packages (testing, CI, review, focus on interface and design, thorough documentation, etc) might not be the best things to immediately propse as your guidelines

2 Likes

@noamross comment on @csgillespie’s question?

No, just for submissions to the journal.

Agree on a README, and the source somewhere public seems important to, doesn’t have to be github of course.

I disagree with Rich. I think it makes it far too easy to not keep docs up to date with the code in the package. The existence of packages not using roxygen is not sufficient evidence suggesting it shouldn’t be used. I’d be interested to hear an argument for having manually curated docs other than “it’s what I know”. And one added dependency in Suggests doesn’t seem like a solid argument against it. My goal in this argument is in having good documentation; it’s probably the maintainer thinking it matters and doing it more than whatever tool is used.

Docs are easily kept in sync with R CMD check - this is much less of an issue than with some other languages. I’m not making an argument based on dependencies (indeed Roxygen does not imply a dependency). But I would argue strongly that roxygen is neither necessary nor sufficient to write good docs.

Put another way, if another editor joining R-journal advocated for some hypothetical tool for automatically documenting things, or mandated use of (say) RUnit, or the use of Sweave for vignettes (it’s built-in after all so must be the standard) I don’t think many of us would be impressed.

Good docs and other general principles are worth arguing for, but not in my mind specific technologies

Good questions and discussion here!

I agree that Roxygen should not be required. I love Roxygen as much as anyone else, but it is a means to an end, and by no means perfect. (e.g. if someone creates Roxygen3 a year from now and it is way better, should the standards keep requiring roxygen2? or should they force everyone to switch? Neither of these make sense to me – I think it is enough to say the code and documentation should be in sync (which CMD check validates) and be of high quality.

  • Maybe it goes without saying, but packages should pass R CMD check cleanly (or document why not persuasively).

  • I think packages should have a test suite and reasonable test coverage, ideally providing a rationale for why some functions are not covered. (Modern CI is a nice addition to this, though I believe RForge has been doing ‘CI’ before travis existed, and is in some regards still ahead of travis & GH actions :slight_smile: )

  • While I wouldn’t suggest a “clean” report from goodpractice::gp() be required, it is something I would probably mention in the recommendations. Package authors can have good reason to deviate from these, but deviations should be deliberate and not sloppy (spellcheck, complete DESCRIPTION, etc).

I agree with Rich that it’s a bit hard to think of things beyond the scope of R CMD check that are easy to automate as minimum standards. Most of things you want to focus on (good documentation, clean interface, deep testing) aren’t easy to automate; which is what makes peer review so valuable.

One thing I think that would make the biggest benefit is merely encouraging reviewer attention to the code base. The gravity of the academic model makes it so easy to focus only on the “article” and the code there-in, and not review the package itself.

4 Likes

Thanks for the comments. They’ve been really useful and most of the thoughts tie in with my own views. I think @cboettig point is spot on, about not wanting to isolate folk. All of this stuff is “easy” when you know how, and really scary when you don’t.

A nice first step (IMHO), would be to indicate to authors that

  • these other tools exist - roxygen2, CI, gp, etc. Not necessarily essential, but they make life easier
  • ditto for NEWS and READMEs

I’m still keen on setting a minimum standard, but this will require some thought.

Thanks again. I’m sure I’ll be back with more queries.


An aside. I see that ropensci uses goodpractice a lot, but the package appears a bit neglected. Any comments.

I’m with Rich on this one: I think the package standards should be about the properties of the package, not the specific technologies used to get there. So in this case, the standard should be about having appropriate and sufficient documentation. By all means suggest to authors that roxygen2 is a useful tool to help achieve this, but just being a roxygen2 user doesn’t necessarily mean you’re writing good documentation (I can attest to this …)
Similar arguments for testing (give some guidance about the degree of testing expected in the package code, not the specific testthat/tinytest/myothertest harness used to do it). CI is perhaps a sub-facet of testing (evidence of regular testing) but depending on the setup it can be used to do more than that, so perhaps it also falls into an “advice for pkg authors to make your life easier” category rather than a “you must use …” one.

2 Likes

Thanks @raymondben that is a really nice way of wording. Much better than my blunt approach of you must use X, Y, Z.

Exciting stuff Colin, and a great discussion already, to which I might only add some of the aspects of the coreinfrastructure criteria, in my opinion particularly those relating to online provision of documentation / website availability - this is surely more important than how docs might be generated. Note also that CI is only a suggestion, not a requirement, which I would argue is good and right because access to CI services may simply not be possible for some people, and so demanding CI must be presumed to be an exclusionary act. Because it can’t be referred to often enough, github (for example) is largely at the whims of US trade laws, and these kinds of systemic dependencies have to be kept in mind.

Finally, note the semi-parallel development of new rOpenSci project on peer-reviewing statistical software, with working documents here (and suitable warning that that is likely not permalink stable :smirk:) - there is likely to be scope for parallel development of our endevaours, so we’ll be sure to stay in touch.

2 Likes

Thanks @mpadge your comments are very sensible and I’ll definitely incorporate them.


A thought I’ve just had is perhaps this whole enterprise would be more suited for ropensci? Just now you have a very nice framework for on-boarding packages. What about a less opinionated framework for general usage? Something that everyone in tidyverse & base land can agree with :wink:

BTW, I agree with your opinions on your on-boarding, but obviously not everyone would.

@csgillespie I’m not sure we have the bandwidth for setting that additional set of standards, but one thing we will be trying to do with the standards @mpadge referred to is make them modular so different groups can adopt parts of them easily, including separation of high level (and language-agnostic) concepts and specific implementations. But as a whole they will probably be more opinionated than would make sense as minimum criteria. One thing we may adopt from the core infrastructure concepts is graded rating rather than binary pass/fail.

I’ve spoken some with Michael Kane about this - one idea is publishing an article about the new standards in the R Journal.

3 Likes

Oh, re goodpractice, we’re probably moving to something more expansive and custom, but it will take a bit. A lot of goodpractice functionality is now in lintr. Between that and covr you get nearly all the same measures.

2 Likes

Thanks @noamross We have a monthly R editors meeting (next one in a couple of weeks), so I’ll bring this up then.

Very interesting thread! I really like @cboettig’s sentence "One thing I think that would make the biggest benefit is merely encouraging reviewer attention to the code base. ".

A few thoughts/opinions. :slight_smile:

About package source:

  • I’d want the package source to be public and the paper and package metadata (URL in DESCRIPTION) to indicate it clearly. Maybe requiring a package-level man page to exist, and for this man page to showcase the URL of the source, would be good.

  • The version reviewed/accepted might be a release as opposed to the changing source.

About package access/installation: I’d expect the package to be distributed by one of the “standard repositories”, maybe; and in any case for any system requirements to be properly documented. I.e. I want to be able to install the package easily as a reviewer (from source?) and as an user.

Looking through our (very opinionated :wink:) dev guide, two points that might be relevant are “Packages should run on all major platforms”, our recommending e.g. xml2 over XML.

2 Likes

Thanks very much for the input from the ROpenSci crew and thanks to @csgillespie for bringing it to their attention.

Borrowing from @cboettig’s input, I like the idea of describing the ends that we want to encourage or require and showing how these can be achieved through a standard set of means. “Opionation” tends to come out more quickly when we discuss means.

Also, I’ve mentioned to @noamross that I think ROpenSci is ahead when it comes to development and reproducibility standards. Is there interest in coordinating some of these efforts? You have experience we don’t and may be able to reach a larger audience. At least half of the R Journal editorial board has already had interactions with the ROpenSci folks and a lot of our goals are complimentary.

3 Likes