Best practices for testing API packages


#1

Hi all,

I wanted to check in and discuss what we should be recommending for testing of API packages for rOpenSci submissions. I find a lot of diversity in how these packages are tested and dealt with in CI, and I’d like to have some consensus.

Many times people just use skip_on_cran() for everything, and have Travis-CI act as CRAN. In this case. Travis-CI status is somewhat meaningless. We can’t count on it to give package status. We can usually run tests (and measure test coverage) locally, as I do on package submission, but we won’t see breakage as packages get updated during the review or during our regular testing of accepted packages. Alternatively, Travis-CI can use NOT_CRAN, but this will often give false errors due to connectivity issues.

Karina Marks and @gabor’s recent blog post about mocking gives a neat alternative that makes API calls CRAN compatible and not dependent on API service status. However, we want to know if there’s an error caused by changes to the API being wrapped, as well.

Here are some potential strategies:

  • Separate tests, both mocked and not (which can be run alternately depending on whether on CRAN)
  • Mocked tests, with a check that returns a message, not an error, for when a website can’t be reached
  • Explicit tests that check that the data returned from an API is the correct form, before the package processes it.
  • Just ask authors to be explicit in package submission what tests are running on Travis/CRAN and which are being run locally.

Thoughts?

P.S. Is there a way to automatically test example code even if it’s wrapped in \dontrun{}


#2

Given experience in other langs (Ruby, Python, etc.) it seems mocking in tests is definitely the standard. And makes test suites run faster, etc. Of course testing the API itself is often still a concern, so need to do both.

I’m nearly there with a useable first version of vcr https://github.com/ropenscilabs/vcr ported from Ruby. It will be super flexible, allowing mocking of HTTP requests, matching patterns based on base URL, paths, query params, headers, any combination of the previous, etc.

Separate tests, both mocked and not (which can be run alternately depending on whether on CRAN)

perhaps a set of triggers is best, so test if on CRAN or not and internet avail or not. Then, if on CRAN or if internet not avail., then use mocked data, if not on CRAN and internet avail use full tests, or maybe only full tests when on Travis e.g.

Mocked tests, with a check that returns a message, not an error, for when a website can’t be reached

Agree, this is useful, since checks can still pass, but then we still know if the web service is down

Explicit tests that check that the data returned from an API is the correct form, before the package processes it.

Perhaps, but i guess the exported functions that call the API will presumably account for this, though maybe not fail well when the API response is not in correct form depending on how code is written

Just ask authors to be explicit in package submission what tests are running on Travis/CRAN and which are being run locally.

Hmm, maybe we could scan the pkg code and see which fxns make http requests by looking for e.g., httr::GET, etc. then we’d know which tests make requests?


In the long term, as some have mentioned, its probably best to get all of our software to separate out the code that does the HTTP request and the code that parses the data, so they can be tested separately. I’m not sure what the best pattern is to do this, but could explore this separately.

There’s devtools::run_examples(run = FALSE) but presumably there’s a way to do it from R check, but not sure what the flag would be.


#3

There is the --run-dontrun flag for R CMD check. It’s what we use for rotl on Travis to make sure the examples included in the docs work.


In my view it would be great to have tools/best practices that allows the maintainers to detect if the tests fail because:

  1. the API or the endpoint is down
  2. the format of the response has changed
  3. there is a bug in the R code

We try to do that in rotl by having low-level functions (not exported) that return a minimally processed response from the API (typically as lists), and exported functions that process the responses from the API into something the user can work with (data frames). Unit tests on the low-level functions are aimed at detecting changes in the response format from the API, while tests on the exported functions aim at testing bugs in the code for the package.


#4

I like this approach. I personally don’t separate out these concerns in packages I maintain, and I probably should. Since this can be somewhat involved, we could add a separate document, and link to it from https://github.com/ropensci/onboarding/blob/master/packaging_guide.md#testing

One way to do this is via testthat skipping if the API is down, but then again, I kind of like it when tests fail if the API is down, so know that’s the case and notify the data provider.

I like this, good idea

I like this too. And if tests for response data don’t fail, then some code in the pkg or a dependency pkg has changed and introduced bugs


#5

I am a big fan of mocking for API requests, but I’ve had trouble thinking through how to do this effectively in R. I’m very excited that someone is working on porting vcr.

I have this same problem not just with APIs but also things like database connectors. For example, I have a function that basically creates a RJDBC connection regardless of the database source, selecting the right driverClass and classPath, but it’s a bit hard to conceptualize how to test this and test code that’s meant to then successfully execute a SQL query on that connection.


Best practice for testing packages with database connections
#6

Cool, glad you’re excited about vcr.

Opened a new issue for database connection testing, pinged you there


#7

Thanks Noam for opening this important topic! A very useful addition to testthat::with_mock is the mockery package, particularly the stub.R function which acts like with_mock, but matches environments to enable, for example, mocking of functions called within other functions.

And Scott: Any thoughts on what to do about vcr::casette sizes? I’m guessing they could be quite large and might generate check warnings about large installed package size.


#8

Right, they could be quite large, but vcr and similar tools I think are mostly targeted at unit testing, which hopefully aren’t making http requests that take a long time or generate large responses

p.s. realized i need to make something else before vcr is ready. that thing is webmockr https://github.com/ropenscilabs/webmockr - still working on some fundamental problems as the thing it’s ported from is in Ruby, and they support monkey-patching, essentially over-riding methods within a package dependency that does the http requests - whereas that’s not supported in R, so still thinking about this


#9

Hi all!

FWIW to test r-pkgs/gh I just wrote a small HTTP mocking package, which is similar in spirit to webmockr (which I only found now).

It is here: https://github.com/gaborcsardi/httrmock

Only works with httr as you might suspected. Also not very flexible, and I am still trying to work out a nice workflow for recording and replaying HTTP requests and responses.

To continue with the bad things, the mocking part is sketchy, but there is just no good solution for it, at least the replay part. Hopefully with some help from Hadley (=httr changes) we can make it a bit more robust. Currently it only works with a single version of httr, 1.2.1, to be safe.

Anyway, @sckott is the most experienced person when it comes to writing API clients, so please let me know what you think. It would be really nice to work out a smooth method to test APIs.

Cheers,
Gabor


#10

Oh, I just sat this one, too: http://ropensci.org/blog/technotes/2016/11/09/crul-release

Missed crul completely… :frowning:


#11

Thanks for the link @gabor - hadn’t seen it. Been working on vcr for quite a while without finishing it, realized I needed webmockr first, then figured easiest to roll my own http client to integrate with webmockr and vcr

I’m hoping that webmockr and vcr will be able to be used in theory with any http client, just as the Ruby versions do.

It’s too bad we can’t monkey patch as in Ruby, so it makes it a bit more painful to try to integrate mocking with existing http clients.

Part of reason for crul is that I make so many pkgs that wrap APIs that I want to control the http client so that I can do things like incorporate mocking, incorporate my errors pkg (https://github.com/sckott/fauxpas), etc.

I’ll have a look at yours


#12

I’m by no means an expert in API testing best practices, but I’m (slowly) learning the ropes. Some resources that might be beneficial to others:


#13

Hi, just seeing this thread now. In the time since the initial discussion happened, I’ve released httptest, a package that offers solutions to many of the complications discussed here. Among other features, httptest makes it easy to record safely and then replay HTTP requests and responses, both in unit tests and in vignettes. See http://enpiar.com/r/httptest/ for an overview and docs.