I work to help a lot of users of rOpenSci packages that work with web APIs. Many of these APIs have pagination - that is, you have to make say 10 requests to get all 100 results if there are 10 results per page allowed.
Some users are surprised to find out that they don’t get all results. Many have questions about how to work with pagination and would like it to be done for them.
In the http client (
crul) that I maintain I just added a new class:
You pass an object of class
HttpClient with connection details, and specify a few details about how pagination works with the specific API you’re working with - then
Paginator takes care of the rest.
This works only with synchronous requests for now, but in theory can get it working for asynchronous too.
Update latest dev version on GitHub to get pagination feature
Setup connection details.
(cli <- HttpClient$new(url = "http://api.crossref.org")) #> <crul connection> #> url: http://api.crossref.org #> curl options: #> proxies: #> auth: #> headers:
Paginator, and set the required details:
by: method to do pagination by (only one for now
query_params, also the default, see docs for future options)
limit_param: name of limit parameter
offset_param: name of offset parameter
limit: total results to get
- limit_chunk``: results to get per page (chunk)
This doesn’t perform the HTTP requests.
(cc <- Paginator$new(client = cli, by = "query_params", limit_param = "rows", offset_param = "offset", limit = 50, limit_chunk = 10)) #> <crul paginator> #> base url: http://api.crossref.org #> by: query_params #> limit_chunk: 10 #> limit_param: rows #> offset_param: offset #> limit: 50 #> status: not run yet
Now call the HTTP method you want, here using
cc$get('works') #> OK
The object now has updated status with number of requests done
cc #> <crul paginator> #> base url: http://api.crossref.org #> by: query_params #> limit_chunk: 10 #> limit_param: rows #> offset_param: offset #> limit: 50 #> status: 5 requests done
Other methods to use
# get all responses cc$responses() # get all HTTP status objects cc$status() # get all HTTP status codes cc$status_code() # get all times cc$times() # get all raw bytes cc$content() # parse all responses cc$parse()
would love any feedback on this - there’s bound to be lots of edge cases because there’s no one way to make a web API