BISON record limit in spocc library


#1

I was doing some large-scale data gathering using the occ() function in spocc and I noticed that setting the limit arg to >= 100,000 will prevent BISON records from being returned.

Is this a default in the spocc library itself or something triggered from the BISON API? As of this moment, I don’t believe BISON has occurrence data with more than 100,000 records but they are almost certainly going to get there in the near future.

Thanks for clarifying!


#2

hi @PEngelstad

spocc::occ() uses rbison::bison or rbison::bison_solr depending on what the user passes in, see https://github.com/ropensci/spocc/blob/master/R/plugins.r#L203-L222

For rbison::bison https://bison.usgs.gov/#api the max is 500 per request. For rbison::bison_solr the max is much more.

I would suggest using a much smaller number to iterate over since network timeouts can happen with large requests and you put more stress on USGS’s service. That is, pass parameters limit and start in your occ() requests like e.g.,

# if using bison_solr
occ(..., limit = 1000, start = 0)
occ(..., limit = 1000, start = 1000)
etc. ...

# if using bison, can only do 500 at a time
occ(..., limit = 500, start = 0)
occ(..., limit = 500, start = 500)
etc. ...

limit and start map to different parameters internally for bison vs bison_solr, but you just need to pass limit/start

thoughts?