I was doing some large-scale data gathering using the occ() function in spocc and I noticed that setting the limit arg to >= 100,000 will prevent BISON records from being returned.
Is this a default in the spocc library itself or something triggered from the BISON API? As of this moment, I don’t believe BISON has occurrence data with more than 100,000 records but they are almost certainly going to get there in the near future.
Thanks for clarifying!
hi @PEngelstad
spocc::occ()
uses rbison::bison
or rbison::bison_solr
depending on what the user passes in, see https://github.com/ropensci/spocc/blob/master/R/plugins.r#L203-L222
For rbison::bison
https://bison.usgs.gov/#api the max is 500 per request. For rbison::bison_solr
the max is much more.
I would suggest using a much smaller number to iterate over since network timeouts can happen with large requests and you put more stress on USGS’s service. That is, pass parameters limit
and start
in your occ()
requests like e.g.,
# if using bison_solr
occ(..., limit = 1000, start = 0)
occ(..., limit = 1000, start = 1000)
etc. ...
# if using bison, can only do 500 at a time
occ(..., limit = 500, start = 0)
occ(..., limit = 500, start = 500)
etc. ...
limit
and start
map to different parameters internally for bison
vs bison_solr
, but you just need to pass limit
/start
thoughts?