Rgbif: occ_count() and occ_search() results differ

Have just started using rgbif, and noticed that occ_count() and occ_search() return different numbers of records for a particular taxon key, but only when using georeferenced & hasCoordinate:

key = 7264332
occ_search(taxonKey=key, limit=20, hasCoordinate=TRUE)
occ_count(taxonKey=key, georeferenced=TRUE)

The respective output is:

Records found [4537] 
4514

If I remove the hasCoordinate and georeferenced parameters then all is well, but I get nervous when the numbers don’t match up like this. Would appreciate any info or advice, thanks.

1 Like

Thanks for your question. I don’t know the reason they differ. My guess is that GBIF may use slightly different methods for each of /count and /occurrence web services on the backend to do the pruning of points to those that have coordinates. But perhaps rgbif does something wrong - though, looking through our code I can’t find anything. I’ll ask on the GBIF mailing list about this and get back to you here…

note to self: here’s the equiv. API calls

http://api.gbif.org/v1/occurrence/search?taxonKey=7264332&hasCoordinate=true&limit=20

http://api.gbif.org/v1/occurrence/count?taxonKey=7264332

Thanks for your replies, Scott. Was wondering if you’d noticed this before, as it doesn’t seem to be the case for all species. No big deal though, I’ll just omit the parameters and get all records, then go from there.

I’ll also check out the GBIF mailing list!

Thanks, these return 4537 and 4635 respectively (the latter is all records).

Note that:

http://api.gbif.org/v1/occurrence/count?taxonKey=7264332&isGeoreferenced=true

returns 4514

I heard back from GBIF quickly. Here’s their response, with a few annotations:

Two things can cause this [discrepancy]:

  1. Eventual consistency
    The count service is an insanely high throughput service, while search is lower throughput - they have different backends, and a messaging bus keeps them in sync. Because of this there is often a short period (up to 1 hr but normally < 5 mins) where they can differ during indexing runs. Issues can creep in and they drift and occasionally we rebuild the count service. The search service is always the correct one.
  1. Geospatial issues
    The isGeoreferenced [parameter] only counts records with coordinates and no known geospatial issues - i.e. records we’d consider suitable for using the coordinates.

In this case it is 2. that provides the difference, and the search service should be using the hasGeospatialIssue parameter.

http://api.gbif.org/v1/occurrence/search?taxonKey=7264332&hasCoordinate=true&hasGeospatialIssue=false&limit=20

http://api.gbif.org/v1/occurrence/count?taxonKey=7264332&isGeoreferenced=true

Both report 4515 records.

I can add a note about this to the documentation for the package.

Make sense? @snubian

Many thanks for chasing this up @sckott, and for your detailed response. That all makes perfect sense.

So is the occ_search() parameter spatialIssues intended to correspond to the API search parameter hasGeospatialIssue? I tried using spatialIssues=FALSE but it didn’t reduce the number of records returned:

occ_search(taxonKey=7264332, limit=20)
occ_search(taxonKey=7264332, spatialIssues=FALSE, limit=20)

Both return all 4635 records.

occ_search(taxonKey=7264332, hasCoordinate=TRUE, spatialIssues=FALSE, limit=20)

returns 4537.

In any case, you’ve answered my question, so thank you again. And a bigger thank you for taking the time to create the rgbif package!

It looks as though GBIF changed the parameter spatialIssues to hasGeospatialIssue. See http://gbif.github.io/dwc-api/apidocs/org/gbif/dwc/terms/GbifTerm.html But in the API docs they still have spatialIssues. I’ll see if they can update that. Sorry, sometimes I don’t hear about changes they make right away.

of course! its fun to work on

opened an issue https://github.com/ropensci/rgbif/issues/151 to fix this

All good, thanks again Scott.

okay, changed in rgbif. Reinstall from github

devtools::install_github("ropensci/rgbif")
library("rgbif")

Next time don’t take so long.

(Kidding!! Thanks!)

1 Like

got it, faster next time :blush: