Retrieve data by Kingdom from iDigBio using spocc

Hi,

I’m trying to retrieve species occurrence data for the kingdom “Plantae” from iDigBio and GBIF using the package spocc.

For GBIF the data retrieved seems to be fine. However, I’m only getting very few records from iDigBio (9 records).
Here is the code:

df <- occ(query = 'Plantae', from = c('gbif', 'idigbio'),
          gbifopts = list(limit = 100000, hasCoordinate=TRUE, hasGeospatialIssue = FALSE, basisOfRecord = 'PRESERVED_SPECIMEN'), 
          idigbioopts = list(limit = 100000, basisOfRecord = 'specimen'),
          geometry = c(-51,-20,-29,0))

I suspect the problem is related the way occ() is searching for the data: all records from iDigBio have “plantae” as attribute for the field “name”, which should not be the case (see below).

          name longitude  latitude    prov       date                                  key
100001 plantae -41.57575 -12.54865 idigbio 2008-08-06 0589703b-f324-44a2-b6b2-45cb6b12a691
100002 plantae -39.17330 -13.78420 idigbio 2008-08-02 207410b7-c08a-4d29-ad2a-017b3eb5677a
100003 plantae -43.50000 -12.68330 idigbio 2005-04-17 4a9cbfd0-6269-41fa-9c8d-50e3698fbd45
100004 plantae -43.50000 -12.68330 idigbio 2005-04-17 912d9bc7-c8f1-4613-a7b0-2a72637bfc24
100005 plantae -41.46670 -12.45000 idigbio 2005-10-03 94c8c678-4287-4e91-bbdf-113dee58170b
100006 plantae -51.00000 -11.00000 idigbio 1985-10-16 aa4067af-9826-4470-afc5-b6edc2bdb335
100007 plantae -41.46670 -12.45000 idigbio 2005-10-03 ab12ed4c-57cf-42b3-83bd-c9d171003d25
100008 plantae -41.46670 -12.45000 idigbio 2005-02-20 bdb12f98-fa88-4ab1-88fe-9de20f2fc022
100009 plantae -41.46670 -12.45000 idigbio 2005-02-20 d696b1f4-2ed7-48e0-aaf3-d730037777f1

Any solution?

Thank you very much!

Cheers,

Juliana

1 Like

Hi, thanks for your question @justropp

Sorry about the complicated nature of spocc, but it’s sort of expected when trying to wrangle so many different data sources :slight_smile:

So with some slight tweaks I think this is what you want

  • For the query, remove the global query param, and instead search for plantae with the kingdom parameter in each data source.
  • For idigbio:
    • you need to use the rq syntax, sorry about that :frowning: they require that - see the ridigbio package docs for help on that
    • use basisofrecord lowercase
    • they don’t have a specimen category, use preservedspecimen
df <- occ(from = c('gbif', 'idigbio'), limit = 100,
          gbifopts = list(hasCoordinate=TRUE, hasGeospatialIssue = FALSE,  basisOfRecord = 'PRESERVED_SPECIMEN', kingdom = "plantae"), 
          idigbioopts = list(rq = list(basisofrecord = 'preservedspecimen', kingdom = "plantae")),
          geometry = c(-51,-20,-29,0))

200 results total, 100 for each source

df
#> Searched: gbif, idigbio
#> Occurrences - Found: 2,217,707, Returned: 200
#> Search type: Geometry

each only returned records of type plantae

unique(df$gbif$data[[1]]$kingdom)
#> [1] "Plantae"
unique(df$idigbio$data[[1]]$kingdom)
#> [1] "plant

Hi Sckott,

Thank you very much again for the help! I understand better now how the spocc works and could get the data I was looking for.

All the best,

Juliana

1 Like