Rgbif argument question

I’m curious if the rgbif package can retrieve a particular piece of information for me. When you use the website, you can get to a page of information for each record (example here: http://www.gbif.org/occurrence/918957709), which includes a section called ‘Occurence Details’ and a field called ‘Remarks’. So my question is, can I pull down the information in the ‘Remarks’ field in an occ_search() call? I don’t see an argument listed for that…

Thanks!

1 Like

Hi @kgturner - yep, you can most def get those fields in rgbif

e.g., try http://api.gbif.org/v1/occurrence/918957709 , which gives

{
key: 918957709,
datasetKey: "07fd0d79-4883-435f-bba1-58fef110cd13",
publishingOrgKey: "b542788f-0dc2-4a2b-b652-fceced449591",
publishingCountry: "CA",
protocol: "DWC_ARCHIVE",
lastCrawled: "2014-06-18T18:19:41.557+0000",
lastParsed: "2014-06-18T18:19:41.597+0000",
extensions: { },
basisOfRecord: "PRESERVED_SPECIMEN",
taxonKey: 3128962,
kingdomKey: 6,
phylumKey: 49,
classKey: 220,
orderKey: 414,
familyKey: 3065,
genusKey: 3127469,
scientificName: "Centaurea diffusa Lam.",
kingdom: "Plantae",
phylum: "Magnoliophyta",
order: "Asterales",
family: "Asteraceae",
genus: "Centaurea",
genericName: "Centaurea",
specificEpithet: "diffusa",
taxonRank: "SPECIES",
dateIdentified: "2009-12-31T23:00:00.000+0000",
decimalLongitude: -123.25143,
decimalLatitude: 49.26304,
stateProvince: "British Columbia",
year: 2010,
month: 2,
day: 18,
eventDate: "2010-02-17T23:00:00.000+0000",
issues: [
"COORDINATE_ROUNDED",
"GEODETIC_DATUM_INVALID",
"GEODETIC_DATUM_ASSUMED_WGS84"
],
modified: "2012-02-29T23:00:00.000+0000",
lastInterpreted: "2014-06-18T18:21:15.861+0000",
references: "http://bridge.botany.ubc.ca/herbarium/details.php?db=vwsp.fmp12&layout=vwsp_web_details&recid=195355&ass_num=V232687",
identifiers: [ ],
facts: [ ],
relations: [ ],
geodeticDatum: "WGS84",
class: "Magnoliopsida",
countryCode: "CA",
country: "Canada",
recordNumber: "RU008-16K",
rights: "http://creativecommons.org/publicdomain/zero/1.0/ & http://www.canadensys.net/norms",
rightsHolder: "University of British Columbia",
type: "PhysicalObject",
identifiedBy: "K.G. Turner",
catalogNumber: "V232687",
recordedBy: "K.G. Turner, K. Nurkowski",
bibliographicCitation: "Centaurea diffusa Lam. (UBC V232687)",
preparations: "mounted",
collectionID: "urn:lsid:biocol.org:col:15106",
nomenclaturalCode: "ICBN",
previousIdentifications: "Centaurea diffusa by K.G. Turner 2010",
verbatimSRS: "unknown",
occurrenceID: "153925",
ownerInstitutionCode: "University of British Columbia",
collectionCode: "UBC",
occurrenceRemarks: "Seed collected: 1119, 2006, A. Shipunov, Pjatigorsk City Russia, N43Ëš3'36'' E44Ëš3'0'', no alt, 1 sheet, white flowers. Voucher for a survey of phenotypic variation across the native and invaded ranges, in a common environment.",
gbifID: "918957709",
habitat: "Greenhouse Conditions, 16 hr days, water every third day",
institutionCode: "University of British Columbia",
datasetID: "http://dataset.canadensys.net/ubc-vascular-specimens",
datasetName: "University of British Columbia Herbarium (UBC) - Vascular Plant Collection",
locality: "Vancouver, University of British Columbia Horticultural Greenhouse, 6394 Stores Road University of British Columbia",
language: "en",
identifier: "153925"
}

Or in rgbif

library("rgbif")
occ_get(918957709, fields = "all")$data
       name       key decimalLatitude decimalLongitude                issues
1 Centaurea 918957709        49.26304        -123.2514 cdround,gass84,gdativ
                            datasetKey                     publishingOrgKey publishingCountry    protocol
1 07fd0d79-4883-435f-bba1-58fef110cd13 b542788f-0dc2-4a2b-b652-fceced449591                CA DWC_ARCHIVE
                   lastCrawled                   lastParsed extensions      basisOfRecord taxonKey
1 2014-06-18T18:19:41.557+0000 2014-06-18T18:19:41.597+0000       none PRESERVED_SPECIMEN  3128962
  kingdomKey phylumKey classKey orderKey familyKey genusKey         scientificName kingdom        phylum
1          6        49      220      414      3065  3127469 Centaurea diffusa Lam. Plantae Magnoliophyta
      order     family     genus genericName specificEpithet taxonRank               dateIdentified
1 Asterales Asteraceae Centaurea   Centaurea         diffusa   SPECIES 2009-12-31T23:00:00.000+0000
     stateProvince year month day                    eventDate                     modified
1 British Columbia 2010     2  18 2010-02-17T23:00:00.000+0000 2012-02-29T23:00:00.000+0000
               lastInterpreted
1 2014-06-18T18:21:15.861+0000
                                                                                                            references
1 http://bridge.botany.ubc.ca/herbarium/details.php?db=vwsp.fmp12&layout=vwsp_web_details&recid=195355&ass_num=V232687
  identifiers facts relations geodeticDatum         class countryCode country recordNumber
1        none  none      none         WGS84 Magnoliopsida          CA  Canada    RU008-16K
                                                                               rights
1 http://creativecommons.org/publicdomain/zero/1.0/ & http://www.canadensys.net/norms
                    rightsHolder           type identifiedBy catalogNumber                recordedBy
1 University of British Columbia PhysicalObject  K.G. Turner       V232687 K.G. Turner, K. Nurkowski
                 bibliographicCitation preparations                  collectionID nomenclaturalCode
1 Centaurea diffusa Lam. (UBC V232687)      mounted urn:lsid:biocol.org:col:15106              ICBN
                previousIdentifications verbatimSRS occurrenceID           ownerInstitutionCode
1 Centaurea diffusa by K.G. Turner 2010     unknown       153925 University of British Columbia
  collectionCode
1            UBC
                                                                                                                                                                                                                   occurrenceRemarks
1 Seed collected: 1119, 2006, A. Shipunov, Pjatigorsk City Russia, N43˚3'36'' E44˚3'0'', no alt, 1 sheet, white flowers. Voucher for a survey of phenotypic variation across the native and invaded ranges, in a common environment.
     gbifID                                                  habitat                institutionCode
1 918957709 Greenhouse Conditions, 16 hr days, water every third day University of British Columbia
                                             datasetID
1 http://dataset.canadensys.net/ubc-vascular-specimens
                                                                 datasetName
1 University of British Columbia Herbarium (UBC) - Vascular Plant Collection
                                                                                                             locality
1 Vancouver, University of British Columbia Horticultural Greenhouse, 6394 Stores Road University of British Columbia
  language identifier
1       en     153925

Ok, great.

In a related question (apologies for non-pretty post), how do I get occ_search and/or occ_get to search for a list of taxon keys? The documentation says “You can pass many keys by passing occ_search in a call to an lapply-family function (see last example below).” But I don’t see this example (certainly it’s not the last one), unless it refers to

Search for many species

splist ← c(‘Cyanocitta stelleri’, ‘Junco hyemalis’, ‘Aix sponsa’)
keys ← sapply(splist, function(x) name_suggest(x)$key[1], USE.NAMES=FALSE)
occ_search(taxonKey=keys, limit=5, return=‘data’)

…which I’m not sure what the sapply function is supposed to do, but it doesn’t seem to work if you give it a vector of taxonkeys instead of species names. The vector of taxonkeys doesn’t work directly in the occ_search either.

This is what I want to do, ideally returning a data frame or something I can turn into a dataframe:

ghouse ← c(918957709, 466334237, 918957719, 466334233, 91895773, 466334221)
ghousedat ← occ_search(taxonKey=ghouse,
fields=c(“gbifID”,“species”, “basisOfRecord”,“year”,“eventDate”,“countryCode”,spaces`“decimalLatitude”,“decimalLongitude”, “genus”,“specificEpithet”,“collectionCode”,“institutionCode”,“locality”,“datasetKey”,“recordNumber”,“occurrenceRemarks”), return=“data”)

What I get is a list of empties:

head(ghousedat)
$918957709
[1] “no data found, try a different search”

$466334237
[1] “no data found, try a different search”

$918957719
[1] “no data found, try a different search”

$466334233
[1] “no data found, try a different search”

$918957739
[1] “no data found, try a different search”

$466334221
[1] “no data found, try a different search”

If I use occ_get, I get data, but a list of lists:

test2 ← occ_get(ghouse, fields = c(“gbifID”,“species”, “basisOfRecord”,“year”,“eventDate”,“countryCode”,
“decimalLatitude”, “decimalLongitude”, “genus”,“specificEpithet”,“collectionCode”,
“institutionCode”,“locality”, “datasetKey”,“recordNumber”,“occurrenceRemarks”))

head(test2)
[[1]]
[[1]]$hierarchy
name key rank
1 Plantae 6 kingdom
2 Magnoliophyta 49 phylum
3 Magnoliopsida 220 class
4 Asterales 414 order
5 Asteraceae 3065 family
6 Centaurea 3127469 genus

[[1]]$media
list()

[[1]]$data
datasetKey basisOfRecord genus specificEpithet decimalLongitude decimalLatitude year eventDate
1 07fd0d79-4883-435f-bba1-58fef110cd13 PRESERVED_SPECIMEN Centaurea diffusa -123.2514 49.26304 2010 2010-02-17T23:00:00.000+0000
countryCode recordNumber collectionCode
1 CA RU008-16K UBC
occurrenceRemarks
1 Seed collected: 1119, 2006, A. Shipunov, Pjatigorsk City Russia, N43°3’36’’ E44°3’0’', no alt, 1 sheet, white flowers. Voucher for a survey of phenotypic variation across the native and invaded ranges, in a common environment.
gbifID institutionCode
1 918957709 University of British Columbia
locality
1 Vancouver, University of British Columbia Horticultural Greenhouse, 6394 Stores Road University of British Columbia

So ok, what should I do?

Ah, one thing wrong here is that occurrence keys are different from taxon keys. The key 918957709 in http://www.gbif.org/occurrence/918957709 is an occurrence key, which you can pass e.g. to occ_get(), but you can’t pass to occ_search(), which in the taxonKey parameter wants taxonomic keys, like 3128962 in http://www.gbif.org/species/3128962

more on the outputs later…

Ah, ok, I’m definitely confused about the difference between taxonKey, occurrence key, and gbifID. All they all different things?

Maybe this helps in thinking about the various keys:

  • Every taxonomic name has a taxonKey, but that key can denote any taxonomic level: class, order, genus, species, etc.
  • Each taxonKey can have any number of occurrences denoted by key (see below). The taxonKey should match the returned taxonKey from occ_search() function.
  • gbifID i usually ignore, but seems to be the same as the occurrence key
  • There’s another key called a nubKey that’s returned in the name_*() functions, which is usually/or always the same as the taxonKey
occ_search(3128962, limit=3, fields=c('name','key','taxonKey','gbifID'))
Records found [1520] 
Records returned [3] 
No. unique hierarchies [1] 
No. media records [1] 
Args [taxonKey=3128962, limit=3, offset=0, fields=name,key,taxonKey,gbifID] 
First 10 rows of data

       name        key taxonKey     gbifID
1 Centaurea 1038571715  3128962 1038571715
2 Centaurea  920996089  3128962  920996089
3 Centaurea  859763858  3128962  859763858

I’ll see if I can make the documentation more clear on all this…

1 Like

@kgturner For combining outputs of functions:

works the same for occ_get() and occ_search()

library("plyr")
out <- occ_get(c(918957709, 466334237))
rbind.fill(lapply(out, "[[", "data"))
       name       key decimalLatitude decimalLongitude                issues
1 Centaurea 918957709        49.26304        -123.2514 cdround,gass84,gdativ
2 Centaurea 466334237        49.26304        -123.2514        cdround,gass84
res <- occ_search(taxonKey=c(2482598,2492010,2498387), limit=5, fields="minimal")
res$`2482598`$data
rbind.fill(lapply(res, "[[", "data"))
                  name        key decimalLatitude decimalLongitude              issues
1  Cyanocitta stelleri  891781350        37.73646       -122.48801 cdround,cudc,gass84
2  Cyanocitta stelleri  891046529        32.82392       -116.53233 cdround,cudc,gass84
3  Cyanocitta stelleri  891056081        37.76811       -122.47370 cdround,cudc,gass84
4  Cyanocitta stelleri  891046151        19.29372        -98.65598 cdround,cudc,gass84
5  Cyanocitta stelleri  891047537        37.86877       -122.23729 cdround,cudc,gass84
6       Junco hyemalis  891036043        45.56906       -122.54434 cdround,cudc,gass84
7       Junco hyemalis  891036414        44.32038        -73.09067 cdround,cudc,gass84
8       Junco hyemalis  891036487        37.87244       -122.25864 cdround,cudc,gass84
9       Junco hyemalis  891036897        32.89302       -117.24204 cdround,cudc,gass84
10      Junco hyemalis  891035392        44.24846        -73.07934 cdround,cudc,gass84
11          Aix sponsa 1022841740        50.55339          7.26308 cdround,cudc,gass84
12          Aix sponsa  920269599        51.76471          9.05524      cdround,gass84
13          Aix sponsa  920269677        49.82962          9.86967      cdround,gass84
14          Aix sponsa  891035006        32.74783        -97.35127 cdround,cudc,gass84
15          Aix sponsa  891035458        34.23871       -116.94964 cdround,cudc,gass84

Does that make sense? There are convenience functions in spocc to combine data outputs for users, but I guess we don’'t have these in rgbif

2 Likes