Date range searches for biodiversity data across data sources with spocc

spocc
r
Tags: #<Tag:0x00007f57fad1b7e0> #<Tag:0x00007f57fad1b128>

#1

spocc is an R client that allows fetching occurrence records data from many different data sources.

We attempt to make it easy as possible to do the same operation across all data sources. For example, you can set whether you want back only records with lat/long coordinates or not across data sources without having to know the internals of what each data source wants you to do (and they often differ). There are many other examples.


A user just asked about how to do date based searching with a single data source. I started looking into it, and it made most sense to simply implement date based searching across all data sources.

The caveat right now is that spocc only supports date range searches with two dates (a start date and an end date). There are of course other ways to search with dates, but i thought iā€™d start with what I assume is the most common use case for date based searching.

install

install development version from github, requires some dependencies that are also dev versions that needed fixes for date searches to work

remotes::install_github("ropensci/spocc")

date range searches

bison

occ(query = 'Acer', date = c('2010-08-08', '2010-08-21'), from = 'bison', limit=5)
#> Occurrences - Found: 570,483, Returned: 5
#> Search type: Scientific
#>   bison: Acer (5)

ala

occ(query = 'Alaba', date = c('2010-01-01T00:00:00Z', '2017-12-31T00:00:00Z'), from = 'ala', limit = 5)
#> Searched: ala
#> Occurrences - Found: 0, Returned: 4
#> Search type: Scientific
#>   ala: Alaba (4)

gbif

occ(query = 'Accipiter striatus', date = c('2010-08-01', '2010-08-31'), from = 'gbif', limit=5)
#> Searched: gbif
#> Occurrences - Found: 1,044, Returned: 5
#> Search type: Scientific
#>   gbif: Accipiter striatus (5)

ecoengine

occ(date = c('2010-01-01', '2010-12-31'), from = 'ecoengine', limit=5)
#> Searched: ecoengine
#> Occurrences - Found: 41,026, Returned: 5
#> Search type: Scientific
#>   ecoengine: custom query (5)

antweb

occ(query = "acanthognathus", date = c('2010-01-01', '2010-12-31'), from = 'antweb', limit=5)
#> Searched: antweb
#> Occurrences - Found: 8, Returned: 5
#> Search type: Scientific
#>   antweb: acanthognathus (5)

vertnet

occ(query = 'Mustela nigripes', date = c('1990-01-01', '2015-12-31'), from = 'vertnet', limit=5)
#> Searched: vertnet
#> Occurrences - Found: 49, Returned: 5
#> Search type: Scientific
#>   vertnet: Mustela nigripes (5)

idigbio

occ(query = 'Acer', date = c('2010-01-01', '2015-12-31'), from = 'idigbio', limit=5)
#> Searched: idigbio
#> Occurrences - Found: 13, Returned: 5
#> Search type: Scientific
#>   idigbio: Acer (5)

obis

occ(query = 'Mola mola', date = c('2015-01-01', '2015-12-31'), from = 'obis', limit=5)
#> Searched: obis
#> Occurrences - Found: 456, Returned: 5
#> Search type: Scientific
#>   obis: Mola mola (5)

inat

occ(query = 'Danaus plexippus', date = c('2015-01-01', '2015-12-31'), from = 'inat', limit=5)
#> Searched: inat
#> Occurrences - Found: 4,882, Returned: 5
#> Search type: Scientific
#>   inat: Danaus plexippus (5)

#2

Scott, thanks again for implementing this feature.

In beta testing the dev-version from GitHub, Iā€™ve noticed that the BISON results may not be listening to parameters.

i.e. Running occ(ā€œBromus tectorumā€, from=ā€˜bisonā€™) only returns results from eBirdā€¦

Just wanted to give you a heads up if you hadnā€™t already run into this error!

-Peder


#3

Thanks for the feedback. Can you include the code you ran from start to finish and the output? I just ran the below and all seems fine:

library(spocc)
packageVersion('spocc')
#> `0.7.3.9310`
x <- occ('Bromus tectorum', from='bison')
x$bison
#> Species [Bromus tectorum (500)]
#> First 10 rows of [Bromus_tectorum]
#> 
#> # A tibble: 500 x 42
#>    name  longitude latitude prov  date       providedScientiā€¦  year countryCode
#>    <chr>     <dbl>    <dbl> <chr> <date>     <chr>            <int> <chr>
#>  1 Zonoā€¦     -117.     34.1 bison NA         Zonotrichia leuā€¦  2014 US
#>  2 Tyraā€¦     -117.     34.1 bison 1982-01-05 Tyrannus vocifeā€¦  1982 US
#>  3 Sturā€¦     -117.     34.1 bison 2011-02-05 Sturnus vulgariā€¦  2011 US
#>  4 Sturā€¦     -117.     34.1 bison 2006-01-10 Sturnus vulgariā€¦  2006 US
#>  5 Contā€¦     -117.     34.1 bison NA         Contopus sordidā€¦  1987 US
#>  6 Actiā€¦     -117.     34.1 bison NA         Actitis macularā€¦  2003 US
#>  7 Cathā€¦     -117.     34.1 bison NA         Catharus guttatā€¦  2013 US
#>  8 Buteā€¦     -117.     34.1 bison 2010-08-05 Buteo jamaicensā€¦  2010 US
#>  9 Calyā€¦     -117.     34.1 bison 2014-03-05 Calypte costae ā€¦  2014 US
#> 10 Streā€¦     -117.     34.1 bison 2014-06-04 Streptopelia deā€¦  2014 US
#> # ... with 490 more rows, and 34 more variables: providedCounty <chr>,
#> #   ambiguous <lgl>, verbatimLocality <chr>, latlon <chr>,
#> #   computedCountyFips <chr>, occurrenceID <chr>, basisOfRecord <chr>,
#> #   providedCommonName <chr>, ownerInstitutionCollectionCode <chr>,
#> #   institutionID <chr>, computedStateFips <chr>, license <chr>, TSNs <chr>,
#> #   providerID <int>, stateProvince <chr>, higherGeographyID <chr>,
#> #   verbatimEventDate <chr>, coordinatePrecision <chr>,
#> #   verbatimElevation <chr>, recordedBy <chr>, geo <chr>, provider <chr>,
#> #   calculatedCounty <chr>, verbatimDepth <chr>, catalogNumber <chr>,
#> #   ITISscientificName <chr>, coordinateUncertaintyInMeters <chr>,
#> #   pointPath <chr>, kingdom <chr>, calculatedState <chr>,
#> #   hierarchy_homonym_string <chr>, collectorNumber <chr>, resourceID <chr>,
#> #   ITIStsn <chr>

where all data from bison has provider

unique(x$bison$data$Bromus_tectorum$provider)
#> [1] "Cornell Lab of Ornithology"

in this particular search


#4

An important point of clarification: Bromus tectorum is a plant. Not expecting avian species to be returned in the results. That is what seems to be the issue.


#5

Sorry about that, there was an error recently introduced. Try again after reinstalling spocc remotes::install_github("ropensci/scocc").

Now I get

Species [Bromus tectorum (10)]
First 10 rows of [Bromus_tectorum]

# A tibble: 10 x 46
   name  longitude latitude prov  date       providedScientiā€¦  year countryCode
   <chr>     <dbl>    <dbl> <chr> <date>     <chr>            <int> <chr>
 1 Bromā€¦     -122.     41.3 bison NA         Bromus tectorumā€¦  1923 US
 2 Bromā€¦     -121.     46.7 bison 1980-01-06 Bromus tectorumā€¦  1980 US
 3 Bromā€¦     -121.     46.7 bison 1980-01-06 Bromus tectorumā€¦  1980 US
 4 Bromā€¦     -121.     46.7 bison 1980-01-06 Bromus tectorumā€¦  1980 US
 5 Bromā€¦     -121.     46.7 bison 1980-01-06 Bromus tectorumā€¦  1980 US
 6 Bromā€¦     -121.     46.7 bison 1980-01-06 Bromus tectorumā€¦  1980 US
 7 Bromā€¦     -121.     46.7 bison 1993-01-06 Bromus tectorumā€¦  1993 US
 8 Bromā€¦     -121.     46.7 bison 1980-01-06 Bromus tectorumā€¦  1980 US
 9 Bromā€¦     -113.     38.6 bison 2020-10-02 Bromus tectorumā€¦  2002 US
10 Bromā€¦     -113.     38.7 bison 2020-10-02 Bromus tectorumā€¦  2002 US

#6

Yep. Working smoothly over here too. Thanks for the fix!