GBIF data for Magnoliophyta back in Oct 13, 2010?


#1

Question from Nora https://twitter.com/laguanegna/status/659345136841793536

Is there a way I can tell GBIF to show what was the data they had for Magnoliophyta back in Oct 13/2010?

I assume two things:

  • you don’t mean specimens/records here, but taxa within Magnoliophyta
  • by 2010-10-13, you mean as of that date, not collected on that date

If yes to both, I’d approach it this way:

Search for Magnoliophyta against GBIF backbone taxonomy

name <- name_backbone(name = "Magnoliophyta")
name$usageKey
#> [1] 49

Search for occurrences (could use occ_count(), but year based searching isn’t working right now)

res <- occ_search(taxonKey = name$usageKey, year = "*,2010", limit = 10)
res$meta
#> $offset
#> [1] 0
#> 
#> $limit
#> [1] 10
#> 
#> $endOfRecords
#> [1] FALSE
#> 
#> $count
#> [1] 94648899

A total of 94648899 records. As far as I know, you can’t get more detailed than that with the date searching.

To actually get records, don’t use occ_search(), but instead use the occ_download*() functions, which is the same thing as using from the GBIF website, but this way you do it from R. For example:

occ_download('taxonKey = 49', 'year <= 2010')

which will ask GBIF to prepare a download for you. You could get these results and further filter to the specific date October 13, 2010. It will be a lot of data, so best probably to put in a database first, I can help if you want.


#2

Scott,

This is great!. I was trying to do this (but for the 13th of Oct):

occ_download('taxonKey = 49', 'year <= 2010')

I only need coordinates, as I want to reproduce the following map (yes, it’s a blurry screenshot I took five years ago!) and then calculate the average count/per degree cell per country.

I’m happy if you have suggestions to produce this.

Thanks again,

Nora


#3

So you do want coordinates, not just a count of records?

You can use the GBIF mapping service (http://www.gbif.org/developer/maps), or do you want to make one in R, or other tool?

I think we can maybe do this with the count API. One way is to use the count_facet() function, which is a wrapper around their count API, e.g., here get counts by country

count_facet(keys = 49, by='country', countries=3, removezeros = TRUE)
#>   .id country    V1
#> 1  49      AD 71127
#> 2  49      AE  2099
#> 3  49      AF 52946

You can also feed in specific country codes. Does this do what you want, or do you want within each country, counts per degree cell? If so, as far as I know, they don’t provide that data. I think they used to provide it in their old API, but no longer. I will ask See qeustion on the mailing list at http://lists.gbif.org/pipermail/api-users/2015-October/000239.html


#4

@ncastaneda The search by grid cell features is gone. It used to be available in the old API, but was too computationally expensive for them. I just asked, and that may return in the future.


#5

Hi Scott,

Thanks for answering and helping to find solutions.

I’m exploring if the count_facet can show data from a specific date (e.g., 13 Oct 2010)

Cheers!

Nora


#6

Dear Scott,

Reading again this thread, I think I have not explained myself correctly.

I was initially planning to download only coordinates for Magnoliophyta for the data available through GBIF by the 15th of October, 2010. Then, I would produce a density raster (similar to the one shown in the screenshot I shared above), and then will explore the average occurrence record count/gridcell for all countries in the world.

I explored the count_facet function, but it does not enable me to set dates. I just tried to download the complete set of georeferenced records (but I guess this will too slow).

Cheers,

Nora


#7

Did you use occ_search() or occ_download() or GBIF website? With occ_search() you can only get up to 200,000 records, so you probably want occ_download() - let me know if you need help with that.


#8

Dear Scott

I used the gbif website for this, although I’m aware of the limitations of occ_search() and thus I always use occ_download(). Will let you know how it goes.

Nora


#9

Following on the question on Magnoliophyta above I wonder if GBIF data allows to look at the change in densities of a given species over time? Say, density of Magnoliophyta species in one given region over last decade? Thanks! I found out about GBIF only today and I find this a fascinating idea.


#10

thanks for your question @PWaryszak

Can you explain a bit more what you are getting at? Not clear to me what you mean by density. Abundance? Richness?


#11

Thank You for lightning fast response. I think of plot-based scenario here. I think of density as number of stems/individuals per 1m2 (but it can be by any regular cell size really).


#12

First thing that comes to mind is searching for datasets that collected data in plots: e.g,

res <- dataset_search(query = "plots", limit=200)
res$data
# A tibble: 116 x 8
                                                                       datasetTitle
                                                                              <chr>
 1                            Data from vegetation plots at Atiquipa, Southern Peru
 2                    Species plots from the Norwegian Vegetation Mapping Programme
 3 Species checklist on the permanent sample plot (Prioksko-Terrasnyi Biosphere Res
 4                           2010-2013 Beetle Data from Machair LIFE+ Project Plots
 5  (Table 4) Acari abundance in control and warming plots, Abisco Research Station
 6 Alien plant presence dataset from the point-radius plot surveys in 2010-2015 in
 7                                                                   IPAS Kitanglad
 8                                                       SuLaMa reptile survey 2013
 9 Inventory of natural and agroforestry stands characterized by Xylopia aethiopica
10 Lama Forest reserve Inventory, South Benin. Data published in the framework of J
# ... with 106 more rows, and 7 more variables: datasetKey <chr>, type <chr>,
#   hostingOrganization <chr>, hostingOrganizationKey <chr>,
#   publishingOrganization <chr>, publishingOrganizationKey <chr>,
#   publishingCountry <chr>

descriptions gets you to long form descriptions of each dataset

res$descriptions

then can manually or programatically filter through to see what datasets you want to work with.

There’s no filter or flag to search for data collected in plots, so this seems like the quickest way.

Once you pick datasets you can query on species of countries, etc. and include the dataset key like

> occ_data(datasetKey = "5accf920-492e-4641-9ba2-11481c116419")
Records found [9054]
Records returned [500]
Args [limit=500, offset=0, datasetKey=5accf920-492e-4641-9ba2-11481c116419]
# A tibble: 500 x 54
                   name        key decimalLatitude decimalLongitude issues
                  <chr>      <int>           <dbl>            <dbl>  <chr>
 1    Oenanthe oenanthe 1556661912        57.02629         -4.22566
 2                 <NA> 1556666800        57.12099         -3.93400
 3 Linaria flavirostris 1556664293        57.06609         -3.99711
 4     Larus glaucoides 1556664347        57.96485         -3.97924
 5     Larus glaucoides 1556663773        57.96485         -3.97924
 6          Gavia immer 1556664263        57.75044         -3.90029
 7     Acanthis flammea 1556666017        58.02025         -3.88071
 8           Alca torda 1556666590        57.65400         -4.29750
 9        Anas penelope 1556668573        57.12099         -3.93400
10      Calidris alpina 1556666153        57.49305         -4.25400
# ... with 490 more rows, and 49 more variables: datasetKey <chr>,
#   publishingOrgKey <chr>, publishingCountry <chr>, protocol <chr>,
#   lastCrawled <chr>, lastParsed <chr>, crawlId <int>, basisOfRecord <chr>,
#   taxonKey <int>, kingdomKey <int>, phylumKey <int>, classKey <int>,
#   orderKey <int>, familyKey <int>, genusKey <int>, scientificName <chr>,
#   kingdom <chr>, phylum <chr>, order <chr>, family <chr>, genus <chr>,
#   genericName <chr>, specificEpithet <chr>, taxonRank <chr>,
#   coordinateUncertaintyInMeters <dbl>, stateProvince <chr>, year <int>,
#   month <int>, day <int>, eventDate <chr>, lastInterpreted <chr>,
#   license <chr>, geodeticDatum <chr>, class <chr>, countryCode <chr>,
#   country <chr>, recordNumber <chr>, eventID <chr>, identifier <chr>,
#   occurrenceStatus <chr>, vernacularName <chr>, institutionCode <chr>,
#   taxonConceptID <chr>, locality <chr>, collectionCode <chr>, gbifID <chr>,
#   occurrenceID <chr>, dataGeneralizations <chr>, infraspecificEpithet <chr>

#13

Big Thanks @sckott - This is excellent !