An rgbif
user wrote with some questions about using the package. The following are the questions:
Taxonomy
The user had a genus Myosotis
that they were working with and ideally wanted to just get occurrences of the genus across all it’s species. They ended up getting GBIF taxon keys for each species, but wondered if there was a way to get occurrence data without havint to first get GIBF taxon keys for each taxon.
There’s a couple of different ways to go.
First, you can use the higher taxonomic name (e.g., genus) to fetch data rather than fetching data for each lower taxonomic name. Caveat being that I am not positive you’ll get the same results. And using the higher taxonomic name only works if you’re sure you want data for all the lower taxonomic names (e.g,. all the species in a genus). So with this approach you can do something like
# get the key for Myosotis
key <- name_backbone('Myosotis')$usageKey
# let's see how many occurrences GBIF has for that key
occ_count(taxonKey = key)
#> [1] 694310
# search for ocurrence data
(x <- occ_data(taxonKey = key, limit = 3))
#> Records found [694310]
#> Records returned [3]
#> Args [limit=3, offset=0, taxonKey=2925668]
#> # A tibble: 3 x 64
#> name key decimalLatitude decimalLongitude issues datasetKey
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 Myosotis… 1.81e9 -46.5 170. cdround… 50c9509d-22c7-4…
#> 2 Myosotis… 1.81e9 38.0 -123. cdround… 50c9509d-22c7-4…
#> 3 Myosotis… 1.81e9 -44.9 170. cdround… 50c9509d-22c7-4…
#> # ... with 20 more rows, and 58 more variables: publishingOrgKey <chr>,
#> # publishingCountry <chr>, protocol <chr>, lastCrawled <chr>,
#> # cutoff for brevity
# all names are species within the genus (don't know if these are all good names or not)
x$data$name
#> [1] "Myosotis rakiura" "Myosotis latifolia" "Myosotis australis"
Second, you can get lower taxonomic names using functions in the rgbif
package e.g.,
res <- name_lookup(higherTaxonKey=2925668, limit = 300, status = "accepted")
res$data
#> # A tibble: 250 x 36
#> key scientificName datasetKey constituentKey parentKey parent kingdom
#> <int> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 7930390 Myosotis rehst… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 2 5660749 Myosotis saxos… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 3 5660801 Myosotis pusil… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 4 5661619 Myosotis ambig… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 5 8075125 Myosotis ergak… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 6 8166715 Myosotis macra… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 7 7626464 Myosotis spelu… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 8 5341177 Myosotis disco… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 9 5660980 Myosotis marit… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> 10 5661200 Myosotis heter… d7dddbf4-2… 7ddf754f-d193-… 2925668 Myoso… Plantae
#> # ... with 240 more rows, and 29 more variables: phylum <chr>, order <chr>,
#> # family <chr>, genus <chr>, species <chr>, kingdomKey <int>,
#> # phylumKey <int>, classKey <int>, orderKey <int>, familyKey <int>,
#> # genusKey <int>, speciesKey <int>, canonicalName <chr>, authorship <chr>,
#> # publishedIn <chr>, nameType <chr>, taxonomicStatus <chr>, rank <chr>,
#> # origin <chr>, numDescendants <int>, numOccurrences <int>, habitats <chr>,
#> # nomenclaturalStatus <lgl>, threatStatuses <chr>, synonym <lgl>,
#> # class <chr>, nubKey <int>, basionymKey <int>, basionym <chr>
and then use those key’s to get occurence data for those specific taxa
Third, you can use taxize
package which has some tools to help you get taxonomic names specifically, e.g.,
# upgrade from github cause there's a bug fix i just pushed
remotes::install_github("ropensci/taxize")
library(taxize)
spp <- downstream("Myosotis", db = "gbif", downto = "species", limit = 300)
head(spp$Myosotis)
#> name rank key name_type
#> 1 Myosotis abyssinica species 5660280 canonicalname
#> 2 Myosotis acutata species 5660278 canonicalname
#> 3 Myosotis afropalustris species 5661736 canonicalname
#> 4 Myosotis agnetis species 5661737 canonicalname
#> 5 Myosotis albomarginata species 5661641 canonicalname
#> 6 Myosotis albosericea species 5661639 canonicalname
Mapping gbif data
The user asked about usage of rgbif::gbifmap()
, and why so many points are removed during the mapping process. Most plotting stuff is beyond my knowledge since I don’t do much of it, so there is that caveat.
There is a function rgbif::gbifmap()
that helps make a simple map with the output of both rgbif::occ_search()
or rgbif::occ_data()
. It does make a plot but I strongly recommend using other tools.
rgbif::gbifmap()
makes a point plot, and before plotting removes impossible locations and missing lat/long data. It plots points with different colors for each taxon, so if you have more than maybe 9 or 10 taxa it becomes kinda silly to have a different color for each taxon because it’s hard to discern colors when there’s so many. Plus the color scale we use scale_color_brewer
has a max of 9 different colors.
splist <- c('Myosotis scorpioides','Myosotis arvensis','Myosotis laxa',
'Myosotis ramosissima','Myosotis sylvatica','Myosotis discolor',
'Myosotis stricta','Myosotis alpestris','Myosotis secunda',
'Myosotis nemorosa','Myosotis ramosissima','Myosotis decumbens',
'Myosotis australis','Myosotis lithuanica','Myosotis sparsiflora',
'Myosotis verna','Myosotis lamottiana','Myosotis baltica',
'Myosotis sicula','Myosotis latifolia','Myosotis stolonifera',
'Myosotis forsteri','Myosotis macrosperma','Myosotis pygmaea',
'Myosotis asiatica','Myosotis incrassata','Myosotis drucei',
'Myosotis welwitschii','Myosotis debilis','Myosotis pusilla')
keys <- vapply(splist, function(x) name_suggest(x)$key[1],
numeric(1), USE.NAMES=FALSE)
dat2 <- occ_data(keys)
dd <- data.table::rbindlist(lapply(dat2, function(z) z$data),
fill = TRUE, use.names = TRUE)
gbifmap(dd)
gives some warnings
Rendering map...plotting 9187 points
Warning messages:
1: In RColorBrewer::brewer.pal(n, pal) :
n too large, allowed maximum for palette Set1 is 9
Returning the palette you asked for with that many colors
2: Removed 5916 rows containing missing values (geom_point).
So geom_point
is removing some points, not sure what is going on there, if anyone can suggest a fix? Although perhaps someday we should remove gbifmap()
as mapr
is a better thing for mapping
Using the mapr
package we can make this work better:
library(mapr)
map_ggplot(dat2)
there’s also options for in mapr
for leaflet, github gists, and base plots