Taxonomic databases from R

sckott · March 17, 2016, 1:10am

taxize is a taxonomic toolbelt we work on - it gets data from open web APIs and does a variety of tasks

however, some use cases really want all the data!

web APIs are great, but they aren’t great when you want all the data, or just more data than the API can deliver in the time your want it. This is a relative thing - some APIs are quite fast, and some are slow. The slower it is, the sooner you hit the this is way to slow, get me out of here point, and you really just want a database dump

So taxizedb

covers: ITIS, Theplantlist, COL (to be covered: NCBI, maybe others)
downloads SQL databases
loads SQL DBs into MySQL or PostgreSQL
creates src objects that you can plug into dplyr for easy database queries/manipulation

Let me know what you think. I’m sure there will be

A bit of history: We’ve been trying to integrate use of locally stored SQL DBs in taxize for a while now, see https://github.com/ropensci/taxize/issues?q=label%3Asql+is%3Aclosed - but it just seems a bit intractable given the complexity of the package already, and then on top of that adding SQL dependency packages, and the fact that very few of the DBs we could replicate web API calls with

p.s., maybe a better name is appropriate

agus.camacho · March 24, 2016, 5:37pm

Hi, i keep trying to download records species by species using occ_data. I am still finding some difficulties that might be of general interest. Basically i often do not get data from the species that i am asking for, but of a different one. For example,

key <- name_backbone(name=“Xantusia arizonae”, kingdom=‘animals’)$speciesKey
if(is.null(key)==F){
r=occ_data(taxonKey=key, hasCoordinate=TRUE,
limit=limit,basisOfRecord=“PRESERVED_SPECIMEN”)
FF <- as.matrix(r$data[,cols])

Will give data from Xantusia vigilis. those have been synonims in the past but it seems clear that they are not since the last 15 years.

http://reptile-database.reptarium.cz/species?genus=Xantusia&species=arizonae

A similar thing happened for 54 of 178 species required. That difficults a bit the use of the data repository because often you have data per species that you want to associate to the distribution of the requested species. I also raises doubts of the validitiy of the downloaded records (Am I getting records from those species or from old considered synonims?). Is there any work around or initiative to mitigate this problem?

Cheers and thanks for your helpful work!
Agus

sckott · March 24, 2016, 5:42pm

can you please put this in a new issue here https://github.com/ropensci/rgbif/issues ? thanks!

Topic		Replies	Views
Confusion around taxonomy packages General Q&A r , taxonomy , package	7	372	February 23, 2023
Using taxadb to query taxonomic information in ecology projects UseCases r , package , taxadb	0	1485	April 27, 2020
Taxize with gbif backbone archive? General Q&A r , taxize , gbif	4	311	February 22, 2023
rOpenSci \| taxadb: A High-Performance Local Taxonomic Database Interface Blog	0	516	March 9, 2020
taxadb: A High-Performance Local Taxonomic Database Interface Blog taxonomy , software-peer-review , taxadb	0	572	February 13, 2020

Taxonomic databases from R

Related topics