taxize is a taxonomic toolbelt we work on - it gets data from open web APIs and does a variety of tasks
however, some use cases really want all the data!
web APIs are great, but they aren’t great when you want all the data, or just more data than the API can deliver in the time your want it. This is a relative thing - some APIs are quite fast, and some are slow. The slower it is, the sooner you hit the this is way to slow, get me out of here point, and you really just want a database dump
covers: ITIS, Theplantlist, COL (to be covered: NCBI, maybe others)
downloads SQL databases
loads SQL DBs into MySQL or PostgreSQL
creates src objects that you can plug into dplyr for easy database queries/manipulation
Let me know what you think. I’m sure there will be
A bit of history: We’ve been trying to integrate use of locally stored SQL DBs in taxize for a while now, see https://github.com/ropensci/taxize/issues?q=label%3Asql+is%3Aclosed - but it just seems a bit intractable given the complexity of the package already, and then on top of that adding SQL dependency packages, and the fact that very few of the DBs we could replicate web API calls with
Hi, i keep trying to download records species by species using occ_data. I am still finding some difficulties that might be of general interest. Basically i often do not get data from the species that i am asking for, but of a different one. For example,
A similar thing happened for 54 of 178 species required. That difficults a bit the use of the data repository because often you have data per species that you want to associate to the distribution of the requested species. I also raises doubts of the validitiy of the downloaded records (Am I getting records from those species or from old considered synonims?). Is there any work around or initiative to mitigate this problem?