taxadb: A High-Performance Local Taxonomic Database Interface

Author: Kari Norman

"Dealing with taxonomic inconsistencies within and across datasets is a fundamental challenge of ecology and evolutionary biology. Accounting for species synonyms, taxa splitting and unification is especially important as aggregation of data across time and different data sources becomes increasingly common. One potentially powerful approach for addressing these issues is to resolve scientific names to taxonomic identifiers that follow a consistent taxonomic concept. In such a workflow, data from one of the many taxonomic providers (e.g. Integrated Taxonomic Information System, Catalogue of Life, National Center for Biological Information is integrated with biodiversity datasets to identify an accepted ID for each name. Multiple tools exist to facilitate this workflow, including R’s taxize package, which provides an API interface to taxonomic databases. However, due to the nature of API queries which are slow, limited in scope, and dependent on the current state of the database, it remains difficult to resolve names to a taxonomic authority in quick, reproducible way. taxadb seeks to address these issues using a new approach for interfacing with taxonomic data via a local database of taxonomic providers.

The goal of this post is to illustrate the ease with which taxadb can be integrated into existing data munging workflows, as well as give a taste for the variety of other exploratory question that are facilitated by the database backend infrastructure."

Read the full post: