I’m a bit confused by the various packages offered to work with taxonomies. Mainly, what is the difference between taxizedb and taxadb? I get that taxize makes API calls and that’s why taxizedb was created to work with local data. What does taxadb offer on top of that? And is there anything in taxlist that I should be aware of that goes beyond this?
For my specific use case, I want to work with the NCBI taxonomy but also a custom taxonomy available in taxdump format. So the ability to read local files and create a taxonomy from that is important to me.
Hi @Midnighter!
That’s a great question because it is not very clear from the documentation what are the differences between these databases.
With some co-authors we actually wrote a paper (and it’s open-access) that reviews many of the packages that work with taxonomical data: https://doi.org/10.1111/2041-210X.13802
The difference between taxizedb and taxadb is in the way both access and give you back the format of the databases.
taxadb uses archived snapshots of the databases following strictly the DarwinCore (you can find more about it on the Data sources vignette). One special thing with taxadb is that all of the databases share the same format and so the returned objects are always of the same structure.
taxizedb on the other hand access SQL versions of the databases given by the providers. It accesses the last available versions and follows strictly their structure. Meaning that there is no common structure across all databases. It can give you more up-to-date versions of the data.
For your other questions I would point you to the paper I mentioned.
AFAIK there is no user-facing function in any of these packages to read data from NCBI taxdump format. @Rekyt, others please feel free to correct me if I’m wrong.
Thank you very much for the high-level overview and the links to useful resources. Although I work in a different field, the paper is still very useful to me.
Hi @Midnighter! While taxizedb cannot work with custom taxdump files at the moment, it may provide some clues for solving your issue. taxizedb::db_download_ncbi() downloads NCBI Taxonomy in taxdump format and converts it into SQL and then taxizedb::classification() interrogates this SQL file. I imagine these two functions could be used as inspiration to find a way to process your custom taxdump?
Hello @Midnighter and the rest of people discussing. I see, it is a hot topic and we are planing a social co-working and office hours about working with taxonomic lists.
Now talking about taxlist, the main task of this package is to offer a structure in R to contain taxonomic information and to define functions doing common data manipulation. The structure consider taxonomic ranks, synonyms and taxon attributes (e.g. functional traits, life forms, chorology, etc.).
Taxlist objects set rules (constraints) to preserve consistency in taxonomic information and enable the documentation of taxon views (i.e. a flora or database used to define taxonomic concepts).
Common functions produce sub-sets by taxonomic groups, count taxa per taxonomic ranks, calculate statistics for taxonomic attributes and there is also an effort to use it in Rmarkdown documents, for instance formatting taxon names and getting rid of typos.
In conclusion, taxlist is rather an option to consider after retrieving taxonomic lists from databases. Taxlist objects can be also connected to diversity records by using the package vegtable.