I’m a bit confused by the various packages offered to work with taxonomies. Mainly, what is the difference between
taxadb? I get that
taxize makes API calls and that’s why
taxizedb was created to work with local data. What does
taxadb offer on top of that? And is there anything in
taxlist that I should be aware of that goes beyond this?
For my specific use case, I want to work with the NCBI taxonomy but also a custom taxonomy available in taxdump format. So the ability to read local files and create a taxonomy from that is important to me.
That’s a great question because it is not very clear from the documentation what are the differences between these databases.
With some co-authors we actually wrote a paper (and it’s open-access) that reviews many of the packages that work with taxonomical data: https://doi.org/10.1111/2041-210X.13802
The difference between
taxadb is in the way both access and give you back the format of the databases.
taxadb uses archived snapshots of the databases following strictly the DarwinCore (you can find more about it on the Data sources vignette. One special thing with
taxadb is that all of the databases share the same format and so the returned objects are always of the same structure.
taxizedb on the other hand access SQL versions of the databases given by the providers. It accesses the last available versions and follows strictly their structure. Meaning that there is no common structure across all databases. It can give you more up-to-date versions of the data.
For your other questions I would point you to the paper I mentioned.
AFAIK there is no user-facing function in any of these packages to read data from NCBI taxdump format. @Rekyt, others please feel free to correct me if I’m wrong.
Thank you very much for the high-level overview and the links to useful resources. Although I work in a different field, the paper is still very useful to me.