neotoma & taxize - resolve taxon names from the Neotoma Paleoecological Database

SimonGoring · October 1, 2016, 6:55pm

I wanted to get full taxonomic resolution for Neotoma named taxa from ITIS.

library(neotoma)
library(taxize)

The big taxonomy table in Neotoma is organized in a hierarchical fashion, but often based (for the plant taxa in particular) on a morphological hierarchy, rather than a taxonomic/phylogenetic hierarchy.

neotoma_taxa <- neotoma::get_table("taxa")

This then gives us the full taxonomic table in Neotoma, but there’s some weird taxonomy names in there, especially given the number of cf. taxa or undifferentiated types (identified as undiff.). I use a straightforward regular expression replacement to get rid of most of them. As with most things in Neotoma, 80% of the data is fairly straightforward, 10% can be dealt with with minor fixes, and then 10% is really “special case” data. The regular expression deals with 10% of the data.

get_class <- function(x) {
  
  # This just clears up some of the "uncertainty" fields in the taxon names.
  # This doesn't catch things like "sensu stricto" and others.
  taxa <- gsub("(\\?|\\-type|cf\\.\\s|aff\\.|\\sundiff\\.)", "", x, perl=TRUE)
  
  taxize::classification(taxa, db="itis", rows = 1)
  
}

Then this is all looped:

all_taxa_list <- list()

for (i in (i-1):nrow(neotoma_taxa)) {
  
  cat(paste0(neotoma_taxa$TaxaGroupID[i], ": ", neotoma_taxa$TaxonName[i], 
             ' - ', round(i/nrow(neotoma_taxa) * 100, 2), '% complete . . . '))
             
  all_taxa_list[[i]] <- list(neotoma_taxa[i,],
                             suppressMessages(get_class(neotoma_taxa$TaxonName[i])))
  
  if (!is.na(all_taxa_list[[i]][[2]])) {
    cat('Success!\n')
  } else {
    cat('Ugh. :(\n')
  }
  
  # Output throughout the run, it's slower, but then we don't run into issues later.
  saveRDS(all_taxa_list, file = "all_taxa.RDS")
}

Suggestions are appreciated, but it works fine

It’s looped, instead of vectorizing the input to classification because there are so many taxa being passed in. I didn’t want a time-out to crash things unrecoverably.

Topic		Replies	Views
Taxize: Get rank of lowest common taxon Package Use Questions	35	4957	October 4, 2017
rOpenSci \| taxadb: A High-Performance Local Taxonomic Database Interface Blog	0	516	March 9, 2020
Taxonomic databases from R Package Use Questions sql , taxize , taxonomy	2	1713	March 24, 2016
taxadb: A High-Performance Local Taxonomic Database Interface Blog taxonomy , software-peer-review , taxadb	0	572	February 13, 2020
Taxize v0.6 is on CRAN Package Use Questions	0	1288	June 19, 2015

neotoma & taxize - resolve taxon names from the Neotoma Paleoecological Database

Related topics