A user asked via email how to take output of traits::ncbi_byname()
and make DNAbin
format objects.
DNAbin
is from the ape
package - there’s many things you can do with an object of class DNAbin
once you have it from what I understand.
We have 3 functions in the traits
package to search for NCBI sequence data
-
ncbi_byname()
- Retrieve gene sequences from NCBI by taxon name and gene names -
ncbi_byid()
- Retrieve gene sequences from NCBI by accession number -
ncbi_searcher()
- Search for gene sequences from NCBI by taxon names, taxonomic IDs, and more
Using the ncbi_byname()
function, we can do:
library(traits)
library(ape)
species <- c("Colletes similis","Halictus ligatus","Perdita californica")
out <- ncbi_byname(taxa=species, gene = c("coi", "co1"), seqrange = "1:2000")
bins <- lapply(out, function(z) {
mat <- t(as.matrix(strsplit(tolower(z$sequence), "")[[1]]))
rownames(mat) <- z$taxon
as.DNAbin(mat)
})
Look at one of them
bins[[1]]
#> 1 DNA sequence in binary format stored in a matrix.
#>
#> Sequence length: 658
#>
#> Label:
#> Colletes similis
#>
#> Base composition:
#> a c g t
#> 0.323 0.113 0.108 0.457
To write to a file you could do:
write.dna(bins[[1]], (f <- tempfile()))
The above makes a separate DNAbin
object for each sequence. From what I understand i think we’d have to have the sequences all the same length to make a single DNAbin
object with all sequences, but am I wrong on that?
Better ways to do this?