Rentrez - problem using the web_history object

JC_NZ · January 11, 2016, 9:40pm

This is probably a question for an NCBI E-utils person rather then the Rentrez wrapper writer or this community but I’m struggling to find enough info.

I’m trying to assemble a set of metadata for sequences deposited in Nuccore with Feature:Source /country=New Zealand, starting with the big list of IDs returned using a general query for ‘New Zealand’ (as it doesn’t seem possible to query the feature table directly). For that I need use_history=TRUE and step through the large set of results using the list stored on the history server. However, if I subsequently use the web_history object I don’t get the expected return data set.

To simplify, here is a E-util web query which returns a single nuccore record …
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccore&term=new+zealand+AND+ddbj_embl_genbank[filter]+AND+gerhardtia+pseudosaponacea[Organism]

If I use Rentrez and the history server then something like …

seq_NZ <- entrez_search(db = "nuccore", term=
                          "new+zealand AND ddbj_embl_genbank[filter] AND gerhardtia+pseudosaponacea[Organism]"
                        , retmax=0, use_history=TRUE)

Then if I use the web history object to fetch the record…

seqrecs <- entrez_fetch(db="nuccore", web_history=seq_NZ$web_history,rettype="xml", retmax=1, parsed=TRUE)

I get back what looks like the associated popset of records and not the single record (for the single queried ID) I was expecting. What silly simple thing am I doing wrong?

sckott · January 11, 2016, 10:00pm

(Hope you don’t mind, I did a small edit to put code in code blocks. See Welcome to rOpenSci Discuss for some discussion of markdown)

paging @dwinter

dwinter · January 11, 2016, 10:51pm

Kia ora JC!

This does seem like something happening on the NCBI’s end. Playing around a little, it also effects “normal” (that is, non web-history queries). I don’t think sending one ID and getting back 8 records is an expected behaviour, so it’s worth letting them know about it.

Depending on what you want to get from the records I can suggest at least one workaround for now. If you download records in “gbc” format, which is a XML-ficaton of genbank you do get one record per ID. You can’t return a parsed object (for now, I’ll start an issue for this and other cases where XML records are not called XML), but it’s easy enough to create one and retrieve information from it:

seqrecs_gb_xml <- entrez_fetch(db="nuccore", web_history = seq_NZ$web_history,rettype="gbc")
parsed <- XML::xmlTreeParse(seqrecs_gb_xml, useInternalNodes=TRUE)
parsed["//INSDSeq_taxonomy"]

[[1]]
<INSDSeq_taxonomy>Eukaryota; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Agaricales; Lyophyllaceae; Gerhardtia</INSDSeq_taxonomy> 

attr(,"class")
[1] "XMLNodeSet"

Hope that helps, and let me know if you have other questions.

David

EDIT to add a link to the issue regarding parsing differently named XML files on the fly: Rentrez - problem using the web_history object

JC_NZ · January 12, 2016, 1:07am

Great response. Thank you. I will have a play!

Topic		Replies	Views
Anyone having problems with rentrez's latest release? Package Use Questions	3	1102	February 19, 2015
rentrez release coming up Package Use Questions	0	834	May 31, 2017
retrive a list of ID with rentrez Package Use Questions r , package	1	679	September 13, 2018
A rentrez paper, and how to use the NCBI's new API keys Blog r , literature , community , ncbi , data-access	0	696	March 20, 2018
restez: Query GenBank locally Blog package , ncbi , data-access , software-peer-review	0	579	December 3, 2018

Rentrez - problem using the web_history object

Related topics