retrive a list of ID with rentrez

package
r
Tags: #<Tag:0x00007f57f6241a08> #<Tag:0x00007f57f62418c8>

#1

Hello,
I’m trying to retrieve a list of sequences from a list of ID with entrez.
My input file looks like this:

query
    1 CBS 119047 AND Botryosphaeriaceae[ORGN]
    2 CBS 119048 AND Botryosphaeriaceae[ORGN]
    3 CBS 119935 AND Botryosphaeriaceae[ORGN]
    4 CBS 113190 AND Botryosphaeriaceae[ORGN]

And my function is this one :

id = list()
for (i in 1:nrow(dataNCBI)) {
  test2 <- entrez_search(db="nucleotide", term = dataNCBI$query[i], retmax= 40)
  ids5 <- data.frame(dataNCBI$query[i],test2$ids)
  id[[i]] <- ids5
  big_data2 = do.call(rbind, id)
}

And my output file:

                 dataNCBI.query.i. test2.ids

97 CBS 119048 AND Botryosphaeriaceae[ORGN] 14279559
98 CBS 119048 AND Botryosphaeriaceae[ORGN] 14279558

So far soo good, However my code only works if my query have a result. When it reach the first value without a result it stops. I would like to create a loop to avoid this problem and obtain something like this:

                 dataNCBI.query.i. test2.ids

97 CBS 119048 AND Botryosphaeriaceae[ORGN] 14279559
98 CBS 119048 AND Botryosphaeriaceae[ORGN] 14279558
99 CBS 113190 AND Botryosphaeriaceae[ORGN] No itens found.
100 CBS 116741 AND Botryosphaeriaceae[ORGN] 51094092

Any tips how to solve this problem?
Best regards,
Eduardo


#2

Thanks for your question @Batis007

In your for loop you can check for whether there are results or not. e.g,. using your for loop above

for (i in 1:nrow(dataNCBI)) {
  test2 <- entrez_search(db="nucleotide", term = dataNCBI$query[i], retmax= 40)
  ids <- if (test2$count == 0) NA else test2$ids
  ids5 <- data.frame(dataNCBI$query[i], ids)
  id[[i]] <- ids5
  big_data2 = do.call(rbind, id)
}

So you can add NA or whatever you want to put in there when there are no results found.