Literature - Get references cited in a publication and Papers citing this publication?

Hi

I hope this fits the category - If not, please feel free to move it or let me know and I can repost somewhere else.

I am looking for a way of programmatically get the references cited in a publication. My use scenario is as follow:

I would like to do a literature analysis based on the selected reviews. To do this, I would like to

  1. get the list of articles cited by this review
  2. of each article cited, get again the articles cited
  3. Do this again (probably down to the 3rd or 4th level

I would use this information to try to identify clusters of literature based on their cited literature to identify “schools of thought” and relevant literature to a certain topic.

Is there a way of batch downloading the references cited, and similar, the articles citing a certain paper?

Thanks,

Rainer

1 Like

This is a great fit, thanks for your question @rkrug

Do you care about where you get your data from? Publishers? Databases?

Are we talking millions of things to search on, or a few hundred or so?

First, citations to a focal paper X are very hard to get. Second, references of the same focal paper X are easier to get.

@sckott Thanks for the thumbs up and your comments.

No - I don’t actually care where I get the data from, as long as it is reliable.

First Level would be perhaps in the hundreds - second and third level obviously much more.

OK - let’s focus on the references of the focal paper X - the ones in Paper X. This would be the better approach, n my opinion, anyway.

Any suggestions how to achieve this workflow?

There’s a number of options with different data sources. One broad wrt data sources option is fulltext

Update to latest on github to get a fix i just made

devtools::install_github("ropensci/fulltext")
library(fulltext)
library(xml2)

# get some articles
(res <- ft_search(query='ecology', from='entrez', limit = 10))
# get full text for those
out <- ft_get(res)
# extract xml, then DOIs for each one
dois <- lapply(out$entrez$data$data, function(z) { 
  xml_text(xml_find_all(read_xml(z), "//ref//pub-id[@pub-id-type=\"doi\"]"))
})
# for one of the elements in `dois`
bb <- ft_get(x = dois[[1]], from = "entrez")
# get refs again
dois <- lapply(bb$entrez$data$data, function(z) { 
  xml_text(xml_find_all(read_xml(z), "//ref//pub-id[@pub-id-type=\"doi\"]"))
})
# and so on

Obviously need to make some tweaks to this for whatever your needs are