I hope this fits the category - If not, please feel free to move it or let me know and I can repost somewhere else.
I am looking for a way of programmatically get the references cited in a publication. My use scenario is as follow:
I would like to do a literature analysis based on the selected reviews. To do this, I would like to
get the list of articles cited by this review
of each article cited, get again the articles cited
Do this again (probably down to the 3rd or 4th level
I would use this information to try to identify clusters of literature based on their cited literature to identify “schools of thought” and relevant literature to a certain topic.
Is there a way of batch downloading the references cited, and similar, the articles citing a certain paper?
There’s a number of options with different data sources. One broad wrt data sources option is fulltext
Update to latest on github to get a fix i just made
devtools::install_github("ropensci/fulltext")
library(fulltext)
library(xml2)
# get some articles
(res <- ft_search(query='ecology', from='entrez', limit = 10))
# get full text for those
out <- ft_get(res)
# extract xml, then DOIs for each one
dois <- lapply(out$entrez$data$data, function(z) {
xml_text(xml_find_all(read_xml(z), "//ref//pub-id[@pub-id-type=\"doi\"]"))
})
# for one of the elements in `dois`
bb <- ft_get(x = dois[[1]], from = "entrez")
# get refs again
dois <- lapply(bb$entrez$data$data, function(z) {
xml_text(xml_find_all(read_xml(z), "//ref//pub-id[@pub-id-type=\"doi\"]"))
})
# and so on
Obviously need to make some tweaks to this for whatever your needs are