Thanks for your question @arw36
Thanks for sharing @dlebauer
One solution is rcrossref
:
library(rcrossref)
library(data.table)
Define a function
species_cr_search <- function(x, ...) {
data.frame(
species = x,
matches = cr_works(query = x, limit = 0, ...)$meta$total_results,
stringsAsFactors = FALSE
)
}
A species list
spp <- c("Poa annua", "Helianthus annuus", "Abies magnifica")
Apply function across species
rbindlist(lapply(spp, species_cr_search))
#> species matches
#> 1: Poa annua 2425
#> 2: Helianthus annuus 3446
#> 3: Abies magnifica 4752
A cool thing about using Crossref is you can set lots of different filters, etc. Here, constrain to publications that have âecologyâ in their title
rbindlist(lapply(spp, species_cr_search, flq = c(`query.container-title` = 'ecology')))
#> species matches
#> 1: Poa annua 46
#> 2: Helianthus annuus 48
#> 3: Abies magnifica 356
A caveat about Crossref is that they only search text that they provide in their web services, which is authors, title, and in some cases abstract (http://api.crossref.org/works?filter=has-abstract:true&rows=0 shows about 824K papers) - that is, theyâre not searching full text of the papers
You could use Google Scholar but you have to jump through more hoops as they donât want people to programmatically scrape their data.
You could also use Scopus - e.g., Wrapping Elsevierâs Sciencedirect/Scopus API? but i donât know much about that.
One approach would be to create a pkg that can interface to many different sources and the user can choose.