Possible new pkg idea: publication bias

Thanks for your question @arw36

Thanks for sharing @dlebauer

One solution is rcrossref:

library(rcrossref)
library(data.table)

Define a function

species_cr_search <- function(x, ...) {
  data.frame(
    species = x, 
    matches = cr_works(query = x, limit = 0, ...)$meta$total_results,
    stringsAsFactors = FALSE
  )
}

A species list

spp <- c("Poa annua", "Helianthus annuus", "Abies magnifica")

Apply function across species

rbindlist(lapply(spp, species_cr_search))
#>              species matches
#> 1:         Poa annua    2425
#> 2: Helianthus annuus    3446
#> 3:   Abies magnifica    4752

A cool thing about using Crossref is you can set lots of different filters, etc. Here, constrain to publications that have “ecology” in their title

rbindlist(lapply(spp, species_cr_search, flq = c(`query.container-title` = 'ecology')))
#>              species matches
#> 1:         Poa annua      46
#> 2: Helianthus annuus      48
#> 3:   Abies magnifica     356

A caveat about Crossref is that they only search text that they provide in their web services, which is authors, title, and in some cases abstract (http://api.crossref.org/works?filter=has-abstract:true&rows=0 shows about 824K papers) - that is, they’re not searching full text of the papers


You could use Google Scholar but you have to jump through more hoops as they don’t want people to programmatically scrape their data.

You could also use Scopus - e.g., Wrapping Elsevier’s Sciencedirect/Scopus API? but i don’t know much about that.


One approach would be to create a pkg that can interface to many different sources and the user can choose.

3 Likes