Recommended R package (or tools) to facilitate possible search strings in systematic Literature Search (systematic literature review?)

Hello All!

I am new to R and would appreciate any kind of help regarding the following!

I am currently in the process of performing a systematic literature review and will be using databases: PubMed, Medline, CINAHL and Scopus. With a large amount of MeSh terms I would need assistance to ensure that all the possible search strings/terms have been exhausted. The OUTPUT I am after is a list of possible titles…even abstracts. An example of search strings would be: Down syndrome AND ionizing radiation AND Leukemia or another example: Li-Fraumeni syndrome AND ionizing radiation AND Brain tumour.

Thank you for your time and hoping to hear some positive feedback!

Cheers!

Maëlle

3 Likes

:wave: @maelle.canet Thanks for your question.

Are you set on these specific sources? Or have some flexibility in which sources you use?

What do you mean by “exhausted”?

In the end are you hoping to get the full texts of the articles, or are titles or abstracts enough?

Dear Scott, thanks for your quick reply!

Yes, these are the specific sources which we intend on using. By exhausted we mean that all the possible synonym combinations have been created :slight_smile: In the end if we could get abstracts that would be nice, in RIS format ideally.

Many thanks!!

Hello again!

Or for the OUTPUT, we would like to have all the info regarding a specific citation (title, authors, publication date and so on) + abstract . Also, to modify , any format which can be imported to ZOTERO as this is our reference manager. =)

data sources

here’s a breakdown of what I know for the data sources you’re using:

  • PubMed/Medline: as far as I know, Medline is the same as Pumbed (see the heading on this page PubMed). These are available in the fulltext package under the name entrez (NCBI’s name for their webservice that allows access to Pubmed/Medline)
  • CINAHL: there’s currently no R package that gives access to this. I’ve asked my university librarians about this.
  • Scopus: Available in the fulltext package

authentication

Pubmed is open, no authentication required.

Scopus on the other hand requires jumping through some hoops. From the ?fulltext-package manual page:

Scopus requires two things: an API key and your institution must have access. For the API key, go to Elsevier Developer Portal, register for an account, then when you’re in your account, create an API key. Pass in as variable key to scopusopts, or store your key under the name ELSEVIER_SCOPUS_KEY as an environment variable in .Renviron, and we’ll read it in for you. See ?Startup in R for help. For the institution access go to a browser and see if you have access to the journal(s) you want. If you don’t have access in a browser you probably won’t have access via this package. If you aren’t physically at your institution you will likely need to be on a VPN or similar so that your IP address is in the range that the two publishers are accepting for that institution.

searching

Best to start with searching, here using examples with entrez, but same applies for Scopus (but requires the authentication above):

res <- ft_search(query='ecology', from='entrez')
res
#> Query:
#>   [ecology]
#> Found:
#>   [PLoS: 0; BMC: 0; Crossref: 0; Entrez: 180481; arxiv: 0; biorxiv: 0; Europe PMC: 0; Scopus: 0; Microsoft: 0]
#> Returned:
#>   [PLoS: 0; BMC: 0; Crossref: 0; Entrez: 10; arxiv: 0; biorxiv: 0; Europe PMC: 0; Scopus: 0; Microsoft: 0]

You can index into the Entrez results to get a data.frame:

res$entrez
#> Query: [ecology]
#> Records found, returned: [180481, 10]
#> 
#>        uid     pubdate    epubdate printpubdate                          source volume issue     pages
#> 1  6783310  2019 Jul 9  2019 Jul 9   2019 Oct 1                         Ecology    100    10    e02794
#> 2  6783302  2019 Apr 3               2019 Apr 3                  Sci Transl Med     11   486  eaav0537
#> 3  6781247        2018                     2018             Environ Model Softw    109          93-103
#> 4  6781240        2017                     2017                 Estuaries Coast     41     2   404-420
#> 5  6781235        2018                     2018                   Hydrobiologia    818     1     71-86
#> 6  6781228        2018                     2018          J Am Water Works Assoc    110    11     64-68
#> 7  6773173 2000 Nov 15              2000 Nov 15                      J Neurosci     20    22 8533-8541
#> 8  6779586 2015 Dec 24 2015 Dec 24     2016 Feb J Exp Zool A Ecol Genet Physiol    325     2   106-115
#> 9  6778798  2019 Aug 8  2019 Aug 8                                G3 (Bethesda)      9    10 3181-3199
#> 10 6778791  2019 Aug 7  2019 Aug 7                                G3 (Bethesda)      9    10 3249-3262
#> Variables not shown: fulljournalname (chr), sortdate (chr), pmclivedate (chr), pmid (chr), doi (chr), pmcid (chr), mid
#>      (chr), title (chr), authors (chr)

Then you can go to ft_get:

# get articles, writes the XML files to your computer
out <- ft_get(res)
# ft_collect gathers and parses the XML and puts it in the output
out <- ft_collect(out)
# then access the XML full text, e.g., for 1 articles
out$entrez$data$data$`6783310`

You can use another package pubchunks to help pull out the parts of the articles you want from the XML, unless you are comfortable dealing with XML yourself.

There’s a fulltext function for abstracts specifically, but not for Entrez, see ?ft_abstract

citations

for citations you can use rcrossref

# pass in the DOIs from the previous search output
# you can request various citation formats, including bibtex
z <- cr_cn(res$entrez$data$doi, format = "bibtex")
z[[1]]
#> [1] "@article{Loreau_2019,\n\tdoi = {10.1002/ecy.2794},\n\turl = {https://doi.org/10.1002%2Fecy.2794},\n\tyear = 2019,\n\tmonth = {jul},\n\tpublisher = {Wiley},\n\tvolume = {100},\n\tnumber = {10},\n\tauthor = {Michel Loreau and Andy Hector},\n\ttitle = {Not even wrong: Comment by Loreau and Hector},\n\tjournal = {Ecology}\n}"
1 Like