Hi, Thanks for re-posting here
Some comments on your Approach:
- you said get this list from Google. Is that Google Scholar? I think you could get this via the Crossref API
- Is the affiliation here for an article, or for the journal? For articles that we can get full text for, we can scrape metadata for affiliations. For some non-OA articles, perhaps this info may be in the metadata provided
To get journals that do ecology, you could try rcrossref
, e.g.,
library("rcrossref")
out <- cr_journals(query = "ecology")
out$meta
total_results search_terms start_index items_per_page
1 143 ecology 0 20
For author affiliation, using the PLOS API you can dig into that data easily. Here’s a blog post I did with an example: http://recology.info/2014/12/rplos-pubs-country/
With respect to fulltext
, we are building a single function interface to searching for article metadata, see fulltext::ft_search()
, which so far includes Crossref, anything available in rentrez
, BMC, PLOS, and arXiv. More will be added.
For getting actual full text where available, see ft_get()
, with wrappers so far for access to plos, bmc, rentrez, elife, with more in the works. Where only PDFs are available, see ft_extract()
. We still have more work to do for the PDF workflow, since thats obviously a bit more complicated than if XML is provided. But if you get a chance to try these functions, we’d love the feedback!