Academic departments as networks: test case for package 'fulltext'

Hi, Thanks for re-posting here :smile:

Some comments on your Approach:

  1. you said get this list from Google. Is that Google Scholar? I think you could get this via the Crossref API
  2. Is the affiliation here for an article, or for the journal? For articles that we can get full text for, we can scrape metadata for affiliations. For some non-OA articles, perhaps this info may be in the metadata provided

To get journals that do ecology, you could try rcrossref, e.g.,

library("rcrossref")
out <- cr_journals(query = "ecology")
out$meta

  total_results search_terms start_index items_per_page
1           143      ecology           0             20

For author affiliation, using the PLOS API you can dig into that data easily. Here’s a blog post I did with an example: http://recology.info/2014/12/rplos-pubs-country/

With respect to fulltext, we are building a single function interface to searching for article metadata, see fulltext::ft_search(), which so far includes Crossref, anything available in rentrez, BMC, PLOS, and arXiv. More will be added.

For getting actual full text where available, see ft_get(), with wrappers so far for access to plos, bmc, rentrez, elife, with more in the works. Where only PDFs are available, see ft_extract(). We still have more work to do for the PDF workflow, since thats obviously a bit more complicated than if XML is provided. But if you get a chance to try these functions, we’d love the feedback!