Academic departments as networks: test case for package 'fulltext'


#1

Hello all,

Brief intro since Scott just invited me into this group. I’m a PhD student at Stony Brook University studying plant-animal interactions. My computational work harnesses high performance R, computer vision and network analysis (website).

Use Case Aim

Using tools from network ecology, we can begin to examine patterns of collaboration, specialization and compartmentalization within academic departments. Using journal publications and category as a metric of similarity among faculty members, we can compare the relative specialization of each department and compare academic niche breadth as a function of size, location and other measures of group interactions.

#Approach

  1. Get list of journals with subheadings and disciplines from google to create similarity lists.
  2. Get names of top academic departments in the US with a program in Ecology and Evolution.
  3. Search the pubmed (scopus?) archives for journal title and affiliation.
  4. Decompose API results into R metadata.
  5. Perform network analysis on academic departments as a function of similarity in journal publications.

I made a markdown document outlining the project on github

I’d love to hear how ROpenSci can help approach these questions with the upcoming fulltext package.


#2

Hi, Thanks for re-posting here :smile:

Some comments on your Approach:

  1. you said get this list from Google. Is that Google Scholar? I think you could get this via the Crossref API
  2. Is the affiliation here for an article, or for the journal? For articles that we can get full text for, we can scrape metadata for affiliations. For some non-OA articles, perhaps this info may be in the metadata provided

To get journals that do ecology, you could try rcrossref, e.g.,

library("rcrossref")
out <- cr_journals(query = "ecology")
out$meta

  total_results search_terms start_index items_per_page
1           143      ecology           0             20

For author affiliation, using the PLOS API you can dig into that data easily. Here’s a blog post I did with an example: http://recology.info/2014/12/rplos-pubs-country/

With respect to fulltext, we are building a single function interface to searching for article metadata, see fulltext::ft_search(), which so far includes Crossref, anything available in rentrez, BMC, PLOS, and arXiv. More will be added.

For getting actual full text where available, see ft_get(), with wrappers so far for access to plos, bmc, rentrez, elife, with more in the works. Where only PDFs are available, see ft_extract(). We still have more work to do for the PDF workflow, since thats obviously a bit more complicated than if XML is provided. But if you get a chance to try these functions, we’d love the feedback!


I can haz text mining in R
#3

Hi Scott,

Thanks for the thoughts!

  1. On the journal description, i’ve been using the google classification, see here under section 3 and the link there as well.

That way i code do a principal components and plot a biplot to see the descriptor loading on each department. Just a thought so far.

  1. The affiliation i mean for the author, thanks for the link! I’ll have to look into PLOS. I was surprised to see that pubmed only keeps author affiliation for the lead author.

I’ll definitely try the fulltext functions and let you know!


#4

Thanks for the info. Great, do let us know how your fulltext experience goes


#5

A couple very small thoughts so far.

Installation went well, but since i had not previously used rplos, i needed to go get an api key. This could be mentioned on the fulltext readme in the future, not on the rplos dependency?

It looks like the plos api login is down for the moment top right corner here, so i haven’t been able to follow your example yet.

Right now looking for the guide on crossref to see if i can query specific fields, such as author == or affiliation ==, rather than just a text query that may match any field. I think that connects to the field under ‘filter names’ here.

Still working through everything, but it all looks really promising.


#6

Thanks for the feedback @bw4sz .

Good point about the PLOS API key, Will make that more clear. Sorry sign in isn’t working. In the meantime, put any string in, it should work, but do get a key when it works again.

With respect to Crossref: Do you want to query on a specific field? Or get back specific fields? It sounds like you want the former. If so, that’s not supported yet, see https://github.com/CrossRef/rest-api-doc/issues/3