Brief intro since Scott just invited me into this group. I’m a PhD student at Stony Brook University studying plant-animal interactions. My computational work harnesses high performance R, computer vision and network analysis (website).
Use Case Aim
Using tools from network ecology, we can begin to examine patterns of collaboration, specialization and compartmentalization within academic departments. Using journal publications and category as a metric of similarity among faculty members, we can compare the relative specialization of each department and compare academic niche breadth as a function of size, location and other measures of group interactions.
#Approach
Get list of journals with subheadings and disciplines from google to create similarity lists.
Get names of top academic departments in the US with a program in Ecology and Evolution.
Search the pubmed (scopus?) archives for journal title and affiliation.
Decompose API results into R metadata.
Perform network analysis on academic departments as a function of similarity in journal publications.
you said get this list from Google. Is that Google Scholar? I think you could get this via the Crossref API
Is the affiliation here for an article, or for the journal? For articles that we can get full text for, we can scrape metadata for affiliations. For some non-OA articles, perhaps this info may be in the metadata provided
To get journals that do ecology, you could try rcrossref, e.g.,
With respect to fulltext, we are building a single function interface to searching for article metadata, see fulltext::ft_search(), which so far includes Crossref, anything available in rentrez, BMC, PLOS, and arXiv. More will be added.
For getting actual full text where available, see ft_get(), with wrappers so far for access to plos, bmc, rentrez, elife, with more in the works. Where only PDFs are available, see ft_extract(). We still have more work to do for the PDF workflow, since thats obviously a bit more complicated than if XML is provided. But if you get a chance to try these functions, we’d love the feedback!
That way i code do a principal components and plot a biplot to see the descriptor loading on each department. Just a thought so far.
The affiliation i mean for the author, thanks for the link! I’ll have to look into PLOS. I was surprised to see that pubmed only keeps author affiliation for the lead author.
I’ll definitely try the fulltext functions and let you know!
Installation went well, but since i had not previously used rplos, i needed to go get an api key. This could be mentioned on the fulltext readme in the future, not on the rplos dependency?
It looks like the plos api login is down for the moment top right corner here, so i haven’t been able to follow your example yet.
Right now looking for the guide on crossref to see if i can query specific fields, such as author == or affiliation ==, rather than just a text query that may match any field. I think that connects to the field under ‘filter names’ here.
Still working through everything, but it all looks really promising.
Good point about the PLOS API key, Will make that more clear. Sorry sign in isn’t working. In the meantime, put any string in, it should work, but do get a key when it works again.
With respect to Crossref: Do you want to query on a specific field? Or get back specific fields? It sounds like you want the former. If so, that’s not supported yet, see https://github.com/CrossRef/rest-api-doc/issues/3