Hi Chris and Scott,
It’s great news Chris that you are working on an R package for the USPTO data. In terms of the use cases, there is the issue of access to patent data by patent offices and researchers from universities in developing countries who do not have the resources to pay the high costs of commercial databases. In the patent analytics training for patent offices and researchers in developing countries that WIPO is running (and I teach on) we have been teaching introductory analytics including R and ropensci packages (mainly rplos at the moment as an easy intro). The USPTO has recently moved to an open access service in JSON format and I assume Chris this is what you have been working with?
In addition to the uses by patent offices and researchers in developing countries the bibliometrics and scientometrics community makes a lot of use of patent data ranging from basic technology trends, to text and technology mining, policy analysis and econometrics. Top journals in that area are Scientometrics and Research Policy. So, in my view Scott there is a large research community out there who want to work with patent data but typically can’t afford it except in quite limited form. In my own work at Manchester University with the scientometrics team at the business school I am working on pushing R as a means for easier access to patent data and for wrangling patent data for a range of analytics purposes.
I would mention that I have also been experimenting with creating some R packages for patent data (opsr for the European Patent Office
Open Patent Services API) and the Lens patent database (which advocates open source and open access). I am presently purrring my way through the eternal nested lists of European Patent data with a view to eventual submission to ropensci and am also some way along with the lensr package. In the case of WIPO training for developing countries we use an early dev version of the opsr package to introduce the idea and will also shortly start demoing the lensr package. If a USPTO data package becomes available it is pretty much a racing certainty it would be used in the analytics training around the world.
The work the USPTO has been doing for the API is pretty cool… it seems well formatted (compared with the other data sources) and includes lots of possibilities for things like mapping inventor locations with leaflet etc… or digging into the literature citations in patents and linking across to other APIs such as crossref, or text mining with tidy text mining and so on etc. The patents view API is the main point of interest at the moment.
Apologies for being long winded but the key point is that there are a number of different communities who would I think make use of a package to access USPTO data in R. My own view on the rOpenSci or rOpenGov side of things is that patent data is fundamentally about science and technology and the exciting aspects of patent data are what it has to tell us about trends and developments in science and technology. In my case, that involves linking biology and patent data (e.g. taxize and rgbif and rcrossreft etc). That however, is just my take on things. So, I would like to support Chris in this exciting idea.
Text mining patents for biodiversity
opsrdev pre-purrr approach to the eternal nested lists
lensr to access the Lens patent database (prior to brand new Release 4.2.1)