R package to host WebPlotDigitizer locally and allow live interaction with data?

Hello rOpenSci,

While I was in graduate school, I had developed a web-based tool for extracting numerical data from images of plots called WebPlotDigitizer (http://arohatgi.info/WebPlotDigitizer). I have continued to develop this in my spare time and it is now a fairly popular tool with over 1000 hits per day and has been mentioned in many published works, blogs, etc. The Google Chrome App (which is really just a link to the URL that loads the app) shows a little over 9000 users and that accounts for only a fraction of total hits.

Over time, many users have shown interest in an offline version and I have tried to cobble together something for Windows users (Embedded Chrome + PHP from PHP Desktop project) but it does not do it enough justice. Another common request is to be able to work with all the datasets together and even perform simple operations or analysis on the digitized data. Some users also want to be able to go through 100s of images that might have been plotted on the same axes.

I was thinking that many of the above requests can be partially addressed if there was an R package that could locally host the HTML5 code (Rhttpd or Rook maybe?) and also let the users access the image and digitized data in real-time via Websocket communication (httpuv maybe?).

The WPD code is quite simple (vanilla HTML5) with all the image processing done on the user’s side and not on the server (but it does require to be hosted on a HTTP server mainly to satisfy some same-origin policies in HTML5). It also has a couple of very simple (easily replaceable) PHP scripts which helps users download the data as CSV or JSON files.

So I am thinking of a package with an API that goes something like the following (assuming appropriate support in WPD code):

library('WPD-R') # Load the package

wpd.open('plot.png') # Opens WPD in the browser with plot.png loaded

datasets <- wpd.getDatasets() # Fetch digitized data as a list of data frames

wpd.loadJSON('myjson.json') # Load a JSON file into WPD that contains calibration, dataset info etc.

wpd.loadMask('mask.png') # Specify mask to specify region of interest.

# and so on...

I wanted to get some feedback on this to see if this would be something that is genuinely useful to the R community. I was also wondering if rOpenSci could offer any development time, funding or even just some general guidance to help create such a package. I work on WPD only in my spare hours, so I would really appreciate any kind of help. Also since I am a very new R user, I am not too familiar with all the packages that are available so even initial pointers would be nice. I have hacked around a bit with httpuv and so I know that it is certainly possible to do this. I do have some experience building Websockets based apps in other programming environments.

Besides plot digitization, WPD can also do some basic distance and angle measurements that can be used for microscope images. As a stretch goal, I would like to see this develop into something that is used for scientific image analysis in general (Like a mini ImageJ perhaps?), but that would require many man-hours of work.

Thanks,
Ankit

1 Like

This looks like a great tool. I personally think this would be great to have programmatic access to a digitization tool in R.

We can definitely provide some guidance. If you/others do work on a package, and if you want to submit to our suite, you can do so through GitHub - ropensci/software-review: rOpenSci Software Peer Review., an all github based onboarding/review process.

p.s. there is one other R package I know of that does data extraction GitHub - tpoisot/digitize: A R package to extract data from scatterplots

Hello Scott,

Thank you for the feedback. I will take a look at the on-boarding documents and start putting together some initial code. Is there a place where I can ask technical questions (e.g. Use Rook or Rhttpd directly)?

Also, is the digitization package that you have pointed to still in development? It appears that it has been neglected for quite some time.

Cheers,
Ankit

You can ask here - we do have a growing community on this forum. Though SO Newest 'r' Questions - Stack Overflow is probably the first place to look for answers since there are almost 90K R questions there.

I think digitize is not actively developed anymore. E.g., it is archived on CRAN Index of /src/contrib/Archive/digitize

Thanks Scott.

I have started hacking around with some code here: https://github.com/ankitrohatgi/wpd-R

At the moment, it seems like just httpuv might have everything that I need. I am starting with just a simple R script for now. Once I get the existing app working (including the CSV and JSON exports), I will start turning this into a package. I will have to add some code for Websocket communication into WPD as well.

Cheers,
Ankit

Great, looking good already. Ping us if you need any help.

Scott

I made some decent progress and my little script does a pretty good job of hosting the app and is also able to handle all the things I was using PHP for (POST requests for generating files for download). The next step would be to get the WebSocket communication going.

I did run into a small issue which I couldn’t find a clear answer to by just Googling:

The app sends CSV or JSON data for download via a POST message to the server which is picked up by the httpuv app’s call method. This finally responds with an attachment content type header resulting in a file download. When decoding the POST data, I noticed that only Rook’s Utils$unescape seems to get it right. decodeURIComponent was replacing spaces with a + sign for some reason. So now my script depends on both httpuv and Rook - I would really like to cut down on the dependencies and so I was wondering if there’s something better I can do. I don’t mind manually regexing out the + if that is the best way to handle this. Has anyone here run into this problem? Is there some resource that I can read that explains how I should handle the POST data?

Also, I was wondering if there’s any embedded browser control that I can use with R. Kicking the browser open works well, but it would be nice if I can just open a simple window (like the plot window maybe). An embedded Chrome/WebKit based widget would be fantastic!

I think I am at a good point to start reading about how R packages are created and documented. I have ordered a copy of Hadley Wickham’s book as I realized I was spending way too much time googling questions that are all answered in the book. Looking at this weeks progress, I think I can get a decent package out soon.

There’s also:

  • URLdecode() in base R
  • RCurl::curlUnescape()
  • curl::curl_unescape()

Within RStudio there is the Viewer tab in which you can open interactive views of tables/plots/etc.

That’s great. Package development is far easier now with Hadley’s tools

Thanks! I tried URLdecode(), but not the others. I will try it out this weekend.

1 Like

Here is my first attempt at a working package: https://github.com/ankitrohatgi/digitizeR

I decided to call it digitizeR. At the moment, it only launches a fully functional WPD (with backend support). Real-time interaction from R is something I can work on next.

EDIT: I need to clean up the documentation, description etc. But this should work otherwise.

Looks great @ankitrohatgi

Tried it out, Works great!