Google Drive/Docs Rmarkdown


#1

I just found this repo which contains a script to convert Google Docs to markdown, and it’s got me wondering whether Google Docs could be used somehow to go about Rmarkdown document editing differently, i.e. upstream of the existing RStudio/Rmarkdown framework.

The repo contains a Google Apps script (essentially Javascript), but there’re currently a few bugs in it - italicised/bold links don’t convert properly, something like an off-by-one error, but the debugging is pretty straightforward.

For those who follow it, the Jupyter (née IPython) project is working on Google Drive integration - see the jupyter/jupyter-drive repository for details.

Google Drive is potentially a far better environment to draft research documents in than starting directly in Rmarkdown from a text editor/Rstudio - keyboard shortcut simplicity for hyperlink entry (Ctrl+K), etc. and the research panel, and more sophisticated features like change tracking.

When run, the script emails markdown as an attachment to the Gmail inbox associated with the Google account logged into Drive. Ideally, you could get the markdown output written to Drive instead, or failing that retrieve the attachment with twitteR (GitHub: geoffjentry/twitteR).

You could potentially then have a setup to preview changes to documents without actually having to edit Rmarkdown — the reproducible document would still be available.

One of my main complaints (and presumably other users’ experience) in writing in Rmarkdown is the cognitive overhead of looking at code rather than a WYSIWYG document. Basically, it could allow us to write more effectively, and also collaboratively (Google Docs are excellent for collaboration, they’re used for real time shared document editing at academic events sometimes).

I’m aware Microsoft Word, OpenOffice etc. have desktop publishing covered but Google Docs is a nice cross-platform, accessible and (currently) free program so could be worth considering or developing with.

I’d love to hear others’ thoughts on this :slight_smile:


Directions for rchie package / Google Docs
#2

Woops, I meant gmailr there not twitteR !

To update this thread, I looked through every fork of the above repo — only one seems to have fixed the basic functionality of italic/bold style handling, which I’ve now forked as gdocs2Rmd. Feel free to open an issue.


#3

Hi Louis,
I like the idea in general. But I’m having a bit of trouble following the workflow and the steps involved. Can you clarify if I understand this correctly.

Users collaborate over a WYSIWYG Google doc
Then we convert to md using gdocs2Rmd
This ends up getting pulled into a R session where it gets parsed.

Correct so far? Now what happens if I edit the document in R (say I’m fixing issues with code). Is there a way to send it back to the original doc, or do I send a rendered doc back to Google drive (like having a source and rendered file)?

Perhaps this could be integrated with Oliver Keyes’ Google drive package such that someone just calls gdocs2Rmd, which then programmatically retrieves the Google doc, converts to Rmd, parses Rmd, then pushes rendered doc back to the drive? I’m just spitballing here but the workflow could use a lot more refinement.


#4

Totally like this idea! At the moment “driver” has download_file, which writes a specified file format to disc; is this satisfactory or would you prefer I built in an option based on httr’s streaming?


#5

I am using download_file() to write to a tempfile() and then reading
in, so streaming would make things slightly faster, but you’ll need a
file on disk in any case to use rmarkdown::pandoc_convert() and convert
the google doc to markdown.


#6

Glad to hear you like it!

@ Karthik yeah, it sounds like you follow me - in an ideal world, you’d end up with real-time changes in sync (in both directions) between Google Doc and the Rmarkdown, but (for now at least) the ‘triggers’ available for Apps Script are limited to time repeats, at most every hour.

This suggests to me either there’s some other way to use this API, google.script.run client-side API looks about right.

I’m sure it’s technically possible to set up updating with every edit, but I’m not clear on that so wouldn’t focus on it at this point. Note that every change of a Google Drive file is saved automatically (the document constantly displays “changes saved in Drive”) so it’s possible.

I see it converting to md, and then (I’m thinking of RStudio here) the changes are visible in real-time as for when you edit a local file with a text editor. So there’d need to be some framework to update a local reference to the gdoc file - there’s no such thing as a local gdoc file to “retrieve” really, it’s just JSON containing a reference URL to the document as seen by cating it and e.g. the picture of my screen on this page, where I set up local ‘storage’ of gdocs in the same format they’re stored on Chrome OS.

Long story short any interaction would be through web interfaces not file storage. So the act of sending back to Drive would have to be through an API either a good one in R or the one being worked on by Google apps developers (which I think must surely be a good source of ready-to-roll code?). :smile:

So the Google Doc should contain some way of delineating code chunks as in Rmarkdown (as unobtrusive as possible but as I’ve noted there’s no code block).

The Code pretty GDocs add-on by Ian Kilpatrick (@bfgeek on GitHub/Twitter) interprets single-cell tables as code blocks for (implicit/assumptive) syntax highlighting - perhaps he could be persuaded to open source it and we could use it as a starting point to access single-cell tables to interpret them as code chunks.
By default, anything in such a single-cell table would be stuck inside a (non-chunk) code block:

```r
```

but then anything with some switch could go in an {r} knitr-usable chunk and with more switches the full scope of chunks could be accessed (including changing code language, eval=TRUE and all of that). Syntax highlighting would be the icing on the cake, and there are lots of Javascript libraries to do so (and lots of JS devs who might chip in their expertise).

The YAML should be generated through UI rather than being visible in the document - it’s metadata. I’m sure there’ll be facilities to do so in the “UI” API or however Google’s implemented it (got a feeling I read they deprecated UI but would have to check up on it all).

I’ve written a Chrome extension before and it seems the same kind of setup - Javascript more or less, but with all their own commands and very tight but well-documented syntax, tons of examples because it’s an active area.

TeX doesn’t have much of a place here as I’m seeing it, but I’m sure it could find some way in at a later stage. Given all of that there’d be pretty good preservation of all of Rmarkdown’s features(?)

I’ll ask Ian if he’ll take a look at this thread and might consider open sourcing the add-on.

@ Oliver httr and oauth is probably how this would be handled… it’ll involve looking up how to access Google APIs and interpret the objects of the document (online), outputting a string with newlines to be saved to disk as Rmarkdown. Given the existing script I wouldn’t expect it to be too hard.

@ Noam like I say there’s no actual file - just a reference JSON with URL and id (the URL contains the ID so even that’s pretty redundant). The file on disk would be the Rmd, so perhaps we’re better of starting to think of the downloading as being to Rmd… conceptually it makes it a bit clearer to not be thinking about “downloading a Google doc” perhaps. The string that comes out of the online API is markdown - there’s not really anything else to talk about saving.

Edit Just been blocked from posting this comment twice due to the link…? Chrome webstore, GitHub and Twitter are all that are here. Removed them now… Also not letting me use more than 2 links / user mentions… or use an image in my post :confused: I vote to go back to GitHub issues or a Google group or something


#7

gisted my original reply with links/image


#8

I’ve fixed the issue…everything should be back


#9

Perfect! Then it’ll do what you want :). Still pre-alpha, mind; I need to get uploads working (anyone who wants to help work out why HTTR is choking at post + OAuth + upload_file(), drop me a line)


#10

and uploads are working! Time to write aaaall the unit tests (should have an initial release, w/vignette et al, Monday morning)


Directions for rchie package / Google Docs
#11

I proposed a potential workflow with externalized code for working with Rmd and Google Docs some time ago. Perhaps it would be beneficial to this conversation:

http://bertelsen.ca/programming/google-docs-to-slidify-directly-from-r/


#12

The example was using slidify, but any Rmd document could work…


#13

We had a similar discussion recently, but just wanted Google Docs for change tracking, didn’t mind writing markdown. I then set up a quick way to check whether it renders right.


#14

it seems to be super helpful. thank you a lot for the information. but i’m a newbie and i can’t actually understand it all very well. do you mind if i am going to have some questions for you a bit later? please? i just see that you know these things much better than, at least, I do.