I’ve begun work on the rchie package, which parses the NYT’s new ArchieML format.
The idea of ArchieML is to be able to include structured data in otherwise unstructured text documents. The main use-case it seems to be developed for is collaboration with non-technical writers. An example workflow: You want to create a graphic to accompany an article, so you collaborate using Google Docs in writing the article, which includes some hierarchical lists. When you parse the article with rchie::from_archie(), you get list objects you can use to make a figure with diagrammeR.
I’d like rchie to work with well with Google Docs. The question is: Should I build in a Google Drive API wrapper into the package?RGoogleDocs hasn’t been updated in several years and will no longer work after Google deprecates their old API in April. There’s also a nacent RGoogleDrive on github, but it doesn’t look like it’s being actively developed and doesn’t do much of what I’d need. Thoughts? (Note this relates to @Louis’s question about Google Drive)
Other things I’m trying with rchie:
Pulling ArchieML in from Word docs, using rmarkdown::pandoc_convert
Extracting ArchieML from the text portions of Rmd files, and making this available to chunks of those files when knitting.
Let me know if there are features or use-cases I should think about.
I have a use-case demo for rchie up as a Shiny App. It shows how an R markdown doc can pull in dynamic text as ArchieML data from a Google Doc (using @Ironholds’ driver). I throw in something with gspreadr, for fun, too.
I dropped the gspreadr component because for some reason pinging the google spreadsheet was very slow, but now the App and Google Doc also show how you can have structured numeric data in ArchieML.
I’ve been thinking about the"mixed workflow" approach that I demo’d with Shiny app. The main purpose developing the mixed workflow approach, I think, is is to enable collaboration between someone in a word processing / spreadsheet environment with someone in a text/R/git environment. Both can use their preferred approach, and you get a final product that’s a scientific paper or interactive web page.
There are a bunch of things one could do to improve this. One is to use the the driver and git2r packages to import the version history of a Google Doc into working git repository. Another is to improve the Shiny App, possibly using htmlwidgets so that text imported into the final product live, and/or embedding a google doc directly into the Shiny App, so that both editing and output can be seen in one place.
In what cases would the “mixed workflow” / ArchieML approach have an advantage? Right now I think the advantage is in (1) more complex outputs than *.Rmd documents, like Shiny apps and web pages with complex layouts, and (2) in version control, because I think version control in Google Docs/Dropbox and git have distinct advantages (ease/power), that can be combined with this approach.
The reasons I put together RGoogleDocs in the first place was solely for the purpose of being able to pull a shared Google Doc that a number of people had edited into R for compilation as markdown.
I did this with the specific intent of using slidify (this was before rmarkdown was a viable option).
The solution hinged on downloading html from googledocs and relying on the document headings to make decisions on formatting as well as a python script that converts html to text… presumably one could use pandoc but at the time I wasn’t aware of it. Here’s an example of my workflow, it’s not very complicated and could definitely be adjusted from slidify to whatever package you’re working with (rmarkdown)
With that said, the RGoogleDocs package isn’t being developed because it already did exactly what I needed it to do if you have something else you would like to see, post it as an issue on the github page and I’ll see if I can work it into my schedule over the coming weeks.