My R Markdown project notebooks tend to be a bunch Rmd files that include many exploratory graphs and models, and a lot of data wrangling to get the data into the appropriate forms for each. I’ll often want to repeat the data-wrangling steps in a new Rmd before trying something else.
In the past I’ve copy-pasted some of the data-wrangling code into a new file, but one can lose track how two copies of the same task change slightly. Functionalizing or extracting the data-wrangling code, on the other hand, seems like it can be more effort than it’s worth.
So I created a source_rmd()
function (below, and in a gist) that extracts the source code from an *.Rmd and runs it, optionally setting a null graphics device for plots. Now I run this function at the top of notebook files to load the environment of a previous notebook, and start off with wrangled data and models.
Some questions:
- Am I just lazy to avoid functionalizing my code? Is there a good rule-of-thumb as to when to do so? I tend to think that it’s when you start wrangling more than one data set the same way.
- Is there a better approach to workflow within a project?
- Is this implemented more robustly elsewhere? Any thoughts on improving it?
- I thought about something that would select chunks to run, but decided that this was just the level of specificity at which functionalizing the code and putting in an external
R/
folder makes sense.
- I thought about something that would select chunks to run, but decided that this was just the level of specificity at which functionalizing the code and putting in an external
- Is there a good place for this function to live that I might submit a PR?
#' Source the R from an knitr file, optionally skipping plots
#'
#' @param file the knitr file to source
#' @param skip_plots whether to make plots. If TRUE (default) sets a null graphics device
#'
#' @return This function is called for its side effects
#' @export
source_rmd = function(file, skip_plots = TRUE) {
temp = tempfile(fileext=".R")
knitr::purl(file, output=temp)
if(skip_plots) {
old_dev = getOption('device')
options(device = function(...) {
.Call("R_GD_nullDevice", PACKAGE = "grDevices")
})
}
source(temp)
if(skip_plots) {
options(device = old_dev)
}
}