Should drake have a pipe?

Tags: #<Tag:0x00007fd2539b96f0> #<Tag:0x00007fd2539b9498>


A recent feature request intrigued me, and I am thinking about adding a %dp% pipe operator to drake's new DSL. What do you think? Do you see yourself using it in your own workflows? Refs:


  • magrittr and related pipes are readable, elegant, and domain-specific.
  • In cases like this one and this one, %dp% can help write the same drake_plan() with less code and less effort.


  • Complexity, in the colloquial sense. drake is already super large and complicated, and a major goal of development is to downsize it and speed it up. Do we really need %dp%? Would enough people benefit? When it comes to drake, I have a history of getting overexcited about potential new features.
  • Speed. %dp% incurs a penalty (benchmarks, probably fixable) even if we do not use it (more benchmarks, not as bad).


This is perhaps a curmudgeonly reply and specific to my own hang-ups, but but I think a pipe makes drake more “modern R” / “tidy” focused, while I feel it could benefit from being more general. As much as I appreciate drake, I’ve always found it challenging to define my workflows in this imperative or functional programming way (workflow-as-code). I’m more comfortable with workflow-as-data, which is more like Make, remake, or the YAML files CI systems use. I would love if some of the work on the drake interface would enable more of that paradigm.


Maybe a thin layer like redrake (currently a stub) could support a retro interface. Nothing wrong with that. In fact, it could also compensate for issues like #527, which come about because drake focuses on the user’s environment rather than script files. In general, the former is more brittle and less predictable than the latter, and you could falsely invalidate targets if you are not careful.

However, that is not the direction I plan to take drake itself. When I created drake two years ago, I was deliberately trying to break away from the old paradigm. As a user, I strongly felt that Makefiles, remake.yml files, and language-agnostic command line tools were obstructing my relationship with R. Yes, we should save our work in files, but not at the expense of interactivity, flexibility, good clean function-oriented programming, or user-side control, and not in a way that requires us to patch together R code with non-R code. Yes, a workflow should be data, but it should be the kind of data best suited to R: a tidy in-memory data frame, preferably generated by more R code.

In the crowded, mature, competitive space of sophisticated pipeline tools, drake's strongest and most unique asset is its domain-specificity. Unconventional? Absolutely. But I believe it embraces what it means to program in R.


Fair enough! An eloquent reply, and one clearly well-thought through before.


Hmm… there might actually be a way to do both (bottom of Just an idea, not sure if it makes sense yet.


In fact, in this proposal, both kinds of functionality can probably coexist.


Re configuration files: an experimental implementation is now in the development version (described here) and the plan is to send it to CRAN next week. @noamross, when I responded to you earlier, I assumed that a configuration file would require changing existing drake functions and going against the design philosophy. I no longer believe that to be the case, I think we can have the best of both worlds.

The new experimental callr-like API does not change drake's modern/tidy worldview, but it does make it more amenable to file-based workflows. The pipe, if we decide to merge it, can coexist.