Should drake have a pipe?

wlandau · February 22, 2019, 4:27am

A recent feature request intrigued me, and I am thinking about adding a %dp% pipe operator to drake's new DSL. What do you think? Do you see yourself using it in your own workflows? Refs:

#746
#748

Advantages

magrittr and related pipes are readable, elegant, and domain-specific.
In cases like this one and this one, %dp% can help write the same drake_plan() with less code and less effort.

Drawbacks

Complexity, in the colloquial sense. drake is already super large and complicated, and a major goal of development is to downsize it and speed it up. Do we really need %dp%? Would enough people benefit? When it comes to drake, I have a history of getting overexcited about potential new features.
Speed. %dp% incurs a penalty (benchmarks, probably fixable) even if we do not use it (more benchmarks, not as bad).

noamross · February 22, 2019, 12:49pm

This is perhaps a curmudgeonly reply and specific to my own hang-ups, but but I think a pipe makes drake more “modern R” / “tidy” focused, while I feel it could benefit from being more general. As much as I appreciate drake, I’ve always found it challenging to define my workflows in this imperative or functional programming way (workflow-as-code). I’m more comfortable with workflow-as-data, which is more like Make, remake, or the YAML files CI systems use. I would love if some of the work on the drake interface would enable more of that paradigm.

wlandau · February 22, 2019, 8:47pm

Maybe a thin layer like redrake (currently a stub) could support a retro interface. Nothing wrong with that. In fact, it could also compensate for issues like #527, which come about because drake focuses on the user’s environment rather than script files. In general, the former is more brittle and less predictable than the latter, and you could falsely invalidate targets if you are not careful.

However, that is not the direction I plan to take drake itself. When I created drake two years ago, I was deliberately trying to break away from the old paradigm. As a user, I strongly felt that Makefiles, remake.yml files, and language-agnostic command line tools were obstructing my relationship with R. Yes, we should save our work in files, but not at the expense of interactivity, flexibility, good clean function-oriented programming, or user-side control, and not in a way that requires us to patch together R code with non-R code. Yes, a workflow should be data, but it should be the kind of data best suited to R: a tidy in-memory data frame, preferably generated by more R code.

In the crowded, mature, competitive space of sophisticated pipeline tools, drake's strongest and most unique asset is its domain-specificity. Unconventional? Absolutely. But I believe it embraces what it means to program in R.

noamross · February 23, 2019, 11:50am

Fair enough! An eloquent reply, and one clearly well-thought through before.

wlandau · March 2, 2019, 3:39am

Hmm… there might actually be a way to do both (bottom of https://github.com/ropensci/drake/issues/761#issuecomment-468870818). Just an idea, not sure if it makes sense yet.

wlandau · March 2, 2019, 6:01pm

In fact, in this proposal, both kinds of functionality can probably coexist.

wlandau · March 3, 2019, 6:21pm

Re configuration files: an experimental implementation is now in the development version (described here) and the plan is to send it to CRAN next week. @noamross, when I responded to you earlier, I assumed that a configuration file would require changing existing drake functions and going against the design philosophy. I no longer believe that to be the case, I think we can have the best of both worlds.

The new experimental callr-like API does not change drake's modern/tidy worldview, but it does make it more amenable to file-based workflows. The pipe, if we decide to merge it, can coexist.

wlandau · March 19, 2019, 1:54pm

Regarding the original topic, I have decided to not add the pipe to drake. I feel it does not improve code readability, and the complexity and performance penalties do not seem worth it anyway. Still, reimplementing magrittr for drake was a fun learning exercise.

Topic		Replies	Views
The prequel to the drake R package Blog reproducibility	0	664	February 6, 2018
The spirit of a drake-friendly workflow Package Use Questions drake	5	1138	June 22, 2018
Community Call - Reproducible Workflows at Scale with drake Blog reproducibility , drake , community-call	1	845	September 26, 2019
Current best practice on use of pipe within package functions Package Development	1	317	March 2, 2023
Can {drake} RAP? Promoting {drake} for pipeline management in UK government UseCases drake	1	1648	October 2, 2019

Should drake have a pipe?

Advantages

Drawbacks

Related topics