Using Drake with kubeflow

Hi opensci community,

We are currently building a machine learning stack in our company.
One of the tools we plan on heavily using is kubeflow. However, I am highly disappointed with the support for R in kubeflow.
Hence, I tried to figure out how R users could utilize kubeflow in the data science workflow.
On core features of kubeflow obviously is kpf - aka building pipelines -. I was wondering whether anyone of you already tried to let drake communicate with kubeflow and if so, how and how did it go?

If not, would you think that path is worth investigating and further following?

I am looking forward to a lively discussion. Hopefully, some of you already have experiences.

2 Likes

Thanks for the post - @wlandau any thoughts?

drake relies on R packages clustermq and future for all HPC, everything from local multicore processing to distributed computing on clusters. This allows drake to think about scheduling and monitoring targets at a high level without having to get into the details of any one service in particular. If the k8s ecosystem has a resource manager (analogous to traditional schedulers like SLURM) that allows users to tunnel into workers, then it should be possible to build a layer of R that sends jobs and receives in-memory data as output. With that in place, the next step is to wrap that layer into new backends for clustermq and future. Then, drake will automatically be able to talk to k8s. I have recently been discussing new cloud backends with @HenrikBengtsson and Michael Schubert, and they might be open to this.

For kubeflow specifically, it depends on where it fits into the stack. If it operates as a top-level workflow automation tool with a DAG and everything, then it might be awkward to have it work in tandem with drake. But if kubeflow is more like SLURM or AWS Batch, then the direction I described above might be possible.

From my perspective, cloud integration is by far the greatest unmet need in R. Traditional HPC on private supercomputers is dying out, and data science is moving to AWS, Google Cloud, k8s, etc. R users need the simple ability to submit an in-memory job and get in-memory data back, with minimal config, minimal manual setup/teardown, and minimal cost. I believe this alone is worthy of several R Consortium grants.

Short answer: unfortunately, a lot of infrastructure needs to be built up before drake can seamlessly interact with the k8s ecosystem.

2 Likes

Thanks for the post @wlandau these are great insights. Basically I agree with everything you just said. Especially your thoughts on cloud integration of R.

My thoughts are probably a bit naive, but then again, I am not an infrastructure guy so bare that in mind :wink:
Here is my line of thought. Kubeflow offers with KFP a pipelining tool, from my understanding and tinkering, it pretty much does what drake does, but in the k8s eco system. However it relies on a Python SDK and yaml files. Hence, it is quite a stretch to integrate this into R. You could think about reticulate or some docker images but it feels cumbersome. My idea was to kind of “translate” the drake pipe into a KFP pipe.
Again, maybe a very naive way of thinking.

Yeah, it sounds like the closest we could expect to get is an automated converter from a drake plan to the YAML for KPF.

Incidentally, drake originally had something like this internally to turn plans into Makefiles. In other words, make(plan, parallelism = "Makefile") would create a Makefile based on the plan and then run the latter. Each Makefile recipe looked something like Rscript -e 'drake::run_drake_target("drake_target_name")', and there were hidden text files to translate drake's target invalidation rules into time stamps that GNU Make could understand. (And that’s why the name “drake” stands for “Data frames in R for Make”.) The whole Makefile backend turned out to be super clunky. drake quickly outgrew it, and I removed it in version 7 in early 2019.

FYI, Christopher Paciorek at UC Berkeley has been working on using futures with Kubernetes. We had a meeting, I’ve made some changes to future to make his life a bit easier. He said he’ll refactor the documentation quite a bit but you can peek at it at https://github.com/paciorek/future-kubernetes.

1 Like

Thanks @HenrikBengtsson that looks promising.
We are actually keen on contributing.