drake relies on R packages
future for all HPC, everything from local multicore processing to distributed computing on clusters. This allows
drake to think about scheduling and monitoring targets at a high level without having to get into the details of any one service in particular. If the k8s ecosystem has a resource manager (analogous to traditional schedulers like SLURM) that allows users to tunnel into workers, then it should be possible to build a layer of R that sends jobs and receives in-memory data as output. With that in place, the next step is to wrap that layer into new backends for
drake will automatically be able to talk to k8s. I have recently been discussing new cloud backends with @HenrikBengtsson and Michael Schubert, and they might be open to this.
kubeflow specifically, it depends on where it fits into the stack. If it operates as a top-level workflow automation tool with a DAG and everything, then it might be awkward to have it work in tandem with
drake. But if
kubeflow is more like SLURM or AWS Batch, then the direction I described above might be possible.
From my perspective, cloud integration is by far the greatest unmet need in R. Traditional HPC on private supercomputers is dying out, and data science is moving to AWS, Google Cloud, k8s, etc. R users need the simple ability to submit an in-memory job and get in-memory data back, with minimal config, minimal manual setup/teardown, and minimal cost. I believe this alone is worthy of several R Consortium grants.
Short answer: unfortunately, a lot of infrastructure needs to be built up before
drake can seamlessly interact with the k8s ecosystem.