Any interest in an R package that interfaces DVC?

Tags: #<Tag:0x00007f8b614d0910>

Hi all,

I started putting together a little R package that interfaces with DVC. You can check it out here: GitHub - andrewcstewart/dvc-r: R Package for Data Version Control (DVC)

It currently only implements a limit set of functionality from DVC, including project setup and data tracking (along with remotes, push, pull, etc).

One of the primary motives behind this particular package was just that I’ve been wanting to run through setting up an R package that 1) wraps a python package, 2) wraps a CLI, 3) uses the {testthat} package to unit test that interface, and 4) uses GitHub Actions to build and test the whole thing. DVC fits that bill well, and I do find myself often using the data tracking features from within R enough that it made a nice little project.

But from here, I’m curious if there is any interest in developing the package further to include other aspects of DVC, etc. Otherwise I may just leave it as is, but if there are any folks interested enough then it could possibly make for a fun project going forward.

Anyway, any thoughts or feedback welcome!



Thanks for sharing your project here @andrewcstewart

Yeah, it’s a nice package to include lots of tools/skills that are useful to have for sure. Don’t have time myself to contribute, but I imagine they’ll be some interest in the community

1 Like

I see a lot of value for projects that use targets. targets avoids implementing a version control system of its own, but I tried to design the data store to be lightweight and simple enough for third-party data versioning tools to get involved. This prototype shows how continuous deployment to a remote Git branch can version historical changes to pipeline output, but it only works for data small enough for GitHub / Git LFS. dvc_pull() and dvc_push() could really help scale the size of the data. Related threads here and here.

1 Like

Awesome. I’ve been wanting to check out targets in more detail, so that could be a great reason to. I’m thinking maybe I should try to walk through a targets example and attempt to work dvc into the workflow.

If anyone is interested and willing to review the dvc package with me sometime, that would be awesome too.