Track fast-evolving custom R scripts via `freezr`


#1

Hi ropensci,

I’m working on freezr, an R package that records snapshots of your fast-evolving R scripts or Rmd notebooks. You can see it at https://github.com/ekernf01/freezr .

The freezr package offers conveniences including time-stamped results folders, logs saying which commit you’re on, and an inventory system to load files into downstream scripts without hard-coding the (usually time-stamped and possibly quite ugly) paths.

Despite the availability of many other reproducibility-oriented, R-based tools, I believe freezr answers an unmet need. In my day to day work, it has saved me a huge amount of time (and embarrassment). Please check it out if you are interested, and let me know what would make freezr more practical for you or your students.

Thanks!


#2

Hi @ekernf01-- freezr looks really interesting, and is quite similar to another ROpenSci package we’ve been working on for the last few years – recordr (https://github.com/NCEAS/recordr). Here’s a brief rundown of how recordr helps you to:

  • run your script with organizational tags to keep track of scenarios for each run,
  • gather the output into a cached results directory,
  • save a copy of your code, and sessionInfo() along with the results,
  • track the dependencies among inputs and outputs of your code,
  • provides run management functions to list, search, view, and manage previous runs,
  • allow you to publish the results of one or more runs as an archival package to any DataONE data repository using the ROpensci datapack packaging format

The provenance information about the run, inputs, outputs, and dependencies is all recorded using the ProvONE extension of the PROV model for provenance. So it is compatible with many other provenance tools.

I’m really amazed at how parallel freezr and recordr are, and would love to discuss synergies with you sometime. Your linkages to GitHub commit hashes is particularly interesting. Great work, and thanks for pointing it out.

You might also add RDataTracker to your list of related tools, which is an even more fine-grained provenance tracker for R.

Matt


#3

Oh, and I forgot – recordr also can watch and record your interactive console sessions to be sure the exact sequence of commands used is properly recorded, even if commands from a script are run out of order.


#4

Wow. I wish they’d taught me about this in school; I could have avoided writing freezr at all. The projects look similar right down to the max_archive_file_size and the choice of term “tag”. It definitely makes sense for us to discuss what we have in common and what we could add to one another’s projects. I will try using recordr for my next big or medium-size project and get a feel for it. In the mean time, feel free to contact me via first.lastfour.13 at gmail.com, where first is eric and lastfour is kern.

I will also ponder whether freezr is doing the community any good by competing for this niche. Given that you are years ahead of me, it may be time to mothball freezr sooner than anticipated.

I am curious what sort of obstacles you ran into while working on your package. I had trouble setting up tests for code that functions almost exclusively to produce side effects, and that is still a barrier for me in terms of the examples embedded in the Roxygen docs.

P.S. I just added the GitHub commit hashes this afternoon. Definitely a nice feature for my current project in genomics.

2017-10-02 22:02 GMT-04:00 Matt Jones discuss@ropensci.org:
mbjones
October 3
Oh, and I forgot – recordr also can watch and record your interactive console sessions to be sure the exact sequence of commands used is properly recorded, even if commands from a script are run out of order.

Visit Topic or reply to this email to respond.

To unsubscribe from these emails, click here.


Eric