I’m trying to understand the theory of how drake should be used.
I have a data analysis project with a workflow that looks like:
- Directories like data/raw/, data/clean/, code/prep/, code/analysis, output [contains .Rmd reporting file]
- Files in the code directories like 01_clean.R and 02_merge_and_transform.R
And then I make it “reproducible” by creating a master script that calls:
How do I move from this workflow to a drake
plan? My understanding is that some changes in approach are:
- Keep data in the environment rather than files. So should I no longer write to
data/clean/and then read from it in a subsequent script, just use the same names of objects in subsequent steps/targets?
- Focus on functions. So if I currently have 100 lines of code in
01_clean.Rthat clean a raw data.file and write it to a .csv, executed by
source()in my current approach, I would need to make that into function(s)? The project is a one-off with unique data so I am not invested in creating functions.
I tried putting
source("code/prep/01_clean.R") inside my
drake_plan call, which worked - great! - but then when I changed something in that file and tried to run
make(plan) again, it told me “All targets are already up to date”, not seeing the saved change to the .R file.