@npjc These are good questions. I’ll give my take but others may have different opinions and insights as well.
I think you containers do essentially give us an easy way to stop time, if done correctly. For instance, the rocker versioned images are fixed to a build date corresponding to the last day said version of R was current. So if my paper runs in, say,
rocker/binder:3.4.3, then that container will always run R 3.4.3, and always contain any CRAN / bioconductor packages fixed to whatever version was current on the day before the 3.4.4 release (i.e. 2017-03-15, as logged here). Similarly system libraries installed from
apt are fixed to the debian release, i.e. they will always come from debian:stretch (aka debian:9), even after later debian releases. Sure, it’s an open question for how long people will be able to deploy today’s docker container on the machines of the future, or to deploy a old versions of debian since the recipe isn’t docker-specific anyhow; but so far past versions of debian have been pretty persistent. No snapshot is perfect and docker doesn’t capture details at the level of the kernel or the hardware, but unless you’re writing papers specifically about hardware or kernel performance we can hope those don’t impact reproducibility…
The dynamic model you pose is very interesting, but I don’t believe it has to be an entirely orthogonal approach. Like you say, for me, it’s often hard to justify the time to “update” an old analysis to the latest code base. However, my future work will often build on parts of a previous work, and I hope (often with little evidence to show for it) that some of the code will be useful down the road not just for me but for others, so there’s value in allowing it to evolve.
As I’ve mentioned in a related thread on the RStudio Community, I try and put most non-trivial code related to an analysis into a separate package. I try to keep a “research compendium” that is associated with a particular paper or result to be relatively free of custom functions: i.e. ideally only
.Rmd notebooks, no
R/ directory with namespace etc. I try to move these custom functions into a separate R package that I can depend on (linking by GitHub release for version stability) across multiple projects, and keep these up-to-date to the extent that I and others are using them. By treating anything meant for possible re-use as “software” that can both evolve and be snap-shotted in time, separate from a particular paper which will inevitably fossilize at something close to it’s published form, I think I get something a bit more dynamic and hopefully reproducible.
Not sure if that made any sense; but just the practice I have currently evolved towards by dint of trying various other permutations. Thoughts welcome!