Slides and some thoughts on a talk about reproducibility

I recently gave a presentation on reproducibility to my organization: “Reproducibility from a Mostly Selfish Point of View.” You can find the slides here on figshare.

I am grateful to the following for source material and inspiration:

I broke the talk into two sections: “Why” and “How”. Why was, as the title suggests, primarily focused on the benefits of reproducibility to us, and I proceeded from avoiding negatives (risk avoidance) to creating positives (more impact). In How I tried to be very high-level, talking about major concepts in reproducibility, and then talking generally about the tools that I have used for each, emphasizing that they may not be the right tools for everyone. Then we had a discussion about the most promising areas and tools to start with.

This went remarkably well. A few quick thoughts:

  1. Many people are primed for this topic. The steady drum of reproducibility-related stories in the science press over the past few years has heightened awareness of this stuff.

  2. Despite my avoiding a focus on openness, open-data and code mandates came up a lot, because people want to reduce the effort involved in getting code and data in shape to share per these requirements.

  3. The biggest response in terms of positives came from talking about impact, rather than risk management or productivity (though those resonated as well). As I put it, “If we’re going to share, let’s share impressively.” There was a lot of talk about new things we could do, and new audiences we could reach, with additional research products that emerge from reproducible workflows. An example I like is Andrew Rambaut’s MERS data. Rambaut did the work of aggregating this for his own analyses, but by releasing (and updating) the data set with a nice little D3 data viewer on top of it he provided the community with a great tool and increased the visibility of his own work.



This is all fantastic, and timely! I roped myself into doing something similar, albeit for a different crowd: Fed Managers. That being said, a lot of what you include is going to be relevant even for them, especially the last three slides. I think including that information – caveats, realizing that reproducibility occurs along a gradient, and identifying resources required is important. Anyway, thanks for sharing this. I plan to borrow heavily!


p.s. links to Karl’s course materials and the reproducibility guide is busted.

@noamross nice, interesting stuff

I think I missed how this connects to reproducibility - maybe I didn’t read the slides closely enough :slightly_smiling: can you explain?

p.s. hope you don’t mind, fixed two broken links in your post

The slides are pretty sparse, as I mostly just riffed on them, so I don’t think you missed anything. The point I was making was this: When you do you science in reproducible fashion, every step along the way is a potential output, not just the manuscript. It’s a much smaller leap to dress up and publish your data set, workflow or method when they are prepared this way.

(Thanks for fixing the links!)