Associating an Rmd file with a commit

I’ve had the same problem as @noamross, and have handled it badly for previous projects. My solution so far as basically been to do what @jennybc and @cboettig and others do and commit the output files and not worry about it. My 2 cents to add is that it’s pretty easy to create an index.Rmd with a code block that spits out a table of contents for the all the Rmd files in the repository, and to modify _output.yaml to add a header to the files. From there you can either push everything to a gh-pages branch and get a website from GitHub, or rync the HTML files to your own server. I’m usually collaborating with non-technical folks, and it’s often easier to point them to a regular website instead of GitHub. And I’m not really fond of having websites hosted by GitHub, so this would allow me to rsync files to my own website.

Here is demo repo:
https://lmullen.github.io/rmd-notebook/

And an actual project:
http://lmullen.github.io/civil-procedure-codes/index.html

My other reason for doing this was to have a simple, uniform way for students to show their work. No more committing Rmd files that don’t compile.

1 Like

Thanks @lincoln, really nice examples!

Also great point about just using an auto-generated table of contents with links rather than browse the repo directly. I should start doing that…

Yeah, I’m still torn on the HTML (on gh-pages) vs md (viewed on GitHub) as the output part. HTML has two drawbacks for me:

  • It doesn’t translate to private repos.
  • I can’t browse history as easily

Private repos matter more to me now that I tell students to do this stuff, and want them to have a private area to experiment. gh-pages rendering (or any of the rawgit html renderers) obviously only work for public content.

Likewise, it’s easy to click back in history on the GitHub side of things. It’s a good point that HTML is more user friendly to navigate overall (and can look better since you controll the CSS, and most importantly can include mathjax!). But for the usecase @noamross brings up (scratch notebook entries) navigating back to earlier versions of results and being able to see the code and output in the single file is really nice. (If only GitHub could somehow render html preview content internally…)

1 Like

That’s really nice, @lincoln. I also like the way you create an index. I’ve had that idea but never done squat about it.

Can you elaborate on what you mean by “My other reason for doing this was to have a simple, uniform way for students to show their work. No more committing Rmd files that don’t compile.” It sounds like you’re having students use this template? I’d like to hear more about that.

Have you wrestled with the vexing issue of having everything in the top-level directory? It’s really not sustainable, I find, which opens up a whole new set of issues.

@jennybc: In past semesters some students have had a hard time grasping that an Rmd document had to be knit. They would run code chunks individually and would get correct results, but they would also have a bunch of junk in the document. So when I marked the assignment by cloning the repo, it was a mess.

I haven’t entirely settled on what I’m going to do next semester. But I’m thinking about requiring them to use commit the generated HTML and use that template to get a list of pages on GitHub pages. I’ll look at the HTML first. So if it doesn’t knit, no credit. Then, I’ll go check the repository itself as a second step.

The question of having everything in the top repository gets back to @noamross’s question, which I don’t have a good answer for. Like Noam, my scratch work starts out in Rmd files then tends to migrate into scripts (in scripts) and helper functions (in R) if not into its own package. So when I’ve moved on from an Rmd file which is no longer useful, I move it plus the generated files into a subdirectory. That way I have my notes, but the files are no longer Make targets or cluttering the main repo. Probably not ideal in lots of ways, but I guess I figure that reproducibility for the end product is enough, and reproducibility of every step along the way is more than I’m willing to tackle.

A little addendum to this conversation. @cboettig mentioned the problem of gh-pages always being public, even for private repositories. I’ve recently been using jekyll-auth to make my project notebooks visible as web pages visible only to my team. Setup may be slightly elaborate if students are setting up their own repos, but if people are collaborating on a notebook or project you set up as a Github Organization or Team it works well.

I see I am late to the party here, but I am working on a package (eventually for submission to ropensci, I hope) that automatically records the commit hash and the sessionInfo (among other things). Check it out here, if you are interested: