rOpenSci | Safeguards and Backups for GitHub Organizations

At rOpenSci, much of our code, content and infrastructure is hosted on GitHub over several organizations – described on our resources page. This post summarizes some steps we’ve taken to safeguard our GitHub organizations.


This is a companion discussion topic for the original entry at https://ropensci.org/blog/2022/03/22/safeguards-and-backups-for-github-organizations/

Hi Maëlle, great article!

I am attempting to back up our entire organization repositories following your steps. I am new to the gh package and GitHub REST API. My understanding is that I first need to make a POST request to make a “migration archive” for each repo. Once the archive is exported, I can then download it. I am also new to what a “migration archive” is. My understanding is that when I archive a repo, it makes it read-only. Is this what the “migration archive” does as well? I would still need the repos to remain the same as we are not migrating to say Github Enterprise.

Thanks!

:wave: @DanWismer!

Right, archiving a repo makes it read-only but here archive is used in a different meaning, downloading an archive = downloading the files, git history and metadata (including issues/PRs).

Regarding the mechanics of the process, this package that we published might help: Helps Download Archives of GitHub Repositories • gitcellar (this is what we use to do our own backups now).

Thanks Maëlle,

Q. does download_organization_repos “lock_repositories” before migration?

I am not 100% sure what locking a repo does, but I saw the term in the GitHub docs as recommended: Exporting migration data from GitHub.com.

The gitcellar package describes exactly what I am wanting to achieve for our organization (downloading files, git history and metadata (including issues/PRs)). To confirm, the “downloading an archive” via gitceller; does this still keep the Github.com repo as is. Meaning, I want to avoid actually moving/migrating the repo. Or in other words, I want to avoid doing a “cut and paste” and rather have a “copy and paste” (sorry for the awkward question, my fear is doing something destructive to our repo)

Thanks!

Q. does download_organization_repos “lock_repositories” before migration?

I am not 100% sure what locking a repo does, but I saw the term in the GitHub docs as recommended: Exporting migration data from GitHub.com.

We use the default value of false see the docs Organization migrations - GitHub Docs and our code https://github.com/ropensci-org/gitcellar/blob/2bc062989eba0b8b1d22a671620042ce72ea149c/R/repo.R#L15 If you think we should surface the arguments, feel free to open an issue (or a PR :wink:) with the feature request. However I don’t think that’s what you want, I think only people who actually want to migrate their repo, not back it up using the migration archive, need that. If you lock a repo for instance there’s no write access to it.

The gitcellar package describes exactly what I am wanting to achieve for our organization (downloading files, git history and metadata (including issues/PRs)). To confirm, the “downloading an archive” via gitceller; does this still keep the Github.com repo as is. Meaning, I want to avoid actually moving/migrating the repo. Or in other words, I want to avoid doing a “cut and paste” and rather have a “copy and paste” (sorry for the awkward question, my fear is doing something destructive to our repo)

We back up all our repos weekly and they stay as they are. However I’d recommend creating an organization with a few toy repos in it (with a few files in each, a few issues, a few PRs), to experiment, so that you can see what happens and feel safer.

One use case was reported on this forum.