Best practice for testing packages with database connections

Tags: #<Tag:0x00007f825f44a0a0> #<Tag:0x00007f825f449f60> #<Tag:0x00007f825f449e20>


via Best practices for testing API packages

pining @jsonbecker here

I don’t often do this, so I don’t really know what best practice is. Any thoughts anyone?

Best practices for testing packages with large downloads
#2 while not yet on CRAN is a good solution and correctly scrubs authentication from the stored quieries and responses.


I am currently forced into tackling this issue. I am working on a package that stores data in a database, potentially stored in a user-specified location (and so Gabor’s otherwise very nice solution can not be applied). My main problem is that CRAN tests on win-builderdo not allow files to be removed, so I currently resort to creating a test database and … simply leaving it there if on_cran. Obviously not best practice … Any advice appreciated!

Another downside is that the whole suite of tests can then only construct the database once, which is both not best proactice and not particularly useful.

It would be great to have a best practice document that helped resolve such issues, and this thread is obviously the place, so I’m hoping this contribution might kick it back into sufficient life for us to move towards that point.

Another important general issue is where best to store test databases? (tempdir(), getwd()?).


@mpadge Can’t you just avoid running examples and tests on CRAN? And instead use Travis/Appveyor/Circle-CI/Rhub, etc.

Travis has easy ways to setup many common databases

what’s the database?


@sckott Unfortunately no. The package workflow is:

data to disk (~TB) -> C++ -> SQLite -> R -> cleanup disk -> out

(using C++ sqlite bindings and not RSQLite) and the only way to test anything at all is

test_data -> disk -> C++ -> SQLite -> R -> (do not) cleanup disk -> out

Without the database connection there can be no tests, and having no tests or examples whatsoever actually running on cran can surely not be good practice. Travis is no probs at all. Issues only appear to arise in through the win-builder machines appearing not to allow the cleanup stage. (I also can not get appveyor to find the C++ sqlite3 bindings, even though win-builder and any other windows/docker configs I can find have no probs there, but that’s a slightly separate issue.)

The issue about storage location and cleanup policies is, i’d suggest, sufficiently general as to warrant consideration in terms of some kind of general statement about best practice. As soon as i have a satisfactory solution, i’d be keen to contribute in that regard to save others the pain of prolonged searching in vain for a workable solution.

General best practice questions for which answers are likely to be useful:

  1. At which stage of tests to establish the db? Just once at the start, and use same throughout, or establish anew for each new block of tests? Likely some kind of first option desirable and can best work so … but if not, then second option best handled so …
  2. How best to clean up the db at end of (block of) tests?
  3. Where to store the db?

Answering these questions is going to require consideration of differences between CRAN, travis, appveyor, and whatever else. The easy bit there is that answers should be generally database independent, so general approaches ought to be possible.


Why? No other language pkg distribution networks build/test pkgs, e.g, python, ruby, node, julia, etc. And its so easy to do testing on travis and many other CI sites - Plus, even if you run tests on CRAN, there’s no automated message/etc. to tell you builds/tests broke, etc. Sometimes cran maintainers email you about it, sometime they don’t. Doesn’t seem ideal


good point @sckott ! I shall then not shy away from simply switching all tests off on CRAN. I’d nevertheless still like to get to the bottom of why win-builder machines prohibit files from being removed from disk. Interestingly, this seems to be permitted for examples (for which cleanup works), yet not for tests, so there must be some kind of environmental variable that differs between the two.