I run a shared lab server that we use for big computational jobs, which often includes geospatial routines using the raster package. raster has the neat feature of being able to work with bigger-then RAM data by storing it on disk. When raster objects get to large to handle in memory, it will automatically move them to disk as temporary files in the
This can result in the rapid accumulation of lots of large tempfiles. Moreover, many users of the server aren’t aware of this, as
/tmp/ is a top-level directory that’s not visible to most users. A user doing some ambitious geoprocessing can fill up the hard drive and grind all jobs to halt pretty easily.
Thanks to some feedback from rOpenSci colleagues, I came up with a 3-part solution to this:
Clear out tempfiles more frequently: By default, raster keeps tempfiles between sessions, removing only those of a certain age when the session begins. The default age is a week. Frankly, none of my users reuse these tempfiles as far as I can tell. So I changed the age to one hour, by adding
options(rasterTmpTime = 1) to my
Rprofile.site file. Practically, many users just leave their RStudio session open indefinitely, though, so in many cases this doesn’t prevent accumulation.
Move items to disk less frequently: raster has some nice machinery to estimate the memory needed for a task and move data to disk if it is not available. However, it turns out that it also has a default upper limit of 100MB for any task. I have much more RAM than this available, so I also added
options(rasterMaxMemory = 1e10) to change this to 10GB.
Move tempfiles to users’ own home directories: Users weren’t generally aware of the accumulation of temporary files because they were outside of their own directories. I wanted each user to have their own tempfile directory so they could view and delete temporary files they generated.
In theory this is supposed to be set by the
TEMPDIR environment variables, but because of the way this is implemented in R, one can’t set a user-specific path like
Renviron.site; path expansion is not performed. Thankfully, Simon Urbanek’s unixtools package (GitHub only) has a nice utility for resetting the temp directory. I’ve installed this on my server and added these lines to
Now all my users have visible
~/tmp/ directory in their home folder where temporary R session files go. It’s easy for them to see when they are using a lot of space, and easy for me to see who is taking up space with a quick
sudo ncdu /home.
P.S.: You’ll find my
Rprofile.site and the rest of the configuration in my server Docker image