Safe Loading of RData Files

Unless you have configured R not to ask, every time you close R or RStudio you are prompted to save your workspace. This saves an RData file to the working directory. The functions save.image() and save() offer a little more flexibility, but basically do the same thing: they save all or part of your current workspace to disk.

Let’s say last week I did some analysis on the built-in dataset called iris and I executed the following right before ending my R session

> ls()
[1] 'fit1'    'iris'    'species'
> save.image('MyData.RData')

This saved the three objects in my global environment to a file called MyData.

Now I am ready to do a similar analysis on another data set about daisies. I load up the daisies data frame and create a unique list of all the species.

> ls()
[1] 'daisies' 'species'

I want to experiement with some models but I first want to take a look at what I did in the iris study, for reference. I load up the MyData file from the iris analysis using the following

> load('MyData.RData')
> ls()
[1] 'daisies' 'fit1'  'iris'  'species'

The problem with the default behavior of load() is that it does not allow me to load just one of the objects from the file but requires me to load all and throws them in my global environment. Sometimes, like here, this writes over objects that already exist in memory. My daisy species object got overwritten by the iris species object I had saved to disk.

This isn’t really a problem if you always give objects unique names or if you remember every object you have saved in every file, but really, who can possibly do that? There is another way to combat this and that is to not rely on load()‘s default behavior. The second parameter allows you to specify an environment other than the global environment in which to load the contents of an RData file. So, I could have, and should have, done this

  iris.env <- environment()
  load('MyData.RData', envir = iris.env)
  iris.fit1 <- iris.env$fit1

I’ve never really analyzed any iris or daisy data, but this illustrates what has happened to me on several different occasions when I need to compare the results from two separate analyses that have a similar structure and overlapping names for objects. I’ve written a convenience function to make this loading to an environment easier. My philosophy is that the only safe way to load data from an RData file is to load it to an environment, inspect that environment and then explicitly identify what it is I want in my global environment before putting it there. I never use the load() function directly any more and only ever use the following

  LoadToEnvironment <- function(RData, env = new.env()){
    load(RData, env)

If at some future point I wanted to compare the models from the iris and daisy analyses I would do the following

  iris.env <- LoadToEnvironment('iris.RData')
  daisy.env <- LoadToEnvironment('daisy.RData') <- iris.env$fit1 <- daisy.env$fit1
  # Compare and

I wish you happy and safe coding.