Using the Windows Clipboard, or Passing Data Quickly From Excel to R and Back Again

Two of my favorite functions are copy.table() and paste.table(). I’m going to turn this story on its head and give you the ending first.

# Copy data out of R
copy.table <- function(obj, size = 4096) {
  clip <- paste('clipboard-', size, sep = '')
  f <- file(description = clip, open = 'w')
  write.table(obj, f, row.names = FALSE, sep = '\t')
  close(f)  
}

# Paste data into R
paste.table <- function() {
  f <- file(description = 'clipboard', open = 'r')
  df <- read.table(f, sep = '\t', header = TRUE)
  close(f)
  return(df)
}

The first allows you to copy a data frame to the clipboard in a format that Excel knows how to deal with. All you have to do after running copy.table() is select the cell in Excel (or position in a text file) you want to paste to and press CNTL+V. It works on anything that can be coerced to a data frame, too, so vectors, table objects, etc. are all valid objects for copying.

# These all work
copy.table(1:100)
copy.table(letters)
copy.table(my.df)
copy.table(table(my.df$col1))
copy.table(matrix(1:20, nrow = 2))

If you get an error when trying to copy it is most likely because obj requires more than 4096 bytes to be represented as text. You can pass a larger number as the second argument to make it work. I’ve tried experimenting with estimating an upper bound on the size of an object but it hasn’t worked out yet. For now, if I can’t get something to copy I just double the second argument.

# If my.df is of moderate size
copy.table(my.df)

# If my.df is huge
copy.table(my.df, 10000)

Pasting works in a similar way. Select the range in Excel you want to copy (including the header) and press CNTL+C. Then run

other.df <- paste.table()

This one doesn’t take any arguments because it goes straight to the clipboard and pulls everything there.

I use both of these quite a bit when doing quick and dirty work where I need to have a more tactile, hands-on, “under the influence of Excel” workflow. They are interactive, so I don’t use them in the final production level code for any reproducible studies. But, for development and for quick stuff they are really helpful.

Have fun! Please forward on to someone else you know who uses R. Thanks!

Safe Loading of RData Files

Unless you have configured R not to ask, every time you close R or RStudio you are prompted to save your workspace. This saves an RData file to the working directory. The functions save.image() and save() offer a little more flexibility, but basically do the same thing: they save all or part of your current workspace to disk.

Let’s say last week I did some analysis on the built-in dataset called iris and I executed the following right before ending my R session

> ls()
[1] "fit1"    "iris"    "species"
> save.image('MyData.RData')

This saved the three objects in my global environment to a file called MyData.

Now I am ready to do a similar analysis on another data set about daisies. I load up the daisies data frame and create a unique list of all the species.

> ls()
[1] "daisies" "species"

I want to experiement with some models but I first want to take a look at what I did in the iris study, for reference. I load up the MyData file from the iris analysis using the following

> load('MyData.RData')
> ls()
[1] "daisies" "fit1"  "iris"  "species"

The problem with the default behavior of load() is that it does not allow me to load just one of the objects from the file but requires me to load all and throws them in my global environment. Sometimes, like here, this writes over objects that already exist in memory. My daisy species object got overwritten by the iris species object I had saved to disk.

This isn’t really a problem if you always give objects unique names or if you remember every object you have saved in every file, but really, who can possibly do that? There is another way to combat this and that is to not rely on load()‘s default behavior. The second parameter allows you to specify an environment other than the global environment in which to load the contents of an RData file. So, I could have, and should have, done this

  iris.env <- environment()
  load('MyData.RData', envir = iris.env)
  iris.fit1 <- iris.env$fit1

I’ve never really analyzed any iris or daisy data, but this illustrates what has happened to me on several different occasions when I need to compare the results from two separate analyses that have a similar structure and overlapping names for objects. I’ve written a convenience function to make this loading to an environment easier. My philosophy is that the only safe way to load data from an RData file is to load it to an environment, inspect that environment and then explicitly identify what it is I want in my global environment before putting it there. I never use the load() function directly any more and only ever use the following

  LoadToEnvironment <- function(RData, env = new.env()){
    load(RData, env)
    return(env) 
  }

If at some future point I wanted to compare the models from the iris and daisy analyses I would do the following

  iris.env <- LoadToEnvironment('iris.RData')
  daisy.env <- LoadToEnvironment('daisy.RData')
  iris.fit <- iris.env$fit1
  daisy.fit <- daisy.env$fit1
  # Compare iris.fit and daisy.fit

I wish you happy and safe coding. This is my inaugural post. My plan is to add a new post every few days so check back soon. Thanks!