An updated version of my package fancycut is on CRAN. The package has one function, fancycut(), that is more versatile than cut() for certain applications. cut() takes a numeric vector and turns it into a factor or categorical vector. It works best when they are no gaps in the intervals you have defined and when the are all either right handed or left handed. It does not like point intervals or mixed open, closed, etc. fancycut() is OK with all of this. It uses interval notation we learned in high school math which is why I like it. Here is an example

  x = -10:10,
  SubZero = '(-Inf,0)',
  Zero = '[0,0]',
  Small = '(0,2]',
  Medium = '(2,5)',
  Large = '[5,10]', = FALSE

This returns

[1] "SubZero" "SubZero" "SubZero" "SubZero" "SubZero" "SubZero" "SubZero" "SubZero" "SubZero" "SubZero" "Zero"    "Small"   "Small"   "Medium"  "Medium" 
[16] "Large"   "Large"   "Large"   "Large"   "Large"   "Large"  

It has other features, too, please see the help files.

Desktop in the Cloud

I just wanted to share the good experience I have had with creating a virtual personal desktop on Paperspace. A few months ago I started researching whether there was anyone out there providing this service. Paperspace came up in the search but their service was not yet available. But, to my surprise, they were available the other night when I needed some extra computing power ASAP. Their souped-up desktop runs with 32 GB of RAM and 8 CPUs. You can pay monthly or by the hour and the rates are pretty reasonable. What is amazing is the speed. I was really impressed today with the download speeds, of all things. I had an entire R development stack set up and ready to go in 30 mins or less (R, RStudio, tools, github, git, Sublime, and on and on). AND, I’ve created a snapshot of the setup so I can spin up a new instance in the future, if I screw something up.

They are still a new company, small enough to still be personal, evidenced by the one-on-one conversation I’ve been having this evening with one of their outreach guys, Paul Kleen (paul AT paperspace DOT com). He’s offered free storage to anyone who emails him in response to this post, all you have to pay is the hourly usage fees. He offered me credits if anyone signs up but I am declining those because I don’t want you to think this is just a sales pitch. I really am impressed with this service and it totally hit the spot in a time of need. Try it out and you will love it.

Oh, and while I was setting this up, I tried to do the same on Azure and burned through $200 of credits before I even got to use the machine. Amazon suspended my account. The other smaller players I didn’t even try — too sketchy looking. Paperspace on the other hand is offering a good service that I can really stand behind.

Safe Loading of RData Files

Unless you have configured R not to ask, every time you close R or RStudio you are prompted to save your workspace. This saves an RData file to the working directory. The functions save.image() and save() offer a little more flexibility, but basically do the same thing: they save all or part of your current workspace to disk.

Let’s say last week I did some analysis on the built-in dataset called iris and I executed the following right before ending my R session

> ls()
[1] 'fit1'    'iris'    'species'
> save.image('MyData.RData')

This saved the three objects in my global environment to a file called MyData.

Now I am ready to do a similar analysis on another data set about daisies. I load up the daisies data frame and create a unique list of all the species.

> ls()
[1] 'daisies' 'species'

I want to experiement with some models but I first want to take a look at what I did in the iris study, for reference. I load up the MyData file from the iris analysis using the following

> load('MyData.RData')
> ls()
[1] 'daisies' 'fit1'  'iris'  'species'

The problem with the default behavior of load() is that it does not allow me to load just one of the objects from the file but requires me to load all and throws them in my global environment. Sometimes, like here, this writes over objects that already exist in memory. My daisy species object got overwritten by the iris species object I had saved to disk.

This isn’t really a problem if you always give objects unique names or if you remember every object you have saved in every file, but really, who can possibly do that? There is another way to combat this and that is to not rely on load()‘s default behavior. The second parameter allows you to specify an environment other than the global environment in which to load the contents of an RData file. So, I could have, and should have, done this

  iris.env <- environment()
  load('MyData.RData', envir = iris.env)
  iris.fit1 <- iris.env$fit1

I’ve never really analyzed any iris or daisy data, but this illustrates what has happened to me on several different occasions when I need to compare the results from two separate analyses that have a similar structure and overlapping names for objects. I’ve written a convenience function to make this loading to an environment easier. My philosophy is that the only safe way to load data from an RData file is to load it to an environment, inspect that environment and then explicitly identify what it is I want in my global environment before putting it there. I never use the load() function directly any more and only ever use the following

  LoadToEnvironment <- function(RData, env = new.env()){
    load(RData, env)

If at some future point I wanted to compare the models from the iris and daisy analyses I would do the following

  iris.env <- LoadToEnvironment('iris.RData')
  daisy.env <- LoadToEnvironment('daisy.RData') <- iris.env$fit1 <- daisy.env$fit1
  # Compare and

I wish you happy and safe coding.

SQL Server 2016 with R Install

I installed the “SQL Server 2016 Release Candidate” today to start playing around with Microsoft is calling “SQL Server R Services (In-Database)”. I think this is really going to change the way we do things at the office and it’s got me really excited.

You can download the Release Candidate from

I chose the CAB install which downloads two files, about 1.9GB. Running the executable then unpacks the setup files you will need for installation. Then, setup should run automatically. Microsoft has some good instructions at

The last part of their instructions is a test to make sure R is working.

exec sp_execute_external_script @language =N'R',
@input_data_1 =N'select 1 as hello'
with result sets (([hello] int not null));

The install actually adds a new instance of R running on the server, which for me is just my laptop.  A simple change to the code above helped me to find where the instance is.

exec sp_execute_external_script @language =N'R',
@script=N'OutputDataSet <-'
with result sets (([R.home] sysname));

You can add more packages to this install quite easily. I’ve gone ahead and pinned this R instance to my taskbar. Run it as an admin and the installed packages are automatically available to SQL Server code. After installing the actuar package, just to test, I ran this in SSMS and it worked!

exec sp_execute_external_script @language =N'R',
OutputDataSet <-'
with result sets (([LoadedPackages] sysname));

Fibonacci Sequence in R with Memoization

This is in response to Andrew Z’s post on R-Bloggers Friday about using recursion to calculate numbers in the Fibonacci sequence.

I’ve re-written the author’s Fibonacci function here. The only really change is that this one is extended to negative integers.

fib <- function(n) {

    # Handle "vectors" by element
    if (length(n) > 1) {
        return(sapply(n, fib))

    # Base cases
    if (n == 0) 
    if (n == 1) 

    # Check to see if n is an integer Do not use is.integer as that is very
    # strict
    if (round(n, 0) != n) 

    # Negative numbers
    if (n < 0) 
        return(fib(-1 * n) * ((-1)^((n + 1)%%2)))

    # Everything else
    return(fib(n - 1) + fib(n - 2))


This is a great example of recursion, but not very useful in practice.  It is pretty slow because it has to go to the base case several times for every top-level call.

##    user  system elapsed 
##    0.19    0.00    0.19
##    user  system elapsed 
##    1.54    0.00    1.54
##    user  system elapsed 
##   15.98    0.00   15.97

Using fib to calculate Fibonacci numbers over 10 is really just too slow. This function is a perfect candidate for memoization. The memoise package is great for this, but gets tricky when the function being memoized is recursive. This example uses the principles of the memoise package and even steals a little code.

fibM <- (function() {

    # The code here related to the cache *mostly* comes from the memoise
    # package's object new_cache.

    cache <- NULL

    cache_reset <- function() {
        cache <<- new.env(TRUE, emptyenv())
        cache_set('0', 0)
        cache_set('1', 1)

    cache_set <- function(key, value) {
        assign(key, value, envir = cache)

    cache_get <- function(key) {
        get(key, envir = cache, inherits = FALSE)

    cache_has_key <- function(key) {
        exists(key, envir = cache, inherits = FALSE)

    # Initialize the cache

    # This is the function that gets returned by the anonymous function and
    # becomes fibM.
    function(n) {

        nc <- as.character(n)

        # Handle "vectors" by element
        if (length(n) > 1) {
            return(sapply(n, fibM))

        # Check to see if n is an integer Do not use is.integer as that is very
        # strict
        if (round(n, 0) != n) 

        # Cached cases
        if (cache_has_key(nc)) 

        # Negative numbers
        if (n < 0) 
            return(fibM(-1 * n) * ((-1)^((n + 1)%%2)))

        # Everything else
        out <- fibM(n - 1) + fibM(n - 2)
        cache_set(nc, out)



This function demonstrates the use of memoization and using an anonymous function to create a closure. I wrote it this way so that the cache appears to be embedded in the function instead of leaving it in the Global Environment. This keeps it “safe” and promotes tidiness.

Every time you call fibM, whether at the top level or through recursion, it checks its cache to see if it has already solved this problem. If it has it just returns the result. If not, it solves it and stores the answer in the cache for later use.

Before it has been called the first time the cache is empty except for the “seed” values.

## [1] '0' '1'

Call it once and see how fast it fills up.

## [1] 832040
##  [1] '0'  '1'  '10' '11' '12' '13' '14' '15' '16' '17' '18' '19' '2'  '20'
## [15] '21' '22' '23' '24' '25' '26' '27' '28' '29' '3'  '30' '4'  '5'  '6' 
## [29] '7'  '8'  '9'

Even on the first run this is faster than fib but that speed improvement intensifies once the cache starts to fill up.

To demonstrate, I’ll clear the cache first.

## [1] '0' '1'

Now for some speed tests.

nums <- -25:25

system.time(res.fib <- fib(nums))
##    user  system elapsed 
##    7.11    0.00    7.11
system.time(res.fibM.1 <- fibM(nums))
##    user  system elapsed 
##       0       0       0
system.time(res.fibM.2 <- fibM(nums))
##    user  system elapsed 
##       0       0       0

Let me know if you have any questions or comments on the techniques used here.

Using the Windows Clipboard, or Passing Data Quickly From Excel to R and Back Again

Two of my favorite functions are copy.table() and paste.table(). I’m going to turn this story on its head and give you the ending first.

# Copy data out of R
copy.table <- function(obj, size = 4096) {
  clip <- paste('clipboard-', size, sep = '')
  f <- file(description = clip, open = 'w')
  write.table(obj, f, row.names = FALSE, sep = '\t')

# Paste data into R
paste.table <- function() {
  f <- file(description = 'clipboard', open = 'r')
  df <- read.table(f, sep = '\t', header = TRUE)

The first allows you to copy a data frame to the clipboard in a format that Excel knows how to deal with. All you have to do after running copy.table() is select the cell in Excel (or position in a text file) you want to paste to and press CNTL+V. It works on anything that can be coerced to a data frame, too, so vectors, table objects, etc. are all valid objects for copying.

# These all work
copy.table(matrix(1:20, nrow = 2))

If you get an error when trying to copy it is most likely because obj requires more than 4096 bytes to be represented as text. You can pass a larger number as the second argument to make it work. I’ve tried experimenting with estimating an upper bound on the size of an object but it hasn’t worked out yet. For now, if I can’t get something to copy I just double the second argument.

# If my.df is of moderate size

# If my.df is huge
copy.table(my.df, 10000)

Pasting works in a similar way. Select the range in Excel you want to copy (including the header) and press CNTL+C. Then run

other.df <- paste.table()

This one doesn’t take any arguments because it goes straight to the clipboard and pulls everything there.

I use both of these quite a bit when doing quick and dirty work where I need to have a more tactile, hands-on, “under the influence of Excel” workflow. They are interactive, so I don’t use them in the final production level code for any reproducible studies. But, for development and for quick stuff they are really helpful.

Have fun! Please forward on to someone else you know who uses R. Thanks!