Fibonacci Sequence in R with Memoization

This is in response to Andrew Z’s post on R-Bloggers Friday about using recursion to calculate numbers in the Fibonacci sequence.
http://heuristicandrew.blogspot.com/2014/12/fibonacci-sequence-in-r-and-sas.html

I’ve re-written the author’s Fibonacci function here. The only really change is that this one is extended to negative integers.

fib <- function(n) {

    # Handle "vectors" by element
    if (length(n) > 1) {
        return(sapply(n, fib))
    }

    # Base cases
    if (n == 0) 
        return(0)
    if (n == 1) 
        return(1)

    # Check to see if n is an integer Do not use is.integer as that is very
    # strict
    if (round(n, 0) != n) 
        return(NA)

    # Negative numbers
    if (n < 0) 
        return(fib(-1 * n) * ((-1)^((n + 1)%%2)))

    # Everything else
    return(fib(n - 1) + fib(n - 2))

}

This is a great example of recursion, but not very useful in practice.  It is pretty slow because it has to go to the base case several times for every top-level call.

system.time(fib(20))
##    user  system elapsed 
##    0.19    0.00    0.19
system.time(fib(25))
##    user  system elapsed 
##    1.54    0.00    1.54
system.time(fib(30))
##    user  system elapsed 
##   15.98    0.00   15.97

Using fib to calculate Fibonacci numbers over 10 is really just too slow. This function is a perfect candidate for memoization. The memoise package is great for this, but gets tricky when the function being memoized is recursive. This example uses the principles of the memoise package and even steals a little code.

fibM <- (function() {

    # The code here related to the cache *mostly* comes from the memoise
    # package's object new_cache.

    cache <- NULL

    cache_reset <- function() {
        cache <<- new.env(TRUE, emptyenv())
        cache_set('0', 0)
        cache_set('1', 1)
    }

    cache_set <- function(key, value) {
        assign(key, value, envir = cache)
    }

    cache_get <- function(key) {
        get(key, envir = cache, inherits = FALSE)
    }

    cache_has_key <- function(key) {
        exists(key, envir = cache, inherits = FALSE)
    }

    # Initialize the cache
    cache_reset()


    # This is the function that gets returned by the anonymous function and
    # becomes fibM.
    function(n) {

        nc <- as.character(n)

        # Handle "vectors" by element
        if (length(n) > 1) {
            return(sapply(n, fibM))
        }

        # Check to see if n is an integer Do not use is.integer as that is very
        # strict
        if (round(n, 0) != n) 
            return(NA)

        # Cached cases
        if (cache_has_key(nc)) 
            return(cache_get(nc))

        # Negative numbers
        if (n < 0) 
            return(fibM(-1 * n) * ((-1)^((n + 1)%%2)))

        # Everything else
        out <- fibM(n - 1) + fibM(n - 2)
        cache_set(nc, out)
        return(out)

    }

})()

This function demonstrates the use of memoization and using an anonymous function to create a closure. I wrote it this way so that the cache appears to be embedded in the function instead of leaving it in the Global Environment. This keeps it “safe” and promotes tidiness.

Every time you call fibM, whether at the top level or through recursion, it checks its cache to see if it has already solved this problem. If it has it just returns the result. If not, it solves it and stores the answer in the cache for later use.

Before it has been called the first time the cache is empty except for the “seed” values.

ls(environment(fibM)$cache)
## [1] '0' '1'

Call it once and see how fast it fills up.

fibM(30)
## [1] 832040
ls(environment(fibM)$cache)
##  [1] '0'  '1'  '10' '11' '12' '13' '14' '15' '16' '17' '18' '19' '2'  '20'
## [15] '21' '22' '23' '24' '25' '26' '27' '28' '29' '3'  '30' '4'  '5'  '6' 
## [29] '7'  '8'  '9'

Even on the first run this is faster than fib but that speed improvement intensifies once the cache starts to fill up.

To demonstrate, I’ll clear the cache first.

environment(fibM)$cache_reset()
ls(environment(fibM)$cache)
## [1] '0' '1'

Now for some speed tests.

nums <- -25:25

system.time(res.fib <- fib(nums))
##    user  system elapsed 
##    7.11    0.00    7.11
system.time(res.fibM.1 <- fibM(nums))
##    user  system elapsed 
##       0       0       0
system.time(res.fibM.2 <- fibM(nums))
##    user  system elapsed 
##       0       0       0

Let me know if you have any questions or comments on the techniques used here.

Using the Windows Clipboard, or Passing Data Quickly From Excel to R and Back Again

Two of my favorite functions are copy.table() and paste.table(). I’m going to turn this story on its head and give you the ending first.

# Copy data out of R
copy.table <- function(obj, size = 4096) {
  clip <- paste('clipboard-', size, sep = '')
  f <- file(description = clip, open = 'w')
  write.table(obj, f, row.names = FALSE, sep = '\t')
  close(f)  
}

# Paste data into R
paste.table <- function() {
  f <- file(description = 'clipboard', open = 'r')
  df <- read.table(f, sep = '\t', header = TRUE)
  close(f)
  return(df)
}

The first allows you to copy a data frame to the clipboard in a format that Excel knows how to deal with. All you have to do after running copy.table() is select the cell in Excel (or position in a text file) you want to paste to and press CNTL+V. It works on anything that can be coerced to a data frame, too, so vectors, table objects, etc. are all valid objects for copying.

# These all work
copy.table(1:100)
copy.table(letters)
copy.table(my.df)
copy.table(table(my.df$col1))
copy.table(matrix(1:20, nrow = 2))

If you get an error when trying to copy it is most likely because obj requires more than 4096 bytes to be represented as text. You can pass a larger number as the second argument to make it work. I’ve tried experimenting with estimating an upper bound on the size of an object but it hasn’t worked out yet. For now, if I can’t get something to copy I just double the second argument.

# If my.df is of moderate size
copy.table(my.df)

# If my.df is huge
copy.table(my.df, 10000)

Pasting works in a similar way. Select the range in Excel you want to copy (including the header) and press CNTL+C. Then run

other.df <- paste.table()

This one doesn’t take any arguments because it goes straight to the clipboard and pulls everything there.

I use both of these quite a bit when doing quick and dirty work where I need to have a more tactile, hands-on, “under the influence of Excel” workflow. They are interactive, so I don’t use them in the final production level code for any reproducible studies. But, for development and for quick stuff they are really helpful.

Have fun! Please forward on to someone else you know who uses R. Thanks!

Safe Loading of RData Files

Unless you have configured R not to ask, every time you close R or RStudio you are prompted to save your workspace. This saves an RData file to the working directory. The functions save.image() and save() offer a little more flexibility, but basically do the same thing: they save all or part of your current workspace to disk.

Let’s say last week I did some analysis on the built-in dataset called iris and I executed the following right before ending my R session

> ls()
[1] "fit1"    "iris"    "species"
> save.image('MyData.RData')

This saved the three objects in my global environment to a file called MyData.

Now I am ready to do a similar analysis on another data set about daisies. I load up the daisies data frame and create a unique list of all the species.

> ls()
[1] "daisies" "species"

I want to experiement with some models but I first want to take a look at what I did in the iris study, for reference. I load up the MyData file from the iris analysis using the following

> load('MyData.RData')
> ls()
[1] "daisies" "fit1"  "iris"  "species"

The problem with the default behavior of load() is that it does not allow me to load just one of the objects from the file but requires me to load all and throws them in my global environment. Sometimes, like here, this writes over objects that already exist in memory. My daisy species object got overwritten by the iris species object I had saved to disk.

This isn’t really a problem if you always give objects unique names or if you remember every object you have saved in every file, but really, who can possibly do that? There is another way to combat this and that is to not rely on load()‘s default behavior. The second parameter allows you to specify an environment other than the global environment in which to load the contents of an RData file. So, I could have, and should have, done this

  iris.env <- environment()
  load('MyData.RData', envir = iris.env)
  iris.fit1 <- iris.env$fit1

I’ve never really analyzed any iris or daisy data, but this illustrates what has happened to me on several different occasions when I need to compare the results from two separate analyses that have a similar structure and overlapping names for objects. I’ve written a convenience function to make this loading to an environment easier. My philosophy is that the only safe way to load data from an RData file is to load it to an environment, inspect that environment and then explicitly identify what it is I want in my global environment before putting it there. I never use the load() function directly any more and only ever use the following

  LoadToEnvironment <- function(RData, env = new.env()){
    load(RData, env)
    return(env) 
  }

If at some future point I wanted to compare the models from the iris and daisy analyses I would do the following

  iris.env <- LoadToEnvironment('iris.RData')
  daisy.env <- LoadToEnvironment('daisy.RData')
  iris.fit <- iris.env$fit1
  daisy.fit <- daisy.env$fit1
  # Compare iris.fit and daisy.fit

I wish you happy and safe coding. This is my inaugural post. My plan is to add a new post every few days so check back soon. Thanks!