While working on a panel matching project, I came upon a portion of my code that was taking inordinate amounts of time. This post is about a pretty simple change that resulted in, for me, a gigantic performance increase. Initially, my code took about 100 minutes to run this one small section of code. This one simple change resulted in a runtime of around 100 seconds. Here’s basically what the code looked like:

lagit<-function(x) {
  dyadid<-x$dyadid[1]
  year<-x$year
  var<-ts(x$var, start=x$year[1])
  Lag1var<-lag(var,-1)
  Lag2var<-lag(var,-2)
  Lag3var<-lag(var,-3)
  Lead1var<-lag(var,1)
  Lead2var<-lag(var,2)
  out<-data.frame(cbind(dyadid,year,var,Lag1var,Lag2var,Lag3var,Lead1var,Lead2var))
  return(out)
}
d<-do.call("rbind",by(dat,dat$dyadid,lagit))

On its face, this code looks like it makes sense. I want to create a bunch of lagged variables in panel data, so I create the appropriate data frame for each dyad’s observations, and then combine them. Unfortunately, despite the many strengths of dataframes, they have a lot of overhead. If we simply move the data.frame to the outside of the “loop” (such as it is), then we make the massive performance gains. In other words, we combine all the panels and then create the data frame. This sort of makes more sense in the first place, so I’m not sure why I initially wrote it the other way. The following code is all that I changed to get the large speed increase I discussed at the beginning of this post:

lagit<-function(x) {
  dyadid<-x$dyadid[1]
  year<-x$year
  var<-ts(x$var, start=x$year[1])
  Lag1var<-lag(var,-1)
  Lag2var<-lag(var,-2)
  Lag3var<-lag(var,-3)
  Lead1var<-lag(var,1)
  Lead2var<-lag(var,2)
  out<-cbind(dyadid,year,var,Lag1var,Lag2var,Lag3var,Lead1var,Lead2var)
  return(out)
}
d<-data.frame(do.call("rbind",by(dat,dat$dyadid,lagit)))

It wasn’t surprising that this sped up the code, but the_ extent_ to which it sped up my code was very startling. So my advice is to always be sure you mess with data frames as little as possible. They create a lot of overhead if you’re constantly converting things back and forth.