I have recently been working on writing a package for R, and in the course of doing so wanted to run some fairly large simulations. I have a powerful multi-core processor on my desktop, but R doesn’t care about those extra cores. I wanted to run 5000 repetitions of my simulation function, but doing so takes a couple minutes, which is clearly unacceptable. The output of system.time on replicate(5000,sim(1000)) returns an elapsed time of around 400 seconds. This seems ridiculous. I try to write well behaved vectorized code (I mean, replicate should be trivial to multithread, right?), so there is no reason that we shouldn’t be able to massively parallelize this code.

This is indeed the case. The package we are interested in is “snow.” This is an acronym for “Simple Network of Workstations”. As the name implies, it was written for clustering, but we don’t have to use it like that. Let’s say I just want two “nodes” (which just correspond to threads, in this case). What do we need to do?

First, we should initiate the “cluster”, by simply creating two nodes on the local machine. We do that as follows (Windows may pop up a firewall warning. I’m not sure how this will work on other OSes):


There are other sorts of clusters, but SOCK clusters are easy to set up, so I’m just using that. Next, we need to set up the environment on each of these newly created nodes.  For my simulation, I need the sandwich and the lmtest packages attached, so I do that:

clusterEvalQ(cl,setwd("[Your Working Directory Path]"))
clusterEvalQ(cl,source("[Source File]"))

As you can see, I am calling a function of my own creation in the simulation, so I need to source that file to each of the new R environments I created. The final step is to just tell the cluster to get to work:


And its really that simple. There doesn’t seem to be a direct equivalent to sapply in the snow library, but this isn’t difficult to deal with. The parSapply function operates just like the normal sapply function in base R (and there are other similar functions which you can find in the snow documentation). Clearly, calling this function as I do will do exactly the same thing as replicate(5000,sim(1000)). stopCluster just stops the cluster and cleans up the connections.

Calling this through system.time calculates that the elapsed time is just a bit more than half (about 225 seconds) of the elapsed time of the single threaded version. This isn’t surprising, as the clustering method requires some amount of overhead, so we shouldn’t expect time to be exactly cut in half.

There may be a cleaner way to do this that feels like less of a hack, but I’m pretty pleased with this. It took me a very small amount of time to figure out and get working.