As of late I have been putting together some packages in R. There are a lot of great tools out there that make this easier (some of which, like Hadley’s devtools, I haven’t learned yet). I will provide a brief introduction to what I’ve been using so far to create packages. All of this is subject to change as I discover better tools (and fix any mistakes that may exist in this rundown).

The first thing to note are some of the broad strokes of what we need accomplished. As we know, R commands have handy documentation accessible via ?.  We need to create this documentation formatted in the proper way so that it has the same style as other R commands. We also need to ensure that all the proper dependencies are loaded. Likewise, we want to make sure that all the functions are accessible to the user that should be (and all those that shouldn’t be aren’t). We would like as much of this to be automatic as possible, too, and to allow for easy updating in the case of bugs or additions.

I’ll get into some of the zanier elements later, but to begin with, let’s talk setup. To begin with, you should have each function in a separate R file which corresponds to the name of the function. All of the files for the functions which you want to put into your new package should be in a new directory (and for simplicity, this should probably be a folder specifically for your package). I will assume that this is the case, and that your working directory is set (setwd) to this folder. The next thing we want to do is create the documentation for each function.

So this is the long and boring part where you write out all the documentation for each and every function that is accessible to the user of your package. One does this by using roxygen2, which is a wonderful little package. Essentially, R documentation exists in files with an .Rd extension. Rather than edit these by hand, we just throw in some markup to the header of each of our files, and roxygen handles the rest. It’s very nifty. The markup is very simple. I used a couple different sources of documentation on this markup. Primarily, I looked at the function documentation available here, a vignette available here, and from Hadley’s really wonderful book that is still in progress, available here. I will copy an example header from Hadley’s page for reference:

#' Order a data frame by its columns.
#'
#' This function completes the subsetting, transforming and ordering triad
#' with a function that works in a similar way to \code{\link{subset}} and 
#' \code{\link{transform}} but for reordering a data frame by its columns.
#' This saves a lot of typing!
#'
#' @param df data frame to reorder
#' @param ... expressions evaluated in the context of \code{df} and 
#'   then fed to \code{\link{order}}
#' @keywords manip
#' @export
#' @examples
#' mtcars[with(mtcars, order(cyl, disp)), ]
#' arrange(mtcars, cyl, disp)
#' arrange(mtcars, cyl, desc(disp))
arrange <- function(df, ...) {
  ord <- eval(substitute(order(...)), df, parent.frame())
  unrowname(df[ord, ])
}

That is, the first line is the title and the next little paragraph is a longer description. The param lines document the individual parameters of the function. The examples are working examples of how to make the function work. The export line should be included if you want the user to be able to use the function. If you don’t export it, the function will only be available for internal use. Some other possible tags include return, seealso, author, and various other options. The other option of note is include which will show that there exists a dependency on a particular package (or function from a package) in order for your function to work. This is obviously pretty important. While one could rig up some code for dependency checking, that is not the preferred mode of operation.

Assuming that the documentation is all written up properly, the next step is to create the package! One could do this all from inside the current R session, but I’ve found that this has some weird behavior associated with it, so we’ll instead do this with a batch script. If you’re on Linux, it’s very easy to cook up shell scripts that will do the job, but Windows presents a bit more trouble. I don’t particularly enjoy mucking around on the command line in Windows, so I wanted to create a simple batch file that wouldn’t require me to do so. I eventually worked out the following solution.

First off, we need to create a new R file. I called it roxy.R (for roxygen). This script will update our package’s source and documentation. It looks like the following:

require(methods)
require(utils)
require(roxygen2)
package.skeleton("mypackage",code_files=c("mypackage-package.R","function1.R","function2.R","function3.R"),force=TRUE)
roxygenize("mypackage",overwrite=TRUE,copy.package=FALSE,unlink.target=FALSE)

This for a package called “mypackage”. For some reason, it is apparently necessary to load the methods and utils packages which are present in base-R. When calling Rscript from the command line, they aren’t loaded properly for some reason. I should also point out that there is an additional file here called “mypackage-package.R” which you have not yet created. This is a dummy file to provide package-level documentation. Here is what you might want to put in this file:

#' Title of your package
#' 
#' Longer description of your package which might
#' even take multiple lines.
#'
#' @name mypackage-package
#' @aliases mypackage
#' @docType package
#' @author Your Name \email{Email@@website.com}
NULL

Note the NULL at the bottom of the file. This is because roxygen requires that there be something in the file that isn’t just a comment. This just creates a package level ?mypackage command (from the aliases line). You might want to include things like a listing of all the provided functions, and a general overview of the purpose of the package.

The next part is to actually create the batch script which will be building your package. First I will include the full script I’m using, and then I will explain what all the lines do.

set PKG_VER=0.1
set PKG_VER_NOTE=First version. Include as long a description as you like, although linebreaks do not do well.

set R_SCRIPT="C:\Program Files\R\R-2.15.0\bin\Rscript.exe"
set R_TERM="C:\Program Files\R\R-2.15.0\bin\R.exe"
rm -rf mypackage/R mypackage/man
%R_SCRIPT% roxy.R

sed -i 's/\(Title: \).*/\1 Package Title/g' mypackage/DESCRIPTION
sed -i 's/\(Version: \).*/\1 %PKG_VER%/g' mypackage/DESCRIPTION
sed -i 's/\(Description:\).*/\1 Package description with lines broken up by \\n after which you should include a space./g' mypackage/DESCRIPTION
sed -i 's/\(Author:\).*/\1 Your Name/g' mypackage/DESCRIPTION
sed -i 's/\(Maintainer:\).*/\1 Your Name ^<email@website.com^>/g' mypackage/DESCRIPTION
sed -i 's/\(License:\).*/\1 License name (^>=2.0) (See LICENSE)/g' mypackage/DESCRIPTION
sed -i 's/\(Collate:\).*/Depends: R (^>= 2.15.0), dependency1, dependency2\\n\1/g' mypackage/DESCRIPTION

echo --------------- >> mypackage\VERSION
echo Version: %PKG_VER% >> mypackage\VERSION
echo ----- >> mypackage\VERSION
echo Changes: %PKG_VER_NOTE% >> mypackage\VERSION
echo --------------- >> mypackage\VERSION

%R_TERM% CMD INSTALL --build mypackage
PAUSE

So there it is. Save this to a file with the extension .bat. For all of this to work, you need Rtools installed, as well as cygwin. This gives us more power on the command line by making UNIX-like commands work (sed, for instance). You also need these to be in your PATH. I’m not sure if these installations will automatically do this. If not, here is a pretty good explanation of how to set these variables for different versions of Windows (it’s specifically for Java, but you just need to add the appropriate directories). You need cygwin in your PATH, as well as Rtools. So you would just add “;c:/Rtools;c:/Rtools/bin” or whatnot on to the end of the string.

The first four lines just set some variables for later use. the first two are version specific. Here you can edit what the new version number of your package is, along with a brief description of changes. This will automatically be written to a changelog stored in VERSION (as seen in the lines towards the bottom that start with echo.  The other two variables should be changed to reflect the location of your R installation. I think I installed R with the automatic location settings, so you may not need to change this. You will, however, need to update this when you update your R version.

The rm bit removes the old versions of the documentation and source from the directory that the package will be built from. This seems like it shouldn’t be necessary due to the “force” switch in package.skeleton (in roxy.R), but it seems to be needed in my case, at least. The line that begins with %R_SCRIPT% simply runs the R script we created earlier to create and roxygenize our package.

All of the lines that begin with sed edit the DESCRIPTION file. I won’t go into exactly how sed works, but essentially, each of these lines searches for a line with some pattern, and then replaces that line with the pattern plus the bit you add onto the end. Because we aren’t in a god-loving UNIX-like shell, we have to add some additional quirky things in. For instance, if we want to print a greater than or less than symbol, we need to include a caret (^) first to escape it. These lines aren’t strictly necessary if you just want to edit the DESCRIPTION file yourself. What I suggest you do is to eventually just write out a nice DESCRIPTION file, copy it into your working directory, and then add in a line to your batch file “cp DESCRIPTION mypackage/DESCRIPTION”. You can then remove all of the sed lines except for the version number update.

Most of the lines changed in the DESCRIPTION file are pretty self explanatory, but the License line might merit a little more expansion. Choosing a license is one of these things that a lot of people get really worked up about (RMS, for instance). I’m not really sure how to approach it in my own case. I think that when I eventually publish my package(s), I will probably either go with the ubiquitous GPL, or the Apache license. I’m rather partial to Apache licensing insomuch as it doesn’t include all of the copyleft provisions in GPL (and especially GPLv3). I’m not really interested in dealing with all of that, so something like Apache (which seems to be a very agnostic sort of open source licensing) seems to make a lot of sense. On the other hand, though, most R packages seem to use GPL, so there may not be much of a reason to buck the trend.

The line that begins with %R_TERM% packages up our package and saves it as mypackage_%PKG_VER%.zip. The final line pauses the batch file so that the DOS prompt stays on the screen so that we can inspect the output to ensure that there were no errors.

And you’ve done it! Your package should have been generated correctly, with all the proper documentation and whatnot! You can then install it to test with install.packages(“mypackage_0.1.zip”,repos=NULL) for whatever version number you are on.

So that’s the development workflow I have put together. Feel free to use whatever makes sense to you. I didn’t really see a very good centralized repository of an entire development workflow, so I’m putting my own out into the aether.

(EDIT: I’ve made vast changes in my workflow through the use of Hadley’s fantastic devtools package. It’s still not completely obvious how to manage the whole workflow, so I’ll be making a new post on the subject soon.)