R Package Development

rstats tutorial

There are many situations where it makes sense to develop your software project as an R project. R package development makes organization, documentation, collaboration, and publication of code easier. This post provides a short primer with example code on R package development using the devtools package.

Jakob Napiontek https://www.pik-potsdam.de/members/napiontek/homepage (Social Metabolism & Impacts, Potsdam Institute for Climate Impact Research)https://www.pik-potsdam.de
12-21-2020

Why do this?

When engaging in statistical analysis using R, there are many problems with the intuitive practice of compiling numerous .R files and scattering fragments of code around various .Rmd files. Mainly in terms of organization, documentation, collaboration, and publication. Specifically:

Packaging functions in R addresses those problems allowing for reproducible research that allows replicating results by you in three month time, your collaborators and anyone else interested in your research.

How to do it?

The most convenient way to build your first R package is using RStudio. If you are not using RStudio for your analysis in R already anyway, you can easily download it for free here. Once you have RStudio installed and made sure you are running the latest version of R (we are using 4.0.5 for this tutorial), we need to install a few packages:

install.packages(c("devtools", "roxygen2", "testthat", "knitr","usethis"))

Those will be a massive help along the way and we are going to explain further what they do once we get to use them in our code.

Set up

The easiest way to start a new R package in RStudio is to click the “File” menu button and select “New Project” then click “New Directory” to get to this selection:

Choose “R Package using devtools” and enter a name for your package and a suitable location for the corresponding package directory.

The devtools package will automatically fill our directory with two files:

Furthermore, folder R/ was created, which will contain all our functions.

Alternatively you can achieve the same result using this function from the usethis package:

usethis::create_package("/path/to/package")

Functions

Each function is written inside an .R file with the same name as the function. You can now copy your own functions into .R files or follow along with this example. Let’s say we want a function calculating the weighted mean of a data set. We would first create a file R/weighted_mean.R and then fill it with the following:

weighted_mean <- function(data, weight) {
    result <- sum(weight*data) / sum(weight)
    return(result)
}

The function is supplied by numeric vectors for the data and weight and returns a number for the weighted mean of our data set. If we want a graphical output instead we could plot the data points and their weight and add the weighted mean as a vertical line. The following function would accomplish that:

The weighted_mean() function only used the sum() function that is already included in R, but if we plot the data we need a graphics package like ggplot2. In package development, we don’t import packages with the usual library() function and instead install them alongside our package once the user installs it. Therefor we add a line in our DESCRIPTION file that imports all packages we depend on:

Imports:
    ggplot2

The easiest way to add Imports is to run the function usethis::use_package(). It automatically adds them to your DESCRIPTION file. When using functions from other packages inside your code remember to always add the package you are referring to before the function with ::. After running usethis::use_package("ggplot2") your code should look like this:

weighted_graph <- function(data, weight){
  result <- sum (data*weight) / sum(weight)
  plot <- ggplot2::ggplot(data.frame(data,weight), aes(x = data, y = weight)) +
    geom_point() +
    geom_vline(xintercept = result, linetype = "dashed") +
    ggtitle(result) +
    theme_bw()
  return(plot)
}

All this would implement the function in the package, but to “expose” the function to the users we need to add #' @export at the top of our file:

#' @export

usethis::use_package("ggplot2")

weighted_graph <- function(data, weight){
  result <- sum (data*weight) / sum(weight)
  plot <- ggplot2::ggplot(data.frame(data,weight), aes(x = data, y = weight)) +
    geom_point() +
    geom_vline(xintercept = result, linetype = "dashed") +
    ggtitle(result) +
    theme_bw()
  return(plot)
}

If you want to use the pipe operator %>% in your functions you need to import the pipe as well. The best way to do this is to use:

usethis::use_pipe(export = TRUE)

This function will create a R/utils_pipe.R file and expose the pipe to your users to use.

Further reading

If you need a package beyond a specific version, you can add a min_version parameter to your usethis::use_package("ggplot2", min_version = "3.3.2") function. Always try to specify minimal versions instead of exact versions as not to disrupt compatibility since R can not have multiple versions of the same package installed.

If another package just improves your functions but is not needed for base functionality you can suggest it instead of importing it using the type = "Suggests" parameter. This doesn’t automatically install the package and you have to check for it using requireNamespace("packagename", quietly = TRUE) before using it. Use this to avoid an unnecessary long list of dependencies.

Documentation

The documentation of each installed function in R is accessible using ?function_name. To achieve the same for all our functions we make use of the roxygen2 package installed earlier. It provides us with an easy way to write documentation at the top of our functions .R file. Similar to the #' @export defined above we now add a title, description and specify the parameters and return. The definitions are preceded by #' to tell them apart from regular comments.

#' Graphical representation of weighted data set
#'
#' This function calculates the weighted mean and plots it alongside data and
#' weight. It requires numeric vectors for values and weights and returns a
#' graphic.
#'
#' @param data value vector
#' @param weight weights vector
#' @return Graph for data and weight with weighted mean.
#' @export

weighted_graph <- function(data, weight){
  result <- sum (data*weight) / sum(weight)
  plot <- ggplot2::ggplot(data.frame(data,weight), aes(x = data, y = weight)) +
    geom_point() +
    geom_vline(xintercept = result, linetype = "dashed") +
    ggtitle(result) +
    theme_bw()
  return(plot)
}

In the end we just run devtools::document() and a documentation file is automatically created in man/weighted_mean.Rd that gets displayed once any user types ?weighted_mean.

Data

Most of the packages you write will need to include some data. Either to deliver a basic use case within the package or because it is needed for the performed analysis (e.g., census data). The straight forward way to do this is inside a data/ folder. It is best practice to create individual .RData files to store this data. The easiest way to provide the data to your users this way is by using usethis::use_data():

x <- sample(100)
y <- sample(1000)

usethis::use_data(x,y)

To avoid having to load all included data into memory when importing your package it is recommended to add LazyData: true to your DESCRIPTION file. This will only load datasets once they are actually needed and save your user memory. If you created the package using the “New Project Wizard” of RStudio and selected “R Package using devtools” or used usethis::create_package() this option is activated automatically in your DESCRIPTION

Documenting your data works similar to documenting a function. You create a .R file in the R/ folder and write the description, type and source of your data behind #' inside, afterwards you call the data set you want to document as follows:

#' Sample of 100 data points between 1 and 100
#'
#' A data set to be used as example or value or weight
#'
#' @format numerical vector of length 100
#'
#' @source sample(100)
"x"

Testing

To see if the functions you wrote actually work as promised you can use the devtools::load_all() functions to temporarily lead your functions into memory and experiment with them. load_all() is faster than fully installing the package and will be removed once you close your R session. If you are content with your experiments you can move on to a more thorough check of all contents and connections in your package. The gold standard for checking your package in R is devtools::check(). This will provide you with a rather lengthy output but at the very end, you will get a quick summary of your errors, warnings, and notes. Try to at least address the errors as soon as they arise. Try to check your code using devtools::check() as often as possible to discover problems early and individually. The more functions you write between checks the harder it becomes to isolate the error in your code.

To check if your package not only runs and is correctly formatted, but actually produces the correct result you can use unit tests. They let you check you functions behavior against expected results. To implement the testing folder we call usethis::use_testthat() and write our first test calling usethis::use_test(). This will open an example test file that looks like that:

test_that("multiplication works", {
  expect_equal(2 * 2, 4)
})

You can change out the calculation 2 * 2 for any operation using your function and the result 4 for your expected result. While working on your package you can use devtools::test() periodically to run all available test and see if everything still works as expected. You can also check the type of result using expect_type() or its length expect_length(), you can even check for expected errors using expect_error()

Deployment

Once you have run devtools::document() and devtools::check() and confirmed that your functions and data are set we can almost deploy your package.

But first we have to think about licensing!

“Pick a Licence, Any License” Jeff Atwood

To define how other may use, edit and distribute your published code it is important for you to decide under which license to publish it. You need to make that decision now before publishing to give others the ability to use your code. Even if you do not intend to make any copyright claims, no experienced researcher will touch your unlicensed code because they have not been given the rights to use it.

There is a plethora of licenses available to you, but I suggest the short and simple MIT license for our example package. It’s the most popular open source license and only requires the preservation of the license and copyright notice. It permits private and commercial use, modification and redistribution under different terms. You can easily implement in your package with usethis::use_mit_license("Your Name") This will add a preconfigured LICENSE.md file to your package directory.

The standard used at PIK is: usethis::use_mit_license(copyright_holder = "Potsdam Institute for Climate Impact Research, Your Name")

With your package checked and licensed it is finally time to make it available to your audience. If you push it into a GitHub repository anyone can install it in R using:

devtools::install_github("username/repository")

If you followed along, your package should look like the one available at devtools::install_github("napiontek/example"). Feel free to compare it to your own and start defining your very own functions. If you have tasted blood and want to delve further into package development, you should check out this book.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Napiontek (2020, Dec. 21). FL Metab methods blog: R Package Development. Retrieved from http://www.pik-potsdam.de/~pichler/metab/blog/posts/2020-12-21-r-package-development/

BibTeX citation

@misc{napiontek2020r,
  author = {Napiontek, Jakob},
  title = {FL Metab methods blog: R Package Development},
  url = {http://www.pik-potsdam.de/~pichler/metab/blog/posts/2020-12-21-r-package-development/},
  year = {2020}
}