Installing your own R packages on the cluster.
Introduction
Please also refer to the FAQ on installing popular R packages here: http://otrs.pik-potsdam.de/otrs/customer.pl?Action=CustomerFAQZoom;ItemID=177
In most cases, installing your own R packages on the cluster is straightforward. Typically, within R install.packages() works fine, or from the command line, R CMD INSTALL <package archive="">.
You may, however, come across a package that depends on a system library which we have installed in a non-standard location.
This guide will walk through the installation of a couple of such packages. The intention is not to provide a foolproof guide for installing a particular package, or set of packages, but to show typical errors that you might see, how to start to interpret them and modify the installationto correct them.
Example 1 - package "rgeos"
We'll start by attempting to install the R package "rgeos". This is an interface to the `"Geometry Engine - Open Source" library <https: geos="" trac.osgeo.org=""></https:>`__, a C++ library which we have installed in the cluster system libraries directory (/p/system/) and made available as a module (geos/3.6.1)
First, we load a recent R module:
# our optimised R was built with the Intel compilers, so we load this first: module load intel/2018.1 # now load the version of R we need: module load R/3.4.4
Before going any further, check that you only have these two modules loaded:
module list Currently Loaded Modulefiles: 1) intel/2018.1 2) R/3.4.4
First error: a missing library
Now we start R and use install.packages():
> install.packages("rgeos") Installing package into '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4' (as 'lib' is unspecified) trying URL 'https://ftp.gwdg.de/pub/misc/cran/src/contrib/rgeos_0.3-28.tar.gz' Content type 'application/octet-stream' length 252833 bytes (246 KB) ================================================== downloaded 246 KB * installing *source* package 'rgeos' ... ** package 'rgeos' successfully unpacked and MD5 sums checked configure: CC: icc configure: CXX: icpc configure: rgeos: 0.3-28 checking for /usr/bin/svnversion... yes configure: svn revision: 572 checking for geos-config... no no configure: error: geos-config not found or not executable. ERROR: configuration failed for package 'rgeos' * removing '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/rgeos' * restoring previous '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/rgeos' The downloaded source packages are in '/p/tmp/R_tmp/RtmpkJJdOQ/downloaded_packages' Warning message: In install.packages("rgeos") : installation of package 'rgeos' had non-zero exit status
We see that the installation failed, (non-zero exit status warning at the bottom). Working backwards, we see ERROR: configuration failed for package 'rgeos', and the line before that gives us a precise error: configure: error: geos-config not found or not executable.
The R installation doesn't know where to find the components of the base GEOS library, in particular the program which can inform it about the setup (geos-config). We know that GEOS is installed via a module, which should also set this path up correctly.
So, let's quit R, and check:
module avail geos geos/3.3.3 geos/3.5.0 geos/3.6.1
There's a couple, so we'll pick the latest. Does it provide geos-config?
module load geos/3.6.1 which geos-config /p/system/packages/geos/3.6.1/bin/geos-config
Yes! So we load the GEOS module before starting R again.
Now, module list will show three loaded modules:
module list Currently Loaded Modulefiles: 1) intel/2018.1 2) R/3.4.4 3) geos/3.6.1
Success!
Now, let's start R again, and run install.packages('rgeos')
* installing *source* package 'rgeos' ... ** package 'rgeos' successfully unpacked and MD5 sums checked configure: CC: icc -std=c99 configure: CXX: icpc configure: rgeos: 0.3-28 checking for /usr/bin/svnversion... yes configure: svn revision: 572 checking for geos-config... /p/system/packages/geos/3.6.1/bin/geos-config checking geos-config usability... yes configure: GEOS version: 3.6.1 checking geos version at least 3.2.0... yes checking geos-config clibs... yes checking geos_c.h presence and usability... yes checking geos: linking with libgeos_c... yes configure: PKG_CPPFLAGS: -I/p/system/packages/geos/3.6.1/include configure: PKG_LIBS: -L/p/system/packages/geos/3.6.1/lib -lgeos -L/p/system/packages/geos/3.6.1/lib -lgeos_c configure: creating ./config.status config.status: creating src/Makevars ** libs icpc -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -daal=parallel -qopenmp -xCORE-AVX2 -fPIC -c dummy.cc -o dummy.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c init.c -o init.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c local_stubs.c -o local_stubs.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos.c -o rgeos.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_R2geos.c -o rgeos_R2geos.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_R2geosMP.c -o rgeos_R2geosMP.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_bbox.c -o rgeos_bbox.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_buffer.c -o rgeos_buffer.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_coord.c -o rgeos_coord.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_geos2R.c -o rgeos_geos2R.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_linearref.c -o rgeos_linearref.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_misc.c -o rgeos_misc.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_poly2nb.c -o rgeos_poly2nb.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_predicate_binary.c -o rgeos_predicate_binary.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_predicate_unary.c -o rgeos_predicate_unary.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_topology.c -o rgeos_topology.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_topology_binary.c -o rgeos_topology_binary.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_validate.c -o rgeos_validate.o icc -std=c99 -I/p/system/packages/R/3.4.4/lib64/R/include -DNDEBUG -I/p/system/packages/geos/3.6.1/include -I"/p/system/packages/R/3.4.4/lib64/R/library/sp/include" -I/usr/local/include -fpic -O3 -ipo -mkl -qopenmp -xCORE-AVX2 -fPIC -c rgeos_wkt.c -o rgeos_wkt.o icpc -shared -L/p/system/packages/R/3.4.4/lib64/R/lib -qopenmp -o rgeos.so dummy.o init.o local_stubs.o rgeos.o rgeos_R2geos.o rgeos_R2geosMP.o rgeos_bbox.o rgeos_buffer.o rgeos_coord.o rgeos_geos2R.o rgeos_linearref.o rgeos_misc.o rgeos_poly2nb.o rgeos_predicate_binary.o rgeos_predicate_unary.o rgeos_topology.o rgeos_topology_binary.o rgeos_validate.o rgeos_wkt.o -L/p/system/packages/geos/3.6.1/lib -lgeos -L/p/system/packages/geos/3.6.1/lib -lgeos_c -L/p/system/packages/R/3.4.4/lib64/R/lib -lR installing to /home/linstead/R/x86_64-pc-linux-gnu-library/3.4/rgeos/libs ** R ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded * DONE (rgeos) The downloaded source packages are in '/p/tmp/R_tmp/RtmpoxQvcB/downloaded_packages'
We see no errors or warnings, and near the beginning, we see our change to the compiler options taking effect: configure: CC: icc -std=c99
Example 2 - missing libraries, but no helper config program
In the rgeos example, the R package needs to know where to find the GEOS (C++) library. It used a handy tool provided by GEOS, geos-config, to get these settings. But not every library comes with such a tool. In this example, we'll install a package that needs to be told explicitly where to find a library.
install.packages("udunits2")
Let's try to install the R package udunits2. This package provides an interface in R to the udunits library, a C library for the manipulation and conversion of units of physical quantities. The latest version is installed on the cluster in /p/system/packages/udunits/2.2.26/ (and available via module udunits/2.2.26). First we'll try a simple install.packages('udunits2') and see what happens.
* installing *source* package 'udunits2' ... ** package 'udunits2' successfully unpacked and MD5 sums checked checking for gcc... icc -std=c99 checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether icc -std=c99 accepts -g... yes checking for icc -std=c99 option to accept ISO C89... none needed checking for XML_ParserCreate in -lexpat... yes checking how to run the C preprocessor... icc -std=c99 -E checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking udunits2.h usability... no checking udunits2.h presence... no checking for udunits2.h... no checking for ut_read_xml in -ludunits2... no -----Error: libudunits2.a not found----- If the udunits2 library is installed in a non-standard location, use --configure-args='--with-udunits2-lib=/usr/local/lib' for example, or --configure-args='--with-udunits2-include=/usr/include/udunits2' replacing paths with appropriate values for your installation. You can alternatively use the UDUNITS2_INCLUDE and UDUNITS2_LIB environment variables. If udunits2 is not installed, please install it. It is required for this package. ERROR: configuration failed for package 'udunits2' * removing '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/udunits2' * restoring previous '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/udunits2' The downloaded source packages are in '/p/tmp/R_tmp/RtmpZSRurP/downloaded_packages' Warning message: In install.packages("udunits2") : installation of package 'udunits2' had non-zero exit status
There's a lot of information here, but it tells us fairly explicitly what the problem is (Error: libudunits2.a not found),and indeed, /p/system isn't a standard location.
Simply loading the udunits/2.2.26 module won't be enough though - we need to tell R directly where to find the library (the error message tells us how!).
The method suggested (--configure-args='--with-udunits2-lib=/usr/local/lib')applies for installations done directly from source (i.e. if you download the source code, unpack it, configure and make it).
We can pass these options to install.packages inside R though, which is more convenient. (Please read the documentation for install.packageshere).
We first load the UDUNITS module (module load udunits/2.2.26), then:
install.packages("udunits2", configure.args='--with-udunits2-lib=/p/system/packages/udunits/2.2.26/lib --with-udunits2-include=/p/system/packages/udunits/2.2.26/include')
Notice that we passed the recommended configuration options via the configure.args parameter to install.packages.
How do we know what the correct paths are? By looking at the udunits module:
module show udunits/2.2.26 ------------------------------------------------------------------- /p/system/modulefiles/tools/udunits/2.2.26: module-whatis Enable usage for udunits version 2.2.26 setenv UDUNITSROOT /p/system/packages/udunits/2.2.26 setenv UDUNITS2_LIB /p/system/packages/udunits/2.2.26/lib setenv UDUNITS2_INCLUDE /p/system/packages/udunits/2.2.26/include module load intel/2018.1 module load compiler/gnu/7.3.0 module load expat bison prepend-path PATH /p/system/packages/udunits/2.2.26/bin prepend-path INCLUDE /p/system/packages/udunits/2.2.26/include prepend-path LD_LIBRARY_PATH /p/system/packages/udunits/2.2.26/lib prepend-path MANPATH /p/system/packages/udunits/2.2.26/share/man -------------------------------------------------------------------
We see a few things here, but the INCLUDE and LD_LIBRARY_PATH parts tell us where the components are.
Example 3: Rmpi
Rmpi is an R interface to the MPI library for writing parallel applications. On the cluster, the supported MPI libraries are those bundled withthe Intel Cluster tools, most recently the intel/2018.1 and intel/2018.3 modules.
Let's try a simple installation (install.packages("Rmpi"))
* installing *source* package 'Rmpi' ... ** package 'Rmpi' successfully unpacked and MD5 sums checked checking for gcc... icc -std=c99 checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether icc -std=c99 accepts -g... yes checking for icc -std=c99 option to accept ISO C89... none needed checking for pkg-config... /usr/bin/pkg-config checking if pkg-config knows about OpenMPI... no checking how to run the C preprocessor... icc -std=c99 -E checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking mpi.h usability... no checking mpi.h presence... no checking for mpi.h... no configure: error: "Cannot find mpi.h header file" ERROR: configuration failed for package 'Rmpi' * removing '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/Rmpi' * restoring previous '/home/linstead/R/x86_64-pc-linux-gnu-library/3.4/Rmpi' The downloaded source packages are in '/p/tmp/R_tmp/RtmpLx7H93/downloaded_packages' Warning message: In install.packages("Rmpi") : installation of package 'Rmpi' had non-zero exit status
As expected, it fails :)
The error:
checking for mpi.h... no configure: error: "Cannot find mpi.h header file"
gives us a hint that the package installer can't find components of the MPI library.
But this installer isn't as helpful as udunits2 above, which gave us a suggestion about what to try next. Where do we go from here? A good starting point is always to head for the documentation for the package in question: Rmpi documentation
But sadly, not all documentation is complete or up to date! For this package (a fairly important R package in high-performance computing circles), the linked PDF has no installation instructions, and the linked README contains a URL which no longer exists.
So we must dig further.
Examining the source code of an R package.
The CRAN page for the package contains a link to the source code, so we'll fetch this, unpack it, at look at the options for configuring it.
# fetch the code from the CRAN link "Package source": wget https://cran.r-project.org/src/contrib/Rmpi_0.6-7.tar.gz # unpack it: tar xvzf Rmpi_0.6-7.tar.gz cd Rmpi # look at the options for configuration: ./configure --help . . # some interesting looking parameters: Optional Packages: --with-Rmpi-include=INCLUDE_PATH location of MPI header files --with-Rmpi-libpath=LIB_PATH location of MPI library files --with-Rmpi-type=MPI_TYPE the type of MPI: OPENMPI,LAM,MPICH,MPICH2, or CRAY --with-mpi=LIB_PATH location of top-level MPI directory
Strangely, these options are listed under "Optional packages", though MPI seems to be mandatory (as one would expect for an interface to an MPI library!).
So, after looking into the module file for the Intel tools, and the directories it points to, we can construct the correct installer command.
install.packages("Rmpi", configure.args="--with-Rmpi-include=/p/system/packages/intel/parallel_studio_xe_2018_update1/compilers_and_libraries/linux/mpi/include64/ --with-Rmpi-libpath=/p/system/packages/intel/parallel_studio_xe_2018_update1/compilers_and_libraries/linux/mpi/lib64/ --with-Rmpi-type=OPENMPI")
Note that we set MPI-type to OPENMPI, not INTEL. The configuration options don't indicate that Intel is available, but OpenMPI is a close possibility.
An aside: testing MPI code on a login node
To load/test this module on a login node, you'll need to set:
export I_MPI_FABRICS=shm:shm
before starting R (Do not set this for jobs submitted via SLURM though.)
Example 4: rgdal
The following modules need to be loaded (with "module load").
- an R version (e.g. R/3.4.4 or R/3.3.2)
- GDAL (e.g. gdal/2.2.4)
- PROJ4 (e.g. proj4/5.0.1)
Then, in R:
install.packages("rgdal", configure.args=c('--with-proj-include=/p/system/packages/proj4/5.0.1/include','--with-proj-lib=/p/system/packages/proj4/5.0.1/lib'))
Note that the path to the PROJ4 library must match the version loaded. Use "module show" to find the correct path it you're not using version 5.0.1.
The paths to the GDAL libraries are found automatically at install time via the "gdal-config" tool, once a gdal module is loaded.
Other combinations of versions of R, PROJ4, and GDAL may also be possible, but the above versions are known to install correctly.
Conclusions.
Installing R packages can sometimes be quite tricky, and require special knowledge about underlying tools and libraries. It's rarely the case that a package cannot be installed at all.
Here are some general guidelines, based on our experience with this problem:
- Read the error messages carefully. Sometimes the error messages tell us directly what is missing, expected or what to try next.
- Ask a colleague. At PIK, very many R modules are commonly used by many scientists. You may find that a colleague has experience with the problem you're having.
- Read the package documentation. Though not always useful, it may give you a hint about configuration options, without having to dig into the sources.
- Dig into the sources. Have a look at what options the configure script in the source code accepts. These will often give you a hint about what to try, especially if you have an explicit error.
- Ask IT services (cluster-support@pik-potsdam.de). We have some experience of a few problematic modules. However, given the huge range of packages available for R, and not being professional R programmers ourselves, it may take us a long time to be able to investigate a particular error message. We can certainly advise on which modules are available, and can install third-party libraries on request.
Document converted to ReStructured Text (.rst) with pandoc -f markdown -t rst README.md > README.rst</https:></https:></package>