Once I was familiar with R, I realized it is great for statistics and has a huge repository of all sorts of packages about different statistical methods. However one thing it is not good at is computing, particularly nested ‘for’ loops. A very easy way around this problem is to use Rcpp and RcppArmadillo. These tools can boost the loop computations by 50-80 times. Meanwhile it provides the flexibility to use familiar data constructors like matrices, arrays or lists (called as fields in Armadillo terminology). If you are new to Rcpp, check out Advanced R tutorial by Hadley Wickham to know more about the motivations behind using such tools.
I always prefer ‘RcppArmadillo’ over ‘Rcpp’ due to ease of use and a good documentation. Normally Rcpp(Armadillo) code can easily be imported in R by using ‘sourceCpp’. However if you wish to parallelize your code using ‘foreach’ or ‘OpenMP’ packages, it is necessary to load Rcpp(Armadillo) code as an R package. I struggled figuring this out and this post is primarily intended for those who want to adventure on similar path. Often the goals while modeling is to tune for multiple parameters that captures the innovation of the model, viz. finding the optimum simulation settings. With modern computer infrastructure, having 4 or more cores on personal computers has become very common. With these requirements and resources, it is very natural to combine the two, by parallelizing Rcpp(Armadillo) code over several cores. Here I have compiled easy to follow steps to create a R package that imports Rcpp(Armadillo) code on Windows 10 OS:
- Update to latest versions of R and R Studio.
- Install Rtools. This is most crucial step and the chances to mess this up can be very high given the general tendency to press ‘next’ while installing anything.
- While installing Rtools, a dialog box appears prompting for the default folder change. Delete the by default path “C:\RBuildTools\3.4” and enter new path as “C:\Rtools”
- While selecting the path (one of the dialog box during the installation), tick the checkbox that says edit the path.
- Go to Control Panel — System and security — System — Advanced system settings — Environment Variables — Path — Edit.
- Add the path to Rcmd program. If you installed the latest version of R on a 64 bit machine in a default way, this path should be “C:\Program Files\R\R-3.3.2\bin\x64”
- There could be multiple paths before and you can add this one just putting a semicolon to separate it from the last one.
- Now go back to R and install Rcpp, RcppArmadillo packages. Load these packages and use RcppArmadillo.package.skeleton(“XYZ”) to create a skeleton package at some “XYZ” location. If you are not using Armadillo, you may just use Rcpp.package.skeleton(“XYZ”).
- Open the XYZ directory, go to src and modify the cpp file. You may add several functions in the cpp file or even add multiple cpp files.
- Go back to R and compile the attributes of your cpp code using compileAttributes(“XYZ”).
- If you are not mentioning arma or std in front of your respective containers, you must add “using namespace arma” and “using namespace std” in all your cpp files and the attribute file “RcppExports.cpp”.
- You may now build the package via command line. Go to the directory containing “XYZ” and say R CMD build XYZ. This should create a tar file. Install this tar file using R CMD INSTALL XYZ_1.0.tar.gz.
- That’s it, you are done! You may now load the “XYZ” package inside some wrapper function and call that wrapper function inside your foreach loop.