Extending R with Intel MKL

ID 标签 688500
已更新 11/1/2017
版本 Latest
公共

author-image

作者

Introduction

The Using Intel MKL with R article discusses building R with the Intel® Math Kernel Library (Intel® MKL) BLAS and LAPACK to improve the performance of those parts of R that rely on matrix computations.  Since the BLAS and LAPACK routines are a long-standing set of functions with a standard API, the R build scripts have a built-in facility to use these. Setting R to use Intel MKL can provide significant performance improvement on parts of R, but there are other functions in Intel MKL that can provide great performance as well. This article will give you a quick tutorial on how to extend R by writing a wrapper for a function in Intel MKL and call it from your R script. 

The wrapper

Let's begin by selecting a simple function in Intel MKL that is not a part of the BLAS or LAPACK. For this exercise we'll choose the vdPow function which takes two arrays, x and y, and creates a new array z whose nth element is x[n] to the y[n] power.

 #include "R.h" #include "Rdefines.h" #include "Rinternals.h" #include "mkl.h" SEXP mkl_vdpow(SEXP N, SEXP X, SEXP Y) {   SEXP Z;   int *n;   int nn;   double *x, *y, *z;     /* Convert input arguments to C types */   PROTECT(N = AS_INTEGER(N));   PROTECT(X = AS_NUMERIC(X));   PROTECT(Y = AS_NUMERIC(Y));     n = INTEGER_POINTER(N);   x = NUMERIC_POINTER(X);   y = NUMERIC_POINTER(Y);     /* Allocate memory to store results */   nn = *n;   PROTECT(Z = NEW_NUMERIC(nn));   z = NUMERIC_POINTER(Z);     vdPow(nn, x, y, z);     UNPROTECT(4);     return Z; }

Building the wrapper

On Windows this is easily done by obtaining the Windows package and Rtools from CRAN. I downloaded R 2.15.2 and Rtools version 3.0.0.1927 and then created a file to set my environment as follows:

SET R_TOOLS_ROOT="C:\Program Files\R\Rtools"

	SET MINGW64_ROOT=%R_TOOLS_ROOT%\gcc-4.6.3

	SET MINGW64_CMPLR_ROOT=%MINGW64_ROOT%\i686-w64-mingw32

	SET PATH=%MINGW64_ROOT%\bin;%MINGW64_CMPLR_ROOT%\bin;%R_TOOLS_ROOT%\bin;%MINGW64_ROOT%\libexec\gcc\i686-w64-mingw32\4.6.3;%PATH%

	SET LIB=%MINGW64_CMPLR_ROOT%\lib64;%MINGW64_CMPLR_ROOT%\lib;%LIB%

 

I then build using the following commands:

SET R_HOME="C:\Program Files\R\R-2.15.2"

	SET MKL_ROOT="C:\Program Files (x86)\Intel\Composer XE 2013"

	c:\"program files"\r\rtools\gcc-4.6.3\bin\gcc -c -O2 -Wall  -std=gnu99 -I%R_HOME%\include -I%MKL_ROOT%\mkl\include pow_wrp.c -o pow_wrp.o

	c:\"program files"\r\rtools\gcc-4.6.3\bin\gcc -shared -Wl,-soname,pow_wrp.so -o pow_wrp.so pow_wrp.o %R_HOME%\bin\i386R.dll %MKL_ROOT%\mkl\lib\ia32\mkl_rt.lib %MKL_ROOT%\compiler\lib\intel64\libiomp5md.lib

 

Calling the function from R

Here then is a simple script that calls the function we've just built

 

dyn.load("c:/full_path_or_directory_on_path/pow_wrp.so")

mkl_pow <- function(n, x, y) .External("mkl_vdpow", n, x, y)

n <- 1000

	x <- runif(n, min=2, max=10)

	y <- runif(n, min=-2, max=-1)

z <- mkl_pow(n, x, y)

 

Performance

Let's see what sort of performance difference the use of Intel MKL can make by comparing this new power function with the simple means available within R. To do the similar operation in R we could use a loop to perform each scalar operation. The following is the result of running the interactively in the 32-bit RGui that comes with the R-2.15.2 pre-built Windows package on my laptop (Intel MKL 11.0 update 2, R-2.15.2, 2-core Intel® Core™ i5-540M processor (2.53GHz) 4GB RAM, Microsoft Windows 7):

 

> dyn.load("c:/a/collateral/kb/r/files/pow_wrp.so")

	> mkl_pow <- function(n, x, y) .Call("mkl_vdpow", n, x, y)

	> n <- 1000000

	> x <- runif(n, min=2, max=10)

	> y <- runif(n, min=-2, max=-1)

	> start <- proc.time()

	> z <- mkl_pow(n, x, y)

	> end1 <- proc.time() - start

	> end1

	user system elapsed

	0.03 0.02 0.05

	> i <- n

	> start <- proc.time()

	> repeat{ z[i] <- x[i]^y[i] i <- i - 1 if (i==0) break() }

	> end2 <- proc.time() - start

	> end2

	user system elapsed

	5.49 0.03 5.17 

 

For this very large, size the Intel MKL function makes a large difference. The loop approach takes 5.49 seconds while the Intel MKL function takes 0.03 seconds. For smaller problems the performance will be less, but for systems with higher core-counts, compute intensive functions can show even greater speed-up. 

Given the potential for large performance improvements, it may be worthwhile finding out the functions available in Intel MKL

 

"