Andrew Shin, Market Risk Manager of Koch Supply & Trading, achieves significant performance increases on option pricing algorithms using Jacket to accelerate his MATLAB® code with GPUs.

Andrew says, “My buddy and I are, at best, novice programmers and we couldn’t imagine having to figure out how to code all this in CUDA.” But he found Jacket to be straight-forward. With these results, he says he can see Jacket and GPUs populating Koch’s mark-to-futures cube, which contains its assets, simulations, and simulated asset prices.

Modern option pricing techniques are often considered among the most mathematically complex of all applied areas of finance. Andrew shared some exemplary code to demonstrate how much leverage you can get out of Jacket and GPUs for financial computing in MATLAB® and C/C++.

All option pricing algorithms in this post use a Monte Carlo simulation. The `stockpath B`

version is the option pricing algorithm with a barrier option, and the `stockpath A`

version is without the barrier.

The execution times and speedups for these algorithms are shown in the charts below:

For `stockpath A`

, Jacket and ArrayFire reach** 40.5x** and **51.8x** speedups respectively over the baseline CPU MATLAB® code. For `stockpath B`

, Jacket and ArrayFire achieve **38.4x** and **47.5x** speedups respectively over the baseline CPU MATLAB® code.

System specs:

A single workstation with a 2.66 GHz Intel Core 2 Quad CPU and an 6GB NVIDIA Quadro 6000 GPU, with MATLAB® R2010b.

Below is example Jacket code of option pricers:

```
function P=stockpath_A(N, K, steps, t, r, vol, cls)
payoff = zeros(N, 1, cls);
dt = t / (steps - 1);
s = K * ones(N, 1, cls);
randmat = randn(N, steps - 1, cls);
d = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
S = [s, d];
S = cumprod(S, 2);
payoff = max(S(:, steps) - K, 0.0);
P = double(mean(payoff) * exp(-r * t));
end
function P=stockpath_B(N, K, steps, t, r, vol, cls)
payoff = zeros(N, 1, cls);
B = 1.15 * K;
dt = t / (steps - 1);
s = K * ones(N, 1, cls);
randmat = randn(N, steps - 1, cls);
d = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
S = [s, d];
S = cumprod(S, 2);
S(:, steps) = S(:, steps) .* all(S < B, 2);
payoff = max(S(:, steps) - K, 0.0);
P = double(mean(payoff) * exp(-r * t));
end
```

Where `K`

is strike price, `t`

is maturity, `r`

is risk free rate, `B`

is barrier and `vol`

is volatility. The `cls`

, which is either `'double'`

or `gdouble`

, enables us to run same pieces of code `stockpath_A`

and `stockpath_B`

both on CPU and GPU without actually changing the code. The `cls`

serves as a flag that lets the code know whether it is run on a CPU or a GPU. When `cls`

is set as `gdouble`

, matrix initialization, assignments, calculations including `cumprod`

, `mean`

and `all`

, are all executed in parallel on GPU and therefore you will get a good speedup.

Below is example ArrayFire code of option pricers:

```
static double stockpath_A(int N)
{
dtype pres = f64;
int steps = 180;
array payoff = zeros(N, 1, pres);
double K = 100.0;
double t = 0.5;
double dt = t / (double)(steps - 1);
double vol = .30;
double r = .01;
array s = 100 * ones(N, 1, pres);
array randmat = randn(N, steps - 1, pres);
randmat = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
array S = join(1, s, randmat);
S = accum(S, 1, PROD_T);
payoff = max(0.0, S.col(steps - 1) - K);
double P = mean<double>(payoff) * exp(-r * t);
return P;
}
static double stockpath_B(int N)
{
dtype pres = f64;
int steps = 180;
array payoff = zeros(N, 1, pres);
double K = 100.0;
double B = 115.0;
double t = 0.5;
double dt = t / (double)(steps - 1);
double vol = .30;
double r = .01;
array s = 100 * ones(N, 1, pres);
array randmat = randn(N, steps - 1, pres);
randmat = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
array S = join(1, s, randmat);
S = accum(S, 1, PROD_T);
S(span, end) *= all(S < B, 1);
payoff = max(0.0, S.col(steps - 1) - K);
double P = mean<double>(payoff) * exp(-r * t);
return P;
}
```

Where `array`

is matrix, `f64`

denotes all elements of arrays should be doubles, `join(1, s, randmat)`

joins matrix `s`

and `randmat`

as `[s, d]`

does, and `accum(S, 1, PROD_T)`

calculates cumulative product as `cumprod(S, 2)`

does. ArrayFire functions are straightforward to understand as their names indicate and of course they will be run on GPUs. The ArrayFire code is faster than Jacket because it is C/C++ code and it does not suffer from overheads introduced by MATLAB®.

From the above code and charts, you can easily see the simplicity of achieving great speedups. Grab a free Jacket or ArrayFire Pro 15-day trial today!

## Comments 2

Pingback: Walking Randomly » A Month of Math Software – August 2012

regarding cumulative product, S = accum(S, 1, PROD_T); where is the definition for PROD_T?