Andrew Shin, Market Risk Manager of Koch Supply & Trading, achieves significant performance increases on option pricing algorithms using Jacket to accelerate his MATLAB® code with GPUs.
Andrew says, “My buddy and I are, at best, novice programmers and we couldn’t imagine having to figure out how to code all this in CUDA.” But he found Jacket to be straight-forward. With these results, he says he can see Jacket and GPUs populating Koch’s mark-to-futures cube, which contains its assets, simulations, and simulated asset prices.
Modern option pricing techniques are often considered among the most mathematically complex of all applied areas of finance. Andrew shared some exemplary code to demonstrate how much leverage you can get out of Jacket and GPUs for financial computing in MATLAB® and C/C++.
All option pricing algorithms in this post use a Monte Carlo simulation. The stockpath B
version is the option pricing algorithm with a barrier option, and the stockpath A
version is without the barrier.
The execution times and speedups for these algorithms are shown in the charts below:
For stockpath A
, Jacket and ArrayFire reach 40.5x and 51.8x speedups respectively over the baseline CPU MATLAB® code. For stockpath B
, Jacket and ArrayFire achieve 38.4x and 47.5x speedups respectively over the baseline CPU MATLAB® code.
System specs:
A single workstation with a 2.66 GHz Intel Core 2 Quad CPU and an 6GB NVIDIA Quadro 6000 GPU, with MATLAB® R2010b.
Below is example Jacket code of option pricers:
function P=stockpath_A(N, K, steps, t, r, vol, cls)
payoff = zeros(N, 1, cls);
dt = t / (steps - 1);
s = K * ones(N, 1, cls);
randmat = randn(N, steps - 1, cls);
d = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
S = [s, d];
S = cumprod(S, 2);
payoff = max(S(:, steps) - K, 0.0);
P = double(mean(payoff) * exp(-r * t));
end
function P=stockpath_B(N, K, steps, t, r, vol, cls)
payoff = zeros(N, 1, cls);
B = 1.15 * K;
dt = t / (steps - 1);
s = K * ones(N, 1, cls);
randmat = randn(N, steps - 1, cls);
d = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
S = [s, d];
S = cumprod(S, 2);
S(:, steps) = S(:, steps) .* all(S < B, 2);
payoff = max(S(:, steps) - K, 0.0);
P = double(mean(payoff) * exp(-r * t));
end
Where K
is strike price, t
is maturity, r
is risk free rate, B
is barrier and vol
is volatility. The cls
, which is either 'double'
or gdouble
, enables us to run same pieces of code stockpath_A
and stockpath_B
both on CPU and GPU without actually changing the code. The cls
serves as a flag that lets the code know whether it is run on a CPU or a GPU. When cls
is set as gdouble
, matrix initialization, assignments, calculations including cumprod
, mean
and all
, are all executed in parallel on GPU and therefore you will get a good speedup.
Below is example ArrayFire code of option pricers:
static double stockpath_A(int N)
{
dtype pres = f64;
int steps = 180;
array payoff = zeros(N, 1, pres);
double K = 100.0;
double t = 0.5;
double dt = t / (double)(steps - 1);
double vol = .30;
double r = .01;
array s = 100 * ones(N, 1, pres);
array randmat = randn(N, steps - 1, pres);
randmat = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
array S = join(1, s, randmat);
S = accum(S, 1, PROD_T);
payoff = max(0.0, S.col(steps - 1) - K);
double P = mean<double>(payoff) * exp(-r * t);
return P;
}
static double stockpath_B(int N)
{
dtype pres = f64;
int steps = 180;
array payoff = zeros(N, 1, pres);
double K = 100.0;
double B = 115.0;
double t = 0.5;
double dt = t / (double)(steps - 1);
double vol = .30;
double r = .01;
array s = 100 * ones(N, 1, pres);
array randmat = randn(N, steps - 1, pres);
randmat = exp((r - (vol * vol * 0.5)) * dt + vol * sqrt(dt) * randmat);
array S = join(1, s, randmat);
S = accum(S, 1, PROD_T);
S(span, end) *= all(S < B, 1);
payoff = max(0.0, S.col(steps - 1) - K);
double P = mean<double>(payoff) * exp(-r * t);
return P;
}
Where array
is matrix, f64
denotes all elements of arrays should be doubles, join(1, s, randmat)
joins matrix s
and randmat
as [s, d]
does, and accum(S, 1, PROD_T)
calculates cumulative product as cumprod(S, 2)
does. ArrayFire functions are straightforward to understand as their names indicate and of course they will be run on GPUs. The ArrayFire code is faster than Jacket because it is C/C++ code and it does not suffer from overheads introduced by MATLAB®.
From the above code and charts, you can easily see the simplicity of achieving great speedups. Grab a free Jacket or ArrayFire Pro 15-day trial today!
Comments 2
Pingback: Walking Randomly » A Month of Math Software – August 2012
regarding cumulative product, S = accum(S, 1, PROD_T); where is the definition for PROD_T?