Antenna array design involves repeated simulation to tune the many parameters involved, and waiting around for simulations to finish is no fun. Offloading the optimization problem onto the GPU cuts that time down significantly. In their recent paper, Capozzoli, Curcio, and Liseno (pdf, citation) of University of Naples Federico II demonstrated how a simple modification to their echo generator array simulation took advantage of the GPU to bring immediate speedups. Checkout this figure from their paper showing CPU simulation time growing prohibitively slow while the GPU grows little as more data is fed. Their simulation is designed around optimizing an energy functional. Using fminunc to drive the optimization problem on the CPU, they simply modified their functional evaluation to take …
Jacket Lectures – Learn and Teach GPU computing
We are pleased to share 6 in-depth Jacket lectures, helpful both in learning and teaching Jacket. Download the lectures (PDF format), here: http://www.accelereyes.com/support/lectures Jacket is used in course instruction at many universities around the world. Professors and course instructors use Jacket to provide engineering students with GPU acceleration of MATLAB® algorithms and to bring HPC to MATLAB courses. The six lectures are entitled “Parallel High Performance Computing with Emphasis on Jacket Based GPU Computing” and have topics including: Parallel computing introduction Jacket introduction Basic programming with Jacket Advanced programming with Jacket Multiple GPU programming Benchmarking If you are looking at accelerating MATLAB code or parallel computing with MATLAB, you definitely will want to add these lectures to your arsenal of …
Laplace Transform Inversion on the GPU
The numerical inversion of the Laplace transform is a long standing problem due its implicit ill-posedness. Patrick Kano and Moysey Brio of Acunum Algorithms and Simulations, with their experience in computational methods and algorithm development, found a solution that not only works, but is very fast. Their code implements the Weeks’ method for Numerical Laplace Inversion. Apart from casting CPU variables to GPU, etc, the major step involved in Jacketizing the code was as simple as converting a for loop to GFOR! Something like what’s given below: for nidx=1:Nprod Errorvec(nidx) = wfncpuErrorEst( … ); end gfor nidx=1:Nprod Errorvec(nidx) = wfnjacErrorEst( … ); gend The loop in question calls a global minimization function that computes an absolute error estimate for each …
Speeding Up Compressed Sensing Algorithms
Are you looking for ways to speed up compressed sensing? If you work in the areas of medical image reconstruction, image acquisition or sensor networks, you probably are. This paper, Parallel Implementation of Compressed Sensing Algorithm on CUDA-GPU, compares CPUs running Matlab and GPUs running Jacket using a Basis Pursuit Algorithm for compressed sensing. They compared an Intel Core 2 Duo T8100 (2.1GHz and 3.0 GB memory) running Matlab with a NVIDIA GeForce series 8400m GS (256 MB video memory, DDR2 and bus width of 64bit) using an older version of Jacket, Version 1.3. The CPU and GPU setups were used to run their Basis Pursuit Algorithm on six MRI images. These are some samples: The implementation using Jacket …
Our Point of View & Twitter Comedy
“Great businesses have a point of view, not just a product or service.” ~37 Signals At AccelerEyes, our point of view is that GPU software can and should deliver great results on real applications. With this point of view, we’ve kept our heads down solely focused on delivering a great runtime system for GPUs. All our energy has been devoted to the task of emitting optimized low-level code from high-level matrix notation. These efforts are now paying off in a big way! Jacket is consistently delivering awesome results in real applications, read examples here and here. Alternative choices apparently have a different point of view. Yesterday’s twitter stream contained a comical, but all-to-common indication of frustration with the recent GPU …
Digital Holograms Faster than Ever
REAL3D is a digital holography project funded by the EU and brings together nine participants from academia and industry under the FP7. As part of the project Nitesh Pandey, Damien Kelly, Bryan Hennelly and Thomas Naughton from the National University of Ireland, Maynooth demonstrate utilizing pre-computation and quantization of chirp matrices with GPUs running Jacket from AccelerEyes speeds up the reconstructions of digital holograms. Digital holography is a powerful imaging technique with many new applications like true 3D display. It allows the capture of both amplitude and phase information of the light reflected off the surface of 3D objects. Researchers at the National University of Ireland, Maynooth are developing techniques based on digital holography for 3D display applications. Reconstruction of …
Feature Learning Architectures with GPU-acceleration
Stanford researchers in Andrew Ng’s group used GPUs and Jacket to speed up their work on Feature Learning Architectures. They wanted to know why certain feature learning architectures with random, untrained weights perform so well on object recognition tasks. The complete write up can be found in On random weights and unsupervised feature learning in ICML 2011. They decide to use GPUs and Jacket for this study because of “the need to quickly evaluate many architectures on thousands of images.” Jacket taps into the immense computing power of GPUs and speeds up research utilizing many images. This is the architecture used in the study: They started by studying the basis of good performance for systems and found convolutional pooling …
A better way to time Jacket code
Whether you are a new Jacket programmer or a GPU maestro, you are bound to speed-test Jacket at some point. There are many factors to keep in mind while benchmarking Jacket code – a simple tic-func()-toc won’t do. For example, this is some typical benchmarking code: % warm up x = rand(n,’single’); x = grand(n, ‘single’); geval(x); % CPU timing tic for r = 1:reps x = rand(n,’single’); end cpu_time = toc; % GPU timing gsync, tic for r = 1:reps x = grand(n,’single’); geval(x); end gsync, gpu_time = toc With Jacket 1.7, this entire code chunk is now replaced by two lines: cpu_time = timeit(@() rand(n,’single’)); gpu_time = timeit(@() grand(n,’single’));
Improved Fat/Water Reconstruction Algorithm with Jacket
Case Western Reserve University researchers turned to GPUs running Jacket to develop a fast and robust Iterative Decomposition of water and fat with an Echo Asymmetry and Least-squares (IDEAL) reconstruction algorithm. The complete article can be found here. The authors report that “GPU usage is critical for the future of high resolution, small animal and human imaging” and Jacket “enables GPU computations in MATLAB.” Their research was performed on a desktop system with 32GB RAM, dual Intel Xeon X5450 3.0 GHz processors, an NVIDIA Quadro FX5800 (4GB RAM, 240 cores, 400 MHz clock), and MATLAB R2009a 64bit. Jacket v1.1, an older version, was used to produce these results. Reconstruction tests with different sized images were performed to evaluate computation times …
CUDA over Remote Desktop now available for Tesla GPUs
Update: Jacket over Remote Desktop is now available for Quadro devices too! Read this post. Jacket over Remote Connections is also documented extensively on the AccelerEyes Wiki. Over the past several years, many Jacket programmers have requested support for Remote Desktop in Windows. We are pleased to report that recent NVIDIA drivers now enable Jacket to run over Remote Desktop, for some system configurations. Specifically, the requirements to make this work include: Windows Vista, Windows 7, Windows HPC Server 2008, or Windows HPC Server 2008 R2 The latest NVIDIA driver (as required by Jacket) Tesla GPU TCC-mode enabled on at least one (Tesla) GPU To enable TCC, the Tesla cannot be connected to a display. This means you need to …