Are you looking for ways to speed up compressed sensing? If you work in the areas of medical image reconstruction, image acquisition or sensor networks, you probably are. This paper, Parallel Implementation of Compressed Sensing Algorithm on CUDA-GPU, compares CPUs running Matlab and GPUs running Jacket using a Basis Pursuit Algorithm for compressed sensing. They compared an Intel Core 2 Duo T8100 (2.1GHz and 3.0 GB memory) running Matlab with a NVIDIA GeForce series 8400m GS (256 MB video memory, DDR2 and bus width of 64bit) using an older version of Jacket, Version 1.3. The CPU and GPU setups were used to run their Basis Pursuit Algorithm on six MRI images. These are some samples: The implementation using Jacket …
Digital Holograms Faster than Ever
REAL3D is a digital holography project funded by the EU and brings together nine participants from academia and industry under the FP7. As part of the project Nitesh Pandey, Damien Kelly, Bryan Hennelly and Thomas Naughton from the National University of Ireland, Maynooth demonstrate utilizing pre-computation and quantization of chirp matrices with GPUs running Jacket from AccelerEyes speeds up the reconstructions of digital holograms. Digital holography is a powerful imaging technique with many new applications like true 3D display. It allows the capture of both amplitude and phase information of the light reflected off the surface of 3D objects. Researchers at the National University of Ireland, Maynooth are developing techniques based on digital holography for 3D display applications. Reconstruction of …
Feature Learning Architectures with GPU-acceleration
Stanford researchers in Andrew Ng’s group used GPUs and Jacket to speed up their work on Feature Learning Architectures. They wanted to know why certain feature learning architectures with random, untrained weights perform so well on object recognition tasks. The complete write up can be found in On random weights and unsupervised feature learning in ICML 2011. They decide to use GPUs and Jacket for this study because of “the need to quickly evaluate many architectures on thousands of images.” Jacket taps into the immense computing power of GPUs and speeds up research utilizing many images. This is the architecture used in the study: They started by studying the basis of good performance for systems and found convolutional pooling …
Improved Fat/Water Reconstruction Algorithm with Jacket
Case Western Reserve University researchers turned to GPUs running Jacket to develop a fast and robust Iterative Decomposition of water and fat with an Echo Asymmetry and Least-squares (IDEAL) reconstruction algorithm. The complete article can be found here. The authors report that “GPU usage is critical for the future of high resolution, small animal and human imaging” and Jacket “enables GPU computations in MATLAB.” Their research was performed on a desktop system with 32GB RAM, dual Intel Xeon X5450 3.0 GHz processors, an NVIDIA Quadro FX5800 (4GB RAM, 240 cores, 400 MHz clock), and MATLAB R2009a 64bit. Jacket v1.1, an older version, was used to produce these results. Reconstruction tests with different sized images were performed to evaluate computation times …
Hybrid GPU & Multicore Processing for LU Decomposition
One of the hot areas in supercomputing is hybrid compute: balancing the computational load between one or more CPUs and GPUs. Along these lines Nolan Davis and Daniel Redig at SAIC recently presented work on Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems where they developed a novel algorithm for LU decomposition, one of the most important routines in linear algebra. Here’s a snapshot view of their setup: System Specs: GPU Nvidia® Tesla™ 2050 448 processing cores3 GB dedicated memory Multicore Host 24 cores64 GB system memory Red Hat® Enterprise Linux 5 Two AMD Opteron™ 6172 12-core processors Host-to-GPU Communications PCIE 2.0 16 channels at 500 MB/sec/laneTheoretical peak bandwidth of 8 GB/sec Their initial results are very promising. For …
Stanford GPU Benchmarks: Jacket vs PCT/GPU
Researchers in the Pervasive Parallelism Laboratory at Stanford University recently published work describing a novel framework for parallel computing with a paper entitled, “A Domain-Specific Approach to Heterogeneous Parallelism.” As part of their research, they compared Jacket to the GPU support in the Parallel Computing Toolbox™. The results clearly show that Jacket’s optimizations make a big difference in performance. In this blog post, we highlight 4 algorithms included in their research: NAME DESCRIPTION INPUT Gaussian Discriminant Analysis (GDA) Generative learning algorithm for modeling the probability distribution of a set of data as a multivariate Gaussian 1,200×1,024 Matrix Restricted Boltzmann Machine (RBM) Stochastic recurrent neural network, without connections between hidden units 2,000 Hidden Units 2,000 Dimensions Support Vector Machine (SVM) Optimal …
GPU accelerated lattice Boltzmann model for shallow water flow and mass transport
Dr. Kevin Tubbs and Professor Tsai at Louisiana State University recently published an interesting paper using GPUs and Jacket to accelerate lattice Boltzmann models for shallow water flow and mass transport. More details about this work are provided in the full success story page on the website. Jacket makes GPU programming easy. “Very little recoding was needed to promote the LBM code to run on the GPU,” say the authors at one point in their paper. In this blog post, we share the highlights of this work. Using these methods, the authors are able to simulate shallow water flow and mass transport. For instance, checkout these videos of a dam break: The authors completed this work with a relatively older …
Computer Vision Demos at SC’10 with 8-GPU Colfax CXT8000
We just returned from SC’10, the biggest supercomputing show of the year. At the show, we demoed Jacket driving computer vision demos on an 8-GPU Colfax CXT8000 system… pure eye candy! We had CPU and GPU versions of the demos running on 8 different monitors, each attached to the 8 Tesla C2050 GPUs in the system. Input data for the various demos was sourced from 3 webcams and 2 Blu-ray video inputs. Checkout the demo details, below: Demo 1 Sobel edge detection with image dilation and interpolation overlaid on Blu-ray video in realtime. Demo 2 Feature detection on a 4-level pyramid of 640×480 realtime webcam video. Demo 3 Gradient descent feature tracking , a stripped down version of KLT, tracking …
Beam Propagation Methods – Jacket is 3.5X faster than the CPU and 2X faster than PCT
A couple weeks ago, a GPU-enabled code appeared on MATLAB Central entitled, “A CUDA accelerated Beam Propagation Method [BPM] Solver using the Parallel Computing Toolbox.” In this post, we share a video which showcases how Jacket is much better than PCT at GPU computing, by analyzing performance on this Beam Propagation Method code. To reproduce these results, download the source code here: CUDA_BPM_NOV_04_2010_AccelerEyes These benchmarks were run on an NVIDIA Tesla C2070 GPU versus a quad-core Intel CPU. MATLAB + PCT R2010B were used for the PCT-GPU experiments. MATLAB + Jacket 1.6 (prerelease) were used for the Jacket-GPU experiments. Take Home Message Due to Jacket’s extensive library of GPU functions and its optimized GPU runtime, it performs 3.5X faster than …
Jacket accelerating life science and defense applications
With IBM’s decision this week to integrate Tesla technology into it’s high performance computing line, there should be no doubt that GP-GPU computing is more than a fad, organizations solving technical problems are able to do them more productively and efficiently than ever before with GPUs. AccelerEyes’ customers are experiencing this first hand with the Jacket product family as they are able to quickly and easily implement new or existing algorithms for GPUs and accomplish their technical needs much faster with substantial speed improvements. Case in point, this week, AccelerEyes has released two case studies from customers that have used Jacket to transform their applications to GPU Computing with compelling results. System Planning Corporation has implemented two different radar processing …