Frameworks | Page 12

Jacket and GPUs show promise in Neuroscience with fMRI and SPM

John Melonakos March 8, 2010Case Studies, CUDA Leave a Comment

For those of you interested in neuroscience and neuroimaging, you have probably heard of a software capability called SPM or Statistical Parametric Mapping developed by a group at University College London. Well, a group at Georgia Tech has been doing some work with Jacket and CUDA on SPM and have produced some initial results that show some promise. Being able to speed up the image analysis of functional MRI can benefit the medical community in a big way. AccelerEyes has been supporting these efforts at Georgia Tech and with the permission of the authors we have produced an initial look at their work. Enjoy. http://www.accelereyes.com/resources/spm-fmri

Median Filtering: CUDA tips and tricks

ArrayFire March 4, 2010CUDA, Events 4 Comments

Last week we posted a video recording from NVIDIA’s GTC09 conference. In the video, I walked through median filtering, presenting the vanilla implementation and then walking through progressive CUDA optimizations. A comment on that post suggested trying some other compiler flags, and it sparked a new series of experiments. In the original video, we started with a vanilla CPU implementation of 3×3 median filtering. We then ported this to the GPU to realize some immediate gains, but then we started a string of optimizations to see how far we could drive up performance: switching to textured memory, switching to shared memory, switching the internal sorting of pixels, etc. The conclusion: pay attention to the resource usage reported by nvcc (registers, …

A case study in CUDA optimization

ArrayFire February 20, 2010CUDA 4 Comments

Jimi Malcolm, VP of Engineering and Co-founder of AccelerEyes takes about 15 minutes to share CUDA optimization strategies to maximize performance of CUDA code. Watch the video below to find out what needs to go into strategizing CUDA development to maximize performance. Jimi uses Median Filtering for this case study. Get the Flash Player to see this player.

Streaming data to the GPU

ArrayFire February 1, 2010CUDA Leave a Comment

Learn how to stream data directly to the GPU using the Jacket SDK.

Developer SDK Upgrade

ArrayFire June 16, 2009CUDA Leave a Comment

In Jacket v1.1, an optional Developer SDK Upgrade is available. This upgrade provides the ability for you to integrate custom CUDA code for use with MATLAB. With a few simple jkt functions (which mimic standard MEX API functions), you can integrate custom CUDA kernels into Jacket. This task is as simple as replacing the main function in your program with jktFunction, which is used in the place of mexFunction for integration of CUDA code into MATLAB and Jacket. This serves an an entry point to Jacket’s runtime. Within a jktFunction, you have access to several jkt API functions to do tasks such as getting input from MATLAB, allocating device memory, calling the CUDA kernels, and casting the kernel’s output to …

LAPACK Functions in Jacket (eig, inv, etc.)

John Melonakos January 22, 2009CUDA 2 Comments

One of the questions people commonly ask us is: When will Jacket support LAPACK features such as eigenvalue decomposition, matrix inverse, system solvers, etc.? The reason this question is so popular is that people recognize that these kinds of problems are well-suited for the GPU and will end up giving great performance boosts for Jacket users. We are looking forward to delivering these functions in Jacket. Jacket is currently built on top of CUDA. For reasons why we like CUDA, see our previous blog post about OpenCL. While NVIDIA is busy building from CUDA from the ground up, we are busy building Jacket from the top (MATLAB) down. NVIDIA is working hard to promote and develop LAPACK libraries directly into …

Data-parallelism vs Task-parallelism

John Melonakos January 22, 2009CUDA, OpenCL 1 Comment

In order to understand how Jacket works, it is important to understand the difference between data parallelism and task parallelism. There are many ways to define this, but simply put and in our context: Task parallelism is the simultaneous execution on multiple cores of many different functions across the same or different datasets. Data parallelism (aka SIMD) is the simultaneous execution on multiple cores of the same function across the elements of a dataset. Jacket focuses on exploiting data parallelism or SIMD computations. The vectorized MATLAB language is especially conducive to good SIMD operations (more so than a non-vectorized language such as C/C++). And if you’re going to need a vectorized notation to achieve SIMD computation, why not choose the …

The NVIDIA MEX-Plugin & Jacket

John Melonakos January 7, 2009CUDA Leave a Comment

One of the first questions people ask when considering Jacket for GPU MATLAB computing is the following: How is Jacket different from the MATLAB plugin on the NVIDIA website (found here: http://developer.nvidia.com/object/matlab_cuda.html)? The short answer to this is that the NVIDIA MEX-plugin requires you to write CUDA code, while Jacket does not. This has many implications and ends up resulting in a lot of advantages for you as a MATLAB programmer. First let’s describe the features of the MEX-plugin: You write CUDA code that solves your problem. You use the MEX configuration files provided by NVIDIA to compile your CUDA code into a MEX file that is callable by MATLAB. MATLAB calls your MEX file, moves data out to the …

OpenCL

John Melonakos December 30, 2008CUDA, OpenCL 4 Comments

We often get questions such as the one we just received via email: 1) Any idea if you will be supporting AMD/ATI cards in future ? 2) Have you considered OpenCL as a potential pathway for the future ? I can see an advantage there for you (if it takes off) in that you’re not tied to a single vendor any more and potentially you’d be able to take advantage of other accelerators that may support it. It’s very early days yet but certainly from our point of view the current paradigm of code to a single vendors card doesn’t seem sustainable.. OpenCL is a community effort to create a standard for parallel computing, with early emphasis on GPGPU computing, …

Page 12 of 12
←
1
...
10
11
12