Blog | Page 29 | ArrayFire

Median Filtering: CUDA tips and tricks

ArrayFire March 4, 2010CUDA, Events 4 Comments

Last week we posted a video recording from NVIDIA’s GTC09 conference. In the video, I walked through median filtering, presenting the vanilla implementation and then walking through progressive CUDA optimizations. A comment on that post suggested trying some other compiler flags, and it sparked a new series of experiments. In the original video, we started with a vanilla CPU implementation of 3×3 median filtering. We then ported this to the GPU to realize some immediate gains, but then we started a string of optimizations to see how far we could drive up performance: switching to textured memory, switching to shared memory, switching the internal sorting of pixels, etc. The conclusion: pay attention to the resource usage reported by nvcc (registers, …

Accelerate Computer Vision Data Access Patterns with Jacket & GPUs

Gallagher Pryor March 1, 2010ArrayFire Leave a Comment

For computer vision, we’ve found that efficient implementations require a new data access pattern that MATLAB does not currently support. MATLAB and the M language is great for linear algebra where blocks of matrices are the typical access pattern, but not for Computer Vision where algorithms typically operate on patches of imagery. For instance, to pull out patches of imagery in M, one must do a double nested for loop, A = rand(100,100) for xs = -W:W for ys = -W:W patch(xs+W+1, ys+W+1) = A(xs+1+x, ys+1+y); end end …with guards for boundary conditions, etc. It gets even more complicated with non-square patches. On top of that, these implementations don’t translate to the GPUs memory hierarchy at all and are thus …

A case study in CUDA optimization

ArrayFire February 20, 2010CUDA 4 Comments

Jimi Malcolm, VP of Engineering and Co-founder of AccelerEyes takes about 15 minutes to share CUDA optimization strategies to maximize performance of CUDA code. Watch the video below to find out what needs to go into strategizing CUDA development to maximize performance. Jimi uses Median Filtering for this case study. Get the Flash Player to see this player.

Using Parallel For Loops (parfor) with MATLAB® and Jacket

ArrayFire February 10, 2010Benchmarks 3 Comments

MATLAB® parallel for loops (parfor) allow the body of a for loop to be executed across multiple workers simultaneously, but with some pretty large restrictions. With Jacket MGL, Jacket can be used within parfor loops, with the same restrictions. However, it is important to note that Jacket MGL does not currently support co-distributed arrays. Problem Size Problem size might be the single most important consideration in parallelization using the Parallel Computing Toolbox (PCT) and Jacket MGL. When data is used by a worker in the MATLAB pool it must be copied from MATLAB to the worker, and must be copied back when the computation is complete. Additionally, when GPU data is used, it must then be copied by the worker …

How long does it take to get 98X performance improvement with GPUs?

John Melonakos February 4, 2010Case Studies 2 Comments

Well, here is a recent story with one of our customers that accomplished 98X performance speed up with Jacket in 16 days. Of the 16 days, 15 days were spent sending emails back and forth about the problem and less than a day was spent getting the customer code in Jacket and running some performance tests! Who would have imagined GPU programming with performance in 1 day. Happy Reading. Day 1: Customer uses inverse radon (or iradon in MATLAB terms) extensively for their back projection algorithms. They would like to know when the iradon function will be available/supported in Jacket. AccelerEyes product management informs the customer that the inverse radon algorithm used in MATLAB is based on the filtered back …

Streaming data to the GPU

ArrayFire February 1, 2010CUDA Leave a Comment

Learn how to stream data directly to the GPU using the Jacket SDK.

Torben’s Corner

Gallagher Pryor January 19, 2010Announcements Leave a Comment

We work very closely with our customers and really appreciate the feedback we receive and value the insight provided. One Jacket programmer has started to post fantastic content on the Jacket Documentation Wiki under Torben’s Corner. This content is maintained by Torben Larsen‘s team at AAU focusing primarily on outlining performance observations between GPUs and CPUs. This information is not only of great value to our technical team but also valuable to the entire Jacket community. Thanks Torben for this great resource!

New Website Launch

John Melonakos January 13, 2010Announcements Leave a Comment

We are pleased to have released a new version of the AccelerEyes website today. This new website delivers a richer level of content and is the result of the hard work by nearly everyone at AccelerEyes. And more is to come. In the near future, we will be uploading new screencasts and demos showing Jacket in action. We are also working on a comprehensive FAQ set of pages for product documentation. Finally, we are receiving great demos and codes from current Jacket customers and will make these stories and examples available to you on the website. If you have suggestions for information that you’d like to see presented on our website, please let us know. You can email these suggestions …

Developer SDK Upgrade

ArrayFire June 16, 2009CUDA Leave a Comment

In Jacket v1.1, an optional Developer SDK Upgrade is available. This upgrade provides the ability for you to integrate custom CUDA code for use with MATLAB. With a few simple jkt functions (which mimic standard MEX API functions), you can integrate custom CUDA kernels into Jacket. This task is as simple as replacing the main function in your program with jktFunction, which is used in the place of mexFunction for integration of CUDA code into MATLAB and Jacket. This serves an an entry point to Jacket’s runtime. Within a jktFunction, you have access to several jkt API functions to do tasks such as getting input from MATLAB, allocating device memory, calling the CUDA kernels, and casting the kernel’s output to …

Page 29 of 30
←
1
...
28
29
30
→