Jimi Malcolm, VP of Engineering and Co-founder of AccelerEyes takes about 15 minutes to share CUDA optimization strategies to maximize performance of CUDA code. Watch the video below to find out what needs to go into strategizing CUDA development to maximize performance. Jimi uses Median Filtering for this case study.
Get the Flash Player to see this player.
Comments 4
Great tutorial! I wish it was there 1.5 years ago when I was evaluating motion estimation algorithms on CUDA.
Something that is missing from the evaluation is the effect of forcing the nvcc compiler to use 9 registers and prevent it from spilling the registers to memory and still using bubble sort. It would be interesting to see if that results in an improvement or degrades performance.
Author
It’s easy to force nvcc to use at most 9 registers, but then it’ll just spill more into lmem — there’s no way to prevent it from spilling to lmem except changing the algorithm so it demands fewer registers.
But I like your thinking, so I put together another set of experiments–some new, some old–and I got a little carried away so I made a new post.
Thanks for the suggestion!
-jm
Pingback: GPU MATLAB Computing » Median Filtering: CUDA tips and tricks
Pingback: Gebäudereinigung München