Image Processing Benchmarks on NVIDIA Jetson TK1

Pradeep GarigipatiArrayFire, Benchmarks, CUDA 7 Comments

In this post we will be looking at benchmarks of the following ArrayFire image processing functions on an ARM device.

We pitted the brand new compute 3.2 GPU on NVIDIA Jetson TK1 against a mobile NVIDIA GPU. The closest match to the GPU (from here on referred as TK1) on the Jetson board we have in our mobile card deck is a NVIDIA GT 650M. The GPU device properties that have critical effect on the function performance are listed below.

Property Name / Device Name Jetson TK1 GK20A GT 650M
Compute 3.2 3.0
Number of multiprocessors 1 2
Cores 192 384
Base clock rate 852 MHz 950 MHz
Total global memory 1746 MB 2048 MB
Total shared memory per block 48 KB 48 KB
Total constant memory 64 KB 64KB
Memory clock rate 924 MHz 900 MHz
Memory bus width 64-bit 128-bit
Total registers per block 32768 65536
Warp size 32 32


Images with the the following resolutions are used for benchmarks.

  • 480p (720×480)
  • 720p and 1080p HD
  • 4K and 8K UHD

Note that the vertical axis (Frames per second) in all of the plots is on log scale. The higher the value along the vertical axis, the better the run times are for that function on a given device.


A 3×3 mask was used to benchmark erode function. Dilate results will be similar to erode because they use the same algorithm with a different local neighborhood operator.

Median Filter

A 3×3 mask was used to benchmark medfilt function as well.


We benchmarked resize for halving the image size.


Standard 256 bin histogram.

Bilateral filter

We used 3.5 (7×7 window) spatial variance and 50 chromatic variance to benchmark bilateral function.


A 5×5 blur kernel was used to benchmark convolve function.


  • erode, resize and conv run in real time for all resolutions except 8K UHD.
  • medfilt runs in real time until 1080p HD resolution and falls to interactive rates for 4K UHD.
  • bilateral runs in real time until 1080p HD resolution and falls to 8 fps for 4k UHD.
  • histogram run times are good enough for it to be used in any photo editing software without noticing any lag for generating image histograms.

There are a plethora of other image processing functions available in ArrayFire. You can find the complete list of the functions available in our documentation here. The main take away point from this post is that we can easily do image processing in real time using Jetson TK1 for up to 4k UHD resolution.


If you want to try out ArrayFire on your Jetson TK1, please contact us at

We’ve released ArrayFire for Jetson TK1. You can now get access to the latest version from our download page.

Comments 7

    1. We are very glad that the post was helpful. ArrayFire functions for computer vision algorithms are currently under testing phase. We very recently did a blog post on feature detection which can be found at . In this post, we shared the benchmarks of Harris corner detector and FAST feature detector in comparision with OpenCV’s implementation. We shall keep posting new benchmarks as more functions pass through the testing phase.

  1. Pingback: ArrayFire: A Portable Open-Source Accelerated Computing Library « Another Word For It

  2. Pingback: The ArrayFire Blog’s Best of 2014 | ArrayFire

  3. Pingback: Tech SEO Guru

  4. Pingback: Мимические морщины вокруг глаз

Leave a Reply

Your email address will not be published. Required fields are marked *