In this blog I will finalize the work that we completed on triangle counting in graphs on the GPU using CUDA. The two previous blogs can be found: Part 1 and Part 2. The first part introduces the significance of the problem and the second part explains the algorithms that we used in our solution. This work was presented in finer detail in the Workshop on Irregular Applications: Architectures and Algorithms which took place as part of Supercomputing 2014. The full report can be found here. In previous blogs, we discussed that the performance of the triangle counting is dependent on the algorithm and the CUDA kernel. Our implementation gives the data scientist control over several important parameters: Number of …