Every so often people come up to us and ask, “Aren’t compilers and compiler directives good enough for HPC applications?” or “Won’t a compiler accomplish that for us?” While compilers have made massive progress in the last two decades, they are still nowhere near the point of putting us and many other HPC programmers out of business. Compilers are still a “one-size-fits-all” solution that needs to be able to deal with any and all input, whereas HPC programmers can be thought of as a designer-fitted solution. Application expertise brings a lot to the table that compilers cannot compete with:
- Our past experiences have helped us optimize applications that have irregular memory access patterns. While some applications such as matrix applications have regular and simple memory access patterns, other applications such as graph algorithms used for social network analysis have random memory access patterns that compilers cannot optimize.
- Our familiarity with data set at hand can help us with the optimizing the application and find algorithmic bottlenecks. Our knowledge can help us decide on when or if loop unrolling is necessary, if SIMD would be beneficial, selecting the number of threads that should be used (which can be directly related to the memory footprint) and much more.
- Our knowledge on the work required by the algorithm(s) can help us in load balancing. APIs such as OpenMP offer directives that allowes the application designer to decide what parallel granularity and which scheduler should be used. The static scheduler offers a straightforward and simple partitioning of the loop across the threads—this does not ensure load-balancing as various iterations of the loop may require a different amount of work. The dynamic scheduler on the other hand allows finer grain partitioning of the work; in practice this scheduler has limited scalability due the nature of multiple threads trying to access the same work queue. A domain expert can help decide which scheduler should be used or how to partition the work equally. A domain expert can design an online load-balancing mechanism that will decide on the partitioning scheme based on the input data.
- Our awareness of the increasing number of architectures helps us fit the right solution to the right system. Recall that each system requires a different set of optimizations.
By no means is the purpose of this blog to reduce the significance or impact that compilers have had on the computing. Rather this blog focuses on the reason that compilers simply cannot offer a high enough level of system utilization that domain expertise does offer! ArrayFire was designed by people with domain expertise. So if you are looking for a high performing computational library, check out ArrayFire!