Bringing Together the GPU Computing Ecosystem for Python

John Melonakos Announcements, ArrayFire, Computing Trends, CUDA, Open Source, Python Leave a Comment

To date, we have not done a lot for the Python ecosystem. A few months ago, we decided it was time to change that. Like NVIDIA said in this post, the current slate of GPU tools available to Python developers is scattered. With some attention to community building, perhaps we can build something better -- together. NVIDIA spoke some about its plans to help cleanup the ecosystem. We're onboard with that mentality and have two ways we propose to contribute: We're working on a survey paper that assesses the state of the ecosystem. What technical computing things can you do with each package? What benchmarks result from the packages on real Python user code? What plans does each group have ...

Cycling through SYCL

Umar Arshad C/C++, Computing Trends, Open Source, OpenCL Leave a Comment

We recently gave an overview of recent history in the technical computing hardware market. In it, we mention the energy at Intel right now. The weight of Intel is behind the SYCL standard through its new software approach, oneAPI. SYCL is a cross-platform API that targets heterogeneous hardware, similar to OpenCL and CUDA. The SYCL standard was first introduced by Codeplay and is now being managed by the Khronos group. It allows single-source compilation in C++ to target multiple devices on a system, rather than using C++ for the host and domain specific kernel languages for the device. Furthermore, SYCL is fully C++ 17 standards compliant. You don't have any extensions to the language that would prevent any standards compliant ...

The Roaring 20s in AI & Technical Computing

John Melonakos ArrayFire, Computing Trends, Open Source Leave a Comment

Since ArrayFire was founded in 2007, there has been an explosion in software and its importance to our lives. Computers, connected to sensors and real-world outcomes, do really cool things that touch nearly every aspect of our lives. I believe these are exciting times for technical computing and for HPC, as evidenced by the things showcased this week at SC 2020. While ArrayFire focuses purely on software, our hardware partners turn our imaginative lines of code into real-world applications. AMD, NVIDIA, and Intel have each evolved tremendously since we started ArrayFire. Over a decade ago, NVIDIA and its CEO-founder Jensen saw the opportunity to teach the world a new heterogeneous model of computing that overwhelmingly convinces scientists, engineers, and analysts ...

Domain Expertise Vs. Compilers

Oded Computing Trends 2 Comments

Every so often people come up to us and ask, "Aren't compilers and compiler directives good enough for HPC applications?" or "Won't a compiler accomplish that for us?" While compilers have made massive progress in the last two decades, they are still nowhere near the point of putting us and many other HPC programmers out of business. Compilers are still a "one-size-fits-all" solution that needs to be able to deal with any and all input, whereas HPC programmers can be thought of as a designer-fitted solution. Application expertise brings a lot to the table that compilers cannot compete with: Our past experiences have helped us optimize applications that have irregular memory access patterns. While some applications such as matrix applications have regular and simple ...

How to Make GPU Hardware Decisions

Scott Computing Trends, CUDA, Hardware, OpenCL Leave a Comment

We get questions all the time about how to make GPU hardware decisions. We've seen just about every scenario you can imagine, and so we always jump at the chance to help others through this decision process. Here's a recent question from a customer. "I've just found your post on Analytic Bridge and have taken a look at your website ... I'm replacing my two Tesla M1060 cards (computing capability too low) and I'm considering used Tesla M2070s or the new GTX 760 cards. Could you offer any insight? I believe the GTX 760 cards may well outperform the older 2070s and are much cheaper." And here's our response. "The GTX 760 will probably outperform the M2070 for single precision ...

APU 2013 – Day 3 Recap

John Melonakos Computing Trends, Events, OpenCL Leave a Comment

Big announcement here at #APU13! AMD CTO, Mark Papermaster, just announced 2 additions to the 2014 Mobile APU roadmap http://t.co/sWHMhb9AAe — AMD (@AMD) November 13, 2013 Today was the final day of AMD's APU 2013 conference. The theme of today was mostly focused on gaming topics, so it was not as relevant to technical computing as yesterday. However, the mobile product announcement from AMD in the tweet above was interesting. OpenCL is just as important in mobile computing as it is in HPC computing. Both ends of the spectrum have a need for speed and can achieve it through great data parallelism. AMD is looking to make better inroads into mobile computing with these APU announcements. Overall, APU 2013 was a fantastic ...

APU 2013 – Day 2 Recap

John Melonakos Computing Trends, Events, OpenCL 1 Comment

Today was the first full day of AMD's APU 2013 conference. It was a whirlwind of heterogeneous computing. From the morning keynotes, three particular salient points stuck out to us: Mike Muller, CTO at ARM, talked about heterogeneous computing. He said it nicely with, "Heterogeneous computing is the future. It has also been our past, but we didn't notice because a few shiny companies overshadowed everything else." That is a great way to describe it. The future of heterogeneous computing involves the rise in importance of non-x86 processors. Throwing a few more MHz onto a CPU no longer is capable of satiating computational demands. Nandini Ramani, VP at Oracle, talked about the importance of Java for heterogeneous computing. She pointed ...

Application Time vs Solver Time

John Melonakos ArrayFire, Computing Trends Leave a Comment

Last week, HPCwire ran an interesting article entitled, "Where has HPC's math gone?" The article analyzes the increasing importance of math solvers to successful HPC outcomes. As the number of cores grows, the percentage of time HPC codes spend in solvers increases significantly. The following chart illustrates this trend nicely:   ArrayFire is ideally suited for HPC applications that need to accelerate the toughest math problems. ArrayFire contains hundreds of math functions across numerous domains. In general, if the HPC community really wants to solve this problem, it will begin to invest more in libraries than in compilers that have no chance at optimizing these tough math problems automatically. Rather, it is only through expertly-tuned codes, such as those developed ...

ISC 2013 Keynote by Stephen Pawlowski of Intel

John Melonakos Computing Trends, Events Leave a Comment

Stephen Pawlowski of Intel gave an interesting keynote today at ISC 2013. He continued the theme of yesterday's keynote to address challenges our market faces in getting to exascale computing. Here is a summary of the points he made during his talk: Getting to exascale by 2020 requires performance improvement of 2x every year Innovations anticipated include stacked chips and optical layers DRAM is not scaling with Moore's Law More power goes into transferring data than in computing Need to operate transistors near threshold New materials for DRAM needed. Resistive memory could replace DRAM. Need to explore both the big die and the small die paths as we approach 2020 Big die path leads to 10 billion transistors on a ...

ISC 2013 Keynote by Bill Dally of NVIDIA

John Melonakos Computing Trends, Events Leave a Comment

Bill Dally of NVIDIA gave a wonderful keynote today at ISC 2013. He focused on addressing the challenges facing our market in getting to exascale computing. He talked about how Moore’s law is alive and well because transistors continue to double at an astonishing rate. However, the additional transistors are not translating into the same big performance gains as they did in the 1990’s. Whereas performance used to grow 50% per year, performance today is growing at a much slower pace. The biggest bottleneck to more performance is energy efficiency. Bill showed slides of chips and talked about the picojoules required to compute versus those required to move data and operands around the chip. The take home message was that ...