Heterogeneous Computing: A New Paradigm for the Exascale Era

By Steve Conway | December 08, 2015

Heterogeneous computing is making its greatest impact at the high end of the HPC market. GPUs first appeared on the TOP500 list of the world’s supercomputer sites (www.top500.org) in 2008. By June 2011, three of the top 10 systems on the list employed GPUs. And in October 2011, the U.S. Department of Energy’s Oak Ridge National Laboratory unveiled plans to upgrade the number one U.S. supercomputer to a successor system (“Titan”) with a planned peak performance of 20–30 petaflops by complementing more than 18,000 x86 CPUs with an equal number of GPUs. Following that, the Texas Advanced Computing Center revealed plans for a heterogeneous supercomputer (“Stampede”) that initially targeted 10 peak petaflops by combining two petaflops of x86 CPUs with eight petaflops of MIC accelerator processors.

The adoption of heterogeneous computing by these and other bellwether HPC sites indicates that GPUs are moving out of the experimental phase and into the phase where they will be increasingly entrusted with appropriate production-oriented, mission-critical work.

Definitions
Cluster: IDC defines a cluster as a set of independent computers combined into a unified system through systems software and networking technologies. Thus, clusters are not based on new architectural concepts so much as new systems integration strategies.

Heterogeneous processing: Heterogeneous processing and the synonymous term heterogeneous computing refer to the use of multiple types of processors, typically CPUs in combination with GPUs or other accelerators, within the same HPC system.

High-performance computing: IDC uses the term high-performance computing to refer to all technical computing servers and clusters used to solve problems that are computationally intensive or data intensive. The term also refers to the market for these systems and the activities within this market. It includes technical servers but excludes desktop computers used for technical computing.

Heterogeneous Computing Benefits for the Exascale Era
The benefits of the heterogeneous computing paradigm for HPC are interrelated and address some of the most important exascale challenges:

System costs. GPUs and related accelerators can provide a lot of peak and Linpack flops for the money. Especially for HPC sites seriously pursuing the upper ranges of the TOP500 supercomputers list, bolting on GPUs can provide a kind of flops warp drive, rocketing Linpack performance to where almost no one has gone before. Witness China’s Tianhe-1A supercomputer, which supplemented x86 processors with GPUs to seize the number one spot on the November 2010 TOP500 list. To achieve this feat, Tianhe-1A employed 14,336 x86 CPUs and 7,168 GPUs. NVIDIA suggested at the time that it would have taken “50,000 CPUs and twice as much floor space to deliver the same performance using CPUs alone.” In any case, by June 2011, three of the top five systems on the list employed GPUs. And in October 2011, as noted earlier, the U.S. Department of Energy’s Oak Ridge National Laboratory unveiled plans to upgrade the number one U.S. supercomputer to peak performance of 20–30 petaflops by complementing more than 18,000 x86 CPUs with an equal number of GPUs.

Speed. HPC users have reported excellent speedups on GPUs, frequently in the 3–10x range, especially for codes, or portions of codes, that exhibit strong data parallelism. GPUs are already enabling real-world advances in HPC domains, especially the life sciences, financial services, oil and gas, product design and manufacturing domains, and digital content creation and distribution. GPUs are a particularly promising fit for molecular dynamics simulations, which extend across multiple applications domains.

Space and compute density. At a time when many HPC datacenters are approaching the limits of their power and space envelopes, GPUs promise to deliver very high peak compute density. A contemporary GPU may contain as many as 512 compute cores, compared with 4 to 16 cores for a contemporary CPU. Keep in mind, however, that heterogeneous computing is heterogeneous for a reason: Each processor type, CPU, and accelerator is best at tackling a different portion of the problem-solving spectrum.

Energy costs. The rapid growth in HPC system sizes has caused energy requirements to skyrocket. Today’s largest HPC datacenters consume as much electricity as a small city, and exascale datacenters promise to devour even more — an estimated 120 megawatts or more, if they were implemented using existing technologies. The Department of Energy’s exascale goal is to bring that number down to no more than 20 megawatts for an exascale system deployment. This is desired to avoid massive increases in energy costs, to ensure the availability of adequate energy supplies from local utilities, and to keep datacenter spatial requirements within reason. GPUs can be valuable partners with CPUs in heterogeneous computing configurations by significantly improving energy efficiency on the substantial subset of codes (and portions of codes) exhibiting strong data parallelism.

Adoption Barriers
As a relatively new technology — at least as used for computing — GPUs have encountered adoption barriers that IDC expects to ease over time. HPC buyers report the following main barriers to more extensive GPU deployment:

Ease of programming. Despite the availability of useful tools such as CUDA, OpenCL, and The Portland Group’s directives-based compiler that’s designed to transform Fortran or C source code into GPU-accelerated code, HPC buyers and end users generally report that programming GPUs remains more challenging than the more familiar approaches to programming standard x86 processors. This barrier will likely continue to drop over time as familiarity with GPU programming methods grows — through the more than 450 universities offering GPU curricula today and as the GPU programming methods advance.

Mediated communication. Another issue frequently cited by HPC users is the fact that GPUs today are typically implemented as coprocessors that need to communicate with x86 or other base processors via PCI Express channels that are comparatively slow — at least when weighed against implementing the CPU and GPU on the same die. This mediated communication affects some applications more than others. It has not prevented HPC users from achieving impressive time-to-solution speedups on a growing number of application codes.

Waiting for future CPU generations. Some HPC users believe that waiting to see what improvements future-generation x86 processors deliver is a risk worth taking, compared with the effort of learning how to program GPUs and adapting portions of their codes to run on GPUs. And because GPUs are still relatively new devices for high-performance computing, some users worry that the substantial effort to rewrite their codes could be wasted if GPU architectures evolve in a new direction or if GPUs are not an enduring phenomenon in the HPC market. This wait-and-see group has been declining as GPUs have increased their influence in the global HPC market and as directive-based programming of GPUs has become more prevalent.

Trends
Heterogeneous computing, which today typically couples x86 processors with GPUs implemented as coprocessors, is an important new paradigm that is increasingly taking its place alongside the existing paradigm of pure-play x86-based HPC systems.

An important sign of GPUs’ momentum is the spread of GPU-related academic offerings. NVIDIA, which supplies educational materials for parallel programming, reports that its CUDA parallel programming language is being taught at 478 universities in 57 countries. The list includes MIT, Harvard, Stanford, Cambridge, Oxford, the Indian Institutes of Technology, National Taiwan University, and the Chinese Academy of Sciences.

For reasons stated earlier (refer back to the Benefits section), heterogeneous computing is proving especially attractive to large HPC sites that are pushing up against the boundaries of computational science and engineering, as well as energy and spatial boundaries. Hence, heterogeneous computing looks particularly attractive as a new paradigm for the exascale computing era that will begin later in this decade. Heterogeneous computing involving GPUs is also making inroads into smaller research sites and industrial organizations.

Keep in mind, however, that x86 processor technology is not standing still and promises to remain the HPC revenue leader through 2015. In addition, accelerator technology will be available from an increasing number of vendors and in a growing array of “flavors,” giving users more options.

Conclusion
Heterogeneous computing that today typically couples x86-based processors with GPUs implemented as coprocessors is an important emerging paradigm in the worldwide HPC market — especially as a strategy to help meet the daunting challenges of the coming exascale computing era. IDC believes that heterogeneous computing will be indispensable for achieving exascale computing in this decade.

GPUs have rapidly emerged from the experimental phase and are used today for production- oriented tasks such as seismic processing, biochemistry simulations, weather and climate modeling, computational finance, computational fluid dynamics, and data analysis. They have tripled their worldwide footprint at HPC sites in the past two years alone, they have become more indispensable for attaining prominence on the closely watched TOP500 supercomputers lists, and they have enabled a growing number and variety of real-world achievements.

The adoption of heterogeneous processing involving GPUs by some of the world’s leading HPC sites indicates that this paradigm is moving beyond the experimental phase, and GPUs are increasingly being entrusted with appropriate portions of production-oriented, mission-critical workloads.

As GPU hardware and software technologies advance, as more university students and others learn how to exploit GPUs, and as more GPUs become available to the world’s most creative scientific, engineering, and computational minds, IDC believes GPUs will play an increasingly important role in the global HPC market, complementing x86 processors within the HPC ecosystem.

<1 2 >

Navigation