Heterogeneous Computing: A New Paradigm for the Exascale Era

By Steve Conway | December 08, 2015

Steve Conway, IDC Research Vice President, HPC

The worldwide high-performance computing (HPC) market is already more than seven years into the petascale era (June 2008–present) and is looking to make the thousandfold leap into the exascale era before the end of this decade. This pursuit is global in scope. IDC expects the United States, the European Union, Japan, China, and Russia to vie with each other to reap exascale computing’s anticipated substantial benefits for scientific advancement, industrial-economic competitiveness, and the quality of human life.

But as many HPC experts have noted, achieving reasonable exascale performance in this compressed time frame presents an array of daunting challenges that cannot be met only through evolutionary extrapolations from existing technologies and approaches. These challenges include, but are not limited to, the following:

System costs (flops/dollar). Twenty years ago, the world’s leading HPC sites spent $25 million to $30 million for the most powerful supercomputers available. Today’s single-digit petaflops supercomputers often cost over $100 million. Early exaflop systems could cost $500 million to $1 billion each. This cost escalation will be difficult to sustain. Anything that can increase the flops/dollar ratio will be welcome.

Application performance (time/solution). This perennial challenge grows continually as HPC users seek to scale their applications to new, larger systems. With clock rates stalled, future performance gains must come almost entirely from increased parallelism, resulting in tremendous concurrency requirements for exascale computing. A 1GHz machine would need to perform a billion independent operations every clock tick. Over time, many large science problems will be able to scale to this level. Other problems will lack the required concurrency for single runs but may make use of extreme-scale systems to run ensemble calculations. Automotive design engineers, for example, have greatly increased the number of parametric runs — along with the resolution of each run — that can occur in their allotted phase of the design cycle.

Space and compute density requirements (flops/square foot). A worldwide IDC study revealed that most HPC sites are struggling mightily with datacenter space limitations. Two-thirds of the sites were planning to expand or build new HPC datacenters. Half of the sites planned, or had already begun, to distribute their HPC resources to multiple locations.

Energy costs for computation and data movement (flops/watt, bytes/watt). Last but not least, power has become both a significant design constraint and a major contributor to cost of ownership. With voltage scaling slowing dramatically, power is no longer holding constant as we grow the transistor count with Moore’s law, resulting in processor designs that are power constrained today and becoming more so with each new IC generation. Performance in this era is determined largely by power efficiency, so the great challenge in system design is making processors and data movement more energy efficient without overly compromising performance. The rapid growth in HPC system sizes has elevated energy requirements. Today’s largest HPC datacenters consume as much electricity as a small city, and multi-petascale and exascale datacenters promise to devour even more. Energy prices have risen substantially above historic levels, although prices have moderated from their 2008 highs. Another element in this “perfect storm” is that HPC datacenter power and cooling developments are occurring at a time of growing sensitivity toward carbon footprints and global climate change. Finally, some of the biggest HPC datacenters worry that their local power companies may balk at fully supplying their future demands. One such site, already seeing the need for a 250-megawatt datacenter, may have to go off the power grid and build a small nuclear reactor.

The Heterogeneous Computing Paradigm
During the past decade, clusters leveraging the economies of scale of x86 processors became the dominant species of HPC systems — doubling the size of the global HPC server market from about
$5 billion in the early 2000s to $9.5 billion in 2010. The reigning paradigm has been to advance peak performance by deploying larger and larger clusters containing more and more standard x86 CPU cores.

But x86 processors were never designed to handle all HPC applications well, and x86 single- threaded performance started to hit a heat and power wall half a dozen years ago. It is becoming increasingly clear that although x86 processor road maps lay out substantial advances, the paradigm of sole dependency on x86 processors will not suffice to meet the challenges associated with achieving exascale computing in this decade.

In recent years, an alternative paradigm, “heterogeneous computing,” has gained market momentum for addressing these challenges. This emerging paradigm augments x86 CPUs with accelerators, primarily GPGPUs (henceforth to be called GPUs), so that each processor type can do what it does best. GPUs are particularly adept at handling the substantial number of codes, and portions of codes, that exhibit strong data or thread-level parallelism. That makes GPUs the heirs apparent to vector processors, except that GPUs benefit from far greater economies of scale and related competitive advantages. IDC research shows that the worldwide PC market for discrete graphics processing units alone was worth about $4 billion in 2010.

The heterogeneous computing paradigm is ramping up nicely across the HPC market as a whole. IDC’s 2008 worldwide study on HPC processors revealed that 9% of HPC sites were using some form of accelerator technology alongside CPUs in their installed systems. Fast-forward to the 2010 version of the same global study and the scene has changed considerably. Accelerator technology has gone forth and multiplied. By this time, 28% of the HPC sites were using accelerator technology — a threefold increase from two years earlier — and nearly all of these accelerators were GPUs. Although GPUs represent only about 5% of the processor counts in heterogeneous systems, their numbers are growing rapidly.

<1 2 >

Navigation