Exclusive Interview : Steve Scott, CTO of Cray

By The Editorial Team | March 10, 2016

The technologies that will help reach the Exascale era

In this exclusive interview, Supercomputer Manufacturer Cray’s CTO Steve Scott gives up some insight as to which areas of supercomputer subsystems need to be improved for the HPC industry to reach the exascale era in optimal conditions

HPC Review: according to you, what are the areas needing improvement to reach Exascale Computing ?
Steve Scott: The current state of research focuses on several points. From a hierarchical standpoint, processors have been gaining the most attention. Let’s be clear, the advances in semiconductor have somehow slowed down, because of the underlying physics of the transistors. Plus, they are constrained by power. The second big related issue is energy eficiency. So the challenge is to gain performance while keeping the wattage low.

HPCR: What are the processor research axes you are exploring in order to gain performance ?
SS: In termes of performance and efficiency, scalar processors may be limited by their transistor count and power dissipation. There are some areas of improvement however, by using vector processing technologies that were the big craze in the early days of supercomputing. Our initial Cray I Supercomputer was, in fact, built upon vector processors.

Vector processors allow to do more work with a smaller number of transistors, and with a lower power expenditure. What is interesting is that vector instructions have found their way into Intel Xeon’s processors with the AVX (Advanced Vector Extensions), a set of vector instructions. The Xeon 5 line uses relatively simple units but uses vector instructions to gain power.

However, serial threads are not getting faster, so there are more changes needed in processor architecture. Cray builds compilers and tools to help developers take advantage of these changes in the processor architectures. The reason for this ever evolving landscape is that programmers have to deal with an ever increasing number of threads. For instance, a clock rate of only 1 ghz results in a billion of operations in parallel for every clock tick, and we should be able to double that figure in order to reach exascale.

HPCR: How do you see the memory architecture evolve ?
SS: Another trend that we see evolving is the hierarchical parallelism, with multiple threads and multiple nodes which results in a triple hierarchy to handle for the programmers. And it isn’t even over yet, since we are adding additional layers to the hierarchy of memory.
In past days we added caches to the chips in order to gain performance. Now we have mutiple levels of cache memory. In fact, there are two new levels of cache. The fastest cache is on the same dia package as the processor itself. The second layer we are foreseeing is a new kind of 3 dimensional stacked memory, which is a high bandwidth emerging memory standard and that should be available this year.

These two new memory layers will allow to obtain an order of magnitude better memory bandwidth plus an order of magnitude lower power consumption. Another kind of memory layer that could come into the memory hierarchy is non volatile memory like intel has announced (3D Xpoint, alias Optane). However, other technologies are coming up like ReRAM, wich will offer larger and less expensive memory than the existing DIMMs, although a bit slower. This evolving memory hierarchy is extremely challenging for the system programmer. The difference will be in latency and bandwidth in this vertical memory model, and how the data is moved between the different layers. It is however, an area that is so new that there is not yet a lot of experience, and is therefore a big opportunity for programmers. The biggest challenge is where and how to move data. Getting that right will have a major impact on performance, because capacity is still limited. So developers will need to find out how to slice the problem in order to have a sizable portion of code executed in memory.

HPCR: Is there a performance difference between ARM and x86 architectures ?
SS: There is no significant performance efficiency advantage of Arm over x86 architecture. Designed for mobile devices, Arm processors are inherently more power efficient. However, Arm’s server class processors consume about the same than x86 cpus.

HPCR: Do you see GPUs use grow in the next years ?
SS: GPUs will definitely continue to play a role but the majority of use cases do not take advantage of them, although their performance and efficiency are growing. Nvidia has made the GPUs easier to use and program with CUDA, which is a different programming model, that makes some programmers shy away. So in the coming years I believe that GPUs will perhaps slightly expand their presence in HPC but should not grow overnight. The main benefit is that it allows specific parts of code to be optimized for GPU operation. GPUs are not faster but they’re getting simpler. Most of the processing will however continue to be done by the CPU, since Moore’s law is still in operation. Our preliminary tests with Intel’s forthcoming Xeon Phi Knight’s Landing is very positive in the internal systems, and we have already sold a couple of large systems based on this processor. This interconnect technology is on each node, allows for very high bandwidth and external memory cards. Dimm memory is not growing so fast.

HPCR: How do you see the interconnect technologies evolve?
SS: System interconnect between nodes is very important in order for a supercomputer to achieve optimal operational performance with its subsystems. Cray pioneered a network topology called dragonfly, which allows two nodes to communicate through a few hops from one node to another (small diameter). It is very tolerant with the job placement and uses a mix of optical and electrical cable signaling. It is very well suited to onterconnect hundreds of cabinets and processors. Our Dragonfly topology makes the network both performan and cost effective. Intel’s Omni Path 2 network will also use a mix of optical and cable topology. Regarding the need for evolution, there will probably be one more iteration in order to reach exascale capability. Cray has worked with the Argonne National Laboratory on the Aurora supercomputer, which on the procurement called Coral will use Omni Path 2 in 2018 which will be similar in nature.

We believe the state of system architecture in 2018-2019 should reach exascale. However, before the technologies come ou of the R&D labs, new subsystems will need to be introduced and we may not reach exascale ata borader level before 2023 according to the US Government. The main reason is that in 2020 exascale will till be very expensive and power hungry. By then the systems will use the next iteration of Xeon Phi, code named Knight’s Hill. But we reasonably expect two more iterations of this processor needed to reach exascale level.

HPCR: Can you tell us more about the state of research regarding storage ?
SS: Cray has been working extensively on storage technologies. Our Storage Systems use the Lustre file system which is built for efficient scalability and is very modular and linear in terms of performance, allowing as we wpeak up to 1.8 Terabytes per second throughput.

However the storage world is getting impacted by non volatile memory and flash. Our Datawarp appliances are flash blades sitting within the XC computers with extremely high bandwidth storage units for intensive throughput. We are currently investigating new technologies based on object storage for cost effective and large volumes of data and we are also looking into integration of new phase change memory.

The objective is to improve scalability and resilience towards exascale. The reason is that the storage system has a resilience problem which can become a challenge while using smaller transistors in individual units. We have good techniques on flash like very fast system check pointing which should scale up to exascale and reduce severely the risks of undetected hardware errors which can happen with traditional storage. This would be complemented by software techniques for self check in order to make sure errors have not crept in. So software architecture for resilience at very large scale is a big focus.

HPCR: Are you pursuing any technologies integrating machine learning ?
SS: Cray has pursued high performance performance analytics in machine learning as a workload running on cray systems for the customers but also for predictive system analysis. The objective is to be able to diagnose performance failure issues by proactively informing adminiinstratrots. One thing we believe is that good HPC technology is good for data analytics.

We achieve that by having robust I/O so we can efficiently move data between CPUs by mixing traditional simulation and modeling with data analytic techniques to reduce the amount of the data to analyze. This data volume reduction will help to optimize the primary component analysis in order to retain only the important part of the data to be analyzed, thus achieving better efficiency results. Several forms of machine learning are highly interesting, as well as unsupervised machine learning which looks for patterns in data even if you don’t know what to look for. Deep neural networks is another area of interest Cray is currently investigating.

HPCR: How do you see the businesses’ needs for HPC grow ?
SS: Many of our large existing customers coming from the commercial / enterprise markets have data analytics sensitive problems to solve, namely in Gas & Oil and Risk Analytics sectors. Cray works with UC of Berkeley’s App Labs to solve scientific problems using algorithms. The App Labs group created Spark technology.

Cybersecurity is another area of usage of incredible use. Graph Analytics are very difficult, but Cray has brought graph analytics together with hardware and software in order to improve cyber threat detection and insider threats. The reason of this area becoming prioritary being that every one is very worried about`growing cyber threats, with major corporations and governments under constant attacks. These technologies will also help monitor network activity and detect anomalous problems

HPCR: Can you tell us more about enterprise usage of HPC ?
SS: At the moment, 15% of Cray’s revenue comes from the private enterprise sector. The rest comes from our historical markets, government and academic. The enterprise sectors that need HPC are the energy and financial sectors, manufacturing, life sciences and cyber security. HPC answers these enterprise’s needs for scalability and productivity in order to achieve sustained performance, and by helping move capacity workloads move into the cloud. By this measure, Cray systems have become more interoperable and support docker containers. However, one should keep in mind that the workloads that people run on Cray systems do NOT move to the cloud. The reason is that the cloud is not built to provide performance. The biggest growth comes from Big Data and Analytics workloads, but the largest systems are still simulation and modeling.

Other areas needing high performance computing for large volumes of data are the activities using upstream processing exploration and production optimization like deep water drilling with very large arrays of undersea sonar arrays which generate an incredible amount of complex data and needs. Financial Services have needs based on data analytics in order to improve fraud detection and do risk analysis. In manufacturing, it is very mich used for computational fluid dynamics like those used in plane and car aerodynamics, crash testing, and noise vibration and harness tests.