2016 Predictions for Supercomputing: Many-Core Processors

By Dr Sebastian von Alfthan | February 10, 2016

Dr Sebastian von Alfthan is a senior HPC specialist at CSC with a talent for deep optimization of codes for extremely high-performance as well as exploring novel processor architectures. He has also worked at the Finnish Meteorological Institute developing the space plasma code, Vlasiator.

Many-core processors making an impact

During the last twenty to thirty years the performance of processors has increased exponentially, first by increasing the clock frequency and the number of instructions per clock (IPC), and in the last ten years through increased parallelism. This trend has lead to multicore processors with more than ten cores each able to execute up to w16 double precision floating point operations per cycle, and GPUs with thousands of very lightweight cores able to run tens of thousands of simultaneous threads.

A major new architecture is introduced by Intel in 2016; the latest generation many integrated core (MIC) Xeon Phi processor, Knights Landing (KNL). This processor is not an accelerator, but a x86 CPU that is fully compatible with normal x86 processors. It is different in that it is a thoroughbred HPC processor giving in total 3 Tflops of performance per socket. It achieve this high performance through a number of new technologies, which in the future will become commonplace also for normal processors:

A very high core number. KNL contains 72 compute cores connected with a 2D mesh interconnect. This is a true “cluster on a chip”.

New AVX-512 vector instructions that are able to operate on 8 double precision numbers, enabling each core to execute 32 double precision floating point operations per cycle.

A new level in the memory hierarchy in the form of a high bandwidth memory sitting on the socket. This memory is 16 GB in size, and will have 5x more bandwidth than normal DDR4 main memory, while latency is comparable to normal main memory.

Integrated on-socket network interface for Omni-Path.

I predict that KNL in itself will become a successful processor architecture for HPC. I also predict that one should look at KNL as a proxy for what the future will bring. Tuning applications now for KNL by implementing a hybrid MPI + OpenMP parallelization scaling to tens of threads per MPI process, and by enabling the core loops to vectorize well will pay off on any architecture. The new memory level is also something that one should be able to exploit in the algorithms, the new flop-monsters need to be fed and optimizing memory traffic will become ever more important.