K40, CUDA 6: NVIDIA focuses on performance

By The Editorial Team | December 27, 2013

It was SC 2013’s big announcement for the GPU acceleration specialist. Thanks to the continuous improvement of manufacturing processes, the Tesla K40 replaces the Tesla K20X at the top of the Kepler architecture with double the memory (12 GB of GDDR5), a number of active cores brought to 2880 (instead of 2688), a slightly increased frequency (745 MHz instead of 735) and a doubled PCIe gen3 connection – all in the same 235 W TDP. As a result, official peak SP / DP performance now reaches 4.29 / 1.43 Tflops. According to various benchmarks presented by the manufacturer, the measurable speedups lie between 20 and 40% – the highest results being obtained on the most memory-hungry applications (CFD, seismic…). NVIDIA also announced a new, developer-controllable “GPU boost” mode designed to temporarily increase the frequency of all cores simultaneously (up to 875 MHz) should the application really need it.

CUDA 6 is in da place

Engagement before the sacred union? The new version of CUDA, also announced at SC, unifies memory logically, to the delight of developers who will no longer have to explicitly manage data transfers between the two distinct CPU and GPU memories. Nothing to do, therefore, with the physical unification of both spaces, which would definitively remove ze bottleneck, but this Nirvana remains on the roadmaps. Beyond Unified Memory, CUDA 6 also provides drop-in libraries (automatic replacement of the CPU versions of BLAS and FFTW with their GPU equivalents) as well as the multi-GPU optimization of these libraries (up to 8 GPUs per node) and the support of workloads up to 512 GB. An important release in other words, to which we’ll devote an in-depth Programming feature (together with the HPC Labs) very soon.

More around this topic...