AMD finally announced the launch of its first graphics accelerator based on the new “Hawaii” architecture. In addition to momentarily transporting us to the warmth of the Pacific Islands, this new GPU has the merit of outperforming its competitors in many ways, to the point that it may be the most eco-efficient to date. With a TDP of 275 Watts, it delivers up to 5.24 Tflops in single precision and 2.62 Tflops in double precision – 1.5 times the capacity of the latest NVIDIA Tesla K40 and even more that of the most powerful Xeon Phi. On the memory side, AMD is positioning itself at the top again (together with the 7100 version of Phi) with no less than 16 GB of GDDR5, with a bandwidth of 320 GB/s.
According to AMD-issued preliminary DGEMM performance results, the W9100 scored 2.35 Tflops (i.e. 90% efficiency), an impressive ratio that will have will be confirmed in the forthcoming official rankings. These numbers were achieved on an Intel Xeon E5-2620v2 workstation equipped with two FirePro W9100s for a total consumption of 650 W. The size of the tested matrices has not been specified but we are told the clBLAS library used was an alpha version still being optimized by AMD especially with regard to wide arrays (over 3000 elements). Nevertheless, these results are about two times higher than those provided by NVIDIA. With a $3,999 retail price, Jean-Christophe Baratault, AMD Sr. Business Development Manager, believes “the market’s perception will be that the W9100 delivers twice the DGEMM performance of the Tesla K40 for half the price.” The latter, in its active version, is currently priced around $9,000, for example on the HP website.
With a conservative estimate of 65% efficiency on Linpack, the extrapolated result would be 3.40 Tflops, or more than 5 Gflops/Watt, an equally remarkable achievement for AMD. For the record, the TSUBAME supercomputer at the Tokyo Institute of Technology is currently number 1 in the Green500 with 4.5 Gflops/W. In order to be ranking in the Green list, a system must be at least at the level of the Top500’s 500th machine, or 117 Tflops as of last November. An AMD site would therefore need to assemble only 80 of these W9100s to achieve a rankable configuration within a reasonable budget envelope.
The programming environment is currently based on OpenCL 1.2 but is already compatible with version 2.0 which includes, among other things, a shared virtual memory address space, nested parallelism and the support of atomics.
If the active, workstation version with up to 6 simultaneous 4K display ports is on sale now, the server version, with passive cooling, is expected for ISC14. It is probably fair to anticipate new efficiency ratios by then.
© HPC Today 2021 - All rights reserved.
Thank you for reading HPC Today.