Meet the Green 500 winner team
The top rank was awarded to ASUS supercomputing cluster powering the GSI research facility in Darmstadt, Germany. Here is the story and the heroes of this achievement. We have interviewed Dr David Rohr, who manages the cluster along with Prof. Dr. Lindenstruth, head of IT.
What does GSI do? In which field does GSI perform? What are the contributions that GSI bring to the public?
GSI is a research facility for heavy ion physics in Darmstadt, Germany. GSI performs fundamental research in several fields, which includes discovery and exploration of heavy elements, the operation of a particle accelerator and high energy physics experiments, and recently GSI played an important role in the development of cancer therapy with ion beams method in cancer therapy is radiation therapy, which traditionally uses photons in electromagnetic waves. This develops a lot of radiation not only to the tumor but also to its environment, in particular to the tissue and skin in front of the tumor thus destroying healthy tissue as well. By exploiting the so called “Bragg peak”, you can shoot below your skin and hit the tumor directly minimizing collateral damage. This was something developed at GSI and is operation and saving lives today
How do you find co-working with ASUS Sales/PM/R&D team for such a project? Why did GSI choose the AMD card and ASUS server as a combination?
AMD FirePro GPUs provide enormous compute performance and excellent power efficiency. As for ASUS servers, we had good experience with ASUS servers which are reliable and stable. Also, ASUS R&Ds provide feedback right away. With GSI being located in Germany, AMD in Toronto, and ASUS in Taiwan, all in different time zones, coordination is difficult but everyone did an excellent job and we have been able to solve all problems quickly and in time.
Did you encounter any difficulties when building the supercomputer? How did you solve those
problems? Do you find ASUS dedicated to solving issues, assuming any were encountered
during the project?
Of course, it is difficult to build a supercomputer. We didn’t choose an integrated “turnkey” solution. Instead, we chose all the components and performed the assembly and commissioning ourselves. So it is normal to have issues when so many components such as CPU, Memory, and GPU all come together. Both ASUS and AMD did a great job identifying and solving all issues. Looking at the sheer number of servers, processors, GPUs, and memory modules, it is natural that we see some broken hardware and it is a lengthy process to find and solve all error sources. It is because of the dedicated work of all the people we achieved a stable system so fast.
Are you satisfied with Asus’ technical support? Are they reacting in good time to your problem-solving requests?
We are quite satisfied with the support we got. ASUS R&Ds are friendly and usually provided feedback very fast within 24 hours. This was a great help during construction of the cluster.
What is your most impressive experience to date while running or stressing the systems?
This is probably the fact that everything just worked at the moment when we plugged all the solutions together. It was a very difficult situation at the beginning. It was a very long process;
you fix one issue, another issue, and another one. We were really very late in the schedule. At the beginning we had issues with the BIOS, drivers, boot problems, network issues, and bugs in our Linpack implementation. In parallel we were trying to tune the hardware via voltage and frequency adjustments. But then, we put all the parts together into the 160 node system; and it was working in just one week.
What is your overall impression of the world’s number-one Green500 supercomputer? How does it work for and contribute to your organization?
It is good to have one common benchmark to compare all the super computers. And the Green500 list is great because it raises awareness for the important fact that we, both vendors and users, have to improve the energy efficiency of our compute centers. Of course, we should build supercomputers for our real application and not aim exclusively at the Green500 list. We have been very lucky that the requirements for our Lattice QCD application and the requirements for superior energy efficiency are quite similar. And we are delighted about reaching the number-one spot on the Green500 list, having constructed the most power efficient supercomputer in the world.
What is your next plan for the supercomputer? Any plan after this?
We have some plans. With respect to Linpack, we have maxed out quad-GPU systems, there is nothing we can tune anymore; maybe 2%. But the more GPUs we can put in the server, the more money we can save on the CPUs. We can also become more power efficient. At the moment we are experimenting with an eight-GPU setup. Besides that we have many other high performance computing projects. Recently, we installed a new cluster at CERN in Geneva, where we are responsible for the High Level Trigger of the ALICE Experiment. And GSI is currently in the process of building a large new high energy physics facility called FAIR. FAIR will require enormous compute resources for simulation and for event reconstruction. Starting next year, we will certainly purchase a large amount of servers and also GPUs as accelerators for FAIR.
Will you involve the ASUS Server division in your future HPC project plans or any datacenter project in your facility?
We are a public research institute; which means our purchase works via public tenders. We define our requirements, evaluate different solutions on the market, and finally the most cost efficient solutions makes the deal. And we found out that ASUS servers are quite competitive for the projects we had and I see no reason why this should change in the future.
The Green500 winning team from ASUS, AMD and GSI after being awarded the top position.
Smiling and proud !
© HPC Today 2021 - All rights reserved.
Thank you for reading HPC Today.