Verbatim: Dan Stanzione, Acting Director – TACC
By   |  June 03, 2014

On the agenda:
• Growing from a local to a leadership supercomputing center: the history
• TACC’s Big Data ambition
• Lessons learned from Stampede’s Xeon Phi architecture
• A vision of the next scale machines

Interview by Stéphane Bihan and Stacie Loving

Mr Stanzione, in just about ten years, TACC has emerged from a modest local computing center to a leading, world-class supercomputing center. How has this been possible? And more generally, what is the history of TACC?

TACC started in June 2001, almost 13 years ago. Since that time, we’ve obviously grown a lot. We have ten times more staff and about a million times more computing power. It’s been a long, hard process to get to this size but we’ve been fortunate, especially since sponsoring agencies have increased their funding for HPC in 2006. They’ve decided to invest more money in larger systems.

For the most part, we are sponsored by the National Science Foundation (NSF), which is where most of the university-funded supercomputers come from. The largest chance of funding in the US is through the DoE (Department of Energy), via National Labs, but on the academic side, it’s the NSF. In 2006, they initiated a program to advance extreme computing. We had a very ambitious plan for that and this is where we’d been making progress up to then. But the point that really propelled us to the forefront was when we won funding for the Ranger system. The proposal was in 2006 and we deployed the system in 2007 to be in production in 2008. At that time, Ranger was the fourth fastest system in the world and the largest open science system. That really put us on the map as a leadership center. We had the drive, a very aggressive plan and we found the right partners and vendors to work with who were in the process of deploying new technologies that would give us an advantage. Hard work and ambition combined with a bit of luck and opportunity, that’s the TACC story.

The most important thing is our ambition and willingness to come up with good and innovative ideas. We worked closely with the vendors to use that funding and to get the most we possibly could from it.

Why did NSF choose TACC?

It was a competitive proposal process. Our proposal was deemed the best for a number of reasons, one being that we found the right partners at the time, using AMD processors. This really gave us an advantage. Sun was also a decisive partner in providing High Performance Computing resources. We worked with Andy Bechtolsheim at Sun to create a new platform. We also decided to take a few risks, along with the vendor, to put together an aggressive plan. It was not only the system but also the support services that we could provide – ie the type of training and our track record in dealing with users – that would take us to the next level.

After the departure of Jay Boisseau last January, you have taken the position of acting director of TACC. What are your plans for the center?

I’ve been the Deputy Director here for the last five years before taking over from Jay in January. Jay had a lot to do with all of that growth, he was really the driving force here during the twelve+ years he was running the center. Fortunately, we’re in a very good position to move forward and there are several opportunities that we can take advantage of.

We have to establish our leadership in the High Performance Computing arena with the NSF. There is an opportunity to expand the range of what we do and we’re already starting to enter into more data-driven science and data-intensive computing. Our next system, Wrangler, will be a system built with a primary focus on data-intensive applications. It’s scheduled for production in January 2015 and will provide for a wider range of scientific workflows.

We’re also looking to diversify in terms of working with other agencies. In particular, we are becoming increasingly more involved in micro-sciences research and health care. We’ve been working hard to be in compliance with all the various rules and regulations regarding personal health information which requires varying degrees of privacy and handling open scientific data with the DoE. So we will be pursuing opportunities with new agencies and new technologies around data.

TACC is mainly funded by NSF while ORNL, for instance, is mainly funded by the DoE. What are the main differences in terms of mission between centers like ORNL and TACC? Is it about different sciences?

It’s partially about different sciences and partially about different missions. The DoE is a very specific type of agency with several different mandates but their primary mission is to secure and improve energy sources. There are actually two kinds of centers in the DoE, one is the NNSA (National Nuclear Security Administration) that runs Los Alamos and Sandia in addition to other energy labs. They mainly focus on new sources of energy. They work on a range of other sciences too, but their first priority is energy. On the other hand, the NSF has a broader goal to promote science, engineering, and technology in general. They fund infrastructures to support not only the NSF’s initiatives but also those of other agencies in the US for health, food production, etc. We focus mostly on the university user community and academics while the DoE labs mainly focus on laboratory users and work within the scope of the DoE’s mission.

That being said, the DoE has made a significant commitment to High Performance Computing through the years. They have built some of the largest systems in the world but they aren’t traditionally “open”. The DoE has very large scale systems that support a few dozen user groups while we support a few thousand user groups around the country and, through partners, around the world. We take on Open Science projects that will be published for the general public just about anywhere.

Can you briefly describe the organization of TACC?

We have eight different areas or departments: the Advanced Systems group which focuses on the actual operation and deployment of the hardware, and keeping our systems up and running. They’re supported by the High Performance Computing group which concentrates on the application stack, scalable algorithms, and answering questions like “How do we get the code to run on 500,000 cores?” or “How do get we the code to run on the Xeon Phi?”. They’re mostly PhD’s and computational scientists. We also have the Visualization area that focuses on the data we produce, how we visualize it, how we interpret it in both scientific and informational visualizations. They also look at the human-computer interface side of how people interact with the data. Moving up to the application stack, we have an Advanced Computing Interfaces group which looks at web portals, programmer interfaces and APIs, specifically for the ways that allow people to move away from the command line to use high performance computing. These are our four technology areas.

Due to the increased focus of many new projects on life sciences, and different manners of computing – an area that doesn’t have the same traditional C++ and FORTRAN users who are accustomed to scalable code – we created the Life Sciences Computing group solely focused on these new users and methodologies. This group maintains an extensive set of applications to support phylogenetics, computational chemistry, genomics, genetics, and bioinformatics. We also have the User Services area that manages training, allocation, education, and project management. And finally there is our Operations and Administration Centers which basically cover everything else.

Navigation

<123>

© HPC Today 2024 - All rights reserved.

Thank you for reading HPC Today.

Express poll

Do you use multi-screen
visualization technologies?

Industry news

Brands / Products index