An inside look at the UberCloud HPC Experiment

By Wolfgang Gentzsch and Burak Yenier | March 13, 2014

Team 46 – CAE simulation of water flow around a ship hull. The picture shows the pressure distribution on the hull surface.

5 – The UberCloud HPC Experiment accelerates HPC in the cloud

In theory, Cloud Computing and its emerging technologies such as virtualization, web access platforms and their integrated toolboxes, solution stacks accessible on-demand, automatic cloud bursting capabilities, etc., enable researchers and industries to use additional computing resources in a flexible and affordable on-demand way. Now, the UberCloud HPC Experiment provides a platform for researchers and engineers to explore, learn and understand the end-to-end process of accessing and using HPC clouds, to identify the issues and resolve the roadblocks (more in [9]). The principle is that end-users, software / resource providers and HPC experts collaborate in teams to jointly solve the end-user’s application problems in the cloud.

Since July 2012, the UberCloud HPC Experiment has attracted 1,100+ organizations from over 66 countries (as of February 2014). To date, the organizers have been able to build 125 of these teams, in CFD, FEM, Computational Biology and other prominent HPC domains, and to publish more than 60 articles about the UberCloud initiative, including numerous case studies about the different applications and lessons learned. Recently, UberCloud TechTalk and a virtual Exhibition [10] has been added along with a Compendium (sponsored by Intel) that includes 25 case studies from digital manufacturing in the Cloud [11].

5.A – The UberCloud HPC Experiment History

Inspired by the results of the Magellan Report [12], the UberCloud HPC Experiment idea arose in May 2012 while Wolfgang Gentzsch and Burak Yenier were comparing the acceptance of cloud computing for typical enterprises applications in contrast to High Performance Computing. While the adoption of cloud computing in the enterprise market is rapidly growing (41.3% per year through 2016, according to Gartner [13]), the awareness and adoption of cloud computing in HPC and digital manufacturing is still very slow. This is mainly due to obstacles such as inflexible software licensing, slow data transfer, security of data and applications and a lack of specific architectural features (resulting in reduced performance in the cloud). The idea of the UberCloud HPC Experiment was to find out more about the end-to-end process of bringing engineering applications to the cloud and to learn more about the real difficulties and how to overcome them. The Experiment started in July 2012.

5.B – The HPC Cloud Experiment: a practical approach

The technology components of HPC-as-a-Service that enable remote access to centralized resources in a multi-tenant way and their metered use are not unfamiliar to the HPC research and engineering community. However, as service-based delivery models take off, users have been mostly on the fence, observing and discussing the potential hurdles to its adoption in HPC. What is fairly certain is that we now have the technology ingredients to make HPC in the cloud a reality. Let’s start by defining what roles each stakeholder (industrial end-users, resource providers, software providers and high performance computing experts) has to play to make service-based HPC come together:

The industry end-user – A typical example is a small or medium size manufacturer in the process of designing, prototyping and developing its next-generation product. These users are prime candidates for HPC-as-a-Service when in-house computation on workstations becomes too lengthy and acquiring additional computing power in the form of an HPC server is too cumbersome or not in line with IT budgets. Plus, HPC is not likely to be the core expertise of this group.

The application software provider – This includes software owners of all shapes and sizes, including ISVs, public domain software organizations and individual developers. The UberCloud Experiment usually prefers rock-solid software, which has the potential to be used on a wider scale. For the purpose of this experiment, on-demand license usage is tracked in order to determine the feasibility of using the service model as a revenue stream.

The HPC resource provider – This pertains to anyone who owns HPC resources (such as computers and storage) and is networked to the outside world. A classic HPC center would fall into this category as well as a standard datacenter used to handle batch jobs or a cluster-owning commercial entity that is willing to provide cycles to run non-competitive workloads during periods of low CPU-utilization.

The HPC experts – This group includes individuals and companies with HPC expertise, especially in the areas of cluster management and software porting. It also encompasses PhD-level domain specialists with in-depth application knowledge. In the Experiment, these experts acting as team leaders, work with end-users, computer centers and software providers to help the pieces of the puzzle fit together.

For example, suppose the user is in need of additional resources to increase the quality of a product design or to speed up a product design cycle – say for simulating sophisticated geometries or physics or for running copious amounts of simulations for a higher quality result. This suggests a certain software stack, domain expertise and even hardware configuration. The general idea is to look at the end-user’s tasks and software and select the appropriate resources and expertise that match specific requirements.

Then, with modest guidance from the UberCloud Experiment team, the user, resource provider and HPC expert will implement and run the task and deliver the results back to the end-user. The hardware and software providers will measure resource usage; the HPC expert will summarize the steps of analysis and implementation; the end-user will evaluate the quality of the process and the results, in addition to the degree of user-friendliness this process provides. The experiment orchestrators will then analyze the feedback received. Finally, the team will get together, extract lessons learned and present further recommendations as input to their case study.

5.C – An experiment still in progress

For the 125 teams from 66 countries (with over 1,100 active and passive organizations involved), the end-to-end process of taking applications to the cloud, performing the computations and bringing the resulting data back to the end-user has been partitioned into 23 individual steps which the teams closely follow on the Basecamp collaboration environment. An UberCloud University (we call it TechTalk) has been created providing regular educational lectures for the community. And the one-stop UberCloud Exhibit [10] offers an HPC Services catalog where community members can exhibit their cloud related services or sele–ct the services which they want to use for their team experiment or for their daily work. Besides, many UberCloud Experiment teams publish their results widely (for example, the article by Sam Zakrzewski and Wim Slagter from ANSYS entitled: “On Cloud Nine” [14]). Finally, at the November 2013 Supercomputing Conference in Denver, The UberCloud received the HPCwire Readers Choice Award for the best HPC cloud implementation, [15].

[References]

[10] The UberCloud Services Exhibit.

[11] The UberCloud HPC Experiment: Compendium of Case Studies. Intel, June 25, 2013.

[12] Katherine Yelick, Susan Coghlan, Brent Draney, Richard Shane Canon. The Magellan Report on Cloud Computing for Science, Dec. 2011.

[13] Gartner Predicts Infrastructure Services Will Accelerate Cloud Computing Growth. Forbes, Feb. 2013.

[14] Sam Zakrzewski & Wim Slagter. On Cloud Nine. Digital Manufacturing Report. April 22, 2013.

[15] The UberCloud Receives Top Honors in 2013 HPCwire Readers’ Choice Award, Nov. 27, 2013.