An inside look at the UberCloud HPC Experiment

By Wolfgang Gentzsch and Burak Yenier | March 13, 2014

Once a matter of theoretical debate, high performance computing in the cloud is now becoming a reality. After reviewing (and demystifying) the issues traditionally associated with it – performance, cost, software licensing, security… – let’s take an insider look at the advances, challenges and a few application use cases resulting from the UberCloud HPC Experiment…

Wolfgang Gentzsch, Burak Yenier – Cofounders, The UberCloud.

Cost savings, shorter time to market, better quality, fewer product failures… The benefits that engineers and scientists could achieve from using HPC in their research, design and development processes can be huge. Nevertheless, according to two studies conducted by the US Council of Cometitiveness (Reflect and Reveal) [1], only about 10% of manufacturers currently use HPC servers when designing and developing their products. The vast majority (over 90%) of companies still perform virtual prototyping or large-scale data modeling on workstations or laptops. It is therefore not surprising that many of them (57%) face application problems due to the inadequacy of their equipment; more precise geometry or physics, for instance, require much more memory than a desktop could possibly possess. Today, there are two realistic options to acquire additional HPC computing: buy a server or use a cloud solution.

Many HPC vendors have developed a complete set of HPC products, solutions and services, which makes buying an HPC server no longer out of reach for an SME. Owning one’s own HPC server, however, is not necessarily the best idea in terms of cost-efficiency, because the Total Cost of Ownership (TCO) is pretty high, especially considering that maintaining such a system requires additional specialized manpower.

In addition to the high costs of expertise, equipment, maintenance, software and training, buying an HPC system also often requires a long and painful internal procurement and approval processes. The other option is to use a cloud solution that allows engineers and scientists to continue using their regular computer system for their daily design and development work and to “burst” the larger, more complex jobs into the HPC cloud as needed. In this way, users have access to virtually limitless HPC resources that offer higher quality results. In management and financial terms, a Cloud solution helps reduce capital expenditure (CAPEX). It offers businesses greater agility by dynamically scaling resources as needed and is only paid for when used.

1 – What is an HPC cloud?

According to the National Institute of Standards and Technology (NIST) [2], “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST further explains that the cloud model is composed of “five essential characteristics” (on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service), three “service models” (Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) and four “deployment models” (private cloud, community cloud, public cloud and hybrid cloud).

Standard cloud services can address a certain portion of HPC needs, notably those that don’t require a lot of parallel processing such as parameter studies with varying input and low I/O requirements traffic. However, many HPC applications cannot be shoehorned into standard cloud solutions and consequently require hardware designs that can efficiently run certain HPC workloads from various science and engineering application areas.

Many distributed computing applications are developed and optimized for specific HPC systems and require intensive communication between parallel tasks. To perform efficiently in an HPC cloud, these applications require additional system features, such as:

• Large capacity and capability resources, application software choice and a physical or virtualized environment depending on performance needs. The use of high performance interconnects and dynamic provisioning can offer cloud features while maintaining HPC performance levels.

• High performance I/O is often necessary to ensure that many I/O-heavy HPC applications will run to their fullest potential. As an example, pNFS might provide a good plug-and-play interface for many of these applications. Back-end storage design, however, will be crucial in achieving acceptable performance.

• Fast network connection between the high performance cloud resources and the end-user’s desktop system. Scientific and engineering simulation results are often range from many Gigabytes to a few Terabytes. Additional solutions in this case are remote visualization, data compression, or even overnight express mailing a disk with the resulting data back to the end-user (by the way, a quite secure solution).

Additionally, there may be other issues that need to be addressed before an HPC cloud can deliver low-cost and flexible HPC cycles; a careful analysis of application requirements is required in order to determine effective HPC performance in standard cloud offerings.

2 – Security in the cloud

Security is the major issue in all cloud deployments and this concerns every organization, not just small and medium enterprises. In order to protect its “crown jewels”, any organization migrating to the cloud should primarily be concerned with the physical location of its data. It should also address a number of questions related to back-up and recovery, auditing, certification and of course, security. In particular, who on the service provider’s side will have access to the data. Achieving acceptable levels of security is not only a matter of technology, but also a matter of trust, visibility, control and compliance. This is achieved through effective management and operational practices. In other words, security is first and foremost a human issue, because when end-users send data and applications into the cloud, they sacrifice complete control.

However, security does not appear to be a major issue for cloud customers, as the market is currently growing by approximately 30% per year. Optimistically, there is no reason to believe that HPC in the cloud will not follow the same trend.

We could not agree more with Simon Aspinall [3] when he says “as with any other new and disruptive technology, there is still some hesitation and some assumptions about how businesses should run, that have until now inhibited the speed of adoption in the enterprise market. These arguments, especially around security, are oddly similar to those that were once voiced around the Internet, online retail and even the mobile phone. Businesses not doing so already would be well advised to take a page from history and adapt or risk getting left behind by competitors already benefiting from the efficiencies a Cloud solution delivers. As with any other major decision, the key is to educate ourselves on the available options, do diligence, define our “Cloud,” and adapt at a pace that is best for our own business.”

[References]

[1] Council of Competitiveness, ‘Make’, ‘Reflect’, ‘Reveal’ and ‘Compete’ studies 2010.

[2] NIST Cloud Definition.

[3] Simon Aspinall, Security in the Cloud. Strategic News Service, Volume 16, N°39, Oct. 28, 2013.