Over the past five years our HPC community has made a lot of good progress towards what some call the “democratization of HPC”, i.e. the trend towards making high performance computing (HPC) available to a larger and larger group of engineers and scientists. One influencing factor is cloud computing which enables those who can’t afford or don’t want to buy their own HPC server to access computing resources in the cloud, on demand, and pay only for what they use.
Since 2012, UberCloud together with the engineering community has performed 195 cloud experiments together, running engineers’ technical computing applications on different HPC clouds, and published 80 case studies. In the early days, each experiment took three months on average, and had a 50% failure rate. Today, our cloud experiments take just a few days, and have a 0% failure rate. What has happened? Clearly, five years ago, HPC cloud computing was in its infancy, and the users faced several severe roadblocks. But since then we have learned how to reduce or even remove them completely. And while the roadblocks were real five years ago, many of them have turned into myths, with the advent of new technologies, adequate business models, and the growing acceptance of cloud computing in general. Let’s have a closer look.
1. Clouds are not secure
This was the number one roadblock for many years, and it is still stuck in the heads and minds of many users. But over the years, cloud providers have hired the brightest security experts and integrated sophisticated levels of security to protect their customers’ data and applications. And virtual private networks guaranty an end-to-end secure link between user and cloud; and especially HPC workloads are often running on dedicated servers which are ‘owned’ by the user for as long as he rented them, avoiding potential multi-tenancy threads. For security reasons, application installations are only carried out by badged experts, and computing resources and storage are safeguarded like Fort Knox. Any cloud provider who caused a security breach would face the risk to be out of business the next day.
2. I have no control over my assets in the cloud
In the early days of cloud computing, you handed over (directly or through a user interface (“cloud platform”) your application and data to the cloud provider not knowing how they handle it nor what the status of your compute (batch) jobs was. Today, many cloud providers are offering more transparency. And, with the advent of software container technology from Docker and UberCloud, additional functionality like collecting granular usage data, logs, monitoring, alerting, reporting, emailing, and interactivity are putting the user back in control.
3. Software licenses are not ready for the cloud
Unfortunately, this is still true for some Independent Software Vendors (ISVs), while others are now adding flexible cloud-based licensing models for short-term usage and increased ‘elasticity’, either as Bring-Your-Own-License (BYOL), consumption based credits or tokens, or simply as pay per use. Still there are often hurdles which can cause headaches for the user. For example, some ISVs don’t allow upgrade of existing licenses (often limited by number of cores) to be able to run on a larger number of cores in the cloud. But with the increasing pressure from their existing customer base and from competitors and from software and support available in the cloud (like e.g. OpenFOAM for fluid dynamics or Code Aster for material analysis), ISVs might become more open and ready to better serve their customers in this regard.
4. There is no portability among different clouds
In the early days, on-boarding a user and her application onto the cloud was painful. Done once, there was no time and resources to move to another cloud, even if you have bet on the wrong horse, e.g. because the cloud architecture was not right, your jobs didn’t scale across a large number of cores, and performance went down instead of up. Today, with a healthy competitive landscape of different cloud providers and apps in the cloud, migrating software and data from A to B is mostly straight forward, often with help then from provider B. And this is especially true for ‘containerized’ applications and workflows which are fully portable among different Linux platforms.
5. Data transfer between the cloud and my desktop is slow
Many applications produce tens to hundreds of gigabytes of result data. Transferring that data from the cloud back to the end-user is often limited by the end-user’s last mile network. However, especially intermediate results can often stay in the cloud, and for checking quality and accuracy of the results often remote visualization is used, sending high-res graphics frames in real time back to the user (like NICE DCV), or performing the complete graphics postprocessing in the cloud (like Ceetron). For the final datasets, there are technologies available which filter, compress and encrypt the data, and stream it back to the user (like Vcollab). And if all this doesn’t help, e.g. in case of hundreds of gigabytes or even terabytes of data, over-night FedEx will always help One comment on LinkedIn was: “FedEx is still the most secure and reliable network”.
6. Cloud computing is more expensive than on-premise computing
Total Cost of Ownership (TCO) studies show that between 20% and 50% of the cost of acquiring and running an HPC system over three years is the cost of hardware. The other percents are high cost of expertise, system operation and maintenance, training, and electricity. To determine the individual TCO of a company it is necessary to perform an in-depth analysis of the existing in-house IT situation, like procurement and acquisition cost, staff needed to operate and maintain the system, electricity, datacenter cost, software cost and maintenance, and more. For a 500 – 1000 core system, TCO therefore can vary between $500K and $1.5 million over three years. And a colleague from ANSYS rightly commented: ”How can an organization accurately assess the – almost hidden – costs of longer queue wait times and delayed projects due to insufficient computing capacity with fixed infrastructure” , and how does this affect the engineers’ productivity? And this colleague was suggesting to rephrase the question from, “will cloud computing save us money?” to, “what value does it create for our organization?” For some cost and TCO guidance see for example the HPC Pricing Guide from Advanced Clustering Technologies, or the TCO calculators from Amazon AWS and Microsoft Azure.
Dividing for example $1 million TCO by 3 years, 365 days, 24 hours, and 500 cores results in $0.08 per core per hour, for a fully (i.e. 100%) utilized system. In reality, especially in small and medium enterprises, HPC servers are often used less than 60% on average over three years. In such a scenario, the resulting cost is more in the order of $0.15 per core per hour, while HPC cloud cores today cost between $0.05 and $0.10 per core per hour. And while cloud providers refresh their systems every six months, the users are stuck with their existing on-premise system (and technology) for at least three years.
7. Cloud-enabling my software can take weeks or even months
This is certainly still true for complex HPC software and workflows developed in-house over many years by many people. But today many applications are already in the cloud, especially major commercial codes. Or you set up your compute environment in the cloud yourself, install the binaries, and upload your data. The good news is that there is now an elegant solution to resolve this hurdle as well: software containers from Docker and UberCloud. The UberCloud containers are especially well suited for high-performance engineering and scientific workloads. They come with dozens of additional HPC layers for: parallel computing (MPI and OpenMP), remote visualization, InfiniBand, secure communication, single-tenant ownership, license server, NFS, log monitoring, and more. All of this running on any Linux system; packaging once, running anywhere; available at your fingertips, within a second, in any private and public cloud.
Despite the continuous effort of lowering and even removing these 7 hurdles on our way to the cloud we still haven’t reached the final goal: the availability of computing as a utility, similar to water, gas, electricity, and telephony. But there are a number of trends that are making me optimistic: digital natives are entering the business world; the trend of new and open source software; and a growing spectrum of affordable cloud-enabled software, on demand and pay per use. Over time there will be an increasing pressure on conservative market forces and a growing support for customers and user-friendly business models for mainstream cloud computing.
Wolfgang Gentzsch is President and co-founder of UBERCLOUD. He was the Chairman of the Intl. ISC Cloud Conference Series, an Advisor to the EU projects DEISA and EUDAT, directed the German D-Grid Initiative, was a Director of the Open Grid Forum, Managing Director of the North Carolina Supercomputer Center (MCNC), and a member of the US President’s Council on Science & Technology PCAST. In the 90’s Wolfgang founded several HPC companies, including Gridware (which developed the distributed resource management software Grid Engine) acquired by Sun where he became Sun’s Senior Director of Grid Computing
An earlier version of this article appeared on LinkedIn Pulse, on May 25, and caused a great deal of excellent comments and discussion, which I now have included in this updated version of the article.
More around this topic...
© HPC Today 2023 - All rights reserved.
Thank you for reading HPC Today.