Containers, meet HPC

By Olli-Pekka Lehto | March 24, 2016

Olli-Pekka Lehto works in various roles involving supercomputers since 2002. These days he manages the Computing Platforms group at CSC.

In the last couple of years, container technologies have been rapidly adopted in the IT industry and the technology has all but become synonymous with Docker. Docker has taken the fairly mature concept of containers (Solaris Zones, IBM LPAR, LXC…) and rapidly developed it into a user-friendly product with a workflow that seamlessly fits the DevOps philosophy and modern microservices architectures.

There have been a huge amount of introductory articles written about this technology itself so I’m not going into it in detail but rather focusing on how Docker containers could be utilised in a HPC context.

A quick summary for the uninitiated: Containers offer many of the benefits of virtualisation with much faster startup and lower overhead. Docker is an easy and efficient framework to build and manage these containers.

State of Containers in HPC
Currently containerised HPC is in its infancy but momentum is clearly building as evidenced by the following developments during 2015:

IBM announced in spring that their LSF suite supports Docker but at least I have not heard of real-world use cases (I’d really be interested in hearing if someone has them).

There was an excellent and well-attended workshop at ISC15 in July around the subject. Here are some of the presentations: link link link. Hopefully there will be a follow-up also in ISC16.

More recently Cray announced nearly imminent support for Docker in SC15 in November. This seems to largely leverage on the excellent work that NERSC has done with their “Shifter” User Defined Images project. I’m really looking forward to piloting it on our systems as well.

At CSC we are developing a service called Pouta Blueprints which provides a very frictionless way to launch sandbox containers for web-enabled applications such as RStudio and Jupyter. The initial use cases have been to have throwaway systems for courses but there are a lot further use cases to be explored.

The recently published draft of the APEX2020 RFP for the next-gen Department of Energy systems has support for containers as a desireable feature. These kinds of big procurements of U.S. national labs are typically a good place to catch glimpses of what the future of HPC will likely hold: There’s typically enough money on the table to affect the vendors’ technology roadmaps in a very concrete way.

What are containers good for?
The basic value proposition is that they help to manage and run applications with complex dependencies easily and efficiently. This can be utilised in the context of HPC in a variety of ways. These cases assume a model where we are running the containers as tasks under a HPC batch job queuing system such as SLURM.

“I need root!” Every now and then we come across “arrogant” applications that assume a dedicated cluster or at least the owner has privileged access, for example to deploy some RPMs, run a web server or even run through a dedicated batch job scheduler system. These applications are a pain because they are difficult to shoehorn into working nicely in a normal, shared HPC system. This is probably the most obvious case where containers are beneficial. We can simply grant the application all the privileges it wants in a nice sandboxed environment.

Complex environments Some application stacks can be very intricate and have a variety of dependencies to exotic libraries and/or have build systems which are not designed with humans in mind. Containers enable packaging these in a nice way with all the dependencies rolled into one blob.

Custom distros We use largely CentOS and RHEL at CSC as they clearly have the best compatibility with all the various ISV applications that we require. However, some homebrew applications have been developed and tested on Ubuntu or SLES and some distro-specific dependencies have slipped in. With containers one can easily use the distro of their liking with relative ease.
Bringing your own application+stack One of the pros of Docker is the simplicity one can share containers through the public DockerHub or a private repository. This enables users to deploy and run containers of their choice on different systems with ease using the simple command-line tools.

Preserving the stack for reproducibility I’m often surprised how sparsely the underlying software stack is described in papers which rely on computational results. Typically the main system name and software versions are included but rarely is there very detailed information on intermediate libraries (i.e. MPI, numerical libraries) and their versions. This can make reproducing the results extremely difficult, especially as HPC centers may clean up and retire old versions of libraries with little notificaton. Of course the systems themselves will retire at some point.

Containerising the application makes it possible to capture and preserve the software stack. It’s not guaranteed to provide perfect reproducibility (for example, hardware and kernel version may affect results) but at least it’s a huge improvement on the current state of things.

Sharing validated HPC stacks to users The ease of sharing also makes it possible to HPC providers to provide containerised versions of their HPC software stacks. Users can then run on their own small clusters, workstations and laptops for development and initial debugging, for example.

Peer-to-peer sharing of stacks This should also make it easier to share applications between facilities in federated Grids (such as EGI) or even make it easier to balance workloads on different systems within a facility. At CSC, where we have a Cray XC and a commodity cluster, the latter is a big selling point.

Enhancing cluster management and testing A completely different use case, but one worth mentioning, is to use containers on the management backend of HPC systems. For starters, one can quite easily simulate complex cluster stacks with a reasonable number of compute nodes on a laptop. Something that’s not really possible with VMs. Christian Kniep’s blog has a lot of excellent examples of this. The logical next step would be to use containers in running production cluster services. I don’t know of any site where this has been done yet though.

What about virtualisation?
Many of the aforementioned cases can already be addressed in a virtualised HPC cloud environment such as our Pouta services at CSC, but with some caveats:

Scheduling inefficiency With a container you can submit the jobs with your batch job scheduler (SLURM, PBS etc.) which have sophisticated queuing policy engines. With a VM you are typically dealing with a cloud management system (like OpenStack) which has a more basic scheduler that’s not really designed for dispatching and queuing in a HPC environment. Perhaps in the future advanced schedulers like Kubernetes and Mesos could accomplish this but there’s quite a lot of work to do.

Launch overhead The initialization of a VM takes a fairly long time compared to a container (<0.1 sec vs >20sec), so setting up an on-demand virtual cluster to run very short jobs carries a large overhead.

Resource overheads Running VMs can carry a larger overhead on both the amount of disk the images consume as well as the memory utilization.

There are definitely still places where virtualisation remains relevant, for example running Windows instances, having long-running dedicated resources or wanting a fully isolated environment for security purposes to name a few.

Challenges
There are a lot of benefits but also some challenges and questions remain:

Docker needs a fairly recent distro to work and privileged access. These are adressed to a large extent by the aforementioned Shifter. However, it is still a product that’s very early in its development cycle and I have not yet tested it in real life so it remains to be seen how seamless it is.

A second question that will be interesting is how to deal with managing compatibility with low-level drivers (GPUs, parallel filesystems, interconnects etc.) which are exposed to the container. I could imagine there may be some compatibility issues if, for example, a CUDA library is too new to the underlying kernel driver.

It may be tempting to just build a container by hand but to make things sustainable, the build recipe should still be captured in some sort of configuration management framework and put under version control. Docker’s own Dockerfile format provides a fairly straightforward way for simple applications.

However, for more complex cases with many dependencies, using something more sophisticated and HPC-oriented like EasyBuild may be a good idea. An added benefit is that you can target also VMs and bare metal with the same configuration as well as a variety of ready recipes for building various applications.

Containerising and sharing licensed software and libraries is potentially a can of worms that’s going to be interesting. I’m not going to even go there in this article..

Containerise Everything… or Not
I also foresee that there is the risk that with the ease of use and easy deployability of containers, people will see it as an “easy out” for tackling all kinds of porting challenges regardless of their complexity.

Example: Instead of putting a bit of effort into figuring out how to adapt their application to the HPC-center’s own typically well-defined and highly-tuned (bare metal) computing environment, people will just dump everything in a container in a quick-and-dirty way.

Initially this may save work but in the long term this may cause users to lose touch with their application, turning it into a black box. This could result, for example, in performance issues and make debugging difficult.

The “container as magic bullet for everything” philosophy could also lead to reduction of investment into the standard bare metal computing environment as nearly everyone is just deploying their own stacks, resulting in stagnation of the standard environment. The worst case could be a cycle that ends up with all groups maintaining disparate stacks for their projects with performance typically lacking compared to the original, standard environment.

How to avoid this? Here are a few educated guesses:

Using containers judiciously, preferring “traditional” deployment into the bare metal computing environment if there’s no compelling reasons for containerising.

Educating users to create “sustainable” containers that can be rebuilt and updated from Dockerfiles, EasyBuild blocks or some other configuration management system of their choice.

Developing standard base containers for the HPC center that are tuned to their systems, clean, well-maintained and compatible with the “bare metal” environment. Having everything in configuration management would help here as well.

Conclusion
Containers offer a lot of potential in complementing bare metal computing environments and Virtual Machines and will almost certainly establish a strong standing in HPC in the next couple of years. However, there are still a lot of work to do and potential caveats. Interesting times ahead..