Building Containers for HPC

Written by Dominik Pantůček on 2026-06-04

docker

When working with scientists from other fields, they sometimes need help with processing their data. And sometimes you just have to wonder why their pipelines are built the way they are...


For the kind of number crunching you encounter for example in bioinformatics, you typically need compiled programs tailored to the task you are trying to solve. However as most of the high-performance computing these days happens on some clusters running containerized images, you need to package your programs in a container. And most scientists are not programmers.

The result is that typical HPC container with compiled programs contains also the complete development environment and instead of 60MB it sports 6GB without any issues. And that is something we had the chance to change at least in one instance.

However when using Docker for these containerized workflows it is possible to use multi-stage build process to include only the required binaries in the resulting image. Suppose we have a Rust program called program1.rs and we want to create a minimal image with its compiled form. For the purpose of this article we'll consider the canonical Hello World program:

fn main() {
    println!("Hello World!");
}

A straightforward solution would be as follows:

# Dockerfile
FROM alpine:latest
RUN apk add --no-cache rust
COPY ./program1.rs /home/program/
RUN cd /home/program ; rustc program1.rs
ENTRYPOINT ["/home/program/program1"]

The problem is the resulting image is unnecessarily large:

$ docker image ls
IMAGE               ID             DISK USAGE   CONTENT SIZE
program1:latest     073e73f355d5        611MB             0B

Switching to temporary image for building the binary and then starting with a fresh one yields much better results:

# Dockerfile

FROM alpine:latest AS build
RUN apk add --no-cache rust
COPY ./program1.rs /home/build/
RUN cd /home/build ; rustc program1.rs

FROM alpine:latest
COPY --from=build /home/build/program1 /home/program/
COPY --from=build /usr/lib/libgcc_s.so.1 /usr/lib/
ENTRYPOINT ["/home/program/program1"]

And those results:

$ docker image ls
IMAGE               ID             DISK USAGE   CONTENT SIZE
program1:latest     d78fe2d1a983       9.06MB             0B

Of course, in real-world scenario you need to copy any and all shared libraries required by your program. But that should not be a problem.

Running the docker image is as simple as docker run:

$ docker run program1
Hello World!

Lately, Singularity is gaining some traction in scientific HPC and therefore it is a good idea to be able to support it as well. With Singularity installed on your local machine, it is not hard at all:

$ singularity build -F program1.sif docker-daemon://program1:latest
INFO:    Starting build...
INFO:    Fetching OCI image...
INFO:    Extracting OCI image...
INFO:    Inserting Singularity configuration...
INFO:    Creating SIF file...
INFO:    Build complete: program1.sif

And what about the size?

$ ls -lh program1.sif
-rwxr-xr-x 1 joe joe 3.9M Jun  7 19:48 program1.sif

And how do we run it?

$ ./program1.sif
Hello World!

Although this seems to trivial if you think about that even for a second, if more scientists would build their containers this way, we could save a lot of computing resources in the long term.

Hope you liked this little side-track into the world of scientific high-performance computing and as always see ya next time!