Building Containers for HPC
Written by Dominik Pantůček on 2026-06-04
dockerWhen working with scientists from other fields, they sometimes need help with processing their data. And sometimes you just have to wonder why their pipelines are built the way they are...
For the kind of number crunching you encounter for example in bioinformatics, you typically need compiled programs tailored to the task you are trying to solve. However as most of the high-performance computing these days happens on some clusters running containerized images, you need to package your programs in a container. And most scientists are not programmers.
The result is that typical HPC container with compiled programs contains also the complete development environment and instead of 60MB it sports 6GB without any issues. And that is something we had the chance to change at least in one instance.
However when using Docker for these
containerized workflows it is possible to use multi-stage build process to include only
the required binaries in the resulting image. Suppose we have a Rust program called program1.rs and we want
to create a minimal image with its compiled form. For the purpose of this article we'll
consider the canonical Hello World program:
fn main() {
println!("Hello World!");
}
A straightforward solution would be as follows:
# Dockerfile
FROM alpine:latest
RUN apk add --no-cache rust
COPY ./program1.rs /home/program/
RUN cd /home/program ; rustc program1.rs
ENTRYPOINT ["/home/program/program1"]
The problem is the resulting image is unnecessarily large:
$ docker image ls
IMAGE ID DISK USAGE CONTENT SIZE
program1:latest 073e73f355d5 611MB 0B
Switching to temporary image for building the binary and then starting with a fresh one yields much better results:
# Dockerfile
FROM alpine:latest AS build
RUN apk add --no-cache rust
COPY ./program1.rs /home/build/
RUN cd /home/build ; rustc program1.rs
FROM alpine:latest
COPY --from=build /home/build/program1 /home/program/
COPY --from=build /usr/lib/libgcc_s.so.1 /usr/lib/
ENTRYPOINT ["/home/program/program1"]
And those results:
$ docker image ls
IMAGE ID DISK USAGE CONTENT SIZE
program1:latest d78fe2d1a983 9.06MB 0B
Of course, in real-world scenario you need to copy any and all shared libraries required by your program. But that should not be a problem.
Running the docker image is as simple as docker run:
$ docker run program1
Hello World!
Lately, Singularity is gaining some traction in scientific HPC and therefore it is a good idea to be able to support it as well. With Singularity installed on your local machine, it is not hard at all:
$ singularity build -F program1.sif docker-daemon://program1:latest
INFO: Starting build...
INFO: Fetching OCI image...
INFO: Extracting OCI image...
INFO: Inserting Singularity configuration...
INFO: Creating SIF file...
INFO: Build complete: program1.sif
And what about the size?
$ ls -lh program1.sif
-rwxr-xr-x 1 joe joe 3.9M Jun 7 19:48 program1.sif
And how do we run it?
$ ./program1.sif
Hello World!
Although this seems to trivial if you think about that even for a second, if more scientists would build their containers this way, we could save a lot of computing resources in the long term.
Hope you liked this little side-track into the world of scientific high-performance computing and as always see ya next time!