Advanced Docker: Multistage parallel Docker build

Published 13 Oct 2020 - 8 min read

Docker is the tool we use every day in our development, but how much time do you waste waiting for Docker build to complete? And how do you deal with gigantic image size?

What if I tell you there’s a better way to build your containers?

Your favorite next tool is called Buildkit!

In this tutorial we’ll dive into the advanced usage of Docker to optimize your development process either in build time and in the size of the image itself. We will do it using Buildkit parallel multistage builds.

Buildkit

Buildkit is a toolkit developed by the Moby project to enhance the build and the packaging of software using containers.

Main features

Among the different features, Buildkit offers automatic garbage collection to clean up unneeded resources, concurrent dependency resolution and efficient instruction caching. Buildkit is part of docker build since Docker 18.06.

How to enable Buildkit

If you want to use the Buildkit powered build engine you can do it using the environment variable DOCKER_BUILDKIT=1 docker build.

It’s also possible to enable Buildkit by default:

  • Edit the daemon configuration in /etc/docker/daemon.json and add

    { "features": { "buildkit": true } }
  • Restart the daemon with
sudo systemctl daemon-reload
sudo systemctl restart docker

Code example

For this tutorial we are going to prepare an image to deploy an instance of Prometheus in production. We will start from a standard Dockerfile and we will refactor it to improve performances.

Legacy Dockerfile

We are going to build Prometheus from source code, to do that we need a Docker image with all its build dependencies: golang, nodejs, yarn and make.

FROM ubuntu:bionic

ENV GOPATH=$HOME/go
ENV PATH=$PATH:/usr/local/go/bin:$GOPATH/bin

RUN apt-get update \
    && apt-get install -y curl git build-essential \
    && curl -sL https://deb.nodesource.com/setup_14.x | bash - \
    && apt-get install -y nodejs \
    && npm install -g yarn \
    && curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz \
    && tar -xvf go1.15.2.linux-amd64.tar.gz \
    && mv go /usr/local \
    && git clone https://github.com/prometheus/prometheus.git prometheus/ \
    && cd prometheus/ \
    && make build 
# RUN ./prometheus --config.file=your_config.yml

and let’s build it with:

$ time docker build --no-cache -t prometheus . -f Dockerfile.prometheus

...

Successfully built 54b5d99ef76a
Successfully tagged prometheus:latest

real    19m56,395s
user    0m0,506s
sys     0m0,334s

The image size is:

$ docker images
REPOSITORY                    TAG                 IMAGE ID            CREATED              SIZE
prometheus                    latest              54b5d99ef76a        25 minutes ago       2.38GB

Legacy build performance

Looking at the results we needed almost 20 minutes to create an instance of Prometheus that has a size of 2.38GB. This will be our starting point.

Multistage build

Now we have an image ready for production, so we are happy, right?

No, we definitely are not

As you may have noticed, the image we’ve just created is huuuge, we can definitely do better using an advanced Docker feature called multistage build. The multistage build is available in Docker since the 17.05 version and it is the go-to way to optimize image size. You can use the FROM ... AS ... instruction to define a build stage and the COPY --from instruction to share artifacts between stages.

Refactor legacy Dockerfile to use multistage build

Let’s apply these concepts to the old Dockerfile.

FROM ubuntu:bionic as base-builder

ENV GOPATH=$HOME/go
ENV PATH=$PATH:/usr/local/go/bin:$GOPATH/bin

RUN apt-get update \
    && apt-get install -y curl git build-essential \
    && curl -sL https://deb.nodesource.com/setup_14.x | bash - \
    && apt-get install -y nodejs \
    && npm install -g yarn \
    && curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz \
    && tar -xvf go1.15.2.linux-amd64.tar.gz \
    && mv go /usr/local \
    && git clone https://github.com/prometheus/prometheus.git prometheus/ \
    && cd prometheus/ \
    && make build
FROM ubuntu:bionic as final
COPY --from=base-builder prometheus/prometheus prometheus
# RUN ./prometheus --config.file=your_config.yml

What we need to do is to create a tiny final stage that contains only the Prometheus executable. We can do it with COPY --from the previous stage.

It’s time to build the Docker image.

$ time docker build --no-cache -t prometheus-multistage . -f Dockerfile.prometheus-multistage
...

Successfully built ab2217626102
Successfully tagged prometheus-multistage:latest

real    19m19,570s
user    0m0,418s
sys     0m0,459s

The image size is.

$ docker images
REPOSITORY              TAG                 IMAGE ID            CREATED             SIZE
prometheus-multistage   latest              ab2217626102        31 seconds ago      151MB

Multistage build performance

Looking at the new results we spent 19 minutes to build the image but the improvement on the size is a significant 99.94% reduction!

Parallel multistage build

So we were able to reduce the image size but the build time is still too much. We can still optimize that by exploiting the Buildkit build engine. The legacy Docker build engine executes the build of the stages sequentially, on the other hand, Buildkit computes the dependency graph of the stages and parallelize the builds. With this in mind, we can refactor the Dockerfile to speed up the build time.

Refactor Dockerfile to use parallel multistage build

Let’s see how this can be done.

FROM ubuntu:bionic as base-builder

ENV GOPATH=$HOME/go
ENV PATH=$PATH:/usr/local/go/bin:$GOPATH/bin

RUN apt-get update \
    && apt-get install -y curl git build-essential

FROM base-builder as base-builder-extended
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash - \
    && apt-get install -y nodejs \
    && npm install -g yarn

FROM base-builder as golang
RUN curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz \
    && tar -xvf go1.15.2.linux-amd64.tar.gz

FROM base-builder as source-code
RUN git clone https://github.com/prometheus/prometheus.git prometheus/

FROM base-builder-extended as builder
COPY --from=golang go /usr/local
COPY --from=source-code prometheus/ prometheus/
RUN cd prometheus/ && make build

FROM ubuntu:bionic as final
COPY --from=builder prometheus/prometheus prometheus
# RUN ./prometheus --config.file=your_config.yml

We create a first stage called base-builder that contains the basic tools and will act as a base for the next layers. Inheriting from base-builder we define:

  • golang, that contains go;
  • source-code, that we use to fetch Prometheus source code;
  • base-builder-extended that is an enhancement of base-builder that contains nodejs and yarn;

The 3 stages don’t depend on each other so the build will be parallelized.

At this point we are ready to build the code, we use builder for that. In this stage, we COPY --from the previous stages the artifacts we need to run the build. Then again we create a tiny final stage that contains only the Prometheus executable.

We can run the build now.

$ DOCKER_BUILDKIT=1 docker build --no-cache -t prometheus-parallel-multistage . -f Dockerfile.prometheus-parallel-multistage
[+] Building 734.4s (13/13) FINISHED                                                                                                  
 => [internal] load build definition from Dockerfile.prometheus-parallel-multistage                                              1.1s
 => => transferring dockerfile: 963B                                                                                             0.1s
 => [internal] load .dockerignore                                                                                                0.8s
 => => transferring context: 2B                                                                                                  0.1s
 => [internal] load metadata for docker.io/library/ubuntu:bionic                                                                 0.0s
 => CACHED [final 1/2] FROM docker.io/library/ubuntu:bionic                                                                      0.0s
 => [base-builder 2/2] RUN apt-get update     && apt-get install -y curl git build-essential                                   195.6s
 => [source-code 1/1] RUN git clone https://github.com/prometheus/prometheus.git prometheus/                                    77.6s 
 => [base-builder-extended 1/1] RUN curl -sL https://deb.nodesource.com/setup_14.x | bash -     && apt-get install -y nodejs   102.1s 
 => [golang 1/1] RUN curl -O https://storage.googleapis.com/golang/go1.15.2.linux-amd64.tar.gz     && tar -xvf go1.15.2.linux  149.8s 
 => [builder 1/3] COPY --from=golang go /usr/local                                                                              13.6s 
 => [builder 2/3] COPY --from=source-code prometheus/ prometheus/                                                                9.5s 
 => [builder 3/3] RUN cd prometheus/ && make build                                                                             338.6s 
 => [final 2/2] COPY --from=builder prometheus/prometheus prometheus                                                             2.6s 
 => exporting to image                                                                                                           1.9s 
 => => exporting layers                                                                                                          1.6s 
 => => writing image sha256:c0e59c47a790cb2a6b1229a5fec0014aa2b4540fc79c51531185c9466c9d5584                                     0.1s 
 => => naming to docker.io/library/prometheus-parallel-multistage                                                                0.1s

And check the image size.

$ docker images
REPOSITORY                       TAG                 IMAGE ID            CREATED             SIZE
prometheus-parallel-multistage   latest              c0e59c47a790        About a minute ago  151MB
prometheus-multistage            latest              ab2217626102        9 minutes ago       151MB
prometheus                       latest              54b5d99ef76a        39 minutes ago      2.38GB

Parallel multistage build performance

Looking at the new results we spent almost 12.5 minutes to build the image, a 30% reduction, keeping the same image size.

Results recap

The table below summarizes the build time and the image size in the three different examples.

Dockerfile Build time Image size
prometheus-parallel-multistage 12.5 m 151MB
prometheus-multistage 19 m 151MB
prometheus 20 m 2.38GB

As you can see the improvement, both in build time and in image size, is really huge. Using the multistage parallel build approach can be useful in production where a smaller Docker image can make the difference. All you have to do is to keep in mind how Buildkit works, think of what can be parallelized in your Dockerfile and develop it accordingly. You can easily integrate Buildkit in your Docker build/test/tag/push pipeline (read here for the test part).

This is it!

I hope this was useful for you, now go and refactor your old Dockerfile!

Reach me on Twitter @gasparevitta and let me know your performance improvements.

You can find the code snippets on Github.

Get emails about new articles!


I write about Continuous Integration, Continuous Deployment, testing, and other cool stuff.
Gaspare Vitta on Twitter