Reducing image size
Reducing image size
-
In the previous example, our final image contained:
-
our
hello
program -
its source code
-
the compiler
-
-
Only the first one is strictly necessary.
-
We are going to see how to obtain an image without the superfluous components.
Can’t we remove superfluous files with RUN
?
What happens if we do one of the following commands?
-
RUN rm -rf ...
-
RUN apt-get remove ...
-
RUN make clean ...
–
This adds a layer which removes a bunch of files.
But the previous layers (which added the files) still exist.
Removing files with an extra layer
When downloading an image, all the layers must be downloaded.
Dockerfile instruction | Layer size | Image size |
---|---|---|
FROM ubuntu |
Size of base image | Size of base image |
... |
… | Sum of this layer + all previous ones |
RUN apt-get install somepackage |
Size of files added (e.g. a few MB) |
Sum of this layer + all previous ones |
... |
… | Sum of this layer + all previous ones |
RUN apt-get remove somepackage |
Almost zero (just metadata) |
Same as previous one |
Therefore, RUN rm
does not reduce the size of the image or free up disk space.
Removing unnecessary files
Various techniques are available to obtain smaller images:
-
collapsing layers,
-
adding binaries that are built outside of the Dockerfile,
-
squashing the final image,
-
multi-stage builds.
Let’s review them quickly.
Collapsing layers
You will frequently see Dockerfiles like this:
FROM ubuntu
RUN apt-get update && apt-get install xxx && ... && apt-get remove xxx && ...
Or the (more readable) variant:
FROM ubuntu
RUN apt-get update \
&& apt-get install xxx \
&& ... \
&& apt-get remove xxx \
&& ...
This RUN
command gives us a single layer.
The files that are added, then removed in the same layer, do not grow the layer size.
Collapsing layers: pros and cons
Pros:
-
works on all versions of Docker
-
doesn’t require extra tools
Cons:
-
not very readable
-
some unnecessary files might still remain if the cleanup is not thorough
-
that layer is expensive (slow to build)
Building binaries outside of the Dockerfile
This results in a Dockerfile looking like this:
FROM ubuntu
COPY xxx /usr/local/bin
Of course, this implies that the file xxx
exists in the build context.
That file has to exist before you can run docker build
.
For instance, it can:
- exist in the code repository,
- be created by another tool (script, Makefile…),
- be created by another container image and extracted from the image.
See for instance the busybox official image or this older busybox image.
Building binaries outside: pros and cons
Pros:
- final image can be very small
Cons:
-
requires an extra build tool
-
we’re back in dependency hell and “works on my machine”
Cons, if binary is added to code repository:
-
breaks portability across different platforms
-
grows repository size a lot if the binary is updated frequently
Squashing the final image
The idea is to transform the final image into a single-layer image.
This can be done in (at least) two ways.
- Activate experimental features and squash the final image:
docker image build --squash ...
- Export/import the final image.
docker build -t temp-image . docker run --entrypoint true --name temp-container temp-image docker export temp-container | docker import - final-image docker rm temp-container docker rmi temp-image
Squashing the image: pros and cons
Pros:
-
single-layer images are smaller and faster to download
-
removed files no longer take up storage and network resources
Cons:
-
we still need to actively remove unnecessary files
-
squash operation can take a lot of time (on big images)
-
squash operation does not benefit from cache
(even if we change just a tiny file, the whole image needs to be re-squashed)
Multi-stage builds
Multi-stage builds allow us to have multiple stages.
Each stage is a separate image, and can copy files from previous stages.
We’re going to see how they work in more detail.
Multi-stage builds
-
At any point in our
Dockerfile
, we can add a newFROM
line. -
This line starts a new stage of our build.
-
Each stage can access the files of the previous stages with
COPY --from=...
. -
When a build is tagged (with
docker build -t ...
), the last stage is tagged. -
Previous stages are not discarded: they will be used for caching, and can be referenced.
Multi-stage builds in practice
-
Each stage is numbered, starting at
0
-
We can copy a file from a previous stage by indicating its number, e.g.:
COPY --from=0 /file/from/first/stage /location/in/current/stage
-
We can also name stages, and reference these names:
FROM golang AS builder RUN ... FROM alpine COPY --from=builder /go/bin/mylittlebinary /usr/local/bin/
Multi-stage builds for our C program
We will change our Dockerfile to:
-
give a nickname to the first stage:
compiler
-
add a second stage using the same
ubuntu
base image -
add the
hello
binary to the second stage -
make sure that
CMD
is in the second stage
The resulting Dockerfile is on the next slide.
Multi-stage build Dockerfile
Here is the final Dockerfile:
FROM ubuntu AS compiler
RUN apt-get update
RUN apt-get install -y build-essential
COPY hello.c /
RUN make hello
FROM ubuntu
COPY --from=compiler /hello /hello
CMD /hello
Let’s build it, and check that it works correctly:
docker build -t hellomultistage .
docker run hellomultistage
Comparing single/multi-stage build image sizes
List our images with docker images
, and check the size of:
-
the
ubuntu
base image, -
the single-stage
hello
image, -
the multi-stage
hellomultistage
image.
We can achieve even smaller images if we use smaller base images.
However, if we use common base images (e.g. if we standardize on ubuntu
),
these common images will be pulled only once per node, so they are
virtually “free.”