In Progress
Unit 1, Lesson 1
In Progress

Container Run Layers

In Dockerfiles, fewer RUN commands are better. Why? For that, we need to talk about layers. Which is exactly what we’re going to do in this video!

Video transcript & code

Recently we wrote this very simple Dockerfile to define the development environment for our Rails app that's under construction. It's based on a published Ruby-oriented container image.

The only customization we've done so far is to install the Yarn package manager for JavaScript modules.

FROM ruby:2.7.2
RUN apt-get update
RUN apt-get install -y yarnpkg
RUN ln -s /usr/bin/yarnpkg /usr/local/bin/yarn

Our Yarn installation actually consists of three steps:

  • Running apt-get update to fetch the current list of packages
  • Installing the Debian package
  • And adding a symlink to invoke Yarn using its more conventional command name instead of the Debian-customized one.

While this Dockerfile is perfectly functional, it isn't written in the recommended style. To understand what's wrong with it, we need to understand layers.

Let's jump over to Photoshop for a minute. Bear with me here, I promise this is relevant.

If you've ever used Photoshop or a program like it, you probably know that images in these apps are built up using layers.

For instance, I have used my amazing art skills to render my impression of a scene from my beloved Smoky Mountains.

It starts with a layer for the sky.

Then a layer for the clouds.

Then one for the mountains,

And finally one for the foreground trees.

I know what you're thinking: I missed my calling when I chose to become a developer instead of an artist!

But let's stay on topic. The important thing to notice here is that each layer overlays the one below it. Where the paint from a higher layer crosses over the scene in a lower layer, the higher layer obscures, or takes precedence, over the lower one.

Docker containers are also built up out of layers, implemented using a technology called a union filesystem. The FROM line in our Dockerfile establishes a "base layer". Then each RUN line implicitly creates a new layer on top of the last. Whatever filesystem changes the command in that layer makes overlay the filesystem from the layer beneath.

FROM ruby:2.7.2
RUN apt-get update
RUN apt-get install -y yarnpkg
RUN ln -s /usr/bin/yarnpkg /usr/local/bin/yarn

We can see this a little more clearly if we run a docker history command on our app development container.

docker image history sixmilebridge_app
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
7880e5f849f6   2 weeks ago   /bin/sh -c ln -s /usr/bin/yarnpkg /usr/local…   16B
7c951faa7e66   2 weeks ago   /bin/sh -c apt-get install -y yarnpkg           61.8MB
69f7cd276fd8   2 weeks ago   /bin/sh -c apt-get update                       17.5MB
7e58098089a4   4 weeks ago   /bin/sh -c #(nop)  CMD ["irb"]                  0B
<missing>      4 weeks ago   /bin/sh -c mkdir -p "$GEM_HOME" && chmod 777…   0B
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV PATH=/usr/local/bundl…   0B
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV BUNDLE_SILENCE_ROOT_W…   0B
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV GEM_HOME=/usr/local/b…   0B
<missing>      4 weeks ago   /bin/sh -c set -eux;   savedAptMark="$(apt-m…   38.6MB
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV RUBY_DOWNLOAD_SHA256=…   0B
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV RUBY_VERSION=2.7.2       0B
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV RUBY_MAJOR=2.7           0B
<missing>      4 weeks ago   /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0B
<missing>      4 weeks ago   /bin/sh -c set -eux;  mkdir -p /usr/local/et…   45B
<missing>      4 weeks ago   /bin/sh -c set -ex;  apt-get update;  apt-ge…   510MB
<missing>      4 weeks ago   /bin/sh -c apt-get update && apt-get install…   146MB
<missing>      4 weeks ago   /bin/sh -c set -ex;  if ! command -v gpg > /…   17.5MB
<missing>      4 weeks ago   /bin/sh -c set -eux;  apt-get update;  apt-g…   16.5MB
<missing>      5 weeks ago   /bin/sh -c #(nop)  CMD ["bash"]                 0B
<missing>      5 weeks ago   /bin/sh -c #(nop) ADD file:6014cd9d7466825f8…   114MB

Most of the lower layers in this history reflect the commands that were used to build up the base ruby image.

But towards the top we can see our RUN directives in reverse order: first an apt-get update, then an apt-get install, and then an ln -s.

Along with each, we can see the total size of the files added to the image by this command. Particularly notable is the fact that the apt-get update command added 17.5 megabytes. That's all from the package listings that apt-get downloaded from various repositories. We needed that metadata in order to install yarn along with all of its dependencies.

Now, for our devcontainer, we're not terribly sensitive about file sizes. But if we were creating an image for production or for others to build on top of, the polite thing to do would be to minimize the size of our container. One way to slim it down is to remove the apt-get package metadata once we're done installing packages.

To do that, we might think to add a new line to our Dockerfile where we delete the Debian package listing directories.

FROM ruby:2.7.2
RUN apt-get update
RUN apt-get install -y yarnpkg
RUN ln -s /usr/bin/yarnpkg /usr/local/bin/yarn
RUN rm -rf /var/lib/apt/lists/*

But here's the problem with that: remember, every RUN command adds a new layer on top. It doesn't remove anything from the layers underneath. Adding a layer that deletes files might omit those files from the final container. But it doesn't actually do anything to cut down on the total file size of the container! All of the lower layers are still there.

But there's another possibility.

Instead of splitting this package installation across multiple RUN commands, we can collapse all of them into a single command using backslashes to join multiple lines, and the shell && operator to chain a series of subcommands together.

FROM ruby:2.7.2
RUN apt-get update \
  && apt-get install -y yarnpkg \
  && ln -s /usr/bin/yarnpkg /usr/local/bin/yarn \
  && rm -rf /var/lib/apt/lists/*

Using && instead of semicolons to join the commands ensures that the whole compound command will bail out early if something goes wrong in one of the subcommands.

Now let's tell docker-compose to rebuild our container:

$ docker-compose build

With that done, we can run docker image history again and see that all three commands now created just a single layer. The size of this layer, 62MB, is the same as the apt-get install layer from our earlier version.

$ docker image history sixmilebridge_app
IMAGE          CREATED         CREATED BY                                      SIZE      COMMENT
21aa368b002c   5 minutes ago   /bin/sh -c apt-get update   && apt-get insta…   61.8MB    

Going back to our Dockerfile, let's talk about another advantage of this compound-run-command pattern: by collapsing all the commands related to installing yarn, we've made it very clear that they are all part of a single logical operation. If we later come back to re-organize the Dockerfile, we'll know to move these commands around as a unit.

Now, real talk, for the purposes of our development container we're not actually going to preserve this Dockerfile exactly as we've re-written it here. Total filesize is not the same level of concern in devcontainers as it is for deployment. And among other things, we're eventually going to want to install more than just the yarn package.

But you'll often see this pattern in other people's Dockerfiles, and so it's important to understand why it exists, and what are the trade-offs involved. Happy hacking!

FROM ruby:2.7.2
RUN apt-get update \
  && apt-get install -y yarnpkg \
  && ln -s /usr/bin/yarnpkg /usr/local/bin/yarn \
  && rm -rf /var/lib/apt/lists/*