Dockerfile Best Practice
General Guidelines
Write .dockerignore
When building an image, Docker has to prepare
contextfirst - gather all files that may be used in a process.Default context contains all files in a Dockerfile directory. Usually we don’t want to include there
.gitdirectory, downloaded libraries, and compiled files.Similar to
.gitignore,.dockerignorecan look like followings:.dockerignore
.git/ node_modules/ dist/
Container should do one thing (Decouple applications)
Technically, you can start multiple processes (such as database, frontend, backend applications) inside Docker container. However, such a big container will bite you
- long build times (change in e.g. frontend will force the whole backend to rebuild)
- very large images
- hard logging from many applications (no more simple stdout)
- wasteful horizontal scaling
- problems with zombie processes - you have to remember about proper init process
The proper way is
- prepare separate Docker image for each component
- use Docker Compose to easily start multiple containers at the same time.
Minimize the number of layers
Docker is all about layers.
- Each command in Dockerfile creates so-called layer
- Layers are cached and reused
- Invalidating cache of a single layer invalidates all subsequent layers
- Invalidation occurs after command change, if copied files are different, or build variable is other than previously
- Layers are immutable, so if we add a file in one layer, and remove it in the next one, image STILL contains that file (it’s just not available in the container)!
Minimizing the number of steps in your image may improve build and pull performance. Therefore it’s a cool best practice to combine several steps into one line, so that they’ll create only one intermediary image.
RUN, COPY and ADD instructions create layers to improve build performance. Other instructions create temporary intermediate images, and do not increase the size of the build.Example
❌
FROM alpine:3.4
RUN apk update
RUN apk add curl
RUN apk add vim
RUN apk add git
✅
FROM alpine:3.4
RUN apk update && \
apk add curl && \
apk add vim && \
apk add git
After building this Dockerfile the usual way you’ll find that this time it has only taken 2 steps instead of 5.
Sort multi-line arguments
Whenever possible, ease later changes by sorting multi-line arguments alphanumerically.
This helps to avoid duplication of packages and make the list much easier to update. This also makes PRs a lot easier to read and review. Adding a space before a backslash (
\) helps as well.
Example
RUN apt-get update && apt-get install -y \
bzr \
cvs \
git \
mercurial \
subversion \
&& rm -rf /var/lib/apt/lists/*
Do not use ’latest’ base image tag
latesttag is a default one, used when no other tag is specified. E.g. our instructionFROM ubuntuin reality does exactly the same asFROM ubuntu:latestBut ’latest’ tag will point to a different image when a new version will be released, and your build may break. 😢
So, unless you are creating a generic Dockerfile that must stay up-to-date with the base image, provide specific tag!!!
Remove unneeded files after each RUN step
Let’s assume we updated apt-get sources, installed few packages required for compiling others, downloaded and extracted archives. We obviously don’t need them in our final images, so better let’s make a cleanup.
E.g. we can remove apt-get lists (created by apt-get update):
FROM ubuntu:16.04
RUN apt-get update \
&& apt-get install -y nodejs \
# added lines
&& rm -rf /var/lib/apt/lists/*
ADD . /app
RUN cd /app && npm install
CMD npm start
Use proper base image
Use specilaized image instead of general-purpose base image. For example, if we just want to run node application, instead of using ubuntu as our base image, we should use node (or even alpine version).
Set WORKDIR and CMD
WORKDIRcommand changes default directory, where we run ourRUN/CMD/ENTRYPOINTcommands.CMDis a default command run after creating container without other command specified. It’s usually the most frequently performed action.
Example
FROM node:7-alpine
WORKDIR /app
ADD . /app
RUN npm install
CMD ["npm", "start"]
Use ENTRYPOINT (optional)
Use “exec” inside entrypoint script
Prefer COPY over ADD
COPYis simpler.ADDhas some logic for downloading remote files and extracting archives (more see official documentation)- Just stick with
COPY!💪
Use multi-stage builds
- Multi-stage builds allow you to drastically reduce the size of your final image, without struggling to reduce the number of intermediate layers and files.
- Because an image is built during the final stage of the build process, you can minimize image layers by leveraging build cache.
- If your build contains several layers, you can order them from the less frequently changed (to ensure the build cache is reusable) to the more frequently changed:
- Install tools you need to build your application
- Install or update library dependencies
- Generate your application
Leverage build cache
- When building an image, Docker steps through the instructions in your
Dockerfile, executing each in the order specified. As each instruction is examined, Docker looks for an existing image in its cache that it can reuse, rather than creating a new (duplicate) image. - The basic rules that Docker follows are outlined below:
- Starting with a parent image that is already in the cache, the next instruction is compared against all child images derived from that base image to see if one of them was built using the exact same instruction. If not, the cache is invalidated.
- In most cases, simply comparing the instruction in the
Dockerfilewith one of the child images is sufficient. - For the
ADDandCOPYinstructions, the contents of the file(s) in the image are examined and a checksum is calculated for each file. The last-modified and last-accessed times of the file(s) are not considered in these checksums. During the cache lookup, the checksum is compared against the checksum in the existing images. If anything has changed in the file(s), such as the contents and metadata, then the cache is invalidated. - Aside from the
ADDandCOPYcommands, cache checking does not look at the files in the container to determine a cache match. - Once the cache is invalidated, all subsequent
Dockerfilecommands generate new images and the cache is not used.
Specify default environment variables, ports and volumes
It’s a good practice to set default values in Dockerfile, which can make our dockerfile more consistent and flexible.
Example
FROM node:7-alpine
# ENV variables required during build
ENV PROJECT_DIR=/app
WORKDIR $PROJECT_DIR
COPY package.json $PROJECT_DIR
RUN npm install
COPY . $PROJECT_DIR
ENV MEDIA_DIR=/media \
NODE_ENV=production \
APP_PORT=3000
VOLUME $MEDIA_DIR
EXPOSE $APP_PORT
Add metadata to image using LABEL
Create ephemeral containers
The image defined by your Dockerfile should generate containers that are as ephemeral as possible. By “ephemeral”, we mean that the container can be stopped and destroyed, then rebuilt and replaced with an absolute minimum set up and configuration.
Don’t install unnecessary packages
To reduce complexity, dependencies, file sizes, and build times, avoid installing extra or unnecessary packages just because they might be “nice to have.”
Dockerfile instructions
FROM
- Whenever possible, use current official images as the basis for your images.
- We recommend the Alpine image as it is tightly controlled and small in size (currently under 5 MB), while still being a full Linux distribution.
LABEL
You can add labels to your image to help organize images by project, record licensing information, to aid in automation, or for other reasons. (More see: Understanding object labels)
Acceptable formats
One label per line
LABEL com.example.version="0.0.1-beta" LABEL com.example.release-date="2015-02-12"Multiple labels on one line
LABEL com.example.version="0.0.1-beta" com.example.release-date="2015-02-12"Set multiple labels at once, using line-continuation characters to break long lines
LABEL com.example.version="0.0.1-beta" \ com.example.release-date="2015-02-12"
RUN
Split long or complex RUN statements on multiple lines separated with backslashes to make your Dockerfile more readable, understandable, and maintainable.
apt-get
Avoid
RUN apt-get upgradeanddist-upgradeAlways combine
RUN apt-get updatewithapt-get installin the sameRUNstatement. This ensures your Dockerfile installs the latest package versions with no further coding or manual intervention. (“cache busting”)E.g.
RUN apt-get update && apt-get install -y \ package-bar \ package-baz \ package-foo \ && rm -rf /var/lib/apt/lists/*You can also achieve cache-busting by specifying a package version. (“cache busting”)
E.g.
RUN apt-get update && apt-get install -y \ package-bar \ package-baz \ package-foo=1.3.*
Below is a well-formed RUN instruction that demonstrates all the apt-get recommendations.
RUN apt-get update && apt-get install -y \
aufs-tools \
automake \
build-essential \
curl \
dpkg-sig \
libcap-dev \
libsqlite3-dev \
mercurial \
reprepro \
ruby1.9.1 \
ruby1.9.1-dev \
s3cmd=1.1.* \
&& rm -rf /var/lib/apt/lists/*
CMD
- The
CMDinstruction should be used to run the software contained in your image, along with any arguments. CMDshould almost always be used in the form ofCMD ["executable", "param1", "param2"…].CMDshould rarely be used in the manner ofCMD ["param", "param"]in conjunction withENTRYPOINT, unless you and your expected users are already quite familiar with howENTRYPOINTworks.
EXPOSE
- The
EXPOSEinstruction indicates the ports on which a container listens for connections. - You should use the common, traditional port for your application. E.g.
- an image containing the Apache web server would use
EXPOSE 80 - an image containing MongoDB would use
EXPOSE 27017
- an image containing the Apache web server would use
ENV
To make new software easier to run, you can use
ENVto update thePATHenvironment variable for the software your container installs.ENVinstruction is also useful for providing required environment variables specific to services you wish to containerizeENVcan also be used to set commonly used version numbers so that version bumps are easier to maintainE.g.
ENV PG_MAJOR=9.3 ENV PG_VERSION=9.3.4 RUN curl -SL https://example.com/postgres-$PG_VERSION.tar.xz | tar -xJC /usr/src/postgress && … ENV PATH=/usr/local/postgres-$PG_MAJOR/bin:$PATHEach
ENVline creates a new intermediate layer, just likeRUNcommands- This means that even if you unset the environment variable in a future layer, it still persists in this layer and its value can’t be dumped.
- You can separate your commands with
;or&&. Using\as a line continuation character for Linux Dockerfiles improves readability. - Or you could also put all of the commands into a shell script and have the
RUNcommand just run that shell script.
ADD or COPY
COPYis preferred.If you have multiple
Dockerfilesteps that use different files from your context,COPYthem individually, rather than all at once. This ensures that each step’s build cache is only invalidated (forcing the step to be re-run) if the specifically required files change.E.g.
COPY requirements.txt /tmp/ RUN pip install --requirement /tmp/requirements.txt COPY . /tmp/This results in fewer cache invalidations for the
RUNstep, than if you put theCOPY . /tmp/before it.Using
ADDto fetch packages from remote URLs is strongly discouraged 🙅♂️; you should usecurlorwgetinstead.That way you can delete the files you no longer need after they’ve been extracted and you don’t have to add another layer in your image.
E.g.
❌
ADD https://example.com/big.tar.xz /usr/src/things/ RUN tar -xJf /usr/src/things/big.tar.xz -C /usr/src/things RUN make -C /usr/src/things all✅
RUN mkdir -p /usr/src/things \ && curl -SL https://example.com/big.tar.xz \ | tar -xJC /usr/src/things \ && make -C /usr/src/things all
For other items (files, directories) that do not require
ADD’s tar auto-extraction capability, you should always useCOPY.
ENTRYPOINT
The best use for
ENTRYPOINTis to set the image’s main command, allowing that image to be run as though it was that command (and then useCMDas the default flags).Example: image for the command line tool
s3cmdENTRYPOINT ["s3cmd"] CMD ["--help"]Now the image can be run like this to show the command’s help:
docker run s3cmdor use the right parameters to execute a command:
docker run s3cmd ls s3://mybucket
The
ENTRYPOINTinstruction can also be used in combination with a helper script, allowing it to function in a similar way to the command above, even when starting the tool may require more than one step.
WORKDIR
- For clarity and reliability, you should always use absolute paths for your
WORKDIR. - You should use
WORKDIRinstead of proliferating instructions likeRUN cd … && do-something, which are hard to read, troubleshoot, and maintain.