Building Distroless Python Docker Images
When running containerized applications in production, security is often
the top priority. For applications written in compiled languages such as
Go, you may notice that often these images do not have a working shell
and you are not able to exec
into the container to debug issues.
For example, if you have a Kubernetes cluster and try to run the command
kubectl exec -it -n kube-system coredns-<pod-id> -- sh
to exec
into the coredns pod, you will get an error like this:
error: Internal error occurred: Internal error occurred: error executing
command in container: failed to exec in container: failed to start exec:
OCI runtime exec failed: exec failed: unable to start container process:
exec: "sh": executable file not found in $PATH: unknown
These shell-less, or distroless images offer a few benefits:
- They are usually smaller in size, and only contains the necessary libraries and dependencies to run your application.
- They reduce the attack surface by removing unnecessary tools. Without a working shell, it is harder for attackers to perform command injection.
To build a distroless image for a Python application, this process is a bit more involved. In this guide, I will walk through the steps to build and run a distroless Python application.
If you just want to see the final working example, jump to this section or check out this repository.
Now let’s get started!
Project structure
Below is the project structure. In this specific example, we are going to spin up a simple fastapi application in a docker container. Adjust the structure for your specific use cases.
Since this is a simple application, we use requirements.txt
to manage
our dependencies. For more complex applications or libraries, you should
generally use a pyproject.toml
file to package your application.
project-root/
├── app/
│ ├── data.py
│ ├── __init__.py
│ └── main.py
├── Dockerfile
└── requirements.txt
Application code
The application code is just a simple fastapi application. To illustrate some potential issues that we might encounter for a distroless image, I created two special enpoints to demonstrate the potential issues.
Here’s the dependencies for application in the requirements.txt
file:
# requirements.txt
fastapi[standard]
pydantic
scikit-learn
Here is the fastapi application code in app/main.py
:
# app/main.py
from fastapi import FastAPI
from app.data import get_iris_data
app = FastAPI()
@app.get('/')
async def hello():
return {'message': 'Hello World'}
@app.get('/touch-file/{file_name}')
async def touch_file(file_name: str):
with open(file_name, 'a'):
pass
return {'message': f'File {file_name} touched'}
@app.get('/data')
async def fetch_data():
return {'data': get_iris_data()}
Here is the data module in app/data.py
:
# app/data.py
from sklearn.datasets import fetch_california_housing
def get_iris_data(
) -> list:
housing = fetch_california_housing()
return housing.data.tolist()[:200]
This first endpoint /touch-file/{file_name}
simulates the shell command
touch <file_name>
, which creates an empty file with the given name if
the file doesn’t exist. This endpoint tests if your application can correctly
create files in the container’s filesystem.
The second endpoint /data
fetches the California housing dataset using
the scikit-learn
library. When fetching larger datasets with Scikit-learn,
it will download the dataset to the user’s home directory by default.
By default a distroless image does not have a home directory, so some
additional efforts are needed to make this endpoint work correctly.
Now let’s walk through the steps to build a distroless image for this application. The application code will not change, but we will build the Dockerfile step by step.
First attempt
To build a distroless image for Python applications, we need to use multistage
build. Essentially we first install all the dependencies from a build image
then copy the necessary files to a distroless image. For the build stage,
we will use python:3.11-slim
, and for the distroless stage, we will use
gcr.io/distroless/python3
. Now the reason we use python:3.11-slim
is
because the python version in gcr.io/distroless/python3
is currently 3.11.
We need to make sure that the Python versions match to prevent any
incompatibility issues. For more info about the distroless base images, see
here.
Let’s try our naive approach first.
FROM python:3.11-slim AS builder
ENV VENV=/opt/venv
RUN python -m venv $VENV
COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
$VENV/bin/pip install --no-cache-dir -r requirements.txt
FROM gcr.io/distroless/python3
ENV VENV=/opt/venv
WORKDIR /app
COPY --from=builder $VENV $VENV
COPY app app
ENV PATH="$VENV/bin:$PATH"
ENTRYPOINT [ "fastapi", "run", "app/main.py" ]
Notice that we’ve set up a virtual environment in the builder stage and
copied it to the distroless image. We also set the PATH
environment variable
to make sure the Python executable will be able to find the installed
libraries.
With this Dockerfile, let’s try to build and run the image:
docker build -t python-distroless-server .
docker run -p 8000:8000 python-distroless-server
You should get an error like this:
exec /opt/venv/bin/fastapi: no such file or directory
What happened here? You might think that we can try to run a shell command and run the container to see what went wrong, but if you try to run
docker run -it --rm --entrypoint sh python-distroless-server
You get:
docker: Error response from daemon: failed to create task for container: failed
to create shim task: OCI runtime create failed: runc create failed: unable to
start container process: error during container init: exec: "sh": executable
file not found in $PATH: unknown
How do we find out what went wrong? Fortunately, there are gcr.io/distroless
images that have working shell for debugging purposes.
In the Dockerfile
, add the debug
tag to the distroless image:
...
FROM gcr.io/distroless/python3:debug
...
Now rebuild the image and run sh
to see what went wrong:
docker build -t python-distroless-server .
docker run -it --rm --entrypoint sh python-distroless-server
From the above error it seems that the fastapi
executable is not found.
So let’s check.
/app # ls /opt/venv/bin/fastapi
/opt/venv/bin/fastapi
The fastapi
executable is indeed there, so what is going on? Python
executables are just wrappers around the Python interpreter, so let’s check
the Python executable for the venv.
/app # ls -h /opt/venv/bin/python
ls: /opt/venv/bin/python: No such file or directory
/app # ls -l /opt/venv/bin/python
lrwxrwxrwx 1 root root 21 Jun 7 20:23 /opt/venv/bin/python -> /usr/local/bin/python
Looks like the python
executable is a symbolic link to
/usr/local/bin/python
, and if your terminal supports color, you will see that
the symlink is red, meaning that the target does not exist!
/app # which python
/usr/bin/python
This distroless image’s Python executable is located at /usr/bin/python
! We
will need to update the symbolic link in the venv to point to the correct python
symbolic link.
Let’s fix this. Here’s the updated Dockerfile
:
FROM python:3.11-slim AS builder
ENV VENV=/opt/venv
RUN python -m venv $VENV
COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
$VENV/bin/pip install --no-cache-dir -r requirements.txt
RUN ln -sf /usr/bin/python $VENV/bin/python
FROM gcr.io/distroless/python3
ENV VENV=/opt/venv
WORKDIR /app
COPY --from=builder $VENV $VENV
COPY app app
ENV PATH="$VENV/bin:$PATH"
ENTRYPOINT [ "fastapi", "run", "app/main.py" ]
Notice that we are correcting the symbolic link for the python
executable
in the build stage since that command is not available in the distrolesss
image.
docker build -t python-distroless-server .
docker run -p 8000:8000 python-distroless-server -d
The app should now be up and running. You can go to
http://localhost:8000/docs
to test the endpoints.
All the endpoints should be working.
Run as non-root user
The docker image is now working, but it is still running as the root user. Let’s make it more secure by running the application as a non-root user.
To do this, when copying files from build stage to the distroless stage, we will change the ownership of the files to a non-root user. We will simply use UID 1000 and GID 1000. You can change these values for your specific use cases.
FROM python:3.11-slim AS builder
ENV VENV=/opt/venv
RUN python -m venv $VENV
COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
$VENV/bin/pip install --no-cache-dir -r requirements.txt
RUN ln -sf /usr/bin/python $VENV/bin/python
FROM gcr.io/distroless/python3
ENV VENV=/opt/venv
COPY --chown=1000:1000 . /app
COPY --from=builder --chown=1000:1000 $VENV $VENV
WORKDIR /app
USER 1000:1000
ENV PATH="$VENV/bin:$PATH"
ENTRYPOINT [ "fastapi", "run", "app/main.py" ]
Notice that we did not create the user or group in either stages. For
build stage since it is only for installing dependencies, the users and groups
are not carried over to the distroless image. In the distroless image,
commands such as useradd
and groupadd
are not available, so we cannot
create a user or a group. However, we can still run the container using a
specific user id and group id using USER
.
You may also notice that I changed the copying of the source code slightly.
Instead of copying just that app
directory, I copied the entire content
of the project root directory to the /app
directory in the distroless image.
In addidtion I set the working directory to /app
later. This is because
we cannot change the ownership of the working directory in the distroless image
after the WORKDIR
line - the chown
command also does not exist. If the
/app
directory is onwed by root the touch file endpoint will fail as the
user 1000:1000
does not have permission to create files in that directory.
docker build -t python-distroless-server .
docker run -p 8000:8000 python-distroless-server -d
The app should be up and running as before at http://localhost:8000/docs
,
but now running as a non-root user. However, if you test the fetch data
endpoint, you will get an internal server error. And the logs will show
something like this:
PermissionError: [Errno 13] Permission denied: '/scikit_learn_data'
This is because Scikit-learn (and many other libraries) typically write
files in the user’s home directory by default and our distroless image
does not have a home directory for the user 1000:1000
. We can fix this by
creating a home directory in the build stage. The updated Dockerfile
is
FROM python:3.11-slim AS builder
ENV HOME=/home/1000
ENV VENV=/opt/venv
RUN mkdir $HOME
RUN python -m venv $VENV
COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
$VENV/bin/pip install --no-cache-dir -r requirements.txt
RUN ln -sf /usr/bin/python $VENV/bin/python
FROM gcr.io/distroless/python3
ENV HOME=/home/1000
ENV VENV=/opt/venv
COPY --chown=1000:1000 . /app
COPY --from=builder --chown=1000:1000 $HOME $HOME
COPY --from=builder --chown=1000:1000 $VENV $VENV
WORKDIR /app
USER 1000:1000
ENV PATH="$VENV/bin:$PATH"
ENTRYPOINT [ "fastapi", "run", "app/main.py" ]
Here I call the home directory /home/1000
, but you can name it anything
you want. What’s important is that the environment variable HOME
is set
in the distroless stage so that libraries such as Scikit-learn will know
where to write files.
If we now rebuild the image and run it again, everything should be working
as expected. The fetch data endpoint will write the dataset to the
/home/1000/scikit_learn_data
directory.
Optional step
We can optionallly remove Python’s build tools from our image to prevent any changes. This will also prevent any libraries from potentially installing extra dependencies at runtime.
To do this we add the following to the Dockerfile
:
# Build stage
...
# After installing the dependencies
RUN rm -rf $VENV/lib/python3.11/site-packages/pip* \
$VENV/lib/python3.11/site-packages/setuptools* \
$VENV/lib/python3.11/site-packages/wheel*
...
# Distroless stage
...
Note that if for any reason you have libraries that starts with
pip
, setuptools
, or wheel
, you will need to adjust the command.
Conclusion
That’s it! We have successfully built a distroless Python Docker image.
Removing unnecessary tools and libraries makes our application more secure and
largely immune to command injection attacks. Note, however, since Python is
an interpreted language, it is the developer’s responsibility to ensure that
the code itself is secure and does not use any unsafe functions such as
eval
, exec
, os.system
, or subprocess.run
with user input. Our
distroless can protect you against os.system
and subprocess.run
most of the
time, but if you use eval
or exec
with user input, it is still possible
for an attacker to execute arbitrary Python code.
In addition, even though we cannot exec
into the container to run a shell,
it is still possible to exec
into the container to run a Python interpreter.
Securing your cluster is still important.
Lastly, many Python libraries use command line tools (for example
pygraphviz
). For these libraries, the distroless image will not
work properly. In such cases, consider modifying the build stage to
install the required tools and copy them to the distroless image.
Final Dockerfile
Here is the final Dockerfile
. There are some small differences from the
above (for example I put the venv under the home directory), but the overall
structure is the same.
# --- Builder Stage ---
# The distroless image currently has python 3.11
# Make sure builder python version matches distroless version
FROM python:3.11-slim AS builder
# For distroless to run as 1000 with a home directory
# Change user name as needed
ENV HOME=/home/1000
ENV VENV=$HOME/venv
RUN mkdir $HOME
RUN python -m venv $VENV
# Install dependencies
COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
$VENV/bin/pip install --no-cache-dir -r requirements.txt
# Correct the link difference between distroless and slim
# Doing this here here because the distroless does not have ln
RUN ln -sf /usr/bin/python $VENV/bin/python
# Optional: Remove build tools
# Be careful if any of your dependencies have package names that start with
# pip, setuptools, or wheel. For example, pip-somthing. Change as needed.
# RUN rm -rf $VENV/lib/python3.11/site-packages/pip* \
# $VENV/lib/python3.11/site-packages/setuptools* \
# $VENV/lib/python3.11/site-packages/wheel*
# --- Distroless Stage ---
# Use tag debug if we need a shell for debugging
FROM gcr.io/distroless/python3
# Make sure these are the same as the builder stage
# since EVN variables are not passed to this stage
ENV HOME=/home/1000
ENV VENV=$HOME/venv
# Note: copy before setting workdir as distroless cannot chown
# Copy app code with the correct ownership
COPY --chown=1000:1000 . /app
COPY --from=builder --chown=1000:1000 $HOME $HOME
# Only run below if VENV is not under HOME
# COPY --from=builder --chown=1000:1000 $VENV $VENV
WORKDIR /app
# Use non-root user
USER 1000:1000
# Set entrypoint and use venv's python for libraries
ENV PATH="$VENV/bin:$PATH"
ENTRYPOINT [ "fastapi", "run", "app/main.py" ]