Building Distroless Python Docker Images

When running containerized applications in production, security is often the top priority. For applications written in compiled languages such as Go, you may notice that often these images do not have a working shell and you are not able to exec into the container to debug issues. For example, if you have a Kubernetes cluster and try to run the command

kubectl exec -it -n kube-system coredns-<pod-id> -- sh

to exec into the coredns pod, you will get an error like this:

error: Internal error occurred: Internal error occurred: error executing
command in container: failed to exec in container: failed to start exec:
OCI runtime exec failed: exec failed: unable to start container process:
exec: "sh": executable file not found in $PATH: unknown

These shell-less, or distroless images offer a few benefits:

  1. They are usually smaller in size, and only contains the necessary libraries and dependencies to run your application.
  2. They reduce the attack surface by removing unnecessary tools. Without a working shell, it is harder for attackers to perform command injection.

To build a distroless image for a Python application, this process is a bit more involved. In this guide, I will walk through the steps to build and run a distroless Python application.

If you just want to see the final working example, jump to this section or check out this repository.

Now let’s get started!

Project structure

Below is the project structure. In this specific example, we are going to spin up a simple fastapi application in a docker container. Adjust the structure for your specific use cases.

Since this is a simple application, we use requirements.txt to manage our dependencies. For more complex applications or libraries, you should generally use a pyproject.toml file to package your application.

project-root/
├── app/
│   ├── data.py
│   ├── __init__.py
│   └── main.py
├── Dockerfile
└── requirements.txt

Application code

The application code is just a simple fastapi application. To illustrate some potential issues that we might encounter for a distroless image, I created two special enpoints to demonstrate the potential issues.

Here’s the dependencies for application in the requirements.txt file:

# requirements.txt
fastapi[standard]
pydantic
scikit-learn

Here is the fastapi application code in app/main.py:

# app/main.py
from fastapi import FastAPI

from app.data import get_iris_data


app = FastAPI()


@app.get('/')
async def hello():
    return {'message': 'Hello World'}


@app.get('/touch-file/{file_name}')
async def touch_file(file_name: str):
    with open(file_name, 'a'):
        pass
    return {'message': f'File {file_name} touched'}


@app.get('/data')
async def fetch_data():
    return {'data': get_iris_data()}

Here is the data module in app/data.py:

# app/data.py
from sklearn.datasets import fetch_california_housing


def get_iris_data(
) -> list:

    housing = fetch_california_housing()

    return housing.data.tolist()[:200]

This first endpoint /touch-file/{file_name} simulates the shell command touch <file_name>, which creates an empty file with the given name if the file doesn’t exist. This endpoint tests if your application can correctly create files in the container’s filesystem.

The second endpoint /data fetches the California housing dataset using the scikit-learn library. When fetching larger datasets with Scikit-learn, it will download the dataset to the user’s home directory by default. By default a distroless image does not have a home directory, so some additional efforts are needed to make this endpoint work correctly.

Now let’s walk through the steps to build a distroless image for this application. The application code will not change, but we will build the Dockerfile step by step.

First attempt

To build a distroless image for Python applications, we need to use multistage build. Essentially we first install all the dependencies from a build image then copy the necessary files to a distroless image. For the build stage, we will use python:3.11-slim, and for the distroless stage, we will use gcr.io/distroless/python3. Now the reason we use python:3.11-slim is because the python version in gcr.io/distroless/python3 is currently 3.11. We need to make sure that the Python versions match to prevent any incompatibility issues. For more info about the distroless base images, see here.

Let’s try our naive approach first.

FROM python:3.11-slim AS builder

ENV VENV=/opt/venv

RUN python -m venv $VENV

COPY requirements.txt .

RUN $VENV/bin/pip install --upgrade pip && \
    $VENV/bin/pip install --no-cache-dir -r requirements.txt

FROM gcr.io/distroless/python3

ENV VENV=/opt/venv

WORKDIR /app

COPY --from=builder $VENV $VENV

COPY app app

ENV PATH="$VENV/bin:$PATH"

ENTRYPOINT [ "fastapi", "run", "app/main.py" ]

Notice that we’ve set up a virtual environment in the builder stage and copied it to the distroless image. We also set the PATH environment variable to make sure the Python executable will be able to find the installed libraries.

With this Dockerfile, let’s try to build and run the image:

docker build -t python-distroless-server .
docker run -p 8000:8000 python-distroless-server

You should get an error like this:

exec /opt/venv/bin/fastapi: no such file or directory

What happened here? You might think that we can try to run a shell command and run the container to see what went wrong, but if you try to run

docker run -it --rm --entrypoint sh python-distroless-server

You get:

docker: Error response from daemon: failed to create task for container: failed
to create shim task: OCI runtime create failed: runc create failed: unable to
start container process: error during container init: exec: "sh": executable
file not found in $PATH: unknown

How do we find out what went wrong? Fortunately, there are gcr.io/distroless images that have working shell for debugging purposes.

In the Dockerfile, add the debug tag to the distroless image:

...
FROM gcr.io/distroless/python3:debug
...

Now rebuild the image and run sh to see what went wrong:

docker build -t python-distroless-server .
docker run -it --rm --entrypoint sh python-distroless-server

From the above error it seems that the fastapi executable is not found. So let’s check.

/app # ls /opt/venv/bin/fastapi
/opt/venv/bin/fastapi

The fastapi executable is indeed there, so what is going on? Python executables are just wrappers around the Python interpreter, so let’s check the Python executable for the venv.

/app # ls -h /opt/venv/bin/python
ls: /opt/venv/bin/python: No such file or directory
/app # ls -l /opt/venv/bin/python
lrwxrwxrwx 1 root root  21 Jun  7 20:23 /opt/venv/bin/python -> /usr/local/bin/python

Looks like the python executable is a symbolic link to /usr/local/bin/python, and if your terminal supports color, you will see that the symlink is red, meaning that the target does not exist!

/app # which python
/usr/bin/python

This distroless image’s Python executable is located at /usr/bin/python! We will need to update the symbolic link in the venv to point to the correct python symbolic link.

Let’s fix this. Here’s the updated Dockerfile:

FROM python:3.11-slim AS builder

ENV VENV=/opt/venv

RUN python -m venv $VENV

COPY requirements.txt .

RUN $VENV/bin/pip install --upgrade pip && \
    $VENV/bin/pip install --no-cache-dir -r requirements.txt

RUN ln -sf /usr/bin/python $VENV/bin/python

FROM gcr.io/distroless/python3

ENV VENV=/opt/venv

WORKDIR /app

COPY --from=builder $VENV $VENV

COPY app app

ENV PATH="$VENV/bin:$PATH"

ENTRYPOINT [ "fastapi", "run", "app/main.py" ]

Notice that we are correcting the symbolic link for the python executable in the build stage since that command is not available in the distrolesss image.

docker build -t python-distroless-server .
docker run -p 8000:8000 python-distroless-server -d

The app should now be up and running. You can go to http://localhost:8000/docs to test the endpoints.

All the endpoints should be working.

Run as non-root user

The docker image is now working, but it is still running as the root user. Let’s make it more secure by running the application as a non-root user.

To do this, when copying files from build stage to the distroless stage, we will change the ownership of the files to a non-root user. We will simply use UID 1000 and GID 1000. You can change these values for your specific use cases.

FROM python:3.11-slim AS builder

ENV VENV=/opt/venv

RUN python -m venv $VENV

COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
    $VENV/bin/pip install --no-cache-dir -r requirements.txt

RUN ln -sf /usr/bin/python $VENV/bin/python

FROM gcr.io/distroless/python3

ENV VENV=/opt/venv

COPY --chown=1000:1000 . /app

COPY --from=builder --chown=1000:1000 $VENV $VENV

WORKDIR /app

USER 1000:1000

ENV PATH="$VENV/bin:$PATH"

ENTRYPOINT [ "fastapi", "run", "app/main.py" ]

Notice that we did not create the user or group in either stages. For build stage since it is only for installing dependencies, the users and groups are not carried over to the distroless image. In the distroless image, commands such as useradd and groupadd are not available, so we cannot create a user or a group. However, we can still run the container using a specific user id and group id using USER.

You may also notice that I changed the copying of the source code slightly. Instead of copying just that app directory, I copied the entire content of the project root directory to the /app directory in the distroless image. In addidtion I set the working directory to /app later. This is because we cannot change the ownership of the working directory in the distroless image after the WORKDIR line - the chown command also does not exist. If the /app directory is onwed by root the touch file endpoint will fail as the user 1000:1000 does not have permission to create files in that directory.

docker build -t python-distroless-server .
docker run -p 8000:8000 python-distroless-server -d

The app should be up and running as before at http://localhost:8000/docs, but now running as a non-root user. However, if you test the fetch data endpoint, you will get an internal server error. And the logs will show something like this:

PermissionError: [Errno 13] Permission denied: '/scikit_learn_data'

This is because Scikit-learn (and many other libraries) typically write files in the user’s home directory by default and our distroless image does not have a home directory for the user 1000:1000. We can fix this by creating a home directory in the build stage. The updated Dockerfile is

FROM python:3.11-slim AS builder

ENV HOME=/home/1000
ENV VENV=/opt/venv

RUN mkdir $HOME

RUN python -m venv $VENV

COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
    $VENV/bin/pip install --no-cache-dir -r requirements.txt

RUN ln -sf /usr/bin/python $VENV/bin/python

FROM gcr.io/distroless/python3

ENV HOME=/home/1000
ENV VENV=/opt/venv

COPY --chown=1000:1000 . /app

COPY --from=builder --chown=1000:1000 $HOME $HOME

COPY --from=builder --chown=1000:1000 $VENV $VENV

WORKDIR /app

USER 1000:1000

ENV PATH="$VENV/bin:$PATH"

ENTRYPOINT [ "fastapi", "run", "app/main.py" ]

Here I call the home directory /home/1000, but you can name it anything you want. What’s important is that the environment variable HOME is set in the distroless stage so that libraries such as Scikit-learn will know where to write files.

If we now rebuild the image and run it again, everything should be working as expected. The fetch data endpoint will write the dataset to the /home/1000/scikit_learn_data directory.

Optional step

We can optionallly remove Python’s build tools from our image to prevent any changes. This will also prevent any libraries from potentially installing extra dependencies at runtime.

To do this we add the following to the Dockerfile:

# Build stage
...
# After installing the dependencies
RUN rm -rf $VENV/lib/python3.11/site-packages/pip* \
    $VENV/lib/python3.11/site-packages/setuptools* \
    $VENV/lib/python3.11/site-packages/wheel*
...
# Distroless stage
...

Note that if for any reason you have libraries that starts with pip, setuptools, or wheel, you will need to adjust the command.

Conclusion

That’s it! We have successfully built a distroless Python Docker image. Removing unnecessary tools and libraries makes our application more secure and largely immune to command injection attacks. Note, however, since Python is an interpreted language, it is the developer’s responsibility to ensure that the code itself is secure and does not use any unsafe functions such as eval, exec, os.system, or subprocess.run with user input. Our distroless can protect you against os.system and subprocess.run most of the time, but if you use eval or exec with user input, it is still possible for an attacker to execute arbitrary Python code.

In addition, even though we cannot exec into the container to run a shell, it is still possible to exec into the container to run a Python interpreter. Securing your cluster is still important.

Lastly, many Python libraries use command line tools (for example pygraphviz). For these libraries, the distroless image will not work properly. In such cases, consider modifying the build stage to install the required tools and copy them to the distroless image.

Final Dockerfile

Here is the final Dockerfile. There are some small differences from the above (for example I put the venv under the home directory), but the overall structure is the same.

# --- Builder Stage ---
# The distroless image currently has python 3.11
# Make sure builder python version matches distroless version
FROM python:3.11-slim AS builder

# For distroless to run as 1000 with a home directory
# Change user name as needed
ENV HOME=/home/1000
ENV VENV=$HOME/venv

RUN mkdir $HOME

RUN python -m venv $VENV

# Install dependencies
COPY requirements.txt .
RUN $VENV/bin/pip install --upgrade pip && \
    $VENV/bin/pip install --no-cache-dir -r requirements.txt

# Correct the link difference between distroless and slim
# Doing this here here because the distroless does not have ln
RUN ln -sf /usr/bin/python $VENV/bin/python

# Optional: Remove build tools
# Be careful if any of your dependencies have package names that start with
# pip, setuptools, or wheel. For example, pip-somthing. Change as needed.
# RUN rm -rf $VENV/lib/python3.11/site-packages/pip* \
#     $VENV/lib/python3.11/site-packages/setuptools* \
#     $VENV/lib/python3.11/site-packages/wheel*

# --- Distroless Stage ---
# Use tag debug if we need a shell for debugging
FROM gcr.io/distroless/python3

# Make sure these are the same as the builder stage
# since EVN variables are not passed to this stage
ENV HOME=/home/1000
ENV VENV=$HOME/venv

# Note: copy before setting workdir as distroless cannot chown
# Copy app code with the correct ownership
COPY --chown=1000:1000 . /app

COPY --from=builder --chown=1000:1000 $HOME $HOME

# Only run below if VENV is not under HOME
# COPY --from=builder --chown=1000:1000 $VENV $VENV

WORKDIR /app

# Use non-root user
USER 1000:1000

# Set entrypoint and use venv's python for libraries
ENV PATH="$VENV/bin:$PATH"

ENTRYPOINT [ "fastapi", "run", "app/main.py" ]