How to run Python in production

My previous article recommended that one should reconsider using Python in production. However, there’s one category of use case where Python is the dominant option for running production workloads. And that’s data analysis and machine learning.

Almost all bleeding-edge work in data analysis and machine learning, especially around LLMs, happens in Python.

So, here are some of my learnings on how to run Python in production.

Project quality

Package manager

Python has a fragmented ecosystem of package managers. The only ones I can recommend are poetry and uv. After learning about uv on Hacker News, I decided to give it a try. uv is blazingly fast and manages the Python binary as well. It even supports migrations from other package managers. The only downside is that uv is still not on a stable release yet.

Bash

1
2
3
# uv is really fast for both fresh and incremental package updates
$ uv sync --all-groups
Resolved 193 packages in 9ms

Linters

Since Python is a dynamically typed language, it is very easy to write code that is either outright broken or breaks along certain code paths.

Linters are the first line of defense against such code. There is a plethora of linters available for Python. None seems to be sufficient on its own. My current stack consists of ruff, ~~autoflake~~, ~~flake8~~, ~~isort~~, and ~~pylint.~~

Makefile

1
2
3
4
5
6
7
format:
	# I enable a lot of linters including isort, flake8, autoflake, pylint equivalent and others
	uv run ruff check --config pyproject.toml --fix .

lint:
	# Config file is specified for brevity
	uv run ruff check --config pyproject.toml .

Microsoft’s pyright might be good but, in my experience, produces too many false positives.

mypy is even worse, see this detailed discussion.

~~I haven’t yet found a good way to enforce type hints or type checking in Python.~~

Update (May 2025)

A good way to enforce types in Python is to use flake8-annotations.
I no longer use flake8 and autoflake explicitly. I prefer using flake8-related rules via ruff linter
I no longer use isort directly. My recommendation is to enable isort via ruff.
I no longer use pylint directly. My recommendation would be to enable pylint via ruff.
I am hopeful for ty, it is still in alpha release as of May 2025.

Prevent secret leaks

Use gitguardian, gitleaks, or noseyparker to prevent secrets from being committed to the repository.

In my experience, GitGuardian is the best, but it is a closed-source tool, while Gitleaks and Noseyparker are open-source.

This advice isn’t specific to Python, but something that engineers who have spent a lot of time writing non-production code in Python Notebooks, do make the mistake of.

Use git commit hook

Pre-commit hooks are good for enforcing code quality. This is not specific to Python either but is a good practice that’s useful when you are working with data engineers and data scientists who excel at data analysis more than writing production-ready code.

Project maintainability

FastAPI

If you are writing a web service, then go for a combination of fastapi and gunicorn. In my benchmarking, everything else being equal, FastAPI+gunicorn has 3X the throughput of Flask+gunicorn.

Bash

1
2
3
4
5
$ h2load --h1 -n1000 -c40 <Flask+gunicorn>
...
                      min         max         mean         sd        +/- sd
time for request:      406us     83.37ms     12.56ms     22.46ms    86.70%
req/s           :      75.98       83.04       79.36        2.08    60.00%

Bash

1
2
3
4
5
6
$ h2load --h1 -n1000 -c40 <Fastapi+gunicorn>
...
                      min         max         mean         sd        +/- sd
 time for request:      825us     29.78ms      4.07ms      6.13ms    91.70%
 req/s           :     231.07      256.41      241.98        8.86    52.50%
...

Data classes

Use data-classes or more advanced pydantic for holding data and use helper classes to group pure functions that operate on those data classes. I was planning to write more, but then I came across this recently written elaborate article on this topic.

Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from typing import List
import uuid
import dataclasses


@dataclasses.dataclass
class Person:
    id: uuid.UUID
    name: str
    degrees: List[str]

Avoid multi-threading

Python’s GIL is a mess. Multi-threading in Python codebases is not well tested and is a source of bugs.

It is best to avoid any concurrency in Python codebases. If you need performance, use multiple processes instead.

Edit: After this article went viral on Reddit, I updated to clarify my opinion. asyncio is a great way to write concurrent code in Python. Using libraries like fastapi that use asyncio underneath is good. However, writing async functions (async def ...) should be done only at one’s discretion.

Dependency management

pip-audit could be useful for dependencies with known vulnerabilities. I have never found anything useful, primarily because I use dependabot for automatic dependency updates.

Yaml

1
2
3
4
5
6
7
8
# Sample dependabot config
version: 2
updates:
- package-ecosystem: "pip"
  directory: "/<path-to-directory-containing-requirements-or-pyproject.toml>"
  schedule:
    interval: "daily"
  open-pull-requests-limit: 1

Further, deptry is a useful tool for finding unused dependencies in Python projects. The results do contain false positives, but it is a good starting point for cleaning up unused dependencies.

Bash

1
2
$ pipx run deptry .  # or uv add --group dev deptry && uv run deptry .
...

Keep code legally compliant

Python has a lot of libraries with licenses that could be troublesome for the codebase. E.g., libraries with GPL licenses that could make the whole codebase GPL. To avoid it, use licensecheck on CI.

Bash

1
2
3
4
5
6
$ uv run licensecheck --format ansi \
  --only-licenses apache bsd isc mit mpl python unlicense \
  --fail-licenses gpl \
  --show-only-failing \
  --zero
...

Deployments

Docker

Use docker for deployments. Even if you are using GPU-enabled VMs, use Docker and expose the GPU to the container with the following parameter.

Bash

1
docker run --gpus all ...

Further, use multi-stage builds where you use poetry/uv to build the package and then copy the built package to a smaller base image on top of python:3.XX-slim.

I have tried Python’s Alpine-based images (python:alpine) and for any non-trivial project, it is very hard to use it due to Debian’s glibc vs Alpine’s musl differences. So, I would recommend against using Alpine-based images for Python.

Nore that while there have been attempts at making Python faster like PyPy, and Codon, they are really difficult to use for any non-trivial project. So, stick to the standard Python interpreter.

Use CPU-only libraries for non-GPU deployments

PyTorch is huge. If you are going to be using pytorch in a non-GPU deployment, then use the CPU-only version. It is significantly smaller with no loss of accuracy.

You can configure this with multi-stage Docker builds or uv has a detailed explanation on how to do this using pyproject.toml.

Bash

1
2
$ pip3 install torch --index-url https://download.pytorch.org/whl/cpu
...

Compile code during build

Compile code during Docker builds. This ensures that the .pyc files exist. It is especially useful for faster boot times during container auto-scaling.

Dockerfile

1
RUN python -m compileall <code_dir>

Download external dependencies at build time

Many libraries like spacy and transformers download large chunks of data on the first use. This not only slows down the container boot time but also makes the Docker build non-hermetic. This was exposed during a HuggingFace outage last year.

Further, prevent downloads during execution with additional library-specific guards.

Dockerfile

1
2
3
4
5
6
# First, Download models
...
# And then disable access to HuggingFace completely
ENV TRANSFORMERS_OFFLINE=1
ENV HF_HUB_OFFLINE=1
ENTRYPOINT ...

Alternatively, you can place these models in cloud/VM storage (PVC on Kubernetes) and mount them as Docker volumes during runtime. For larger models, usually, this is the only choice as building and deploying 5 GiB+ docker images is noticeably slower.

Run Docker containers as a non-root user

The Python docker images have a much larger attack surface than my favorite scratch image for Go deployments.

One should run Python-based containers as a non-root user to reduce the attack surface.

Dockerfile

1
2
3
4
5
6
# In the final build stage
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY --chown=appuser:appuser --from=previous-step /app /app
USER appuser
# Rest of the build steps
ENTRYPOINT ...

Updates after going viral on Reddit & Hacker News

This article went viral on r/python sub-reddit. It is a good discussion about various points mentioned in this article.
Further, it was given a second-chance on Hacker News, because moderators liked it so much and it ended up on the Hacker News front page. There is some interesting discussion there as well.
This article has been featured in TLDR Tech, Russian Python Digest, Pycoders Weekly, and Big Data News Weekly

Project quality#

Package manager#

Linters#

Update (May 2025)#

Prevent secret leaks#

Use git commit hook#

Project maintainability#

FastAPI#

Data classes#

Avoid multi-threading#

Dependency management#

Keep code legally compliant#

Deployments#

Docker#

Use CPU-only libraries for non-GPU deployments#

Compile code during build#

Download external dependencies at build time#

Run Docker containers as a non-root user#

Updates after going viral on Reddit & Hacker News#

Related Posts

Project quality

Package manager

Linters

Update (May 2025)

Prevent secret leaks

Use git commit hook

Project maintainability

FastAPI

Data classes

Avoid multi-threading

Dependency management

Keep code legally compliant

Deployments

Docker

Use CPU-only libraries for non-GPU deployments

Compile code during build

Download external dependencies at build time

Run Docker containers as a non-root user

Updates after going viral on Reddit & Hacker News