My previous article recommended that one should reconsider using Python in production. However, there’s one category of use case where Python is the dominant option for running production workloads. And that’s data analysis and machine learning.

Almost all bleeding-edge work in data analysis and machine learning, especially around LLMs, happens in Python.

So, here are some of my learnings on how to run Python in production.

Project quality

Package manager

Python has a fragmented ecosystem of package managers. The only ones I can recommend are poetry and uv . After learning about uv on Hacker News , I decided to give it a try . uv is blazingly fast and manages the Python binary as well. It even supports migrations from other package managers. The only downside is that uv is still not on a stable release yet.

Bash
1
2
3
# uv is really fast for both fresh and incremental package updates
$ uv sync --all-groups
Resolved 193 packages in 9ms

Linters

Since Python is a dynamically typed language, it is very easy to write code that is either outright broken or breaks along certain code paths.

Linters are the first line of defense against such code. There is a plethora of linters available for Python. None seems to be sufficient on its own. My current stack consists of autoflake , flake8 , ruff , isort , and pylint .

Makefile
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
format:
	uv run autoflake --in-place -r --remove-all-unused-imports --remove-unused-variables .
	uv run autopep8 --recursive --in-place --select W292,W293,W391,E121,E122,E123,E126,E128,E129,E131,E202,E225,E226,E241,E301,E302,E303,E704,E731 .
	uv run ruff check --config pyproject.toml --fix .
	# Same line length as Black
	uv run isort --line-length 88 .

lint:
	uv run autoflake --check-diff -r --quiet \
		--remove-all-unused-imports --remove-unused-variables --remove-duplicate-keys .
	# W503 has been deprecated in favor of W504 - https://www.flake8rules.com/rules/W503.html
	uv run flake8 . --extend-exclude venv --count --show-source --statistics --max-line-length=88 --ignore=E501,W503
	# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
	uv run flake8 . --extend-exclude venv --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
	# Config file is specified for brevity
	uv run ruff check --config pyproject.toml .
	# Same line length as Black
	uv run isort --check --diff --line-length 88 .
	uv run pylint --rcfile=../../.pylintrc --output-format=colorized .

Microsoft’s pyright might be good but, in my experience, produces too many false positives.

I haven’t yet found a good way to enforce type hints or type checking in Python.

Prevent secret leaks

Use gitguardian , gitleaks , or noseyparker to prevent secrets from being committed to the repository.

In my experience, GitGuardian is the best, but it is a closed-source tool, while Gitleaks and Noseyparker are open-source.

This advice isn’t specific to Python, but something that engineers who have spent a lot of time writing non-production code in Python Notebooks, do make the mistake of.

Use git commit hook

[Pre-commit](( https://pre-commit.com/ ) hooks are good for enforcing code quality. This is not specific to Python either but is a good practice that’s useful when you are working with data engineers and data scientists who excel at data analysis more than writing production-ready code.

Project maintainability

FastAPI

If you are writing a web service, then go for a combination of fastapi and gunicorn . In my benchmarking, everything else being equal, FastAPI+gunicorn has 3X the throughput of Flask+gunicorn.

Bash
1
2
3
4
5
$ h2load --h1 -n1000 -c40 <Flask+gunicorn>
...
                      min         max         mean         sd        +/- sd
time for request:      406us     83.37ms     12.56ms     22.46ms    86.70%
req/s           :      75.98       83.04       79.36        2.08    60.00%
Bash
1
2
3
4
5
6
$ h2load --h1 -n1000 -c40 <Fastapi+gunicorn>
...
                      min         max         mean         sd        +/- sd
 time for request:      825us     29.78ms      4.07ms      6.13ms    91.70%
 req/s           :     231.07      256.41      241.98        8.86    52.50%
...

Data classes

Use data-classes or more advanced pydantic for holding data and use helper classes to group pure functions that operate on those data classes. I was planning to write more, but then I came across this recently written elaborate article on this topic.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from typing import List
import uuid
import dataclasses


@dataclasses.dataclass
class Person:
    id: uuid.UUID
    name: str
    degrees: List[str]

Avoid async and multi-threading

Python’s GIL and async are a mess. Multi-threading in Python codebases is not well tested and is a source of bugs . It is best to avoid any concurrency in Python codebases. If you need performance, use multiple processes instead.

Dependency management

pip-audit could be useful for dependencies with known vulnerabilities. I have never found anything useful, primarily because I use dependabot for automatic dependency updates.

Yaml
1
2
3
4
5
6
7
8
# Sample dependabot config
version: 2
updates:
- package-ecosystem: "pip"
  directory: "/<path-to-directory-containing-requirements-or-pyproject.toml>"
  schedule:
    interval: "daily"
  open-pull-requests-limit: 1

Further, deptry is a useful tool for finding unused dependencies in Python projects. The results do contain false positives, but it is a good starting point for cleaning up unused dependencies.

Bash
1
2
$ pipx run deptry .  # or uv add --group dev deptry && uv run deptry .
...

Keep code legally compliant

Python has a lot of libraries with licenses that could be troublesome for the codebase. E.g., libraries with GPL licenses that could make the whole codebase GPL. To avoid it, use licensecheck on CI .

Bash
1
2
3
4
5
6
$ uv run licensecheck --format ansi \
  --only-licenses apache bsd isc mit mpl python unlicense \
  --fail-licenses gpl \
  --show-only-failing \
  --zero
...

Deployments

Docker

Use docker for deployments. Even if you are using GPU-enabled VMs, use Docker and expose the GPU to the container with the following parameter.

Bash
1
docker run --gpus all ...

Further, use multi-stage builds where you use poetry/uv to build the package and then copy the built package to a smaller base image on top of python:3.XX-slim .

I have tried Python’s Alpine-based images (python:alpine) and for any non-trivial project, it is very hard to use it due to Debian’s glibc vs Alpine’s musl differences. So, I would recommend against using Alpine-based images for Python.

Nore that while there have been attempts at making Python faster like PyPy , and Condon , they are really difficult to use for any non-trivial project. So, stick to the standard Python interpreter.

Use CPU-only libraries for non-GPU deployments

PyTorch is huge. If you are going to be using pytorch in a non-GPU deployment, then use the CPU-only version. It is significantly smaller with no loss of accuracy.

You can configure this with multi-stage Docker builds or uv has a detailed explanation on how to do this using pyproject.toml.

Bash
1
2
$ pip3 install torch --index-url https://download.pytorch.org/whl/cpu
...

Compile code during build

Compile code during Docker builds. This ensures that the .pyc files exist. It is especially useful for faster boot times during container auto-scaling.

Dockerfile
1
RUN python -m compileall <code_dir>

Download external dependencies at build time

Many libraries like spacy and transformers download large chunks of data on the first use. This not only slows down the container boot time but also makes the Docker build non-hermetic. This was exposed during a HuggingFace outage last year.

Further, prevent downloads during execution with additional library-specific guards.

Dockerfile
1
2
3
4
5
6
# First, Download models
...
# And then disable access to HuggingFace completely
ENV TRANSFORMERS_OFFLINE=1
ENV HF_HUB_OFFLINE=1
ENTRYPOINT ...

Alternatively, you can place these models in cloud/VM storage (PVC on Kubernetes) and mount them as Docker volumes during runtime. For larger models, usually, this is the only choice as building and deploying 5 GiB+ docker images is noticeably slower.

Run Docker containers as a non-root user

The Python docker images have a much larger attack surface than my favorite scratch image for Go deployments.

One should run Python-based containers as a non-root user to reduce the attack surface.

Dockerfile
1
2
3
4
5
6
# In the final build stage
RUN groupadd -r appuser && useradd -r -g appuser appuser
COPY --chown=appuser:appuser --from=previous-step /app /app
USER appuser
# Rest of the build steps
ENTRYPOINT ...