Hugging Face is GitHub for machine learning models. Their on-the-fly model download scheme, however, is difficult from a DevOps perspective.

Here’s how to disable it.

One huge problem with downloading models on the fly is that at best, your docker images are no longer hermetic. At worst, your deployments start to fail when Hugging Face goes down, just like it went down yesterday for multiple hours.

Here’s a how to make your docker images hermetic.

Dockerfile
1
2
3
4
5
6
7
8
# During the docker build trigger download of models e.g. via embedding computation
...
# Now, disable only access completely
ENV TRANSFORMERS_OFFLINE=1
ENV HF_HUB_DISABLE_TELEMETRY=1
ENV HF_HUB_OFFLINE=1

ENTRYPOINT ...

Do verify by running these images on the CI before deployment as it might fail in case you forgot to download certain models. Now, the huge advantage of this process is that your deployed binaries are hermetic. They no longer depend on Hugging Face being up or returning a different/newer model anymore.