Why does the model data need to be stored in the image? Download the model data ...

za_mike157 · 2026-03-09T19:09:01 1773083341

You are correct! From our tests, storing model weights in the image actually isn't a preferred approach for model weights larger than ~1GB. We run a distributed, multi-layer cache system to combat this and we can load roughly 6-7GB of files in p99 of <2.5s

jono_irwin · 2026-03-09T19:07:53 1773083273

hey cosmotic, we're not really advocating for storing model weights in the container image.

even the smaller nvidia images (like nvidia/cuda:13.1.1-cudnn-runtime-ubuntu24.04) are about 2Gb before adding any python deps and that is a problem.

if you split the image into chunks and pull on-demand, your container will start much faster.

fwip · 2026-03-09T20:09:11 1773086951

Just pre-install the NVIDIA layer on the filesystem instead of docker-pulling it for every single machine.