The Docker Model Runner plugin lets you:
- Pull models from Docker Hub
- Run AI models directly from the command line
- Manage local models (add, list, remove)
- Interact with models using a submitted prompt or in chat mode
Models are pulled from Docker Hub the first time they’re used and stored locally. They’re loaded into memory only at runtime when a request is made, and unloaded when not in use to optimize resources. Since models can be large, the initial pull may take some time — but after that, they’re cached locally for faster access. You can interact with the model using OpenAI-compatible APIs.