Persist AI Model Runner llama-server.exe for more than 5 minutes

cbruhy · August 5, 2025, 9:03pm

Running:
Windows 11 24H2, OS Build 26100.4652
Docker Desktop 4.43.2

The issue/feature:
The model I use is large and takes time to load in memory. If I don’t ask it a question before the end of 5 minutes, it unloads. It will happily reload after, but now I have to wait for it to reload. Is there a keepalive setting somewhere to keep the model alive in memory for an indeterminate amount of time?

Steps to replicate issue/feature:

Open Docker Desktop
Select [Models] tab
Select an existing model to open the chat window
Ask a question …
… the com.docker.llama-server.exe starts and loads the model into memory, answers the question
Let sit idle and after 5 minutes, com.docker.llama-server.exe unloads and the model is cleaned from memory

Tried with no success

Disabled Docker Desktop Settings → Resources → Resource Saver
Looked over docker *-options.json files for possible relevant parameters

Topic		Replies	Views
About the Model Runner category Model Runner	0	31	April 24, 2025
Introducing Docker Model Runner Announcements model-runner	0	199	April 29, 2025
How to monitor/diagnose LLM Model execution? Docker Desktop	4	58	August 5, 2025
When will the Docker Model Runner work on Docker Desktop for Linux Model Runner	1	52	July 29, 2025
Docker "fakes" a upload Docker Desktop docker	1	28	July 20, 2024

Persist AI Model Runner llama-server.exe for more than 5 minutes

Related topics