How to monitor/diagnose LLM Model execution?

Good Day,

i am currently evaluating the new Docker Desktop Feature to run LLM Models with Docker. Is there any way to diagnose / monitor the execution of an LLM? Currently, after calling a LLM, there quite some time passing (due to my pc) until some output is coming again. It would be great to have some means of probing the current state the LLM is in/ what is doing, e.g. Tools Calls, MCP Actions,…

Thank you and best Regards,
UIi

You could try

docker model logs --follow

in anothet terminal. This is what I got from docker model run ai/llama3.2:latest until a simply “Hi” message sent to the model.

[2025-08-03T18:01:04.780505000Z][inference.model-manager] Getting model by reference: ai/llama3.2:latest
[2025-08-03T18:01:04.781339000Z][inference.model-manager][E] Failed to get model: model not found reference: ai/llama3.2:latest
[2025-08-03T18:01:04.781964000Z][inference] Pulling model: ai/llama3.2:latest
[2025-08-03T18:01:04.782415000Z][inference.model-manager] Starting model pull: ai/llama3.2:latest
[2025-08-03T18:01:05.911036000Z][inference.model-manager] Remote model digest: sha256:436bb282b41968a83638482999980267ca8d7e8b5574604460efa9efff11cf59
[2025-08-03T18:01:05.911479000Z][inference.model-manager] Model not found in local store, pulling from remote: ai/llama3.2:latest
[2025-08-03T18:03:38.700910000Z][inference.model-manager] Getting model by reference: ai/llama3.2:latest
[2025-08-03T18:03:38.708529000Z] srv  params_from_: Chat format: Content-only
[2025-08-03T18:03:38.709646000Z] slot launch_slot_: id  0 | task 100 | processing task
[2025-08-03T18:03:38.709780000Z] slot update_slots: id  0 | task 100 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 36
[2025-08-03T18:03:38.709908000Z] slot update_slots: id  0 | task 100 | need to evaluate at least 1 token for each active slot, n_past = 36, n_prompt_tokens = 36
[2025-08-03T18:03:38.710120000Z] slot update_slots: id  0 | task 100 | kv cache rm [35, end)
[2025-08-03T18:03:38.710214000Z] slot update_slots: id  0 | task 100 | prompt processing progress, n_past = 36, n_tokens = 1, progress = 0.027778
[2025-08-03T18:03:38.710395000Z] slot update_slots: id  0 | task 100 | prompt done, n_past = 36, n_tokens = 1
[2025-08-03T18:03:39.614485000Z] slot      release: id  0 | task 100 | stop processing: n_past = 43, truncated = 0
[2025-08-03T18:03:39.614719000Z] slot print_timing: id  0 | task 100 |
[2025-08-03T18:03:39.614760000Z] prompt eval time =     593.08 ms /     1 tokens (  593.08 ms per token,     1.69 tokens per second)
[2025-08-03T18:03:39.614799000Z]        eval time =     307.84 ms /     8 tokens (   38.48 ms per token,    25.99 tokens per second)
[2025-08-03T18:03:39.614834000Z]       total time =     900.92 ms /     9 tokens
[2025-08-03T18:03:39.614937000Z] srv  update_slots: all slots are idle
[2025-08-03T18:03:39.615102000Z] srv  log_server_r: request: POST /v1/chat/completions  200
1 Like

Hello rimelek,
thank you very much for your reply!
I have already look at the logs-section of the models in docker desktop. Actually, I am looking for more detailed updates / more insights to the interactions currently going on.
Thank you and best regards,
Uli

I’m not aware of more detailed logs. There is a --debug flag of docker model run, but I didn’t notice the difference.

If you know anything in the OpenAPI reference that would help you, you can enable the “host-side TCP support” in Docker Desktop’s settings on the “Beta features” tab, but I’m sure you know that as it is also in the menitoned documentation.

If you have a specific feature you would like to see in the model runner, you can ask for it in the Roadmap

1 Like

Dear rimelek,
Thank you for your reply. then ill do so, Docker surely will provide excellent solutions :slight_smile:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.