Okay, so the minute I responded here, I received an update fro the Model Runner team on GitHub stating that the feature is available and my ticket was closed, so I assume someone was watching this thread. So that is good news! I have not tried the feature yet, but will share the link here for your reference:
opened 03:17PM - 28 Jul 25 UTC
closed 03:27PM - 15 Aug 25 UTC
enhancement
Hello, I was directed here from the Moby community.
I am running the following … Docker (Google Gemini) model: ai/gemma3n (Quantization: IQ2_XXS/Q4_K_M). This model is advertised as supporting multimodal inputs, like images, video, and audio.
I can successfully connect to the model, prompt with text, and receive a response.
However, when I post an image (converted to JSON, RAW BASE64, and 512x512, as required by the compatible OpenAI API), I receive the following error (at the end of this message), which suggests that perhaps MMPROJ (the Ollama Multimodal Projector) is not enabled in Docker. Am I correct in saying that I cannot use the Docker OpenAI-compatible API to use multimodal vision?
If I am incorrect, and I can submit images, audio, and video (multimodal options listed in the Docker/Gemma Model), could you provide the steps to get it working?
```
{
"errorMessage": "The service was not able to process your request",
"errorDescription": "image input is not supported - hint: if this is unexpected, you may need to provide the mmproj",
"errorDetails": {
"rawErrorMessage": [
"500 - \"{\\\"error\\\":{\\\"code\\\":500,\\\"message\\\":\\\"image input is not supported - hint: if this is unexpected, you may need to provide the mmproj\\\",\\\"type\\\":\\\"server_error\\\"}}\""
],
"httpCode": "500"
}
```
1 Like