"Connection Refused" Issue while running ollama in container with LLM Chat bot app in another docker Container

I have created a local chatbot in python 3.12 that allows user to chat with pdf uploaded by creating embeddings in qdrant vector database and further getting inference from ollama (Model LLama3.2:3B).
In my source code, I am using the following dependencies:

streamlit 
langchain
langchain_community
langchain_core
python-dotenv
langchain-huggingface
langchain-qdrant
langchain-ollama
unstructured[pdf]
onnx==1.16.1
qdrant-client
torch
torchvision
torchaudio

Since I want to deploy the code on a server (where there is no dependencies installed), I will be using docker to run the containers for qdrant, chatbotapp and ollama. I have successfully pulled ollama latest image and qdrant using docker.

docker run -d -v D:\myollamamodels:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3.2:3b

Both Ollama and docker container are running and accessible from within. Checked using Docker desktop aswell. I have also bridged the chatbot app, ollama and qdrant container onto single network using:

docker network connect my_network ollama
docker network connect my_network qdrant

Now when i run the app, it does open and allowing me to upload the pdf, create the embedding and my embeddings are also successfully store din vector DB( I have included relevant print statements which are reflected in app GUI). Now the issue comes when i want to chat with the document, so when i enter the question, it waits and instead of responding with the inference output , it provides me the error: ":warning: An error occurred while processing your request: [Errno 111] Connection refused".

I have the docker compose file as below:

version: '3.8'

services:
  qdrant:
    image: qdrant/qdrant:v1.12.1  
    container_name: qdrant
    ports:
      - "6333:6333"  # Expose Qdrant on the default port
    volumes:
      - qdrant_data:/qdrant/storage
    networks:
      - my_network  # Connect qdrant to my_network

  ollama:
    image: ollama/ollama:latest  
    container_name: ollama
    ports:
      - "11434:11434"  # Expose Ollama on the default port 
    environment:
      - OLLAMA_MODEL=llama3.2:3b  
    #command: ["--pull-model", "llama3.2:3b"]  # Ensure the model downloads on startup
    volumes:
      - /d/myollamamodels:/models  
    networks:
      - my_network

  app:
    build: .
    container_name: app_new
    ports:
      - "8501:8501"  # Streamlit default port
    environment:
      QDRANT_URL: http://qdrant:6333  # Use Qdrant service name from Docker Compose
      OLLAMA_URL: http://ollama:11434
      #OLLAMA_MODEL: http://host.docker.internal:11434/llama3.2:3b  # Point to Ollama on host
    depends_on:
      - qdrant
      - ollama
    volumes:
      - ./models:/models  # Mount the model directory for access
    networks:
      - my_network  # Connect app to my_network


volumes:
  qdrant_data:

networks:
  my_network:
    driver: bridge

The python program and class which I have been using for AI chatbot is as follows: Streamlit app code and vector embeddings code are in different.py files.

class ChatbotManager:
    def __init__(
        self,
        model_name: str = "BAAI/bge-small-en",
        device: str = "cpu",
        encode_kwargs: dict = {"normalize_embeddings": True},
        llm_model: str = "llama3.2:3b",
        #llm_model: str = None,  # Set to None to use environment variable
        llm_temperature: float = 0.7,
        qdrant_url: str = "http://qdrant:6333",
        ollama_url: str = "http://ollama:11434",  # URL for Ollama inside Docker network
        collection_name: str = "vector_db",
    ):
        """
        Initializes the ChatbotManager with embedding models, LLM, and vector store.

        Args:
            model_name (str): The HuggingFace model name for embeddings.
            device (str): The device to run the model on ('cpu' or 'cuda').
            encode_kwargs (dict): Additional keyword arguments for encoding.
            llm_model (str): The local LLM model name for ChatOllama.
            llm_temperature (float): Temperature setting for the LLM.
            qdrant_url (str): The URL for the Qdrant instance.
            collection_name (str): The name of the Qdrant collection.
        """
        self.model_name = model_name
        self.device = device
        self.encode_kwargs = encode_kwargs
        #self.llm_model = llm_model
        # Get the LLM model name from the environment variable
        self.llm_model = os.getenv("OLLAMA_MODEL", llm_model)
        self.llm_temperature = llm_temperature
        self.qdrant_url = qdrant_url
        self.collection_name = collection_name
        self.ollama_url = ollama_url  # Initialize ollama_url

        # Initialize Embeddings
        self.embeddings = HuggingFaceBgeEmbeddings(
            model_name=self.model_name,
            model_kwargs={"device": self.device},
            encode_kwargs=self.encode_kwargs,
        )

        # Initialize Local LLM
        self.llm = ChatOllama(
            model=self.llm_model,
            temperature=self.llm_temperature,
            server_url=self.ollama_url
            # Add other parameters if needed
        )

        # Define the prompt template
        self.prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""

        # Initialize Qdrant client
        self.client = QdrantClient(
            url=self.qdrant_url, prefer_grpc=False
        )

        # Initialize the Qdrant vector store
        self.db = Qdrant(
            client=self.client,
            embeddings=self.embeddings,
            collection_name=self.collection_name
        )

        # Initialize the prompt
        self.prompt = PromptTemplate(
            template=self.prompt_template,
            input_variables=['context', 'question']
        )

        # Initialize the retriever
        self.retriever = self.db.as_retriever(search_kwargs={"k": 1})

        # Define chain type kwargs
        self.chain_type_kwargs = {"prompt": self.prompt}

        # Initialize the RetrievalQA chain with return_source_documents=False
        self.qa = RetrievalQA.from_chain_type(
            llm=self.llm,
            chain_type="stuff",
            retriever=self.retriever,
            return_source_documents=False,  # Set to False to return only 'result'
            chain_type_kwargs=self.chain_type_kwargs,
            verbose=False
        )

    def get_response(self, query: str) -> str:
        """
        Processes the user's query and returns the chatbot's response.

        Args:
            query (str): The user's input question.

        Returns:
            str: The chatbot's response.
        """
        try:
            response = self.qa.run(query)
            return response  # 'response' is now a string containing only the 'result'
        except Exception as e:
            st.error(f"An error occurred while processing your request: {e}")
            return "Sorry, I couldn't process your request at the moment."

Logs of app container:

2024-10-30 16:47:13 2024-10-30 11:17:13.140 Examining the path of torch.classes raised: Tried to instantiate class 'path.path’, but it does not exist! Ensure that it is registered via torch::class

2024-10-30 16:49:55 2024-10-30 11:19:55.974 Examining the path of torch.classes raised: Tried to instantiate class 'path.path’, but it does not exist! Ensure that it is registered via torch::class

2024-10-30 16:50:44 /app/chatbot.py:119: LangChainDeprecationWarning: The method Chain.run was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke instead.

2024-10-30 16:50:44 response = self.qa.run(query)

I have looked into it many times and modified it based on ollama_url and other factors such as checking ollama service availability, ollama container status, modification of yml file, but none seem to work and I am struck at this error. The entire code is working well though within the development environment without docker (and with ollama as service on host) but I need to deploy it at the earliest on a server to make it available on network.

I have checked ollama container service is working on port 11434 (did checked it via url and also via docker command) and qdrant is also working since the embedding are created and are show via successful message in the APP UI but somehow the connection to ollama is being refused I guess.

Could someone please explain the issue and solution for this problem. Thanks.

Check your browsers developer tools network tab so see to which target URL the web app running in your browser wants to connect to. The Docker network internal service hostnames are probably not available within the browser, so you would need to set the correct target URL.

Hi, thanks for the suggestion. Tried using the Network tools of browser. Here’s the brief of the findings:
POST Request
Under Header TAB

host api.segment.io
filename /v1/t
Address 35.81.90.104:443

Couldn’t find anything specific to ollama service or IP address.

Is this table quoted from the browser’s network tab? It should show you much more. It is not clear to me what gives you the error message exactly. Is it the client side or somethin on the server side? Even if the error happens on server side, a request could return a valid json with the error message. The it is possible that you will not see an error in the webbrowser.

Doesn’t it just look like a dependency is missing?

This is just a deprecation warning, indicating that the method you use is going to be removed in one of the next releases of the package. This is a technical dept that needs to be fixed short term, as the application will break once the deprecated method is removed.