I have created a local chatbot in python 3.12 that allows user to chat with pdf uploaded by creating embeddings in qdrant vector database and further getting inference from ollama (Model LLama3.2:3B).
In my source code, I am using the following dependencies:
streamlit
langchain
langchain_community
langchain_core
python-dotenv
langchain-huggingface
langchain-qdrant
langchain-ollama
unstructured[pdf]
onnx==1.16.1
qdrant-client
torch
torchvision
torchaudio
Since I want to deploy the code on a server (where there is no dependencies installed), I will be using docker to run the containers for qdrant, chatbotapp and ollama. I have successfully pulled ollama latest image and qdrant using docker.
docker run -d -v D:\myollamamodels:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama run llama3.2:3b
Both Ollama and docker container are running and accessible from within. Checked using Docker desktop aswell. I have also bridged the chatbot app, ollama and qdrant container onto single network using:
docker network connect my_network ollama
docker network connect my_network qdrant
Now when i run the app, it does open and allowing me to upload the pdf, create the embedding and my embeddings are also successfully store din vector DB( I have included relevant print statements which are reflected in app GUI). Now the issue comes when i want to chat with the document, so when i enter the question, it waits and instead of responding with the inference output , it provides me the error: " An error occurred while processing your request: [Errno 111] Connection refused".
I have the docker compose file as below:
version: '3.8'
services:
qdrant:
image: qdrant/qdrant:v1.12.1
container_name: qdrant
ports:
- "6333:6333" # Expose Qdrant on the default port
volumes:
- qdrant_data:/qdrant/storage
networks:
- my_network # Connect qdrant to my_network
ollama:
image: ollama/ollama:latest
container_name: ollama
ports:
- "11434:11434" # Expose Ollama on the default port
environment:
- OLLAMA_MODEL=llama3.2:3b
#command: ["--pull-model", "llama3.2:3b"] # Ensure the model downloads on startup
volumes:
- /d/myollamamodels:/models
networks:
- my_network
app:
build: .
container_name: app_new
ports:
- "8501:8501" # Streamlit default port
environment:
QDRANT_URL: http://qdrant:6333 # Use Qdrant service name from Docker Compose
OLLAMA_URL: http://ollama:11434
#OLLAMA_MODEL: http://host.docker.internal:11434/llama3.2:3b # Point to Ollama on host
depends_on:
- qdrant
- ollama
volumes:
- ./models:/models # Mount the model directory for access
networks:
- my_network # Connect app to my_network
volumes:
qdrant_data:
networks:
my_network:
driver: bridge
The python program and class which I have been using for AI chatbot is as follows: Streamlit app code and vector embeddings code are in different.py files.
class ChatbotManager:
def __init__(
self,
model_name: str = "BAAI/bge-small-en",
device: str = "cpu",
encode_kwargs: dict = {"normalize_embeddings": True},
llm_model: str = "llama3.2:3b",
#llm_model: str = None, # Set to None to use environment variable
llm_temperature: float = 0.7,
qdrant_url: str = "http://qdrant:6333",
ollama_url: str = "http://ollama:11434", # URL for Ollama inside Docker network
collection_name: str = "vector_db",
):
"""
Initializes the ChatbotManager with embedding models, LLM, and vector store.
Args:
model_name (str): The HuggingFace model name for embeddings.
device (str): The device to run the model on ('cpu' or 'cuda').
encode_kwargs (dict): Additional keyword arguments for encoding.
llm_model (str): The local LLM model name for ChatOllama.
llm_temperature (float): Temperature setting for the LLM.
qdrant_url (str): The URL for the Qdrant instance.
collection_name (str): The name of the Qdrant collection.
"""
self.model_name = model_name
self.device = device
self.encode_kwargs = encode_kwargs
#self.llm_model = llm_model
# Get the LLM model name from the environment variable
self.llm_model = os.getenv("OLLAMA_MODEL", llm_model)
self.llm_temperature = llm_temperature
self.qdrant_url = qdrant_url
self.collection_name = collection_name
self.ollama_url = ollama_url # Initialize ollama_url
# Initialize Embeddings
self.embeddings = HuggingFaceBgeEmbeddings(
model_name=self.model_name,
model_kwargs={"device": self.device},
encode_kwargs=self.encode_kwargs,
)
# Initialize Local LLM
self.llm = ChatOllama(
model=self.llm_model,
temperature=self.llm_temperature,
server_url=self.ollama_url
# Add other parameters if needed
)
# Define the prompt template
self.prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Context: {context}
Question: {question}
Only return the helpful answer. Answer must be detailed and well explained.
Helpful answer:
"""
# Initialize Qdrant client
self.client = QdrantClient(
url=self.qdrant_url, prefer_grpc=False
)
# Initialize the Qdrant vector store
self.db = Qdrant(
client=self.client,
embeddings=self.embeddings,
collection_name=self.collection_name
)
# Initialize the prompt
self.prompt = PromptTemplate(
template=self.prompt_template,
input_variables=['context', 'question']
)
# Initialize the retriever
self.retriever = self.db.as_retriever(search_kwargs={"k": 1})
# Define chain type kwargs
self.chain_type_kwargs = {"prompt": self.prompt}
# Initialize the RetrievalQA chain with return_source_documents=False
self.qa = RetrievalQA.from_chain_type(
llm=self.llm,
chain_type="stuff",
retriever=self.retriever,
return_source_documents=False, # Set to False to return only 'result'
chain_type_kwargs=self.chain_type_kwargs,
verbose=False
)
def get_response(self, query: str) -> str:
"""
Processes the user's query and returns the chatbot's response.
Args:
query (str): The user's input question.
Returns:
str: The chatbot's response.
"""
try:
response = self.qa.run(query)
return response # 'response' is now a string containing only the 'result'
except Exception as e:
st.error(f"An error occurred while processing your request: {e}")
return "Sorry, I couldn't process your request at the moment."
Logs of app container:
2024-10-30 16:47:13 2024-10-30 11:17:13.140 Examining the path of torch.classes raised: Tried to instantiate class 'path.path’, but it does not exist! Ensure that it is registered via torch::class
2024-10-30 16:49:55 2024-10-30 11:19:55.974 Examining the path of torch.classes raised: Tried to instantiate class 'path.path’, but it does not exist! Ensure that it is registered via torch::class
2024-10-30 16:50:44 /app/chatbot.py:119: LangChainDeprecationWarning: The method
Chain.run
was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:~invoke
instead.
2024-10-30 16:50:44 response = self.qa.run(query)
I have looked into it many times and modified it based on ollama_url and other factors such as checking ollama service availability, ollama container status, modification of yml file, but none seem to work and I am struck at this error. The entire code is working well though within the development environment without docker (and with ollama as service on host) but I need to deploy it at the earliest on a server to make it available on network.
I have checked ollama container service is working on port 11434 (did checked it via url and also via docker command) and qdrant is also working since the embedding are created and are show via successful message in the APP UI but somehow the connection to ollama is being refused I guess.
Could someone please explain the issue and solution for this problem. Thanks.