I build a RAG solution on local. I use 2 models, downloaded from Ollama
- nomic-embed-text (embedding model)
- llama3.2:3b (llm model)
For testing, I only have one pdf document of around 100 pages, which are chunked
- chunk_size=1000
- chunk_overlap=100
- embedding_dimension=348
The vectors are then stored on local using Chroma
For each query, it takes around 3.5 minutes to complete, which I think is quite slow. Also, the answer is not very accurate. I am looking into
- Making it more accurate - I think I have just done the bare minimum. I am looking into prompt engineering, ranking documents, further document processing, etc.
- Making it faster - I haven't got any idea apart from ramping up the hardware (which I will).
Any other suggestions?