1

I build a RAG solution on local. I use 2 models, downloaded from Ollama

  • nomic-embed-text (embedding model)
  • llama3.2:3b (llm model)

For testing, I only have one pdf document of around 100 pages, which are chunked

  • chunk_size=1000
  • chunk_overlap=100
  • embedding_dimension=348

The vectors are then stored on local using Chroma

For each query, it takes around 3.5 minutes to complete, which I think is quite slow. Also, the answer is not very accurate. I am looking into

  • Making it more accurate - I think I have just done the bare minimum. I am looking into prompt engineering, ranking documents, further document processing, etc.
  • Making it faster - I haven't got any idea apart from ramping up the hardware (which I will).

Any other suggestions?

Duy Bui
  • 261
  • 2
  • 5

0 Answers0