Ollama with llama3.2-3b - take 3.5 minutes to finish a query

Asked Nov 05 '24 at 16:24

Active Nov 05 '24 at 16:24

Viewed 190 times

I build a RAG solution on local. I use 2 models, downloaded from Ollama

For testing, I only have one pdf document of around 100 pages, which are chunked

The vectors are then stored on local using Chroma

For each query, it takes around 3.5 minutes to complete, which I think is quite slow. Also, the answer is not very accurate. I am looking into

Making it more accurate - I think I have just done the bare minimum. I am looking into prompt engineering, ranking documents, further document processing, etc.
Making it faster - I haven't got any idea apart from ramping up the hardware (which I will).

Any other suggestions?

asked Nov 05 '24 at 16:24

Duy Bui

0 Answers0