6

Normally, I am familiar with precision and recall evaluation metrics but as you know recall@k and precision@k are different things and used in ranking evaluations especially recommendation systems.

I checked many sources, I understood everything I could not understand a point.

One more thing,

Every source is different between each other in terms of calculation ( 1 , 2, 3, 4 )

let's get this example

enter image description here

I'll give you an example so you can explain to me

Let's say we have 5 users. We are trying to give location recommendations for the next visit to each user. We are analyzing users' historical check-in data and we are giving recommendation for the next visit.

User 1 is visiting: Museum1, Park1, Night Club, "?" (What is next)

We are trying to find the next visiting locations. Let's say our ground truth "Restaurant"

How can I calculate precision@5 and recall@5?

Extra: This youtube video is explaining very good (go to 51:45 on video)

What is 5-6 relevant item means? If we are giving recommendation it should be just 1 item that is gonna be relevant for the user. They are trying to make a movie recommendation but they are saying we have 5 relevant movies. What is that mean?

drorhun
  • 161
  • 1
  • 1
  • 6

2 Answers2

8

A quick answer.

Note: Checking the references I could access fully, there are no discrepancies between the definitions as long as the terms are translated correctly.

Some definitions:

Relevant items: Are items the user(s) have themselves established as relevant with their actions.

Recommended items: Are items which the recomendation system predicts will be relevant to the user(s).

Concerning your example:

Let’s understand the definitions of recall@k and precision@k, assume we are providing $5$ recommendations in this order — 1 0 1 0 1, where 1 represents relevant and 0 irrelevant. So the precision@k at different values of k will be precision@3 is $2/3$, precision@4 is $2/4$, and precision@5 is $3/5$. The recall@k would be, recall@3 is $2/3$, recall@4 is $2/3$, and recall@5 is $3/3$.

Reference: Precision and recall at k for recommender systems

Nikos M.
  • 2,493
  • 1
  • 7
  • 11
0

I think the above answer is wrong, or at least fails to meaningfully answer the question (as I understand it), in an industry setting $k$ is determined by the number of items you are showing to the user. In your case do you show to the user all these options?

Museum1, Park1, Night Club

If yes, then $k=3$ since they saw all $3$. Your recommendation engine is very good if one of the $3$ is what the user really did.

If you show $5$ then your recommendation engine is good is one of the $5$ is what the user really did.

If you show $1000$ then your recommendation engine is good if one of the $1000$ is what the user really did.

Now you see where this is going right? You cannot show so many things to the user, it makes the metric meaningless and also easier for the recommendation engine to do well on the metric.

In your case, I could conceivably think that your $k$ is actually equal to $1$! Why? Because it sounds like if you are thinking of how to use the recommendation engine to meet the person physically (hopefully not to harm him :P). In this case you can only be in one place at one time. So you would probably place yourself wherever the recommendation engine gave the highest predicted value. If it is not identically that highest value, your recommender system is wrong.

I deal with issues like this on my blog for evaluating recommender systems. https://franciscormendes.github.io/2024/11/08/consulting-ab-testing/