10

I'm looking deeper into collaborative filtering. One really interesting paper is "A Comparative Study of Collaborative Filtering Algorithms" http://arxiv.org/pdf/1205.3193.pdf

In order to select which CF algorithm should be used the paper refers to the density of the dataset. What it doesn't do is explain how you actually calculate the density of your dataset.

So in the context of that above paper can anyone help explain to me how I would calculate the density of a dataset? The paper refers regularly density in the 1-5% range.

djones
  • 203
  • 1
  • 2
  • 4

1 Answers1

7

It's actually defined on the first page:

... sparsity level (ratio of observed to total ratings) ...

In other words, the fraction of the user/item rating matrix that is not empty. Remember that the problem is that most user-item pairs have no rating, and we wish to estimate them.

Example:

Let there be three users and four products. The number of possible ratings is $3\times4 = 12$. If every user rates only one product each (regardless of which product), the density is 3/12 = 25%.

Emre
  • 10,541
  • 1
  • 31
  • 39