2

I have a small user-item matrix (25k x 1.8k) describing how users liked or disliked some of the items. Users don't have any attributes but items have several features.

I would like to be able to predict, using this dataset, some of the hidden likes and dislikes.

I don't have experience in the field and am not particularly sure, which approaches should I use as baselines and what should be the measures of effectiveness.

Any advice on the following topics is much appreciated:

1). Since I have a small data set, should I use content-based filtering instead of collaborative filtering? If so, what approaches would you try first? What measures of similarity between items can I use?

2). What approach should I use for collaborative filtering? Can I simply factorize the matrix and multiply the factors to get an estimate?

3). Should I use as a test and validation set a subset of rows (and predict the likes of unseen items) or a random subset of likes, different items for different users?

4). What would be a good measure of effectiveness? RMSE?

Thanks in advance!

maksay
  • 136
  • 1
  • 5

0 Answers0