I have a small user-item matrix (25k x 1.8k) describing how users liked or disliked some of the items. Users don't have any attributes but items have several features.
I would like to be able to predict, using this dataset, some of the hidden likes and dislikes.
I don't have experience in the field and am not particularly sure, which approaches should I use as baselines and what should be the measures of effectiveness.
Any advice on the following topics is much appreciated:
1). Since I have a small data set, should I use content-based filtering instead of collaborative filtering? If so, what approaches would you try first? What measures of similarity between items can I use?
2). What approach should I use for collaborative filtering? Can I simply factorize the matrix and multiply the factors to get an estimate?
3). Should I use as a test and validation set a subset of rows (and predict the likes of unseen items) or a random subset of likes, different items for different users?
4). What would be a good measure of effectiveness? RMSE?
Thanks in advance!