9

I would like to create a content recommendation system based on binary click data that also takes views into account.

What content a user has been exposed to, and therefore has the chance to click on, is currently biased by a rule based system that is not always documented. I do have view data (if a user saw the content on their screen, regardless of whether it was clicked.), and am wondering how to take this into account with a traditional matrix factorization recommendation system such as this item-item approach, or if there are other other better options.

Any suggestions for implementation in Python are a bonus!

elz
  • 43
  • 8

3 Answers3

2

The simplest answer with the cleanest solution may be one that combines your binary click data and view data into another model feature that you can optimize against. What this looks like depends on your knowledge of your end-to-end system.

For instance, is it "good" that a user clicks on something with a high amount of views, or better if a user clicks on something with a low amount of views? You might have very different formulas depending on this question alone:

normalization_function((1/views) * mean(clicks)) vs perhaps normalization_function((view) * mean(clicks))

Be sure to check the assumptions of your matrix factorization implementation, but this new features may just be able to be swapped with your click data.

As a side note, I don't know exactly what you are implementing, but clicks and views generally represent different things -- as a rough guide (clicks = engagement, view = eyeballs), so it might not mean much to combine these.

ngopal
  • 81
  • 5
1

I would not model the view data. You stated that is based on another rule-based system. If you try to model view data, you would be learning that rule-based system (not the preferences of the users). For example, if two users are both likely to view the item would just tell you about the existing system.

I suggest using just the click data. Given that a user viewed and clicked on an item, what other items are likely to be clicked on by a user.

Python has an Implicit package that implements several different popular recommendation algorithms for this type of data.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113
1

Item-Item collaborative filtering can be applied to the unary data. This resource is good for learning item-item collaborative filtering on unary data.

In your case, you just have positives: clicks. From here, you can proceed in two ways:

  1. Binary Classification: For binary classification, you need to define "negatives". Usually implicit feedback or unary data does not have true negatives. So, in order to define your negatives, you can do a couple of things:

    • Negative Sampling: For each positive, you can sample a negative randomly
    • A view and no click as a negative: If the content was shown to the user and the user chose to not click on it counts as a negative. But, it has a selection bias of your rule-based system, which is already in place.
  2. Learning-to-rank

    Learning to rank based approaches such as BPR-MF perform well on unary data. This library is well documented for BPR-MF and works just with unary data.

  3. Learning from Multi-Channel Feedback

    If you want to learn from both views and clicks, this work comes to mind.

learner
  • 359
  • 2
  • 11