/ RECOMMENDER-SYSTEMS

Recommendation Systems Walkthrough - Popularity Recommendations

This post discusses a different approach to recommending movies based on the movie’s popularity.

In this post, we will use the average rate available in the movies database. This approach is for building a more generalized recommendation widget based on the movie’s popularity.

In fact, this widget will be used to recommend movies to all users, not for personalized recommendations. The basic concept is that movies with high popularity will most likely have a high probability of being liked by the average audience.

The approach here is pretty simple we sort the movies based on the pre-calculated average rate which is collected from different users.

For the sake of demonstration, IMDB has a list of top-rated movies which is rated by different users. How is IMDb actually calculating these rates?

The formula for calculating the top rated 250 titles gives a true Bayesian estimate:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where:

  • R = average for the movie (mean) = (Rating)
  • v = number of votes for the movie = (votes)
  • m = minimum votes required to be listed in the Top 250 (currently 25000)
  • C = the mean vote across the whole report (currently 7.0)
def compute_weighted_average_rate(r, v , m, c):
    wr = (v / (v + m) * r) + (m / (v + m) * c)
    return wr

Based on the previous criteria we can use it as a suggestion widget to suggest movies to all users given the IMDB popularity info or users’ average rate that will be recomputed from time to time.

Of course, it is not the best to give a recommendation, but I think it can be used along with other recommender algorithms.

Assume for a moment we have the following table in our movies database which is fetched using this sql query :

SELECT name, rate, COALESCE(popularity, 0) AS popularity 
  FROM movies_app_movie 
 ORDER BY popularity DESC;
Name Rate Popularity
Minions 6.4 547.4882980000001
Wonder Woman 7.2 294.337037
Beauty and the Beast 6.8 287.253654
Baby Driver 7.2 228.032744
Big Hero 6 7.8 213.84990699999997
Deadpool 7.4 187.860492
Guardians of the Galaxy Vol. 2 7.6 185.33099199999998
Avatar 7.2 185.070892
John Wick 7.0 183.870374
Gone Girl 7.9 154.80100900000002
The Hunger Games: Mockingjay - Part 1 6.6 147.098006
War for the Planet of the Apes 6.7 146.161786
Captain America: Civil War 7.1 145.882135
Pulp Fiction 8.3 140.95023600000002
Pirates of the Caribbean: Dead Men Tell No Tales 6.6 133.82782
The Dark Knight 8.3 123.167259
Blade Runner 7.9 96.272374
The Avengers 7.4 89.887648
Captain Underpants: The First Epic Movie 6.5 88.561239
The Circle 5.4 88.439243

Based on the previous table, we deduce that movies are sorted in descending order according to their popularity. We will go through how we can compute such info for every film in our database.

In order to compute the popularity property it is a bit tricky since we may need some of the following information which of course will be stored in a separate table to JOIN on later.

Let’s take a look at a separate table for the ratings of movies and it is required to do some aggregation to calculate the rating count of every movie.

  SELECT movie_id, 
         COUNT(movie_id), 
         AVG(rate) AS rate_avg 
    FROM movies_app_rating 
GROUP BY movie_id 
ORDER BY rate_avg DESC
   LIMIT 20;
Movie ID Name Count Rate Avg
2284 Mr. Magorium’s Wonder Emporium 1 5
4459 Night Without Sleep 1 5
5473 De Dominee 1 5
2636 The Specialist 1 5
36931 On the Edge 1 5
64278 Interceptor Force 2 1 5
183 The Wizard 1 5
845 Strangers on a Train 1 5
26791 Brigham City 1 5
43267 29th Street 1 5
31413 Innocence 1 5
4201 The Fifth Musketeer 1 5
2984 A Countess from Hong Kong 1 5
4140 Blindsight 1 5
6107 Murder in Three Acts 1 5
1563 Sunless 1 5
65216 Bloody Cartoons 1 5
1933 The Others 1 5
8675 Orgazmo 1 5
2897 Around the World in Eighty Days 1 5

There are many ways to determine the popularity of a movie. There is no standard way of computing such a score, we can take the following factors into consideration for example:

  • Number of votes for the day.
  • Number of views for the day.
  • Number of users who marked it as a “favorite” for the day.
  • Number of users who added it to their “watchlist” for the day.
  • Number of comments.
  • Number of rates (Negative Vs. Positive).
  • Number of total votes.
ahmednabil

Ahmed Nabil

Agnostic Software Engineer, Independent thinker with a hunger for challenge and craft mastery

Read More