Recommendation Systems Walkthrough - Popularity Recommendations
This post discusses a different approach to recommending movies based on the movie’s popularity.
In this post, we will use the average rate available in the movies database. This approach is for building a more generalized recommendation widget based on the movie’s popularity.
In fact, this widget will be used to recommend movies to all users, not for personalized recommendations. The basic concept is that movies with high popularity will most likely have a high probability of being liked by the average audience.
The approach here is pretty simple we sort the movies based on the pre-calculated average rate which is collected from different users.
For the sake of demonstration, IMDB has a list of top-rated movies which is rated by different users. How is IMDb actually calculating these rates?
The formula for calculating the top rated 250 titles gives a true Bayesian estimate:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where:
- R = average for the movie (mean) = (Rating)
- v = number of votes for the movie = (votes)
- m = minimum votes required to be listed in the Top 250 (currently 25000)
- C = the mean vote across the whole report (currently 7.0)
def compute_weighted_average_rate(r, v , m, c):
wr = (v / (v + m) * r) + (m / (v + m) * c)
return wr
Based on the previous criteria we can use it as a suggestion widget to suggest movies to all users given the IMDB popularity info or users’ average rate that will be recomputed from time to time.
Of course, it is not the best to give a recommendation, but I think it can be used along with other recommender algorithms.
Assume for a moment we have the following table in our movies database which is fetched using this sql query :
SELECT name, rate, COALESCE(popularity, 0) AS popularity
FROM movies_app_movie
ORDER BY popularity DESC;
Name | Rate | Popularity |
---|---|---|
Minions | 6.4 | 547.4882980000001 |
Wonder Woman | 7.2 | 294.337037 |
Beauty and the Beast | 6.8 | 287.253654 |
Baby Driver | 7.2 | 228.032744 |
Big Hero 6 | 7.8 | 213.84990699999997 |
Deadpool | 7.4 | 187.860492 |
Guardians of the Galaxy Vol. 2 | 7.6 | 185.33099199999998 |
Avatar | 7.2 | 185.070892 |
John Wick | 7.0 | 183.870374 |
Gone Girl | 7.9 | 154.80100900000002 |
The Hunger Games: Mockingjay - Part 1 | 6.6 | 147.098006 |
War for the Planet of the Apes | 6.7 | 146.161786 |
Captain America: Civil War | 7.1 | 145.882135 |
Pulp Fiction | 8.3 | 140.95023600000002 |
Pirates of the Caribbean: Dead Men Tell No Tales | 6.6 | 133.82782 |
The Dark Knight | 8.3 | 123.167259 |
Blade Runner | 7.9 | 96.272374 |
The Avengers | 7.4 | 89.887648 |
Captain Underpants: The First Epic Movie | 6.5 | 88.561239 |
The Circle | 5.4 | 88.439243 |
Based on the previous table, we deduce that movies are sorted in descending order according to their popularity. We will go through how we can compute such info for every film in our database.
In order to compute the popularity property it is a bit tricky since we may need some of the following information which of course will be stored in a separate table to JOIN on later.
Let’s take a look at a separate table for the ratings of movies and it is required to do some aggregation to calculate the rating count of every movie.
SELECT movie_id,
COUNT(movie_id),
AVG(rate) AS rate_avg
FROM movies_app_rating
GROUP BY movie_id
ORDER BY rate_avg DESC
LIMIT 20;
Movie ID | Name | Count | Rate Avg |
---|---|---|---|
2284 | Mr. Magorium’s Wonder Emporium | 1 | 5 |
4459 | Night Without Sleep | 1 | 5 |
5473 | De Dominee | 1 | 5 |
2636 | The Specialist | 1 | 5 |
36931 | On the Edge | 1 | 5 |
64278 | Interceptor Force 2 | 1 | 5 |
183 | The Wizard | 1 | 5 |
845 | Strangers on a Train | 1 | 5 |
26791 | Brigham City | 1 | 5 |
43267 | 29th Street | 1 | 5 |
31413 | Innocence | 1 | 5 |
4201 | The Fifth Musketeer | 1 | 5 |
2984 | A Countess from Hong Kong | 1 | 5 |
4140 | Blindsight | 1 | 5 |
6107 | Murder in Three Acts | 1 | 5 |
1563 | Sunless | 1 | 5 |
65216 | Bloody Cartoons | 1 | 5 |
1933 | The Others | 1 | 5 |
8675 | Orgazmo | 1 | 5 |
2897 | Around the World in Eighty Days | 1 | 5 |
There are many ways to determine the popularity of a movie. There is no standard way of computing such a score, we can take the following factors into consideration for example:
- Number of votes for the day.
- Number of views for the day.
- Number of users who marked it as a “favorite” for the day.
- Number of users who added it to their “watchlist” for the day.
- Number of comments.
- Number of rates (Negative Vs. Positive).
- Number of total votes.