/ RECOMMENDER-SYSTEMS

Recommendation Systems Walkthrough - Popularity Recommendations

This post discusses a different approach to recommending movies based on the movie’s popularity.

Unlocking Recommendations


The Foundational Power of Popularity-Based Systems

In this post, we will use the average rate available in the movies database. This approach is for building a more generalized recommendation widget based on the movie’s popularity.

This document outlines the concept and implementation of popularity-based recommendation systems, a foundational approach in movie recommendation engines. These systems prioritize items based on their broad appeal, making them essential for generalized recommendations and addressing the cold start problem.

Core Concept: Why Popularity Matters


Popularity-based recommendations suggest items based on their broad appeal to the general audience, rather than personalized user preferences. The underlying principle is that items with higher popularity are more likely to be enjoyed by a larger user base.

This method is crucial for creating generalized recommendation widgets (e.g., “Top 10 Movies This Week”) and for addressing the cold start problem (recommending to new users or new items with limited interaction data).

Mechanism & Challenges: Beyond Simple Averages


A simple approach involves sorting items by a pre-calculated aggregate metric, such as an average rating. However, a simple arithmetic average can be misleading. For instance, a movie with one 5-star rating is not as statistically reliable as a movie with 10,000 ratings averaging 4.0 stars. This highlights the need for more sophisticated weighting.

The approach here is pretty simple we sort the movies based on the pre-calculated average rate which is collected from different users.

IMDb’s Weighted Rating Formula (Bayesian Estimate)


IMDb uses a robust statistical approach for its “Top 250” lists, employing a Bayesian estimate for its weighted rating to stabilize ratings for items with fewer votes and prevent outliers from disproportionately influencing perceived popularity.

For the sake of demonstration, IMDB has a list of top-rated movies which is rated by different users. How is IMDb actually calculating these rates?

The formula for calculating the top rated 250 titles gives a true Bayesian estimate:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C where:

  • R = average for the movie (mean) = (Rating)
  • v = number of votes for the movie = (votes)
  • m = minimum votes required to be listed in the Top 250 (currently 25000)
  • C = the mean vote across the whole report (currently 7.0)

Intuition: This formula blends a movie’s individual average rating (R) with the global average (C). When v (votes) is small, WR is pulled closer to C. As v increases and surpasses m (minimum votes), WR approaches R.

Python Implementation Example


def compute_weighted_average_rate(r, v, m, c):
    """
    Calculates the IMDb-style weighted rating for an item.

    Args:
        r (float): The average rating for the item.
        v (int): The number of votes for the item.
        m (int): The minimum votes required for consideration (regularization constant).
        c (float): The mean vote across the entire dataset (prior mean).

    Returns:
        float: The calculated weighted rating.
    """
    if v == 0: # Handle cases with no votes to avoid division by zero or NaN
        return c # Default to global average if no votes
    wr = (v / (v + m) * r) + (m / (v + m) * c)
    return wr

This criterion forms a solid basis for a generalized recommendation widget, useful when individual user profiles are unavailable or insufficient. Scores derived from this formula are typically recomputed periodically to reflect changing preferences and new data.

Data Representation & SQL Queries


Based on the previous criteria we can use it as a suggestion widget to suggest movies to all users given the IMDB popularity info or users’ average rate that will be recomputed from time to time.

Of course, it is not the best to give a recommendation, but I think it can be used along with other recommender algorithms.

A movies_app_movie table might store movie names, rates, and a pre-calculated popularity score. Assume for a moment we have the following table in our movies database which is fetched using this sql query:

SELECT name, rate, COALESCE(popularity, 0) AS popularity 
  FROM movies_app_movie 
 ORDER BY popularity DESC;
Name Rate Popularity
Minions 6.4 547.4882980000001
Wonder Woman 7.2 294.337037
Beauty and the Beast 6.8 287.253654
Baby Driver 7.2 228.032744
Big Hero 6 7.8 213.84990699999997
Deadpool 7.4 187.860492
Guardians of the Galaxy Vol. 2 7.6 185.33099199999998
Avatar 7.2 185.070892
John Wick 7.0 183.870374
Gone Girl 7.9 154.80100900000002
The Hunger Games: Mockingjay - Part 1 6.6 147.098006
War for the Planet of the Apes 6.7 146.161786
Captain America: Civil War 7.1 145.882135
Pulp Fiction 8.3 140.95023600000002
Pirates of the Caribbean: Dead Men Tell No Tales 6.6 133.82782
The Dark Knight 8.3 123.167259
Blade Runner 7.9 96.272374
The Avengers 7.4 89.887648
Captain Underpants: The First Epic Movie 6.5 88.561239
The Circle 5.4 88.439243

COALESCE(popularity, 0) ensures a default value of 0 if popularity is NULL. ORDER BY popularity DESC sorts from highest to lowest popularity. A movies_app_rating table stores individual movie ratings. Here’s how to aggregate ratings per movie:

Based on the previous table, we deduce that movies are sorted in descending order according to their popularity. We will go through how we can compute such info for every film in our database.

In order to compute the popularity property it is a bit tricky since we may need some of the following information which of course will be stored in a separate table to JOIN on later.

Let’s take a look at a separate table for the ratings of movies and it is required to do some aggregation to calculate the rating count of every movie.

  SELECT movie_id, 
         COUNT(movie_id), 
         AVG(rate) AS rate_avg 
    FROM movies_app_rating 
GROUP BY movie_id 
ORDER BY rate_avg DESC
   LIMIT 20;
Movie ID Name Count Rate Avg
2284 Mr. Magorium’s Wonder Emporium 1 5
4459 Night Without Sleep 1 5
5473 De Dominee 1 5
2636 The Specialist 1 5
36931 On the Edge 1 5
64278 Interceptor Force 2 1 5
183 The Wizard 1 5
845 Strangers on a Train 1 5
26791 Brigham City 1 5
43267 29th Street 1 5
31413 Innocence 1 5
4201 The Fifth Musketeer 1 5
2984 A Countess from Hong Kong 1 5
4140 Blindsight 1 5
6107 Murder in Three Acts 1 5
1563 Sunless 1 5
65216 Bloody Cartoons 1 5
1933 The Others 1 5
8675 Orgazmo 1 5
2897 Around the World in Eighty Days 1 5

COUNT(movie_id) calculates v (vote count). AVG(rate) calculates R (average rating). This aggregation highlights the issue of movies with high average > ratings but very few votes (e.g., 5.0 with 1 vote), reinforcing the need for weighted formulas.

Factors for Computing Popularity Scores


A robust popularity score often uses a weighted sum of dynamic and static factors, providing a comprehensive view of an item’s current and long-term appeal. The specific blend and weights are often proprietary and fine-tuned for the platform.

There are many ways to determine the popularity of a movie. There is no standard way of computing such a score, we can take the following factors into consideration for example:

  • Number of votes for the day.
  • Number of views for the day.
  • Number of users who marked it as a “favorite” for the day.
  • Number of users who added it to their “watchlist” for the day.
  • Number of comments.
  • Number of rates (Negative Vs. Positive).
  • Number of total votes.

The Strategic Role of Popularity in Recommender Systems Architecture


  1. Baseline and Fallback: They act as a robust baseline model and an excellent fallback mechanism for the cold-start problem for new users.
  2. Addressing Cold-Start for Items (Partially): The IMDb weighted rating formula partially mitigates the cold-start problem for new items with few ratings by pulling them towards the global average.
  3. Limitations and Trade-offs: Popularity-based systems inherently lack personalization, which can lead to a “filter bubble” and conflict with “beyond accuracy” metrics like diversity and novelty.
  4. Component in Hybrid Systems: Popularity recommendations can be used alongside other algorithms in hybrid recommendation systems, serving as a candidate generator or a feature.

Conclusion


The “Popularity Recommendations” method, though simple, embodies the data science principle of extracting meaningful signals from noisy data, especially with sparsity. The Bayesian estimation in the IMDb formula transforms raw averages into statistically sound popularity scores. While modern systems focus on personalization through techniques like collaborative filtering and deep learning, popularity-based approaches remain vital for providing strong baselines, effective cold-start solutions for new users, and crucial components in sophisticated hybrid architectures. Its enduring appeal stems from its ease of implementation and its robust, generalized insight into collective preference.

ahmednabil

Ahmed Nabil

Agnostic Software Engineer, Independent thinker with a hunger for challenge and craft mastery

Read More