Movie and TV Show Recommendation Engine in R (original) (raw)
Last Updated : 23 Jul, 2025
A movie recommendation system, powered by machine learning recommendation engines, can create a personalized viewing experience that keeps viewers satisfied and engaged. Building a top-notch movie recommendation system is crucial because it directly impacts user retention and platform popularity. It's a complex mix of technology and creativity, with techniques ranging from content-based filtering to collaborative filtering in R Programming Language.
Understanding Recommendation Systems
So, let's talk about recommendation systems. These nifty systems use fancy algorithms and machine learning to predict what you might like and suggest stuff you'll be interested in. How do they do it? Well, they go through a bunch of steps like collecting, storing, analyzing, and filtering data to give you personalized recommendations.
Now, there are two main types of recommendation systems.
- **Collaborative Filtering: Collaborative filtering uses data from user-item interactions to create suggestions. This approach splits into two main types: user-based and item-based. User-based filtering spots folks with matching tastes and pushes items these similar users enjoy. It uses tools like Pearson correlation or cosine similarity to figure out how alike users are. Item-based filtering, however, looks at how similar items are to each other. It suggests stuff that's close to what you've already checked out. This method often scales better than its counterpart.
- **Content-Based Filtering: content-based filtering in recommendation systems, It guesses and proposes items that match what a user liked before. This approach banks on item traits - think movie genres or actors. The system looks at these features and suggests new stuff that fits what the user digs. For content-based filtering to work well, you need a ton of details about each item. You also need a full picture of the user - their clicks, ratings, likes, the works
Alright, now let's dive into how movie and TV show recommendation engines work. These engines are all about analyzing your behavior, preferences, and even your demographic info to suggest content that's right up your alley. They use collaborative filtering to see what you and other users have in common, and content-based filtering to focus on the specific attributes of the movies and shows.
Setting Up Your Environment in R
To begin setting up the environment in R for a movie recommendation system, one needs to install several key libraries.
- **recommenderlab: R package for building recommendation systems.
- **ggplot2: R package for data visualization with a grammar of graphics approach.
- **data.table: R package for efficient data manipulation, especially for large datasets.
- **reshape2: R package for data reshaping and aggregation, facilitating analysis and visualization.
- **dplyr: R package For data manipulation using the %>% operator.
Step 1: Data Collection
In order to build our recommendation system, we have used the MovieLens Dataset. You can find the movies.csv and ratings.csv file that we have used in our Recommendation System Project here :
**Dataset Link: Movies Data , Rating Data
This data consists of 105339 ratings applied over 10329 movies.
Step 2: install and load the required libraries
First we will install and load the required libraries.
R `
Load necessary libraries
library(recommenderlab) # For building recommendation systems library(ggplot2) # For data visualization library(data.table) # For efficient data manipulation library(reshape2) # For reshaping data library(dplyr) # For data manipulation using the %>% operator
Load movie and rating data
movie_data <- read.csv("movies.csv", stringsAsFactors = FALSE) rating_data <- read.csv("ratings.csv") head(movie_data) head(rating_data)
`
**Output:
userId movieId rating timestamp
1 1 16 4.0 1217897793
2 1 24 1.5 1217895807
3 1 32 4.0 1217896246
4 1 47 4.0 1217896556
5 1 50 4.0 1217896523
6 1 110 4.0 1217896150
Step 3: Data Preprocessing
Now we will preprocessing the data.
R `
Extract genres from movie_data into a data frame
movie_genre <- as.data.frame(movie_data$genres, stringsAsFactors = FALSE)
Split genres into separate columns using '|' delimiter
movie_genre2 <- as.data.frame(tstrsplit(movie_genre[, 1], '[|]', type.convert = TRUE), stringsAsFactors = FALSE)
Assign column names to the genre matrix
colnames(movie_genre2) <- c(1:10)
Define list of genres
list_genre <- c("Action", "Adventure", "Animation", "Children", "Comedy", "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western")
Initialize genre matrix with zeros
genre_mat <- matrix(0, nrow(movie_data), length(list_genre))
Assign column names to the genre matrix
colnames(genre_mat) <- list_genre
Iterate through each movie and its genres
for (i in 1:nrow(movie_genre2)) { for (j in 1:ncol(movie_genre2)) { # Find the column index for the genre genre_col <- which(colnames(genre_mat) == movie_genre2[i, j]) # Mark the corresponding genre as 1 in the genre matrix genre_mat[i, genre_col] <- 1 } }
Convert genre matrix to data frame and ensure integer type
genre_mat <- as.data.frame(genre_mat, stringsAsFactors = FALSE) genre_mat <- sapply(genre_mat, as.integer)
Print structure of genre matrix
str(genre_mat)
Combine movie_data, movie_id, and genre information into SearchMatrix
SearchMatrix <- cbind(movie_data[, 1:2], genre_mat)
Display the first few rows of SearchMatrix
head(SearchMatrix)
`
**Output:
movieId title Action Adventure Animation Children Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western
1 1 Toy Story (1995) 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0
2 2 Jumanji (1995) 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 3 Grumpier Old Men (1995) 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
4 4 Waiting to Exhale (1995) 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0
5 5 Father of the Bride Part II (1995) 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
6 6 Heat (1995) 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
This step involves preprocessing the genre information from the movie dataset. We split the genre strings into individual genres using tstrsplit. We then create a binary matrix genre_mat where each row represents a movie and each column represents a genre, with 1 indicating the presence of a genre for a movie. This matrix is then combined with the movie data to form the SearchMatrix.
Step 4: Visualizing of the Data
Now we will visualize the data.
R `
Create a histogram of rating distribution using ggplot2
ggplot(rating_data, aes(x = rating)) +
ggtitle("Rating Distribution") + # Add plot title
xlab("Rating") + # Label for x-axis
ylab("Count") # Label for y-axis
`
**Output:

Movie and TV Show Recommendation Engine in R
This plot shows the distribution of ratings in the dataset. It helps us understand the overall rating behavior of users.
Top Rated Movies
Now we will visualize the top Rated Movies.
R `
Calculate average rating and count of ratings for each movieId
top_rated_movies <- rating_data %>% group_by(movieId) %>% summarize(avg_rating = mean(rating), count = n()) %>%
Filter movies with more than 50 ratings
filter(count > 50) %>%
Arrange movies by average rating in descending order
arrange(desc(avg_rating)) %>%
Select top 10 movies by average rating
top_n(10, wt = avg_rating)
Merge top_rated_movies with movie_data to get movie titles
top_rated_movies <- merge(top_rated_movies, movie_data, by = "movieId")
Create a bar plot of top rated movies
ggplot(top_rated_movies, aes(x = reorder(title, avg_rating), y = avg_rating)) +
geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
coord_flip() + # Flip coordinates to make horizontal bar plot
ggtitle("Top 10 Rated Movies") + # Add plot title
xlab("Movie Title") + # Label for x-axis
ylab("Average Rating") # Label for y-axis
`
**Output:

Movie and TV Show Recommendation Engine in R
This plot shows the top 10 movies based on average ratings, considering only movies with more than 50 ratings. It provides insights into the highest-rated movies in the dataset.
Step 5: Create Rating Matrix
Now we will create one Rating Matrix for the Recommendation Engine in R.
R `
ratingMatrix <- dcast(rating_data, userId ~ movieId, value.var = "rating", na.rm = FALSE) ratingMatrix <- as.matrix(ratingMatrix[,-1]) ratingMatrix <- as(ratingMatrix, "realRatingMatrix") ratingMatrix
`
**Output:
668 x 10325 rating matrix of class ‘realRatingMatrix’ with 105339 ratings.
We transform the ratings data into a matrix format suitable for the recommendation engine. Using dcast, we create a user-item matrix where rows represent users and columns represent movies, with the values being the ratings. This matrix is then converted into a realRatingMatrix object, which is the required format for the recommenderlab package.
Step 6: Build Recommendation Engine
Now we will Build our Recommendation Engine in R.
R `
Build Item-Based Collaborative Filtering (IBCF) model using ratingMatrix data
recommen_model <- Recommender(data = ratingMatrix, method = "IBCF", parameter = list(k = 30))
Get model information
model_info <- getModel(recommen_model)
Display heatmap of similarity matrix for the first 20 rows and columns
image(model_info$sim[1:20, 1:20], main = "Heatmap of the first rows and columns")
`
**Output:
HYBRID_realRatingMatrix''ALS_realRatingMatrix''ALS_implicit_realRatingMatrix''IBCF_realRatingMatrix.
$HYBRID_realRatingMatrix
'Hybrid recommender that aggegates several recommendation strategies using weighted averages.'
$ALS_realRatingMatrix
'Recommender for explicit ratings based on latent factors, calculated by alternating least squares algorithm.'
$ALS_implicit_realRatingMatrix
'Recommender for implicit data based on latent factors, calculated by alternating least squares algorithm.'
$IBCF_realRatingMatrix
'Recommender based on item-based collaborative filtering.'
$LIBMF_realRatingMatrix
'Matrix factorization with LIBMF via package recosystem (https://cran.r-project.org/web/
$POPULAR_realRatingMatrix
'Recommender based on item popularity.'
$RANDOM_realRatingMatrix
'Produce random recommendations (real ratings).'
$RERECOMMEND_realRatingMatrix
'Re-recommends highly rated items (real ratings).'
$SVD_realRatingMatrix
'Recommender based on SVD approximation with column-mean imputation.'
$SVDF_realRatingMatrix
'Recommender based on Funk SVD with gradient descend (https://sifter.org/~simon/journal/20061211.html).'
$UBCF_realRatingMatrix
'Recommender based on user-based collaborative filtering.'
We build the recommendation model using the Item-Based Collaborative Filtering (IBCF) method. The k parameter specifies the number of nearest neighbors. We visualize the similarity matrix using a heatmap to understand the similarity between the first 20 movies.
Step 7: Build the IBCF Model
IBCF's Inner Workings:
- **Similarity Measurement: IBCF figures out how alike items are based on user ratings. It uses tools like cosine similarity or Pearson correlation to crunch these numbers.
- Making Picks: For each user, IBCF spots stuff they've given thumbs up to. Then it hunts down other items that match up . These lookalikes end up as suggestions for the user.
- **Pros and Things to Ponder: IBCF scales better than User-Based CF. Its similarity matrix takes up less space and has fewer gaps than the user-item matrix. IBCF tackles the "cold start" issue for new items more . It bases its picks on how alike items are, not just on what users did before. To get the best out of IBCF, you need to tweak things like k (the number of neighbors). This can change how well it works and how good its suggestions are.
- **Fitting into the Big Picture: IBCF doesn't work alone. It's part of a bigger system that cleans data, trains models, checks how good they are, and puts them to use. IBCF is just one tool in the box. Recommendation engines use it along with other methods (like different CF types, content-based filtering, and mix-and-match approaches) to give each user tailored suggestions based on what they do and what items are like. R `
Build IBCF model
recommen_model <- Recommender(data = ratingMatrix, method = "IBCF", parameter = list(k = 30)) recommen_model
Inspect model
model_info <- getModel(recommen_model) class(model_info$sim) dim(model_info$sim)
Heatmap of similarities
top_items <- 20 image(model_info$sim[1:top_items, 1:top_items], main = "Heatmap of the first rows and columns")
`
**Output:
Recommender of type ‘IBCF’ for ‘realRatingMatrix’
learned using 668 users.
'dgCMatrix'
1032510325

Heatmap of first rows and columns
We build the recommendation model using Item-Based Collaborative Filtering (IBCF) with 30 nearest neighbors. We inspect the similarity matrix and visualize the similarities between the first 20 items using a heatmap.
Step 8: Predict Recommendations
Now we will predict Recommendations.
R `
Set seed for reproducibility
set.seed(123)
Sample data for training and testing
sampled_data <- sample(x = c(TRUE, FALSE), size = nrow(ratingMatrix), replace = TRUE, prob = c(0.8, 0.2))
Split ratingMatrix into training and testing data
training_data <- ratingMatrix[sampled_data, ] testing_data <- ratingMatrix[!sampled_data, ]
Define the number of top recommendations to predict
top_recommendations <- 10
Predict recommendations for testing data using the recommen_model
predicted_recommendations <- predict(object = recommen_model, newdata = testing_data, n = top_recommendations)
Extract recommendations for the first user in the testing set
user1_recommendations <- predicted_recommendations@items[[1]] user1_movies <- predicted_recommendations@itemLabels[user1_recommendations]
Retrieve movie titles for the recommended movies
user1_movie_titles <- sapply(user1_movies, function(x) as.character(subset(movie_data, movieId == x)$title))
Print recommended movie titles for the first user
user1_movie_titles
`
**Output:
"Now and Then (1995)"
72
"Kicking and Screaming (1995)"
84
"Last Summer in the Hamptons (1995)"
90
"Journey of August King, The (1995)"
131
"Frankie Starlight (1995)"
271
"Losing Isaiah (1995)"
279
"My Family (1995)"
309 "Red Firecracker, Green Firecracker (Pao Da Shuang Deng) (1994)"
330
"Tales from the Hood (1995)"
352
"Crooklyn (1994)"
We split the data into training and testing sets (80% training, 20% testing). We then predict the top 10 movie recommendations for the users in the testing set. For a specific user (e.g., user 1), we extract the recommended movie IDs and get their titles.
Step 9: Evaluate the Recommendation Engine
Now we will Evaluate our Recommendation Engine.
R `
Create an evaluation scheme with given parameters
scheme <- evaluationScheme(ratingMatrix, method = "split", train = 0.8, given = 15, goodRating = 4)
Define the model using the evaluation scheme
model <- Recommender(getData(scheme,"train"), method = "IBCF", parameter = list(k = 30))
Predict ratings for known data in the evaluation scheme
pred <- predict(model, getData(scheme, "known"), type = "ratings")
Calculate prediction accuracy using unknown data in the scheme
error <- calcPredictionAccuracy(pred, getData(scheme, "unknown"))
Print prediction accuracy error
error
`
**Output:
RMSE 1.49711559147115 MSE 2.24135509422602 MAE 1.14430992655367
We evaluate the recommendation engine using a split method (80% train, 20% test) and calculate prediction accuracy using RMSE and MAE. This helps to measure how close the predicted ratings are to the actual ratings.
Conclusion
In this article, we started by getting the data ready. We created a rating matrix and extracted movie genres. Then, we used the recommenderlab package in R to train a recommendation model called item-based collaborative filtering (IBCF).
Once the model was trained, we checked how well it performed using a testing dataset. We also made predictions for the top recommendations for each user. To understand how items are related in the recommendation system, we analyzed the similarity matrix. Additionally, we visualized the distribution of similarities between items and users' average ratings.