Maurizio Ferrari Dacrema | Politecnico di Milano (original) (raw)
Uploads
Articles by Maurizio Ferrari Dacrema
Proceedings of the 13th ACM Conference on Recommender Systems (RecSys 2019), 2019
Deep learning techniques have become the method of choice for researchers working on algorithmic ... more Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difcult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of algo-rithmic proposals for top-n recommendation tasks. Specifcally, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable efort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and nearest-neighbor techniques. Generally, questions r egarding the calls for improved scientifc practices in this area. true progress that is achieved in such applied machine learning settings are not new, nor tied to research based on deep learning.
In 27th Conference on User Modeling, Adaptation and Personalization (UMAP ’19), 2019
This paper focuses on recommender systems based on item-item collaborative filtering (CF). Althou... more This paper focuses on recommender systems based on item-item collaborative filtering (CF). Although research on item-based methods is not new, current literature does not provide any reliable insight on how to estimate confidence of recommendations. The goal of this paper is to fill this gap, by investigating the conditions under which item-based recommendations will succeed or fail for a specific user. We formalize the item-based CF problem as an eigenvalue problem , where estimated ratings are equivalent to the true (unknown) ratings multiplied by a user-specific eigenvalue of the similarity matrix. We show that the magnitude of the eigenvalue related to a user is proportional to the accuracy of recommendations for that user. We define a confidence parameter called the eigenvalue confidence index, analogous to the eigenvalue of the similarity matrix, but simpler to be computed. We also show how to extend the eigen-value confidence index to matrix-factorization algorithms. A comprehensive set of experiments on five datasets show that the eigenvalue confidence index is effective in predicting, for each user, the quality of recommendations. On average, our confidence index is 3 times more correlated with MAP with respect to previous confidence estimates.
Proceedings of KaRS 2018 Workshop on Knowledge-aware and Conversational Recommender Systems (KaRS @RecSys 2018), 2018
An Item based recommender system works by computing a similarity between items, which can exploit... more An Item based recommender system works by computing a similarity between items, which can exploit past user interactions (collaborative filtering) or item features (content based filtering). Collaborative algorithms have been proven to achieve better recommendation quality then content based algorithms in a variety of scenarios, being more effective in modeling user behaviour. However, they can not be applied when items have no interactions at all, ie cold start items. Content based algorithms, which are applicable to cold start items, often require a lot of feature engineering in order to generate useful recommendations. This issue is specifically relevant as the content descriptors become large and heterogeneous. The focus of this paper is on how to use a collaborative models domain-specific knowledge to build a wrapper feature weighting method which embeds collaborative knowledge in a content based algorithm. We present a comparative study for different state of the art algorithms and present a more general model. This machine learning approach to feature weighting shows promising results and high flexibility.
User Modeling and User-Adapted Interaction (UMUAI) - The Journal of Personalization Research, 2019
As of today, most movie recommendation services base their recommendations on collaborative filte... more As of today, most movie recommendation services base their recommendations on collaborative filtering (CF) and/or content-based filtering (CBF) models that use metadata (e.g., genre or cast). In most video-on-demand and streaming services, however, new movies and TV series are continuously added. CF models are unable to make predictions in such a scenario, since the newly added videos lack interactions – a problem technically known as new item cold start (CS). Currently, the most common approach to this problem is to switch to a purely CBF method, usually by exploiting textual metadata. This approach is known to have lower accuracy than CF because it ignores useful collaborative information and relies on human-generated textual metadata, which are expensive to collect and often prone to errors. User-generated content, such as tags, can also be rare or absent in CS situations. In this paper, we introduce a new movie recommender system that addresses the new item problem in the movie domain by (i) integrating state-of-the-art audio and visual descriptors, which can be automatically extracted from video content and constitute what we call the movie genome; (ii) exploiting an effective data fusion method named canonical correlation analysis (CCA), which was successfully tested in our previous works, to better exploit complementary information between different modalities; (iii) proposing a two-step hybrid approach which trains a CF model on warm items (items with interactions) and leverages the learned model on the movie genome to recommend cold items (items without interactions). Experimental validation is carried out using a system-centric study on a large-scale, real-world movie recommendation dataset both in an absolute cold start and in a cold to warm transition; and a user-centric online experiment measuring different subjective aspects, such as satisfaction and diversity. Results show the benefits of this approach compared to existing approaches.
Workshops by Maurizio Ferrari Dacrema
IntRS workshop held in conjunction with the 13th ACM Conference on Recommender Systems (RecSys), 2019
In order to improve the accuracy of recommendations, many rec-ommender systems nowadays use side ... more In order to improve the accuracy of recommendations, many rec-ommender systems nowadays use side information beyond the user rating matrix, such as item content. These systems build user profiles as estimates of users' interest on content (e.g., movie genre, director or cast) and then evaluate the performance of the rec-ommender system as a whole e.g., by their ability to recommend relevant and novel items to the target user. The user profile modelling stage, which is a key stage in content-driven RS is barely properly evaluated due to the lack of publicly available datasets that contain user preferences on content features of items. To raise awareness of this fact, we investigate differences between explicit user preferences and implicit user profiles. We create a dataset of explicit preferences towards content features of movies, which we release publicly. We then compare the collected explicit user feature preferences and implicit user profiles built via state-of-the-art user profiling models. Our results show a maximum average pairwise cosine similarity of 58.07% between the explicit feature preferences and the implicit user profiles modelled by the best investigated profiling method and considering movies' genres only. For actors and directors, this maximum similarity is only 9.13% and 17.24%, respectively. This low similarity between explicit and implicit preference models encourages a more in-depth study to investigate and improve this important user profile modelling step, which will eventually translate into better recommendations.
Papers by Maurizio Ferrari Dacrema
Lecture notes in computer science, 2024
arXiv (Cornell University), Aug 2, 2023
In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for s... more In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for solving optimization problems on quantum computers in the NISQ era. However, one limitation of VQAs is their reliance on fixed-structure circuits, which may not be taylored for specific problems or hardware configurations. A leading strategy to address this issue are Adaptative VQAs, which dynamically modify the circuit structure by adding and removing gates, and optimize their parameters during the training. Several Adaptative VQAs, based on heuristics such as circuit shallowness, entanglement capability and hardware compatibility, have already been proposed in the literature, but there is still lack of a systematic comparison between the different methods. In this paper, we aim to fill this gap by analyzing three Adaptative VQAs: Evolutionary Variational Quantum Eigensolver (EVQE), Variable Ansatz (VAns), already proposed in the literature, and Random Adapt-VQE (RA-VQE), a random approach we introduce as a baseline. In order to compare these algorithms to traditional VQAs, we also include the Quantum Approximate Optimization Algorithm (QAOA) in our analysis. We apply these algorithms to QUBO problems and study their performance by examining the quality of the solutions found and the computational times required. Additionally, we investigate how the choice of the hyperparameters can impact the overall performance of the algorithms, highlighting the importance of selecting an appropriate methodology for hyperparameter tuning. Our analysis sets benchmarks for Adaptative VQAs designed for near-term quantum devices and provides valuable insights to guide future research in this area.
Lecture notes in computer science, 2024
Lecture Notes in Computer Science
Sixteenth ACM Conference on Recommender Systems, Sep 18, 2022
After decades of being mainly confined to theoretical research, Quantum Computing is now becoming... more After decades of being mainly confined to theoretical research, Quantum Computing is now becoming a useful tool for solving realistic problems. This work aims to experimentally explore the feasibility of using currently available quantum computers, based on the Quantum Annealing paradigm, to build a recommender system exploiting community detection. Community detection, by partitioning users and items into densely connected clusters, can boost the accuracy of non-personalized recommendation by assuming that users within each community share similar tastes. However, community detection is a computationally expensive process. The recent availability of Quantum Annealers as cloud-based devices, constitutes a new and promising direction to explore community detection, although effectively leveraging this new technology is a long-term path that still requires advancements in both hardware and algorithms. This work aims to begin this path by assessing the quality of community detection formulated as a Quadratic Unconstrained Binary Optimization problem on a real recommendation scenario. Results on several datasets show that the quantum solver is able to detect communities of comparable quality with respect to classical solvers, but with better speedup, and the non-personalized recommendation models built on top of these communities exhibit improved recommendation quality. The takeaway is that quantum computing, although in its early stages of maturity and applicability, shows promise in its ability to support new recommendation models and to bring improved scalability as technology evolves.
In recent years, algorithm research in the area of recommender systems has shifted from matrix fa... more In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research. CCS CONCEPTS • Information systems → Recommender systems; • Computing methodologies → Neural networks.
arXiv (Cornell University), Nov 5, 2022
The development of continuously improved machine learning algorithms for personalized item rankin... more The development of continuously improved machine learning algorithms for personalized item ranking lies at the core of today's research in the area of recommender systems. Over the years, the research community has developed widelyagreed best practices for comparing algorithms and demonstrating progress with offline experiments. Unfortunately, we find this accepted research practice can easily lead to phantom progress due to the following reasons: limited reproducibility, comparison with complex but weak and nonoptimized baseline algorithms, over-generalization from a small set of experimental configurations. To assess the extent of such problems, we analyzed 18 research papers published recently at top-ranked conferences. Only 7 were reproducible with reasonable effort, and 6 of them could often be outperformed by relatively simple heuristic methods, e.g., nearest neighbors. In this paper, we discuss these observations in detail, and reflect on the related fundamental problem of over-reliance on offline experiments in recommender systems research. * This work is an extended abstract based on the publication "Are we really making much progress? A worrying analysis of recent neural recommendation approaches" which received the Best Long Paper Award at the ACM Conference on Recommender Systems (RecSys) 2019 [Ferrari Dacrema et al., 2019b].
Entropy, Jul 28, 2021
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
This paper presents the solution designed by the team "Boston Team Party" for the ACM RecSys Chal... more This paper presents the solution designed by the team "Boston Team Party" for the ACM RecSys Challenge 2022. The competition was organized by Dressipi and was framed under the session-based fashion recommendations domain. Particularly, the task was to predict the purchased item at the end of each anonymous session. Our proposed two-stage solution is effective, lightweight, and scalable. First, it leverages the expertise of several strong recommendation models to produce a pool of candidate items. Then, a Gradient-Boosting Decision Tree model aggregates these candidates alongside several hand-crafted features to produce the final ranking. Our model achieved a score of 0.18800 in the public leaderboard. To aid in the reproducibility of our findings, we open-source our materials. CCS CONCEPTS • Information systems → Learning to rank; • Theory of computation → Boosting.
2022 IEEE International Conference on Quantum Computing and Engineering (QCE), Sep 1, 2022
Proceedings of the 13th ACM Conference on Recommender Systems (RecSys 2019), 2019
Deep learning techniques have become the method of choice for researchers working on algorithmic ... more Deep learning techniques have become the method of choice for researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in general, it has, as a result, become difcult to keep track of what represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications point out problems in today's research practice in applied machine learning, e.g., in terms of the reproducibility of the results or the choice of the baselines when proposing new models. In this work, we report the results of a systematic analysis of algo-rithmic proposals for top-n recommendation tasks. Specifcally, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable efort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today's machine learning scholarship and nearest-neighbor techniques. Generally, questions r egarding the calls for improved scientifc practices in this area. true progress that is achieved in such applied machine learning settings are not new, nor tied to research based on deep learning.
In 27th Conference on User Modeling, Adaptation and Personalization (UMAP ’19), 2019
This paper focuses on recommender systems based on item-item collaborative filtering (CF). Althou... more This paper focuses on recommender systems based on item-item collaborative filtering (CF). Although research on item-based methods is not new, current literature does not provide any reliable insight on how to estimate confidence of recommendations. The goal of this paper is to fill this gap, by investigating the conditions under which item-based recommendations will succeed or fail for a specific user. We formalize the item-based CF problem as an eigenvalue problem , where estimated ratings are equivalent to the true (unknown) ratings multiplied by a user-specific eigenvalue of the similarity matrix. We show that the magnitude of the eigenvalue related to a user is proportional to the accuracy of recommendations for that user. We define a confidence parameter called the eigenvalue confidence index, analogous to the eigenvalue of the similarity matrix, but simpler to be computed. We also show how to extend the eigen-value confidence index to matrix-factorization algorithms. A comprehensive set of experiments on five datasets show that the eigenvalue confidence index is effective in predicting, for each user, the quality of recommendations. On average, our confidence index is 3 times more correlated with MAP with respect to previous confidence estimates.
Proceedings of KaRS 2018 Workshop on Knowledge-aware and Conversational Recommender Systems (KaRS @RecSys 2018), 2018
An Item based recommender system works by computing a similarity between items, which can exploit... more An Item based recommender system works by computing a similarity between items, which can exploit past user interactions (collaborative filtering) or item features (content based filtering). Collaborative algorithms have been proven to achieve better recommendation quality then content based algorithms in a variety of scenarios, being more effective in modeling user behaviour. However, they can not be applied when items have no interactions at all, ie cold start items. Content based algorithms, which are applicable to cold start items, often require a lot of feature engineering in order to generate useful recommendations. This issue is specifically relevant as the content descriptors become large and heterogeneous. The focus of this paper is on how to use a collaborative models domain-specific knowledge to build a wrapper feature weighting method which embeds collaborative knowledge in a content based algorithm. We present a comparative study for different state of the art algorithms and present a more general model. This machine learning approach to feature weighting shows promising results and high flexibility.
User Modeling and User-Adapted Interaction (UMUAI) - The Journal of Personalization Research, 2019
As of today, most movie recommendation services base their recommendations on collaborative filte... more As of today, most movie recommendation services base their recommendations on collaborative filtering (CF) and/or content-based filtering (CBF) models that use metadata (e.g., genre or cast). In most video-on-demand and streaming services, however, new movies and TV series are continuously added. CF models are unable to make predictions in such a scenario, since the newly added videos lack interactions – a problem technically known as new item cold start (CS). Currently, the most common approach to this problem is to switch to a purely CBF method, usually by exploiting textual metadata. This approach is known to have lower accuracy than CF because it ignores useful collaborative information and relies on human-generated textual metadata, which are expensive to collect and often prone to errors. User-generated content, such as tags, can also be rare or absent in CS situations. In this paper, we introduce a new movie recommender system that addresses the new item problem in the movie domain by (i) integrating state-of-the-art audio and visual descriptors, which can be automatically extracted from video content and constitute what we call the movie genome; (ii) exploiting an effective data fusion method named canonical correlation analysis (CCA), which was successfully tested in our previous works, to better exploit complementary information between different modalities; (iii) proposing a two-step hybrid approach which trains a CF model on warm items (items with interactions) and leverages the learned model on the movie genome to recommend cold items (items without interactions). Experimental validation is carried out using a system-centric study on a large-scale, real-world movie recommendation dataset both in an absolute cold start and in a cold to warm transition; and a user-centric online experiment measuring different subjective aspects, such as satisfaction and diversity. Results show the benefits of this approach compared to existing approaches.
IntRS workshop held in conjunction with the 13th ACM Conference on Recommender Systems (RecSys), 2019
In order to improve the accuracy of recommendations, many rec-ommender systems nowadays use side ... more In order to improve the accuracy of recommendations, many rec-ommender systems nowadays use side information beyond the user rating matrix, such as item content. These systems build user profiles as estimates of users' interest on content (e.g., movie genre, director or cast) and then evaluate the performance of the rec-ommender system as a whole e.g., by their ability to recommend relevant and novel items to the target user. The user profile modelling stage, which is a key stage in content-driven RS is barely properly evaluated due to the lack of publicly available datasets that contain user preferences on content features of items. To raise awareness of this fact, we investigate differences between explicit user preferences and implicit user profiles. We create a dataset of explicit preferences towards content features of movies, which we release publicly. We then compare the collected explicit user feature preferences and implicit user profiles built via state-of-the-art user profiling models. Our results show a maximum average pairwise cosine similarity of 58.07% between the explicit feature preferences and the implicit user profiles modelled by the best investigated profiling method and considering movies' genres only. For actors and directors, this maximum similarity is only 9.13% and 17.24%, respectively. This low similarity between explicit and implicit preference models encourages a more in-depth study to investigate and improve this important user profile modelling step, which will eventually translate into better recommendations.
Lecture notes in computer science, 2024
arXiv (Cornell University), Aug 2, 2023
In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for s... more In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for solving optimization problems on quantum computers in the NISQ era. However, one limitation of VQAs is their reliance on fixed-structure circuits, which may not be taylored for specific problems or hardware configurations. A leading strategy to address this issue are Adaptative VQAs, which dynamically modify the circuit structure by adding and removing gates, and optimize their parameters during the training. Several Adaptative VQAs, based on heuristics such as circuit shallowness, entanglement capability and hardware compatibility, have already been proposed in the literature, but there is still lack of a systematic comparison between the different methods. In this paper, we aim to fill this gap by analyzing three Adaptative VQAs: Evolutionary Variational Quantum Eigensolver (EVQE), Variable Ansatz (VAns), already proposed in the literature, and Random Adapt-VQE (RA-VQE), a random approach we introduce as a baseline. In order to compare these algorithms to traditional VQAs, we also include the Quantum Approximate Optimization Algorithm (QAOA) in our analysis. We apply these algorithms to QUBO problems and study their performance by examining the quality of the solutions found and the computational times required. Additionally, we investigate how the choice of the hyperparameters can impact the overall performance of the algorithms, highlighting the importance of selecting an appropriate methodology for hyperparameter tuning. Our analysis sets benchmarks for Adaptative VQAs designed for near-term quantum devices and provides valuable insights to guide future research in this area.
Lecture notes in computer science, 2024
Lecture Notes in Computer Science
Sixteenth ACM Conference on Recommender Systems, Sep 18, 2022
After decades of being mainly confined to theoretical research, Quantum Computing is now becoming... more After decades of being mainly confined to theoretical research, Quantum Computing is now becoming a useful tool for solving realistic problems. This work aims to experimentally explore the feasibility of using currently available quantum computers, based on the Quantum Annealing paradigm, to build a recommender system exploiting community detection. Community detection, by partitioning users and items into densely connected clusters, can boost the accuracy of non-personalized recommendation by assuming that users within each community share similar tastes. However, community detection is a computationally expensive process. The recent availability of Quantum Annealers as cloud-based devices, constitutes a new and promising direction to explore community detection, although effectively leveraging this new technology is a long-term path that still requires advancements in both hardware and algorithms. This work aims to begin this path by assessing the quality of community detection formulated as a Quadratic Unconstrained Binary Optimization problem on a real recommendation scenario. Results on several datasets show that the quantum solver is able to detect communities of comparable quality with respect to classical solvers, but with better speedup, and the non-personalized recommendation models built on top of these communities exhibit improved recommendation quality. The takeaway is that quantum computing, although in its early stages of maturity and applicability, shows promise in its ability to support new recommendation models and to bring improved scalability as technology evolves.
In recent years, algorithm research in the area of recommender systems has shifted from matrix fa... more In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research. CCS CONCEPTS • Information systems → Recommender systems; • Computing methodologies → Neural networks.
arXiv (Cornell University), Nov 5, 2022
The development of continuously improved machine learning algorithms for personalized item rankin... more The development of continuously improved machine learning algorithms for personalized item ranking lies at the core of today's research in the area of recommender systems. Over the years, the research community has developed widelyagreed best practices for comparing algorithms and demonstrating progress with offline experiments. Unfortunately, we find this accepted research practice can easily lead to phantom progress due to the following reasons: limited reproducibility, comparison with complex but weak and nonoptimized baseline algorithms, over-generalization from a small set of experimental configurations. To assess the extent of such problems, we analyzed 18 research papers published recently at top-ranked conferences. Only 7 were reproducible with reasonable effort, and 6 of them could often be outperformed by relatively simple heuristic methods, e.g., nearest neighbors. In this paper, we discuss these observations in detail, and reflect on the related fundamental problem of over-reliance on offline experiments in recommender systems research. * This work is an extended abstract based on the publication "Are we really making much progress? A worrying analysis of recent neural recommendation approaches" which received the Best Long Paper Award at the ACM Conference on Recommender Systems (RecSys) 2019 [Ferrari Dacrema et al., 2019b].
Entropy, Jul 28, 2021
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
This paper presents the solution designed by the team "Boston Team Party" for the ACM RecSys Chal... more This paper presents the solution designed by the team "Boston Team Party" for the ACM RecSys Challenge 2022. The competition was organized by Dressipi and was framed under the session-based fashion recommendations domain. Particularly, the task was to predict the purchased item at the end of each anonymous session. Our proposed two-stage solution is effective, lightweight, and scalable. First, it leverages the expertise of several strong recommendation models to produce a pool of candidate items. Then, a Gradient-Boosting Decision Tree model aggregates these candidates alongside several hand-crafted features to produce the final ranking. Our model achieved a score of 0.18800 in the public leaderboard. To aid in the reproducibility of our findings, we open-source our materials. CCS CONCEPTS • Information systems → Learning to rank; • Theory of computation → Boosting.
2022 IEEE International Conference on Quantum Computing and Engineering (QCE), Sep 1, 2022
arXiv (Cornell University), Aug 3, 2020
In this article, we introduce the ContentWise Impressions dataset, a collection of implicit inter... more In this article, we introduce the ContentWise Impressions dataset, a collection of implicit interactions and impressions of movies and TV series from an Over-The-Top media service, which delivers its media contents over the Internet. The dataset is distinguished from other already available multimedia recommendation datasets by the availability of impressions, i.e., the recommendations shown to the user, its size, and by being open-source. We describe the data collection process, the preprocessing applied, its characteristics, and statistics when compared to other commonly used datasets. We also highlight several possible use cases and research questions that can benefit from the availability of user impressions in an open-source dataset. Furthermore, we release software tools to load and split the data, as well as examples of how to use both user interactions and impressions in several common recommendation algorithms.
arXiv (Cornell University), Nov 5, 2018
An Item based recommender system works by computing a similarity between items, which can exploit... more An Item based recommender system works by computing a similarity between items, which can exploit past user interactions (collaborative filtering) or item features (content based filtering). Collaborative algorithms have been proven to achieve better recommendation quality then content based algorithms in a variety of scenarios, being more effective in modeling user behaviour. However, they can not be applied when items have no interactions at all, i.e. cold start items. Content based algorithms, which are applicable to cold start items, often require a lot of feature engineering in order to generate useful recommendations. This issue is specifically relevant as the content descriptors become large and heterogeneous. The focus of this paper is on how to use a collaborative models domain-specific knowledge to build a wrapper feature weighting method which embeds collaborative knowledge in a content based algorithm. We present a comparative study for different state of the art algorithms and present a more general model. This machine learning approach to feature weighting shows promising results and high flexibility.
arXiv (Cornell University), Aug 31, 2018
Item-item collaborative filtering (CF) models are a well known and studied family of recommender ... more Item-item collaborative filtering (CF) models are a well known and studied family of recommender systems, however current literature does not provide any theoretical explanation of the conditions under which item-based recommendations will succeed or fail. We investigate the existence of an ideal item-based CF method able to make perfect recommendations. This CF model is formalized as an eigenvalue problem, where estimated ratings are equivalent to the true (unknown) ratings multiplied by a user-specific eigenvalue of the similarity matrix. Preliminary experiments show that the magnitude of the eigenvalue is proportional to the accuracy of recommendations for that user and therefore it can provide reliable measure of confidence.
arXiv (Cornell University), Aug 31, 2018
Cold-start is a very common and still open problem in the Recommender Systems literature. Since c... more Cold-start is a very common and still open problem in the Recommender Systems literature. Since cold start items do not have any interaction, collaborative algorithms are not applicable. One of the main strategies is to use pure or hybrid content-based approaches, which usually yield to lower recommendation quality than collaborative ones. Some techniques to optimize performance of this type of approaches have been studied in recent past. One of them is called feature weighting, which assigns to every feature a real value, called weight, that estimates its importance. Statistical techniques for feature weighting commonly used in Information Retrieval, like TF-IDF, have been adapted for Recommender Systems, but they often do not provide sufficient quality improvements. More recent approaches[1, 4] estimate weights by leveraging collaborative information via machine learning, in order to learn the importance of a feature based on other users opinions. This type of models have shown promising results compared to classic statistical analyzes cited previously. We propose a novel graph, feature-based machine learning model to face the cold-start item scenario, learning the relevance of features from probabilities of item-based collaborative filtering algorithms.
It has been long known that quantum computing has the potential to revolutionize the way we find ... more It has been long known that quantum computing has the potential to revolutionize the way we find solutions of problems that are difficult to solve on classical computers. It was only recently that small but functional quantum computers have become available on the cloud, allowing to test their potential. In this paper we propose to leverage their capabilities to address an important task for recommender systems providers, the optimal selection of recommendation carousels. In many video-on-demand and music streaming services the user is provided with a homepage containing several recommendation lists, i.e., carousels, each built with a certain criteria (e.g., artist, mood, Action movies etc.). Choosing which set of carousels to display is a difficult problem because it needs to account for how the different recommendation lists interact, e.g., avoiding duplicate recommendations, and how they help the user explore the catalogue. We focus in particular on the adiabatic computing paradigm and use the D-Wave quantum annealer, which is able to solve NP-hard optimization problems, can be programmed by classical operations research tools and is freely available on the cloud. We propose a formulation of the carousel selection problem for black box recommenders, that can be solved effectively on a quantum annealer and has the advantage of being simple. We discuss its effectiveness, limitations and possible future directions of development.