Multi-Modal Recommendation System with Auxiliary Information (original) (raw)

Designing Multi-Modal Embedding Fusion-Based Recommender

Electronics

Recommendation systems have lately been popularised globally. However, often they need to be adapted to particular data and the use case. We have developed a machine learning-based recommendation system, which can be easily applied to almost any items and/or actions domain. Contrary to existing recommendation systems, our system supports multiple types of interaction data with various modalities of metadata through a multi-modal fusion of different data representations. We deployed the system into numerous e-commerce stores, e.g., food and beverages, shoes, fashion items, and telecom operators. We present our system and its main algorithms for data representations and multi-modal fusion. We show benchmark results on open datasets that outperform the state-of-the-art prior work. We also demonstrate use cases for different e-commerce sites.

Multi-modal Embedding Fusion-based Recommender

ArXiv, 2020

Recommendation systems have lately been popularized globally, with primary use cases in online interaction systems, with significant focus on e-commerce platforms. We have developed a machine learning-based recommendation platform, which can be easily applied to almost any items and/or actions domain. Contrary to existing recommendation systems, our platform supports multiple types of interaction data with multiple modalities of metadata natively. This is achieved through multi-modal fusion of various data representations. We deployed the platform into multiple e-commerce stores of different kinds, e.g. food and beverages, shoes, fashion items, telecom operators. Here, we present our system, its flexibility and performance. We also show benchmark results on open datasets, that significantly outperform state-of-the-art prior work.

A Context-Aware User-Item Representation Learning for Item Recommendation

ACM Transactions on Information Systems, 2019

Both reviews and user-item interactions (i.e., rating scores) have been widely adopted for user rating prediction. However, these existing techniques mainly extract the latent representations for users and items in an independent and static manner. That is, a single static feature vector is derived to encode user preference without considering the particular characteristics of each candidate item. We argue that this static encoding scheme is incapable of fully capturing users’ preferences, because users usually exhibit different preferences when interacting with different items. In this article, we propose a novel c ontext- a ware user-item r epresentation l earning model for rating prediction, named CARL. CARL derives a joint representation for a given user-item pair based on their individual latent features and latent feature interactions. Then, CARL adopts Factorization Machines to further model higher order feature interactions on the basis of the user-item pair for rating predi...

Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022

Recent advancements of sequential deep learning models such as Transformer and BERT have significantly facilitated the sequential recommendation. However, according to our study, the distribution of item embeddings generated by these models tends to degenerate into an anisotropic shape, which may result in high semantic similarities among embeddings. In this paper, both empirical and theoretical investigations of this representation degeneration problem are first provided, based on which a novel recommender model DuoRec is proposed to improve the item embeddings distribution. Specifically, in light of the uniformity property of contrastive learning, a contrastive regularization is designed for DuoRec to reshape the distribution of sequence representations. Given the convention that the recommendation task is performed by measuring the similarity between sequence representations and item embeddings in the same space via dot product, the regularization can be implicitly applied to the item embedding distribution. Existing contrastive learning methods mainly rely on data level augmentation for useritem interaction sequences through item cropping, masking, or reordering and can hardly provide semantically consistent augmentation samples. In DuoRec, a model-level augmentation is proposed based on Dropout to enable better semantic preserving. Furthermore, a novel sampling strategy is developed, where sequences having the same target item are chosen hard positive samples. Extensive experiments conducted on five datasets demonstrate the superior performance of the proposed DuoRec model compared with baseline methods. Visualization results of the learned representations validate that DuoRec can largely alleviate the representation degeneration problem. CCS CONCEPTS • Information systems → Recommender systems.

Comparison of RNN and Embeddings Methods for Next-item and Last-basket Session-based Recommendations

Recurrent Neural Networks (RNNs) have been shown to perform very effectively in session-based recommendation settings, when compared to other commonly used methods that consider the last viewed item of the user and precomputed item-to-item similarities. However, there is little systematic study on how RNNs perform in comparison to methods that use embeddings for item representation for Collaborative Filtering. In this paper we use two industry datasets to compare RNNs with other sequential recommenders that use various embedding methods to represent items. The first dataset corresponds to a typical e-commerce session-based scenario demanding effective next-item recommendation. The second dataset represents a last-basket prediction setting. Results show that although the RNN greatly outperforms embedding methods in the next-item scenario, the dynamic nature of the RNNs has not the same positive impact in the last-basket prediction task. We also present and test a framework that enables the hybrid utilization of text content and item sequences using embeddings. Finally, we report on experiments with reranking methods that demonstrate the effectiveness of simple and practical methods, using item categories, to improve the results.

AdaMCT: Adaptive Mixture of CNN-Transformer for Sequential Recommendation

2022

Sequential recommendation (SR) aims to model users' dynamic preferences from their historical interactions. Recently, Transformer and convolution neural networks (CNNs) have shown great success in learning representations for SR. Nevertheless, Transformer mainly focus on capturing content-based global interactions, while CNNs effectively exploit local features in practical recommendation scenarios. Thus, how to effectively aggregate CNNs and Transformer to model both local and global dependencies of historical item sequence still remains an open challenge and is rarely studied in SR. To this regard, we inject locality inductive bias into Transformer by combining its global attention mechanism with a local convolutional filter, and adaptively determine the mixing importance on a personalized basis through a module and layer-aware adaptive mixture units, named AdaMCT. Moreover, considering that softmax-based attention may encourage unimodal activation, we introduce the Squeeze-Excitation Attention (with sigmoid activation) into sequential recommendation to capture multiple relevant items (keys) simultaneously. Extensive experiments on three widely used benchmark datasets demonstrate that AdaMCT significantly outperforms the previous Transformer and CNNs-based models by an average of 18.46% and 60.85% respectively in terms of NDCG@5 and achieves state-of-the-art performance. CCS CONCEPTS • Information systems → Recommender systems.

Recommendation based on sequence Item2vec

Recommendation system play an important role in e commerce systems such as Amazon, ebay, Zalando etc. In this project we focused on making recommendation based on what items the user has used in the past and on what the other users have done. which are the most widely used functions in online services for presenting relevant items from a given item. We used an online retail dataset with 541,909 transactions and 2603 items and also synthetically generated transaction history dataset with 5 million transaction and 24,010 unique items we applied distributed representation method to be able to recommend the next possible item for the particular user. The key idea of our approach is treating items as words, and then training the Skip gram model based on those items and users information. Resulting item vectors from the model can be used to calculate the cosine similarity between items, and find the similar items for a given list of items. The experiments show that our system achieved an accuracy of 44.46 % using the real dataset and 61.03% using synthetically generated one.

Contextual Sequence Modeling for Recommendation with Recurrent Neural Networks

Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems - DLRS 2017, 2017

Recommendations can greatly benefit from good representations of the user state at recommendation time. Recent approaches that leverage Recurrent Neural Networks (RNNs) for session-based recommendations have shown that Deep Learning models can provide useful user representations for recommendation. However, current RNN modeling approaches summarize the user state by only taking into account the sequence of items that the user has interacted with in the past, without taking into account other essential types of context information such as the associated types of user-item interactions, the time gaps between events and the time of day for each interaction. To address this, we propose a new class of Contextual Recurrent Neural Networks for Recommendation (CRNNs) that can take into account the contextual information both in the input and output layers and modifying the behavior of the RNN by combining the context embedding with the item embedding and more explicitly, in the model dynamics, by parametrizing the hidden unit transitions as a function of context information. We compare our CRNNs approach with RNNs and non-sequential baselines and show good improvements on the next event prediction task. CCS CONCEPTS • Information systems → Data stream mining; • Computing methodologies → Neural networks;

Adaptive Multi-Modalities Fusion in Sequential Recommendation Systems

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

In sequential recommendation, multi-modal information (e.g., text or image) can provide a more comprehensive view of an item's profile. The optimal stage (early or late) to fuse modality features into item representations is still debated. We propose a graph-based approach (named MMSR) to fuse modality features in an adaptive order, enabling each modality to prioritize either its inherent sequential nature or its interplay with other modalities. MMSR represents each user's history as a graph, where the modality features of each item in a user's history sequence are denoted by cross-linked nodes. The edges between homogeneous nodes represent intra-modality sequential relationships, and the ones between heterogeneous nodes represent inter-modality interdependence relationships. During graph propagation, MMSR incorporates dual attention, differentiating homogeneous and heterogeneous neighbors. To adaptively assign nodes with distinct fusion orders, MMSR allows each node's representation to be asynchronously updated through an update gate. In scenarios where modalities exhibit stronger sequential relationships, the update gate prioritizes updates among homogeneous nodes. Conversely, when the interdependent relationships between modalities are more pronounced, the update gate prioritizes updates among heterogeneous nodes. Consequently, MMSR establishes a fusion order that spans a spectrum from early to late modality fusion. In experiments across six datasets, MMSR consistently outperforms state-of-the-art models, and our graph propagation methods surpass other graph neural networks. Additionally, MMSR naturally manages missing modalities.

Specializing Joint Representations for the task of Product Recommendation

Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems - DLRS 2017, 2017

We propose a uni ed product embedded representation that is optimized for the task of retrieval-based product recommendation. To this end, we introduce a new way to fuse modality-speci c product embeddings into a joint product embedding, in order to leverage both product content information, such as textual descriptions and images, and product collaborative ltering signal. By introducing the fusion step at the very end of our architecture, we are able to train each modality separately, allowing us to keep a modular architecture that is preferable in real-world recommendation deployments. We analyze our performance on normal and hard recommendation setups such as cold-start and cross-category recommendations and achieve good performance on a large product shopping dataset. CCS CONCEPTS •Computing methodologies →Machine learning; Neural networks;