BaseModel vs TIGER for sequential recommendations (original) (raw)

In December 2023, a paper titled "Recommender Systems with Generative Retrieval" [1] was presented by researchers at NeurIPS 2023. This paper introduced a novel recommender system, referred to as 'TIGER,' which garnered significant attention due to its innovative approach and state-of-the-art results in sequential recommendation tasks.

Overview of TIGER

The TIGER architecture incorporates several cutting-edge techniques and has a number of interesting properties:

Details of TIGER’s Architecture

The TIGER model leverages RQ-VAE to generate sparse item representations, a method well-suited for transformer decoders where suffix codes depend on prefix codes. The model architecture includes a Bidirectional Transformer Encoder to aggregate historical interactions and an Autoregressive Transformer Decoder to predict the next item in a sequence.

Figure 1: RQ-VAE: In the figure, the vector output by the DNN Encoder, say r0 (represented by the blue bar), is fed to the quantizer, which works iteratively. First, the closest vector to r0 is found in the first level codebook. Let this closest vector be ec0 (represented by the red bar). Then, the residual error is computed as r1 := r0−ec0 . This is fed into the second level of the quantizer, and the process is repeated: The closest vector to r1 is found in the second level, say ec1 (represented by the green bar), and then the second level residual error is computed as r2 = r1 − e′ c1 . Then, the process is repeated for a third time on r2. (source)

The whole architecture looks as follows:

Figure 2: An overview of the modeling approach in TIGER. (source)

On the left-hand side of TIGER, RQ-VAE item code generation can be seen. The right-hand side is the core neural architecture for autoregressive modeling. It consists of a Bidirectional Transformer Encoder, aggregating historical interactions, and an autoregressive Transformer Decoder, proposing the next item in the sequence as a sequence of residual codes.

TIGER’s Benchmark Results

TIGER was benchmarked on several Amazon datasets:

We use three public benchmarks from the Amazon Product Reviews dataset, containing user reviews and item metadata from May 1996 to July 2014. We use three categories of the Amazon Product Reviews dataset for the sequential recommendation task: “Beauty”, “Sports and Outdoors”, and “Toys and Games”. This table summarizes the statistics of the datasets. We use users’ review history to create item sequences sorted by timestamp and filter out users with less than 5 reviews. Following the standard evaluation protocol, we use the leave-one-out strategy for evaluation. For each item sequence, the last item is used for testing, the item before the last is used for validation, and the rest is used for training. During training, we limit the number of items in a user’s history to 20. (source)

TIGER outperformed prior strong baselines on the Amazon datasets used. Results below:

Performance comparison on sequential recommendation. The last row depicts % improvement with TIGER relative to the best baseline. Bold (underline) are used to denote the best (second-best) metric. (source)

TIGER was able to achieve significant improvements of +0.15% to +29.04% over prior state-of-the-art on all metrics on all datasets.

BaseModel vs TIGER Architecture

BaseModel, shares some conceptual similarities with TIGER, such as autoregressive modeling and sparse integer tuple representation of items. However, there are significant differences in their implementation:

BaseModel vs TIGER Performance

To evaluate BaseModel against TIGER, we replicated the exact data preparation, training, validation, and testing protocols described in the TIGER paper. The exact same implementations of Recall and NDCG metrics were used for consistency.

For comparison of the models a few steps were performed:

The entire process took 3 hours from scratch to finish. The parameters of BaseModel were not tuned in any way, and the results look as follows:

Despite limited optimization of BaseModel’s parameters, the results are quite compelling. BaseModel achieved a further +24.26% to +61.57% improvement over TIGER’s results on the same datasets.
Given TIGER’s reliance on RQ-VAE, costly Transformer training, and beam-search inference, it is reasonable to infer that BaseModel’s training and inference processes are orders of magnitude faster.

Conclusion

The comparison between BaseModel and TIGER reveals substantial differences in their architectural choices and performance. While TIGER represents a notable advancement in generative retrieval recommender systems, BaseModel’s approach demonstrates superior efficiency and effectiveness in sequential recommendation tasks. Further optimization and exploration of BaseModel could lead to even more significant advancements in the field of recommender systems.
The potential of our methods encourages continued innovation and comparison with leading models in the field to push the boundaries of what behavioral models can achieve.

References

[1] Rajput, Shashank, et al. "Recommender systems with generative retrieval." Advances in Neural Information Processing Systems 36 (2024).