In this thesis we aim to research and design different neural models for session recommendation. We investigate the fundamental neural models for session recommendation, namely BERT4Rec, SASRec and GRU4Rec and subsequently use our findings to design a simpler but performant neura
...
In this thesis we aim to research and design different neural models for session recommendation. We investigate the fundamental neural models for session recommendation, namely BERT4Rec, SASRec and GRU4Rec and subsequently use our findings to design a simpler but performant neural model.
Firstly, we address methodological errors made in the training and evaluation process of BERT4Rec, SASRec and GRU4Rec in order to create a fair comparison of the models’ performance. We analyze the effect of several design decisions other than the architecture, and subsequently standardize these design decisions for all neural models. In the process, we discover that we can enhance the performance of the original implementations up to 250% in NDCG@10.
Having isolated the effect of the model architecture, we find that models are more similar in performance than previously reported. In addition, there is no consistently superior model. Moreover, we find that frequently-omitted non-neural baselines like SKNN and ItemKNN can be extremely competitive and outperform our neural models on 2 out of 4 datasets. The neural models significantly outperform the non-neural baselines on the remaining 2 datasets.
Furthermore, we compare the recommendation behaviour between different neural models, and find that the recommendation behaviour of different models is both similar and predictable depending on the dataset. We use our baseline models to explain the behaviour of the models, and find that up to 80% of the top-ranked recommendations were also recommended by ItemKNN or a simple first-order Markov Decision Process (MDP).
Therefore, we explore how much performance we can maintain while simplifying the model architectures. We propose a simple embedding-based model, and find that it can equal or even outperform the neural models with up to 48\% on NDCG@10 on 3 out of 4 datasets while being significantly easier to understand, faster to train and easier to tune. Moreover, the results show that we can capture almost all performance with a small subset of the original architectural components.
Our optimized implementations are open-sourced as part of our publication on using a large-language model to initialize item embeddings.