Traditionally, Recurrent Neural Networks (RNNs) are used to predict the sequential dynamics of the environment. With the advancement and breakthroughs of Transformer models, there has been demonstrated improvement in the performance & sample efficiency of Transformers as worl
...
Traditionally, Recurrent Neural Networks (RNNs) are used to predict the sequential dynamics of the environment. With the advancement and breakthroughs of Transformer models, there has been demonstrated improvement in the performance & sample efficiency of Transformers as world models. The focus has been on partially-observable environments where their capabilities can be maximally utilised. In this paper, we sought to investigate the conditions under which transformers outperform RNNs given a fully observable environment where states obey the Markov property. This provides insight into transformers' generalisation and predictive capabilities. Specifically, our experiments explored the impact of model complexity and the size of the dataset. We observed that transformers did not outperform our baseline implementation when given up to 7000 episodes of trajectory data. It was also observed that having shorter sequence lengths had a negligible impact on the performance of the model, leading to our recommendation of avoiding using transformers in these fully observable environments.