Transformer models have proven to be effective tools when used for determining the readability of texts. Models based on pre-trained architectures such as BERT, RoBERTa, and BART, as well as ReadNet, a transformer model which is dedicated to readability assessment, have shown som
...
Transformer models have proven to be effective tools when used for determining the readability of texts. Models based on pre-trained architectures such as BERT, RoBERTa, and BART, as well as ReadNet, a transformer model which is dedicated to readability assessment, have shown some very promising results. However, there is a lack of research focused on comprehensively analyzing these models' performance at a more granular level. Moreover, GPT-2, a member of the very popular GPT transformer family, has never been adapted to and tested in readability assessment. The work presented in this paper fills these knowledge gaps by analyzing the behavior of the five aforementioned models and reflecting on their performance on separate classes of text difficulty. Seeing how they perform on texts of various complexity levels is vital to understanding their behavior and limitations, which will in turn further the knowledge of the situations in which each readability tool achieves the most optimal results.