After surpassing human performance in the fields of Computer Vision, Speech Recognition and NLP, deep learning has been gaining scientific ground in IR. In spite of the sheer amount of publications that have proposed so-called neural IR approaches over the past decade, the field
...
After surpassing human performance in the fields of Computer Vision, Speech Recognition and NLP, deep learning has been gaining scientific ground in IR. In spite of the sheer amount of publications that have proposed so-called neural IR approaches over the past decade, the field has not achieved the kind of progress seen in related fields. Over the past year or so, works have begun to solve the issues that complicate the progress of neural applications in IR. Among those issues we can find the lack of approaches to interpret and analyze neural IR models, which is addressed in this thesis. We propose a novel approach to diagnose retrieval models that is rooted in the axiomatic approach to IR. Axioms encapsulate search heuristics that are expressed as constraints on retrieval functions. Existing axiomatic approaches have provided fruitful analyses of traditional IR models but are no longer viable to study neural IR models. Building forth on these approaches, we propose a novel approach to empirically analyze retrieval functions, suitable for neural models. Based on inspirations from the NLP and Computer Vision communities, we use model-agnostic diagnostic datasets in order to determine what kind of search heuristics models are able to learn. Since the creation of diagnostic datasets does not require a labeled dataset, we can apply the proposed pipeline to almost any dataset containing queries and documents. We have shown for four specific axioms how to extend and relax them, in order to make them fit for obtaining diagnostic datasets. We have applied our diagnostic dataset creation pipeline to the WikiPassageQA and MSMarco corpora and evaluated three traditional baselines and six neural models. Our experiments on the WikiPassageQA dataset show that the proposed approach can indeed diagnose strengths and weaknesses of neural models. However, our experiments on the MSMarco dataset show that an axiomatic analysis based on the four axioms does not always diagnose factors that incur retrieval effectiveness. An interesting direction for future work is therefore to include more axioms in the diagnostic approach. As possible extensions of the work carried out in this thesis, several roads of future work have been proposed. Among them, we can find reproducing experiments on other neural toolkits and employing the methodology on different IR tasks, but also researching the validity of axioms and adopting a specialized metric for axiomatic performance. We furthermore identified various opportunities to use diagnostic datasets beyond diagnosing neural models. Concluding, we believe that the axiomatic approach to diagnosing neural IR models presented in this work is a step forward to gaining valuable insights into the black boxes that deep models are generally considered to be. We hope our work may prove a fruitful resource for analysis in the field of neural IR on the road towards achieving superior performance without losing sight of a better fundamental understanding of IR.