Retrieval-Based Open-Domain Question Answering
Exploring The Impact on The Retrieval Component across Datasets
More Info
expand_more
Abstract
Open-domain question answering (QA) is an important step in Artificial Intelligence and its ultimate goal is to build a QA system that can answer any question posed by humans. The majority of the open-domain QA system is the retrieval-based open-domain QA system, which enables the retrieval component to retrieve relevant documents from a large-scale knowledge source to a question and the answer extraction component to extract the answer to this question based on retrieved documents. As the techniques of Deep Learning progressing significantly, many researchers tried to apply the neural reading comprehension (RC) model to serve the answer extraction component of the open-domain QA system. However, the performance of the neural RC model in open-domain QA is considerably worse than the performance of it in RC-style QA. Therefore, many works have focused on the neural RC model for addressing the performance gap, whereas the retrieval component of the open-domain QA system lacks equivalent attention. Some researchers have built the neural network based information retrieval (IR) models, but currently, it is still difficult for these neural IR models to directly retrieve documents from a large-scale knowledge source in open-domain QA. Hence, many works attempted to use neural IR models for re-ranking documents retrieved by the traditional but efficient IR models (e.g., TF-IDF, BM25) in open-domain QA. However, these works did not analyze the impact of different questions of QA datasets on traditional IR models. Thus, this research gap is the focus in this thesis. We conduct error analyses of questions of different QA datasets to figure out the error types of questions that have a negative impact on the traditional IR models. From the error analysis, we learn that different QA datasets have different impacts on the traditional IR models and are differently hard to be dealt with by the traditional IR models. Therefore, we propose hypotheses that might mitigate the negative impact of the error types of questions that are relatively harder to be handled by the traditional IR models. Furthermore, we perform experiments based on the methodologies that implement our hypotheses for figuring out the validity of these hypotheses. In conclusion, we believe that our work is a step forward to obtaining more insights into the retrieval component of the open-domain QA system and will contribute to the development of the retrieval component for a better open-domain QA system. Moreover, our work can give our users guidance on how to issue a more suitable question that can be processed by the open-domain QA system for giving a more accurate and better answer.