In this thesis, we study automatically generating explanatory reports for anomalous incidents in a train control system (TCS) using Natural Language Generation (NLG). A TCS is a type of safety-critical software that allows train controllers to correctly set the tracks for a trai
...
In this thesis, we study automatically generating explanatory reports for anomalous incidents in a train control system (TCS) using Natural Language Generation (NLG). A TCS is a type of safety-critical software that allows train controllers to correctly set the tracks for a train to pass. The goal of this research is to process the majority of log files generated by such a system, detect anomalies that have occurred and represent that data in human-readable reports that explain anomalous incidents. The reports are generated by making use of NLG techniques, namely data-to-text. We design an NLG pipeline that incorporates novel graphical components. We perform all steps of the report generation process, which is the processing of data, anomaly detection and representing the data in natural language. To process the data we incorporate complex domain-specific rules which require extensive work to accommodate five separate types of event log files. Analyzing logs and detecting anomalous incidents proved complex due to the complicated structure of logs and the ill-defined relationships between different log file types. The log data explained with the NLG pipeline is more extensive than we have seen in literature, as it needs to be linked using complex domain-specific rules and includes temporal and geographical aspects of the railway system. Furthermore, the detected anomalies are very diverse and thus it is unclear what aspects should be included in a report for each anomalous case. We research whether a purely textual presentation or a combination of modalities provides higher information quality. The product of the NLG system are three presentation versions of incident reports, Text, TimeText and RailText. Each version focuses on different important aspects of the nature of the data, to explain anomalous incidents. Due to the diverse properties of the anomalous cases detected, we implement a Case-Based Reasoning (CBR) system to predict the appropriate presentation to explain each incident. We evaluate our work in two expert-based evaluation phases. The first phase evaluates the quality of the different report presentations. The second phase evaluates the feasibility of the CBR system and the quality of the reports chosen by CBR. The work in this thesis is significant as we manage to translate complex data into human-readable reports that are well received by experts and help in understanding anomalous cases. We learn that there is a difference in what presentation format provides the highest information quality of reports depending on the anomalous case being explained. Therefore we conclude that the properties of anomalous cases should be taken into account when generating anomalous reports in a TCS to ensure the highest perceived information quality of the reports. CBR is shown to perform that task well. Furthermore, we find that data familiarity of experts effects their preferences for report presentations.