The Natural Language Generation field has advanced in generating human readable reports for domain experts in various fields. Nevertheless, Natural Language Generation and anomaly detection techniques have not been used in the rail domain yet. Currently, data analysis and inciden
...
The Natural Language Generation field has advanced in generating human readable reports for domain experts in various fields. Nevertheless, Natural Language Generation and anomaly detection techniques have not been used in the rail domain yet. Currently, data analysis and incident reporting for log files from the train control system are performed manually which is very time consuming task that is prone to missing crucial information. The rail domain is safety critical domain where detailed analysis of the train control system may prevent incidents from happening as well as help improve the performance of the train control system. This research designs, implements and evaluates a Natural Language Generation model that successfully translates anomalies detected in log files into human readable reports.
This thesis presents the steps taken for developing a Natural Language Generation system in the rail domain. Additionally, we examine two representations of the train control system used for the Content Determination task of the Natural Language Generation system. Through a case study with domain experts, we evaluate the performance and preference between the reports generated based on the two representations of the train control system and the data retrieved from the log files. The goal is to find a representation that presents the used with a full/solid understanding of the anomalies detected in the log files.
Based on the case study performed to evaluate the system, we present the finding that when developing a Natural Language Generation system for the rail domain, reports generated using a more detailed representation of the train control system (more precisely, using both state names and state attributes that specify the step by step process of setting a route for a train) were preferred over the reports generated using a less detailed representation (only state names). The preference was based on readability, accuracy and understandability measures of the reports presented during the case study.