Comparison of AI Models and Methods for Infilling Hydrological Time Series Data
More Info
expand_more
Abstract
Monitoring and modelling water systems and potential interventions help to manage water more effectively. However, due to defects and operational failures in sensors, the reliability of monitoring and modelling is affected by incomplete data. Because traditional statistical methods are often ineffective in infilling large and multivariate gaps, several machine learning (ML) techniques have been explored. However, a comprehensive overview of ML techniques and their performance across different methods and hydrologic regimes is missing. Therefore, this thesis project tackles the question: What ML approach(es) are most suitable for infilling gaps in hydrological time series in The Netherlands?
This thesis evaluates ML models and their behaviour with different methods and hydrologic regimes. This study will help to understand the generalisability of the ML models, while also enabling improvements in (urban) water management. In this thesis, a multicriteria analysis (MCA) was used to assess a wide range of ML models. Next, during two case studies, a subset of these models were applied to infill gaps in groundwater, sewage water and surface water levels using the intra-station and inter-station methods.
In the MCA, it was found that Support Vector Regressor (SVR), Random Forest (RF), Gradient Boosting Trees (GBT), Multilayer Perceptron (MLP), Self-Organising Map (SOM) and Long-Short Term Memory (LSTM) were sufficiently suitable.
In the first case study, the use of these models with the intra-station approach led to mixed results for infilling small artificial gaps. Generally, acceptable MSE scores were achieved but poor NSE and KGE scores implied limited scalability.
The second case study showed more promising results with the inter-station method on an artificial gap of seven months. This method proved to be more scalable as all metrics indicated acceptable performance.
In the end, it was concluded that both the RF and GBT models performed most robustly. The MLP and LSTM models showed great potential but suffered from inconsistency, potentially caused by too little training data. It was also found that the inter-station method proved more scalable as compared to the intra-station approach. Furthermore, it was found that success is dependent upon conditions of the hydrologic regime such as human intervention.