Exploring Probabilistic Short-Term Water Demand Forecasts at District Level Using Neural Networks
More Info
expand_more
Abstract
In a world with accelerating climate change, rapid population increase and urbanization, urban water systems are under a growing stress. Thus precise short- and medium-term water demand forecasts are needed to optimize water supply operations. Water demand is influenced by human behavior and industrial activities which bring uncertainty, hence it is useful to utilize probabilistic methods to forecast water demand. This thesis provides an overview of probabilistic methods to predict water demand 24 hours ahead, highlighting their advantages, disadvantages, the accuracy of their interval and point forecasts. The case study uses the dataset from the Battle of Water Demand 2024, covering two years and two months of data across 10 districts in Ferrara, Italy, including residential, hospital, countryside, city center, and industrial districts. Three commonly used probabilistic extensions of neural networks were applied: QR (Quantile Regression), MDN (Gaussian Mixture Density Network), and an adapted CQR (Conformal Quantile Regression) method with online updating.
First, forecast models were developed to obtain probabilistic predictions. Three neural network architectures were investigated: a linear model, an MLP (multi-layer perceptron), and an LSTM (Long Short-Term Memory) model, along with a seasonal moving average as a benchmark. These neural network models were trained per district and collectively across districts. When trained per district, the linear model performed most accurately. When trained together, the MLP model performed best, but the linear model generalized the best overall, with the MLP generalizing second-best. The LSTM model had the worst performance. In districts with less heteroscedasticity in the demand pattern, the benchmark model performed on par with the neural networks in the end of the forecasting horizon, indicating that complex models are not always necessary. A categorical variable to determine the DMA did not improve the point forecasts.
Because the MLP with solely lagged features ultimately had the best performance for point forecasts, this model was used for probabilistic extensions to estimate the 0.95 prediction interval. The MLP was extended with the aforementioned probabilistic extensions. The probabilistic models were assessed in terms of reliability with where a probability was computed that tells whether the 0.95 prediction interval is reached as well as sharpness which tells how wide the interval is. Finally the Winkler Score is used that computes a trade-off between both.
Both models that fully learn the prediction interval (QR and MDN) were more difficult to calibrate, and further research is needed to calibrate them accordingly. By training these models jointly on the 10 DMAs, the coverages did vary per DMA. This can potentially be solved by training one model per DMA or by using regularization per DMA to push the model to have a similar coverage per DMA.
The MCD model had difficulty to adapt the prediction interval over the forecasting horizon, causing the coverage to reduce. The QR and MDN models also have trade off imbalances between reliability and sharpness over the forecasting horizon on the testing set, which have more random patterns. Interestingly the Conformal Prediction algorithm maintains its coverage best over the forecasting horizon and increases the sharpness, which is due to the online updating procedure. When allowing small decreases up to 0.02 probability in coverage on the testing set, the CQR model performs best according to the Winkler Score. The MDN performs best when larger drops of coverage are allowed.
Analyzing the rolling coverage over time shows improvements are possible, especially in late spring and summer periods there is under-coverage. There is also still a difference of coverage between weekdays and weekends. This indicates there is still epistemic uncertainty left to reduce, which is the uncertainty of data and model parameters. More features are recommended to reduce this which are categorical features as well as future weather data. This is recommended to investigate for non-industrial DMAs, by assuming a perfect forecast. A larger dataset may also be beneficial to obtain better performing models.