Precipitation has high spatial and temporal uncertainty, which makes it challenging to predict. We focus specifically on extreme amounts of precipitation. The Royal Dutch Meteorological Institute (KNMI) uses a numerical model, approximating the solutions to partial differential e
...
Precipitation has high spatial and temporal uncertainty, which makes it challenging to predict. We focus specifically on extreme amounts of precipitation. The Royal Dutch Meteorological Institute (KNMI) uses a numerical model, approximating the solutions to partial differential equations, to forecast precipitation and other metrics about the weather. These forecasts have systematic errors, due to the model’s high sensitivity to input parameters. These errors can be corrected with statistical methods, by looking at the relation between the predicted and actual precipitation. We use a non-parametric regression set-up to estimate the conditional expectation of the weather given the forecasts of the numerical weather prediction model of the KNMI. Specifically, we focus on predicting the maximum precipitation in a three by three kilometers area in the Netherlands. There are several existing methods for solving non-parametric regression problems; in this thesis we will focus on k-nearest neighbors and random forests. A simulation study shows, however, that both these methods are not capable of dealing with more complex regression problems, such as forecasting extreme precipitation. Therefore, we are proposing a newly developed method, called k-nearest forest neighbors, which is a generalization of the random forests approach. This new method performs significantly better on the simulated data, compared to k-nearest neighbors and random forests. When applying the methods on a precipitation data set obtained from the KNMI, it also turns out that the method we developed has more predictive power than the numerical weather model and the existing non-parametric regression approaches.