Climatic conditions in uence peak discharges in rivers and change sea levels; therefore, attention to the safety of dikes is of ever growing importance. Macro instability is one of the dike failure mechanisms that can inundate the hinterland. Soil heterogeneity plays an important
...
Climatic conditions in uence peak discharges in rivers and change sea levels; therefore, attention to the safety of dikes is of ever growing importance. Macro instability is one of the dike failure mechanisms that can inundate the hinterland. Soil heterogeneity plays an important role in assessing dike safety, especially for slope stability, because it is a major source of uncertainty. To assess a dike network for safety, numerical simulations for a full probabilistic analysis can be computationally expensive. Therefore, this study investigates how to build a state-of-the-art data-driven framework from a numerical model to predict the safety margins from the macro stability of dikes. Inputs and outputs of tens of thousands D-Stability simulations were used to create a training dataset. The most relevant features were selected based on global sensitivity analysis and the representation of soil heterogeneity in the framework. The maximisation of Shannon's information entropy and the generation of the training dataset was achieved by employing a smart sampling strategy for the input parameters. The sampling strategy consists of a Latin hypercube optimised uncorrelated uniform distributed dataset combined with a correlated dataset for optimal training eciency. The uncertainty due to soil heterogeneity is represented by a Gaussian random eld with a trend. This trend is commonly determined from a geotechnical cone penetration test. With a CPT, it also is possible to nd the vertical scale of uctuation, which is parametrised by the correlation length of the uctuations in soil strength. The second-order Markov correlation function is used to represent the correlation of the random elds. The Gaussian random eld is later mapped onto 16 stacked horizontal layers to model the heterogeneous soil properties. The surrogate model consists of an ensemble of thirteen machine learning models. The most important model is a multi-layer perceptron feed forward articial neural network. The other models are histogram based gradient boosting regression trees. Random search and Bayesian optimisation are used as hyperparametrisation techniques to optimise the prediction capability is of the individual ML algorithms. Weights for each model are determined based on optimisation for error reduction for maximum performance. The surrogate predicts the factor of safety (FOS) as well as the coordinates of the slop failure circles and line of depth from the Uplift-Van method. The surrogate model ensemble that predicted FOS is quite accurate with respect to the numerical FOS of D-Stability, and yet the prediction of the failure plane is still slightly worse. A case study was used to demonstrate the performance of the framework. Despite the uncertainty of the subsoil, due to the soil heterogeneity, the surrogate was able to accurately predict the failure probability. However, the prediction of the far end circle coordinates showed lower performance due to propagating errors. Concluding, application of the framework is possible for dike reinforcement optimisation, risk-based dike safety assessment, length effect, and effcient Monte Carlo simulations.