Machine learning for prediction of undrained shear strength from cone penetration test data

More Info
expand_more

Abstract

This research focuses on investigating the relative performance of a range of machine learning algorithms, namely the artificial neural network, support vector machine, Gaussian process regression, random forest, and XGBoost, for predicting the undrained shear strength from cone penetration test data. This is to assess how machine learning could help us lower the need for laboratory test data. The training dataset compiles 526 data from 12 regions and the testing dataset consists of 20 data from a polder located close to Leiden in the Netherlands. In addition, k-fold and group k-fold cross-validation strategies are both applied to validate the models. The poor performance of the models during group k-fold cross-validation suggests that, while machine learning techniques can perform well when site-specific data are included during training, they struggle to generalize without site-specific data. This highlights the difficulty of capturing soil heterogeneity and suggests that either machine learning methods should be trained on specific sites for which some data are already available, or much larger training datasets are needed.