This research evaluates the applicability of Multivariate Imputation by Chained Equations (MICE) for estimating missing well-log data across different sedimentary basis. Utilizing various machine learning techniques including XGBoost (XGB), Random Forest (RF), K-Nearest Neighbors
...
This research evaluates the applicability of Multivariate Imputation by Chained Equations (MICE) for estimating missing well-log data across different sedimentary basis. Utilizing various machine learning techniques including XGBoost (XGB), Random Forest (RF), K-Nearest Neighbors (KNR), and Bayesian Ridge (BR), the performance of MICE was tested on three different data sets from distinct geological contexts and preprocessing conditions with minimal user input.
The main results indicate that the performance of MICE varied across different data sets and well-logs, highlighting the complexity of imputing missing data in heterogeneous sedimentary basins. The number of iterations in MICE did not significantly impact the performance of the models, while data quality, pre-processing, and geological complexities played crucial roles. The Force-200 data set, which underwent extensive preprocessing, demonstrated better imputation performance compared to the Montney and Beetaloo data sets. Additionally, XGB often outperformed other algorithms, predicting missing values with different number of iterations.
The main conclusions drawn from this study emphasize the need for more research to minimize user input and to develop more robust and flexible approaches to imputing missing data in well-logs. The study highlights the challenge of determining a single set of hyperparameters optimal for all the well-logs, suggesting the need for more adaptable models or even advanced techniques like deep learning techniques. The research also suggests the importance of refining pre-processing techniques, exploring further combinations of well-logs, and developing cross-validation approaches that effectively replicates real-world scenarios to advance the application and reliability of MICE in data imputation of subsurface data with missing values.