This thesis investigates and compares the use of deep learning models for automated assessment of tracheomalacia (TM) severity in neonates with esophageal atresia (EA). TM is a condition characterized by weakening of tracheal cartilage, leading to airway collapse. It is especiall
...
This thesis investigates and compares the use of deep learning models for automated assessment of tracheomalacia (TM) severity in neonates with esophageal atresia (EA). TM is a condition characterized by weakening of tracheal cartilage, leading to airway collapse. It is especially common in neonates with EA and requires accurate severity evaluation for diagnosis and treatment. Currently, the severity classification is reliant on the bronchoscopist's interpretation, which makes it susceptible to inter-observer variability. The first aim of this thesis was to develop automated image segmentation techniques and validate the results. The second objective was quantifying airway dimensions based on the segmentations.
The study included data from 14 neonates who underwent bronchoscopy, resulting in 127 bronchoscopy images used for analysis. Various pre-processing techniques were applied to improve the quality of input data, including normalization and histogram equalization. The dataset was expanded using data augmentation techniques.
Four deep learning models were evaluated: 1) the standard U-Net model, 2) the Depth-Anything model with processing steps to create segmentations, 3) the U-Net model with Depth-Anything images as input and 4) the U-Net model using both the Depth-Anything and the original images as input (see Figure 1). Performance was primarily evaluated using the dice score, which measures the overlap between predicted segmentations and ground truth. The U-Net model achieved the best results, with a mean dice score of 0.79 on training data and 0.75 on test data, indicating effective segmentation performance. A study comparing clinician assessments of ground truth and model-predicted airway segmentations found no significant difference in accuracy (63% vs 56%, p = 0.616), though notably, 37% of ground truth segmentations were deemed incorrect by clinicians. Linear prediction models for the parameter ‘roundness’ and the principal component resulted in a correlation of 0.81 and 0.84, and a mean absolute error of 14.40% and 12.73%, respectively, for the standard U-Net model.
In conclusion, this research presents multiple deep learning-based models for the segmentation of airway collapse in bronchoscopy images, with a focus on improving accuracy and clinical relevance. Through the application of U-Net and Depth-Anything models, significant advancements were made in automatic segmentation, providing a useful tool for clinicians in evaluating the extent of airway collapse.
From a future perspective, there is significant potential to integrate the standard U-Net model into clinical settings, where it could play a crucial role in supporting early diagnosis and enabling personalized treatment planning. Several approaches could enhance the model's clinical applicability, including dataset expansion, improved model generalizability, and advanced post-processing techniques. Ultimately, these improvements can be used to create a model that not only quantifies the airway collapse accurately but also serves as a reliable tool for predicting clinical outcomes and guiding treatment planning.