Osteoarthritis (OA) is a chronic musculoskeletal joint disease that leads to disability. Osteophytes are a hallmark of OA in the knee, characterized by the formation of bone spurs that contribute to joint pain and reduced mobility. This study explores the application of deep lear
...
Osteoarthritis (OA) is a chronic musculoskeletal joint disease that leads to disability. Osteophytes are a hallmark of OA in the knee, characterized by the formation of bone spurs that contribute to joint pain and reduced mobility. This study explores the application of deep learning (DL) techniques for the automatic detection and grading of osteophytes on magnetic resonance (MR) images of the knee. Leveraging the DenseNet-121 and ResNet-50 DL architectures from the Medical Open Network for Artificial Intelligence (MONAI) framework and a dataset, containing 1782 double echo steady-state (DESS) MR images from the Osteoarthritis Initiative (OAI), the study aims to enhance diagnostic accuracy and efficiency in medical imaging analysis. The dataset was split 8:2 for training and validation purposes, respectively. Through a series of numerical experiments, the research evaluates binary classification, region of interest (ROI)-based detection, and multi-class classification models, demonstrating that DenseNet-121 generally outperforms ResNet-50. The five-fold cross-validated binary DenseNet-121 model achieved an area under the receiver operating characteristic curve (ROC AUC) score of 0.90 and a balanced accuracy of 0.82, with a 95% confidence interval (CI) of 0.81-0.83 trained on resampled whole knee images. Moreover, the cross-validated ROI detection models for the patella inferior, superior, and tibia lateral subregions achieved balanced accuracy scores of 0.89 (0.88-0.90 CI), 0.86 (0.85-0.87 CI), and 0.85 (0.84-0.86 CI), respectively. However, the multi-class DenseNet-121 model achieved lower performance, with a balanced accuracy of 0.73 (0.71-0.75 CI), indicating the complexity of multi-class classification in this context. Furthermore, this research did not include hyperparameter optimization, as many settings were kept at their default values, suggesting the possibility for improved results. The cross-validated models were evaluated on an external test set, obtained from the Erasmus Medical Centre, comprising FSPGR-FS images from a significantly younger patient cohort, with a notable class imbalance. The models’ performance on this dataset was significantly lower than their validation results, underscoring the limitations in generalizing to different age demographics and class distributions. External testing underscores the need for more robust models to maintain high performance across diverse datasets and clinical settings. Key contributions of this study include the use of weighted categorical cross-entropy (WCCE) loss functions and analysis of the knee’s subregions to improve detection accuracy. The findings establish a solid foundation for further research, suggesting future work should focus on advanced optimization techniques, mixed imaging sequences in the training dataset, and comparative studies with other established models within the computer vision sector.