Finding biological markers for Parkinson's disease

Using machine learning to analyse metagenomic data

More Info
expand_more

Abstract

Parkinson's disease (PD) is a neurodegenerative disorder characterized by motor function loss and potential mental and behavioral changes. The identification of biomarkers in the gut microbiota of PD patients can significantly aid in fast and accurate diagnosis. This study investigates the application of machine learning (ML) models, including Logistic Regression (LR), Random Forest (RF), and Support Vector Machines (SVM), to discover biomarkers in the gut metagenomic data of PD patients. The ML models were optimized using various feature selection techniques, and a comparative analysis of the most influential species in sample discrimination was conducted to verify potential PD-associated biomarkers.
The results demonstrate that all three ML models exhibit moderate performance, indicating their limited discriminatory power. However, the comparison of significant species across different classifiers demonstrates substantial overlap and indicates PD-associated species that align with existing literature findings. These outcomes provide promising evidence that LR, RF, and SVM classifiers can effectively identify biomarkers for PD. However, confounding analysis on a small subset of the dataset failed to identify meaningful PD-associated species. Therefore, caution is advised when interpreting the findings of ML model, considering factors such as classifier performance, dataset limitations, potential biases, influence of feature selection methods, and inherent model differences.
We validate the potential usefulness of ML approaches for biomarker discovery and highlight areas for further investigation into building a sufficiently accurate ML model for PD diagnosis.

Files