Key Fragmentomics Features for Cancer Detection

An Analytical Approach to Identifying Essential Characteristics for Cancer Detection and Classification Using DNA Fragments from Blood Samples

More Info
expand_more

Abstract

Cancer represents a huge challenge in the medical world, necessitating early detection methods to improve treatment outcomes. The field of fragmentomics emerged as a promising option towards developing efficient non-invasive cancer diagnosis tools. By analysing the differences between the cfDNA fragments from blood samples of healthy patients and patients with cancer, this study aims to determine the most important fragmentomics features for cancer detection. The methods present in this work involve extracting features from the cfDNA fragments available in the experimental dataset, applying a pipeline of feature selection techniques that removes the redundant features, training and evaluating a logistic regression and random forest classifiers to differentiate between healthy and diseased samples, and finally extracting the feature weights from the trained models to understand which features contributed the most to the classification task. Filter-based variance thresholding and Correlation-based Feature Selection (CFS) were employed to refine the dataset. Independent t-test and the Mann-Whitney U test are used to calculate the relationship between the cancer and healthy samples. The Pearson correlation coefficient calculates the correlation between each pair of features. The classification performance of the two proposed models is assessed using the train/test split and the nested cross-validation techniques. The evaluation reveals that logistic regression constantly outperforms the random forest and that removing the redundant features increases the performance of both classifiers. Certain genomic bins, mostly on chromosomes 1, 7 and 8, contain significant features for the classification task. These findings suggest that understanding the importance of the fragmentomics features can lead to improved diagnostic tools such as cancer detection based on blood tests.

Files