Host- Microbiome Omics Integration for Cancer Analysis and Diagnostics

Investigating the added value of integrating microbial and host omics information for cancer diagnostics using prediction models

More Info
expand_more

Abstract

Cancer is one of the leading causes of death in the world. While there have been many studies investigating the development and progression of cancer in human tissues using host omics data or microbial data, there is a lack of research combining both types of data, even though both modalities have been shown to affect cancer morphology and aetiology. Studies which do combine these modalities often use simple methods or do not consider the relation between the two modalities and disease phenotypes. Such an integrated approach could offer additional insights and lead to the discovery of new disease biomarkers and better treatment strategies and therapies.

In this paper, we investigated whether such a holo-genomic approach offers additional information compared to using the modalities separately, by comparing the performances of prediction models built using the individual and integrated modalities for various prediction endpoints. To do this, we used TCGA gene expression data for the host omics modality and bacterial genus abundance data from the TCGA-mined Cancer Microbiome Atlas (TCMA) for the microbiome modality.

We found no improvement when integrating host gene expression with microbial abundance information compared to using the gene expression data individually, and the microbial data provided the least amount of diagnostic information. This is likely due to the information density of gene expression data, high variation of the microbiome, and the quantity, specificity and validation of the TCMA data. These results suggest that the holo-omics approach might not provide additional utility in certain contexts, that additional considerations have to be made when choosing microbial and host omic datasets for holo-omic integration, and provide an insight into the usability of the TCMA data set.