The Microbial World Magnified Through the Bright Lens of Comparative Genomics

More Info
expand_more

Abstract

We are witnessing an era of rapid technological advancements, which led to an explosion in the amount of genomic data collected. The field of comparative genomics, in parallel, is expanding at an unrepentant rate. Comparative genomics explores the similarities and differences in the genomes of various organisms, species or strains, and it is one of our most useful tools today for unraveling the complexities of microbial biology. However, despite growing interest in microbial genomics, there remains a significant gap in our understanding of microbial diversity and function. The microbial dark matter remains elusive, and we have a lot more to uncover.
This dissertation aims to leverage comparative genomics, and develop novel algorithms tailored for microbial genomes to enhance our understanding of microbial biology and address existing knowledge gaps. More specifically, it focuses on the representation of microbial diversity and the functional annotation of poorly characterized taxa. By harnessing large-scale genomic datasets, novel approaches and algorithms are designed to uncover hidden traits in microorganisms.
We begin our journey at the smallest scale with viruses; our study of SARS-CoV-2 genomes in the Netherlands during the COVID-19 pandemic showcases the power of genomic data to understand disease dynamics. The remainder of the dissertation concerns bacteria. We explore pangenome graphs to represent bacterial populations. As I discuss the limitations of current methods, I propose an ensemble approach to exploit graph representations for structural variant calling. This work sets the stage for future developments in pangenome graphs as a powerful framework to model bacterial populations and analyze their genetic makeup.
Following recent developments in algorithms for eukaryotes, I draw inspiration from natural language processing to predict gene functions in bacteria. I present SAFPred, a novel tool in which I integrate bacterial synteny into the predictive model, and demonstrate its use to identify variants of toxin genes in Enterococcus. The novelty of my approach lies partly in how I incorporated bacterial synteny into the function prediction algorithm. Thus, I also release our synteny database, SAFPredDB, that can facilitate various comparative genomic analyses in the future. Our journey comes to an end in our study of the Enterococcus genus through the largest collection of genome assemblies. Here, I emphasize the importance of understanding microbial diversity and antibiotic resistance mechanisms once again, and note the power of large scale genomic analyses.
Overall, my main goal with this dissertation is to showcase the potential of comparative genomics in unraveling the mysteries of microbial life and addressing pressing global challenges in health, agriculture, and biotechnology. Through innovative methods and large-scale data analysis, my work, first and foremost, offers valuable insights into microbial biology and evolution, paving the way for future research in the field. And I hope it also encourages further exploration and appreciation of the mighty world of microbes.