Characterizing bacterial genetic diversity

In species' pangenomes and microbial communities

More Info
expand_more

Abstract

Bacteria are everywhere and play essential roles in Earth's diverse ecosystems and human health. For example, humans harbor a complex and essential gut microbial community comprising thousands of bacterial species (in addition to numerous viruses, fungi, and microbial eukaryotes). This community helps break down and synthesize nutrients, trains the immune system, and keeps pathogens at bay. However, imbalances in this community are associated with several diseases, including obesity, inflammatory bowel syndrome, and recurrent urinary tract infections. Moreover, bacteria can cause deadly infections, and many are developing resistance to our most potent antibiotics. Studying bacteria is thus essential for identifying differences between pathogens and harmless commensals, countering antimicrobial resistance, and understanding their impacts on human health.

To study bacteria, we typically characterize and compare their genomes. The genome comprises all of an organism's hereditary information, and its genes encode the molecular machines necessary for cell function, providing an overview of an organism's capabilities. The hereditary information in genomes additionally enables inferring the organism's evolutionary history, which helps in understanding why specific traits evolved or aid in inferring transmission links in case of an outbreak.

A challenge with comparing large sets of bacterial genomes is the extensive variation in genome content among many species. For example, two Escherichia coli strains can share as little as 50% of their genes. Current computational tools offer biased or incomplete views of genetic variation among strains. This hinders the identification of genotype-phenotype associations, prevents tracking mobile genetic elements, and limits our understanding of the microbial communities they are part of.

The central question of this thesis is how to design computational tools that enable accurate characterization of genetic variation among diverse bacterial genomes. This thesis introduces new algorithms to identify and represent genetic variation using graph data structures. It additionally presents a tool that characterizes strain-specific genetic variation in microbial communities, even in the presence of same-species strain mixtures. Finally, this thesis uses the previously mentioned tools to investigate the role of the gut microbiome in women with recurrent urinary tract infections, offering novel insights into the gut and bladder dynamics of  E. coli.

Collectively, we expect this work to contribute to an improved mechanistic understanding of bacteria's role in human health, help track and counter the spread of antimicrobial resistance, and inform on the development of microbiome-mediated therapeutics.