Bernhard Haubold

January 04, 2024

Please refer to https://www.evolbio.mpg.de/16023/group_bioinformatics or contact Bernhard <haubold@evolbio.mpg.de> for further information on the project after having read the publications stated below.

If you wish to apply for the position, please contact Bernhard Haubold by email providing a short motivational statement, names of two referees and a short CV (biosketch).

Quantifying Pangenomes

The genomes of bacteria consist of a constant core of housekeeping genes and a variable set of accessory genes. Together, the constant and variable part of a species’ genome make up its pangenome (Domingo-Sananes, 2021). The size of a pangenome grows as we add successive genomes. For example, Figure 1 shows the size of the pangenome of the human pathogen Streptococcus pneumoniae as a function of the number of genomes investigated. Initially, the pangenome contains one full genome, roughly 2.2 Mb. From there it grows to just under 4 Mb. Growth is rapid at first, then slows down, but never settles to a plateau. So the pangenome of S. pneumoniae appears to be open, as opposed to closed.

The first aim of this project is to program a tool for rapidly calculating pangenomes from large sets of bacterial genomes. One possible version of this tool is based on our software for finding unique genomic regions (Haubold et al., 2021), which was used for calculating Figure 1. Once written, the pangenome program is used to survey the pangenome of any narrowly defined bacterial taxon with at least 100 complete genome sequences.

There is currently a lot of interest in the computation and visualization of pangenomes for humans (Liao et al., 2023). In a second phase of this project we also investigate the pangenomes of eukaryotes, starting with small genomes like those of yeast and moving up to humans. Here, the recent availability of telomer-to-telomer assemblies of human genomes makes them for the first time as complete as those of bacteria have been since the 1990’s.

This project is aimed at a bioinformatician or a computer scientist interested in genomics.

References

R. M. Domingo-Sananes. Mechanisms that shape microbial pangenomes. Trends in Microbiology, 29:493–503, 2021.

B. Haubold, F. Klötzl, L. Hellberg, D. Thompson, and M. Cavalar. Fur: Find Unique genomic Regions for diagnostic PCR. Bioinformatics, 37:2081–2087, 2021.

W.-W. Liao et al. A draft human pangenome reference. Nature, 617:312–324, 2023. 2