Bernhard Haubold
Please refer to https://www.evolbio.mpg.de/16023/group_bioinformatics or contact Bernhard <haubold@evolbio.mpg.de> for further information on the projects after having read the publications stated below.
To apply for the position(s) please write an email to <imprs-application@evolbio.mpg.de> for attention of Ms. Ellen Karl from the MPI personnel department. Your application documents have to be compiled in one PDF including a short motivational statement, a short CV (biosketch), bachelor and master degrees/transcripts of records and contact information for two academic references. Name the PDF as follows: Lastname_Firstname_Lastnamesupervisor_numberofproject.pdf. You will have to send two separate PDFs if you are applying for two projects.
With submission of your application, you accept the processing of your applicant data in terms of data-protection law. For further information on the legal basis and data usage we refer to the MPG privacy policy on https://www.evolbio.mpg.de/3246466/privacy-policy
1. Fast detection of horizontal gene transfer among microbial genomes
Horizontal gene transfer is an important mechanism in the evolution of microbial genomes. It is diagnosed as local changes in the closest relative of a given strain’s genome. In 2011 we published a pro-gram, Alfy, for quickly discovering such changes in local homology among viral and bacterial genomes [1]. Alfy takes as input a query sequence and a set of two or more subject sequences, and returns the query se-quence annotated by the closest subjects. The closest subjects are discovered from the lengths of the longest matches between the query and the subjects. These are looked up from an index constructed from all subject sequences simultaneously. This consumes memory in proportion to the size of the subject set, which doesn’t scale well for the potentially thousands of genomes now available for bacterial pathogens.
We developed a solution to this problem in the context of a different program, Fur, for finding unique regions in bacterial genomes to construct diagnostic markers [2]. In Fur the lengths of longest matches are looked up iteratively one subject at a time rather than simultaneously for all subjects. The aim of the proposed PhD project is to construct an iterative—and hence more efficient—version of Alfy, and to use it to characterize horizontal gene transfer among the genomes of a wide range of bacterial taxa.
This project is suitable for bioinformaticians, computer scientists with a strong interest in biology, and biologists with a strong interest in computing.
References
[1] M. Domazet-Lošo and B. Haubold. Alignment-free detection of local similarity among viral and bacterial genomes. Bioinformatics, 27:1466–1472, 2011.
[2] B. Vieira Mourato, I. Tsers, S. Denker, F. Klötzl, and B. Haubold. Marker discovery in the large. Bioinformatics Advances, 4:vbae113, 2024.
2. Phytax—phylogenetically correct taxonomy of genome sequences
Every entry in Genbank is associated with a taxon in the Genbank taxonomy. This makes the Genbank taxonomy with its 2.7 million taxa the taxonomy of all sequenced life. The most comprehensive entries in Genbank are whole genome assemblies, of which there are currently 3 million. Many researchers wish to use these genome sequences to construct diagnostic markers like those used in the PCR tests for covid. Diagnostic markers are best discovered by comparing the genomes of the target organism to those of their closest relatives. This works as long as the taxonomy of the target is identical to its phylogeny. However, we recently found that in a sample of 114 diverse bacterial taxa, 39 had mixed taxonomy when compared to their closest relatives [1]. This required us to reclassify genomes before proceeding with marker discovery. The aim of this project is to reclassify all bacterial genomes into phylogenetically correct taxa at
the species level and below to facilitate the construction of diagnostic markers from genome sequences.
The result of this reclassification is presented as a public web server, Phytax, for “phylogenetically correct taxonomy. Phytax takes as input a taxon ID and returns the phylogenetically correct genome accessions associated with it. This allows quick retrieval of accurate sets of genome accessions by anyone interested in marker discovery or other genomics work on closely related taxa.
This project would be suitable for a bioinformatician, a computer scientist with strong interest in biology, and a biologist with strong interest in computing.
References
[1] B. Vieira Mourato, I. Tsers, S. Denker, F. Klötzl, and B. Haubold. Marker discovery in the large. Bioinformatics Advances, 4:vbae113, 2024.