Please refer to https://www.evolbio.mpg.de/5814/group_bioinformatics or contact Bernhard for further information on the project: firstname.lastname@example.org
Fast Multiple Sequence Alignment
During the past two decades sequence production in genomics has threatened to outstrip the capacity of sequence analysis tools. This has lead to the development of a new generation of computer programs to which we recently contributed phylonium for fast and accurate distance computation between all members of large genome samples (Klötzl and Haubold, 2020).
The distances can then be used to reconstruct the sample’s phylogeny. The algorithm underlying phylonium is based on fast string indexing using suffix trees and their plain cousins, suffix arrays. Based on such an index, phylonium computes an approximate multiple sequence alignment and then all pairwise distances as the number of mutations per site. However, the underlying multiple sequence alignment can be used for other types of anlyses besides distance computation.
The aim of this project is to explore and implement some of these novel applications, including bootstrap for phylogeny reconstruction and population genetic analyses such as the detection of population structure and selection. The project is suitable for a computer scientist interested in genomics.
Fabian Klötzl and Bernhard Haubold. Phylonium: fast estimation of evolutionary distances from large samples of similar genomes. Bioinformatics, 36:2040–46, 2020.