ALISCORE - Masking of Multiple Sequence Alignments

Tabs

Information

Quick facts

Project title:

ALISCORE - Masking of Multiple Sequence Alignments

ZFMK Project lead:

Prof. Dr. Bernhard Misof

Unit:

Directorate, Chair Systematic Zoology, Algorithmic Development, Directorate Staff, Biodiversity Informatics

Description

Random similarity of sequences or sequence sections can impede phylogenetic analyses or the identification of gene homologies. Additionally, randomly similar sequences or ambiguously aligned sequence sections can negatively interfere with the estimation of substitution model parameters. Phylogenomic studies have shown that biases in model estimation and tree reconstructions do not disappear even with large datasets. In fact, these biases can become pronounced with more data. It is therefore important to identify possible random similarity within sequence alignments in advance of model estimation and tree reconstructions. Different approaches have been already suggested to identify and treat problematic alignment sections, like GBLOCKS or noisy. We propose an alternative method which can identify random similarity within multiple sequence alignments based on Monte Carlo resampling within a sliding window. The method infers similarity profiles from pairwise sequence comparisons and subsequently calculates a consensus profile. In consequence, consensus profiles identify dominating patterns of non-random similarity or randomness within sections of multiple sequence alignments. It thus appears to be a powerful tool to identify possible biases of tree reconstructions or gene identification. The approach has been extended to aminoacid and nucleotide data and is currently further developed to visualize total randomness among sequences of a multiple sequence alignment together with Dr. Patrick Kück and Sandra Meid, both ZFMK.