The Leibniz Institute for the Analysis of Biodiversity Change

is a research museum of the Leibniz Association




Quick facts

Project title: 
Selection of Optimal Subsets from Concatenated Supermatrices - MARE software
ZFMK Project lead: 


In phylogenomics character matrices with extensive missing data are frequently used. These missing data have potentially detrimental effects on the accuracy and robustness of tree inference.

Therefore, many investigators select taxa and genes with high data coverage. Drawbacks of these selections are their exclusive reliance on data coverage without consideration of actual signal in the data. The simple selection of taxa and genes with high data coverage might thus not deliver data matrices with optimal signal. As an alternative, we have developed a heuristics which

(1) assesses information content of genes in super\-matrices using a measure of tree--likeness combined with data coverage and

(2) reduces super\-matrices with a simple hill climbing procedure to matrices with high total information content.

The selection of a data subset with the proposed approach  increased the chance to recover correct partial trees > 10-fold.

Our simulations and analyses of empirical data demonstrate that the selection of data subsets can be improved with formal approaches compared with simply selecting taxa and genes of high data coverage. We are further developing this approach into a hypotheses-driven selection of an optimal concatenated supermatrix.


Contact person

Chair "Systematic Zoology"
+49 228 9122-200
+49 228 9122-202
b.misof [at]