Constructing Phylogenies under the Rich Data Hypothesis

Bonnie Kirkpatrick, University of California, Berkeley

Determining phylogenetic relationships is of central importance to biology. Not only are the phylogenies themselves interesting, but these methods are useful for predicting functional and structural similarities between genes, aligning multiple sequences, and determining the haplotypes compatible with a given genotype. Characters of phylogenetic trees are functions that map the species at the leaves of the tree to some measurable character state, usually nucleotides or morphological features. In many cases, the data for the characters state are incomplete because some of the taxa have unobserved characters. Particularly in the case of morphological and haplotype data, biologists observe characters whose values are not measured across all taxa. Many problems in phylogeny are NP-complete; these include the problems of determining character compatibility with four taxa, partial character compatibility of binary characters, quartet compatibility, and maximum character compatibility [2]. In restricted cases, fast algorithms exist. For full binary characters and quartets that all share a root, polynomial time algorithms exist. We discuss extensions to the Rich Data Hypothesis discovered by Halperin and Karp. Under this hypothesis, they found a near-linear time algorithm for inferring a perfect phylogeny from partial binary character data with a specific structure [1]. Here we consider extending the Rich Data Hypothesis to non-binary characters.


[1] E. Halperin and R. Karp. Perfect phylogeny and haplotype assignment. In RECOMB ’04: Proceedings of the eighth annual international conference on Computational molecular biology, pages 10–19, New York, NY, USA, 2004. ACM Press.


[2] M. Steel. The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification, 1992:91–116, 1992.

Abstract Author(s): Bonnie Kirkpatrick and Richard Karp