
Biology Direct 2012
Cubic time algorithms of amalgamating gene trees and building evolutionary scenariosKeywords: Phylogenetics , Fast algorithms , Tree inference , Species tree , Tree amalgamation , Tree reconciliation , Supertree , Evolutionary events , Gene duplication , Gene loss , Horizontal gene transfer , Gene gain , Time slices Abstract: Background A long recognized problem is the inference of the supertree S that amalgamates a given set {Gj} of trees Gj, with leaves in each Gj being assigned homologous elements. We ground on an approach to find the tree S by minimizing the total cost of mappings αj of individual gene trees Gj into S. Traditionally, this cost is defined basically as a sum of duplications and gaps in each αj. The classical problem is to minimize the total cost, where S runs over the set of all trees that contain an exhaustive nonredundant set of species from all input Gj. Results We suggest a reformulation of the classical NPhard problem of building a supertree in terms of the global minimization of the same cost functional but only over species trees S that consist of clades belonging to a fixed set P (e.g., an exhaustive set of clades in all Gj). We developed a deterministic solving algorithm with a low degree polynomial (typically cubic) time complexity with respect to the size of input data. We define an extensive set of elementary evolutionary events and suggest an original definition of mapping β of tree G into tree S. We introduce the cost functional c(G, S, f ) and define the mapping β as the global minimum of this functional with respect to the variable f, in which sense it is a generalization of classical mapping α. We suggest a reformulation of the classical NPhard mapping (reconciliation) problem by introducing time slices into the species tree S and present a cubic time solving algorithm to compute the mapping β. We introduce two novel definitions of the evolutionary scenario based on mapping β or a random process of gene evolution along a species tree. Conclusions Developed algorithms are mathematically proved, which justifies the following statements. The supertree building algorithm finds exactly the global minimum of the total cost if only gene duplications and losses are allowed and the given sets of gene trees satisfies a certain condition. The mapping algorithm finds exactly the minimal mapping β, the minimal total cost and the evolutionary scenario as a minimum over all possible distributions of elementary evolutionary events along the edges of tree S. The algorithms and their effective software implementations provide useful tools in many biological studies. They facilitate processing of voluminous tree data in acceptable time still largely avoiding heuristics. Performance of the tools is tested with artificial and prokaryotic tree data. Reviewers This article was reviewed by Prof. Anthony Almudevar, Prof. Alexander Bolshoy (nominated by
