Abstract:
The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be computed relatively efficiently between rooted trees using fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF formulation is known for the unrooted case. Correspondingly, previous algorithms are unable to compute unrooted SPR distances larger than 7. In this paper, we substantially advance understanding of and computational algorithms for the unrooted SPR distance. First we identify four properties of minimal SPR paths, each of which suggests that no MAF formulation exists in the unrooted case. We then prove the 2008 conjecture of Hickey et al. that chain reduction preserves the unrooted SPR distance. This reduces the problem to a linear size problem kernel, substantially improving on the previous best quadratic size kernel. Then we introduce a new lower bound on the unrooted SPR distance called the replug distance that is amenable to MAF methods, and give an efficient fixed-parameter algorithm for calculating it. Finally, we develop a "progressive A*" search algorithm using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance. Our algorithm is nearly two orders of magnitude faster than previous methods on small trees, and allows computation of unrooted SPR distances as large as 14 on trees with 50 leaves.

Abstract:
In this note, we present a counterexample to Hill et al's conjecture and subsequently show that a modified version of their conjecture holds.While Hill et al's conjecture may result in an overestimate of the rooted subtree prune and regraft distance, a slightly more restricted version of their approach gives the desired outcome and can be applied to speed up the exact calculation of this distance between two phylogenies.In recent years, one of the main research foci in the development of theoretical frameworks that aim at approaching questions in evolutionary biology turns from the reconstruction of phylogenetic trees towards the reconstruction of phylogenetic networks. This has partly been triggered by the exponentially growing amount of available sequence data arising from whole genome sequencing projects and a successive detection of genes whose sequences are chimeras of distinct ancestral gene sequences, and hence, are likely to be the result of reticulation (e.g. horizontal gene transfer or hybridization). Although evolutionary biologists are now mostly acknowledging the existence of species arising from reticulation within certain groups of organisms, the extent to which such events have influenced the evolutionary history for a set of present-day species remains controversially discussed until today. To shed light on this question, Hill et al. [1] recently published a study that is centered around the identification and quantification of horizontal gene transfer. The authors have implemented a new software package--called SPRIT--consisting of a heuristic as well as an exact algorithm, applied it to several data sets of variable size, and compared their results and running times with those obtained from other algorithms that have previously been developed to analyze reticulate evolution.Algorithmically, SPRIT draws on ideas that are borrowed from work that has been done in the context of the graph-theoretic operation of rooted subtree prune and regraft (rSPR)

Abstract:
We use Dirichlet form methods to construct and analyze a reversible Markov process, the stationary distribution of which is the Brownian continuum random tree. This process is inspired by the subtree prune and regraft (SPR) Markov chains that appear in phylogenetic analysis. A key technical ingredient in this work is the use of a novel Gromov--Hausdorff type distance to metrize the space whose elements are compact real trees equipped with a probability measure. Also, the investigation of the Dirichlet form hinges on a new path decomposition of the Brownian excursion.

Abstract:
Statistical phylogenetic inference methods use tree rearrangement operations to perform either hill-climbing local search or Markov chain Monte Carlo across tree topologies. The canonical class of such moves are the subtree-prune-regraft (SPR) moves that remove a subtree and reattach it somewhere else via the cut edge of the subtree. Phylogenetic trees and such moves naturally form the vertices and edges of a graph, such that tree search algorithms perform a (potentially stochastic) traversal of this SPR graph. Despite the centrality of such graphs in phylogenetic inference, rather little is known about their large-scale properties. In this paper we learn about the rooted-tree version of the graph, known as the rSPR graph, by calculating the Ricci-Ollivier curvature for pairs of vertices in the rSPR graph with respect to two simple random walks on the rSPR graph. By proving theorems and direct calculation with novel algorithms, we find a remarkable diversity of different curvatures on the rSPR graph for pairs of vertices separated by the same distance. We confirm using simulation that degree and curvature have the expected impact on mean access time distributions, demonstrating relevance of these curvature results to stochastic tree search. This indicates significant structure of the rSPR graph beyond that which was previously understood in terms of pairwise distances and vertex degrees; a greater understanding of curvature could ultimately lead to improved strategies for tree search.

Abstract:
Within the field of phylogenetics there is great interest in distance measures to quantify the dissimilarity of two trees. Here, based on an idea of Bruen and Bryant, we propose and analyze a new distance measure: the Maximum Parsimony (MP) distance. This is based on the difference of the parsimony scores of a single character on both trees under consideration, and the goal is to find the character which maximizes this difference. In this article we show that this new distance is a metric and provides a lower bound to the well-known Subtree Prune and Regraft (SPR) distance. We also show that to compute the MP distance it is sufficient to consider only characters that are convex on one of the trees, and prove several additional structural properties of the distance. On the complexity side, we prove that calculating the MP distance is in general NP-hard, and identify an interesting island of tractability in which the distance can be calculated in polynomial time.

Abstract:
The subtree prune and regraft distance (dSPR) between phylogenetic trees is important both as a general means of comparing phylogenetic tree topologies as well as a measure of lateral gene transfer (LGT). Although there has been extensive study on the computation of dSPR and similar metrics between rooted trees, much less is known about SPR distances for unrooted trees, which often arise in practice when the root is unresolved. We show that unrooted SPR distance computation is NP-Hard and verify which techniques from related work can and cannot be applied. We then present an efficient heuristic algorithm for this problem and benchmark it on a variety of synthetic datasets. Our algorithm computes the exact SPR distance between unrooted tree, and the heuristic element is only with respect to the algorithm’s computation time. Our method is a heuristic version of a fixed parameter tractability (FPT) approach and our experiments indicate that the running time behaves similar to FPT algorithms. For real data sets, our algorithm was able to quickly compute dSPR for the majority of trees that were part of a study of LGT in 144 prokaryotic genomes. Our analysis of its performance, especially with respect to searching and reduction rules, is applicable to computing many related distance measures.

Abstract:
Three standard subtree transfer operations for binary trees, used in particular for phylogenetic trees, are: tree bisection and reconnection ($TBR$), subtree prune and regraft ($SPR$) and rooted subtree prune and regraft ($rSPR$). For a pair of leaf-labelled binary trees with $n$ leaves, the maximum number of such moves required to transform one into the other is $n-\Theta(\sqrt{n})$, extending a result of Ding, Grunewald and Humphries. We show that if the pair is chosen uniformly at random, then the expected number of moves required to transfer one into the other is $n-\Theta(n^{2/3})$. These results may be phrased in terms of agreement forests: we also give extensions for more than two binary trees.

Abstract:
We present new and improved fixed-parameter algorithms for computing maximum agreement forests (MAFs) of pairs of rooted binary phylogenetic trees. The size of such a forest for two trees corresponds to their subtree prune-and-regraft distance and, if the agreement forest is acyclic, to their hybridization number. These distance measures are essential tools for understanding reticulate evolution. Our algorithm for computing maximum acyclic agreement forests is the first depth-bounded search algorithm for this problem. Our algorithms substantially outperform the best previous algorithms for these problems.

Abstract:
We present efficient algorithms for computing a maximum agreement forest (MAF) of a pair of multifurcating (nonbinary) rooted trees. Our algorithms match the running times of the currently best algorithms for the binary case. The size of an MAF corresponds to the subtree prune-and-regraft (SPR) distance of the two trees and is intimately connected to their hybridization number. These distance measures are essential tools for understanding reticulate evolution, such as lateral gene transfer, recombination, and hybridization. Multifurcating trees arise naturally as a result of statistical uncertainty in current tree construction methods.

Abstract:
We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the Subtree Prune-and-Regraft (SPR) distance between two phylogenetic trees. Our result improves on the very recent 2.5-approximation algorithm due to Shi, Feng, You and Wang (2015). Our algorithm is the first approximation algorithm for this problem that uses LP duality in its analysis.