Abstract:
Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Although they often work well in practice, existing supertree approaches use optimality criteria that do not reflect underlying processes, have known biases and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and > 50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of simulated benchmark datasets, we show that SPR supertrees are more similar to correct species histories under plausible rates of LGT than supertrees based on parsimony or Robinson-Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera; a small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT.

Abstract:
We present new and improved fixed-parameter algorithms for computing maximum agreement forests (MAFs) of pairs of rooted binary phylogenetic trees. The size of such a forest for two trees corresponds to their subtree prune-and-regraft distance and, if the agreement forest is acyclic, to their hybridization number. These distance measures are essential tools for understanding reticulate evolution. Our algorithm for computing maximum acyclic agreement forests is the first depth-bounded search algorithm for this problem. Our algorithms substantially outperform the best previous algorithms for these problems.

Abstract:
We present efficient algorithms for computing a maximum agreement forest (MAF) of a pair of multifurcating (nonbinary) rooted trees. Our algorithms match the running times of the currently best algorithms for the binary case. The size of an MAF corresponds to the subtree prune-and-regraft (SPR) distance of the two trees and is intimately connected to their hybridization number. These distance measures are essential tools for understanding reticulate evolution, such as lateral gene transfer, recombination, and hybridization. Multifurcating trees arise naturally as a result of statistical uncertainty in current tree construction methods.

Abstract:
The subtree prune-and-regraft (SPR) distance metric is a fundamental way of comparing evolutionary trees. It has wide-ranging applications, such as to study lateral genetic transfer, viral recombination, and Markov chain Monte Carlo phylogenetic inference. Although the rooted version of SPR distance can be computed relatively efficiently between rooted trees using fixed-parameter-tractable maximum agreement forest (MAF) algorithms, no MAF formulation is known for the unrooted case. Correspondingly, previous algorithms are unable to compute unrooted SPR distances larger than 7. In this paper, we substantially advance understanding of and computational algorithms for the unrooted SPR distance. First we identify four properties of minimal SPR paths, each of which suggests that no MAF formulation exists in the unrooted case. We then prove the 2008 conjecture of Hickey et al. that chain reduction preserves the unrooted SPR distance. This reduces the problem to a linear size problem kernel, substantially improving on the previous best quadratic size kernel. Then we introduce a new lower bound on the unrooted SPR distance called the replug distance that is amenable to MAF methods, and give an efficient fixed-parameter algorithm for calculating it. Finally, we develop a "progressive A*" search algorithm using multiple heuristics, including the TBR and replug distances, to exactly compute the unrooted SPR distance. Our algorithm is nearly two orders of magnitude faster than previous methods on small trees, and allows computation of unrooted SPR distances as large as 14 on trees with 50 leaves.

Abstract:
Statistical phylogenetic inference methods use tree rearrangement operations to perform either hill-climbing local search or Markov chain Monte Carlo across tree topologies. The canonical class of such moves are the subtree-prune-regraft (SPR) moves that remove a subtree and reattach it somewhere else via the cut edge of the subtree. Phylogenetic trees and such moves naturally form the vertices and edges of a graph, such that tree search algorithms perform a (potentially stochastic) traversal of this SPR graph. Despite the centrality of such graphs in phylogenetic inference, rather little is known about their large-scale properties. In this paper we learn about the rooted-tree version of the graph, known as the rSPR graph, by calculating the Ricci-Ollivier curvature for pairs of vertices in the rSPR graph with respect to two simple random walks on the rSPR graph. By proving theorems and direct calculation with novel algorithms, we find a remarkable diversity of different curvatures on the rSPR graph for pairs of vertices separated by the same distance. We confirm using simulation that degree and curvature have the expected impact on mean access time distributions, demonstrating relevance of these curvature results to stochastic tree search. This indicates significant structure of the rSPR graph beyond that which was previously understood in terms of pairwise distances and vertex degrees; a greater understanding of curvature could ultimately lead to improved strategies for tree search.

Abstract:
Phylogenetic networks are leaf-labelled directed acyclic graphs that are used to describe non-treelike evolutionary histories and are thus a generalization of phylogenetic trees. The hybridization number of a phylogenetic network is the sum of all indegrees minus the number of nodes plus one. The Hybridization Number problem takes as input a collection of phylogenetic trees and asks to construct a phylogenetic network that contains an embedding of each of the input trees and has a smallest possible hybridization number. We present an algorithm for the Hybridization Number problem on three binary trees on $n$ leaves, which runs in time $O(c^k poly(n))$, with $k$ the hybridization number of an optimal network and $c$ a constant. For two trees, an algorithm with running time $O(3.18^k n)$ was proposed before whereas an algorithm with running time $O(c^k poly(n))$ for more than two trees had prior to this article remained elusive. The algorithm for two trees uses the close connection to acyclic agreement forests to achieve a linear exponent in the running time, while previous algorithms for more than two trees (explicitly or implicitly) relied on a brute force search through all possible underlying network topologies, leading to running times that are not $O(c^k poly(n))$ for any $c$. The connection to acyclic agreement forests is much weaker for more than two trees, so even given the right agreement forest, reconstructing the network poses major challenges. We prove novel structural results that allow us to reconstruct a network without having to guess the underlying topology. Our techniques generalize to more than three input trees with the exception of one key lemma that maps nodes in the network to tree nodes and, thus, minimizes the amount of guessing involved in constructing the network. The main open problem therefore is to establish a similar mapping for more than three trees.

Abstract:
Two types of multipurpose essential oil blends, blend^{11} containing eleven different essential oils and blend^{12} containing twelve, were tested against bacterial strains of Pseudomonas aeruginosa ATCC 9027, Serratia marcescens ATCC 13880 and Staphylococcus aureus ATCC 6538 and against the fungi, Candida albicans ATCC 10231, Aspergillus fumigatus ATCC 10894 and Fusarium solani ATCC 36031 to determine the spectrum of in vitro antimicrobial activity using aromatograms (paper disc diffusion assays). Microbial growth was decreased by multipurpose blend^{11} and blend^{12} in a similar manner. The saline control disc did not inhibit antimicrobial growth while the two blends exhibited significant zones of inhibition for all 3 bacteria and for the 3 fungi. The greatest antibacterial activity of blend^{11} and blend^{12} was exhibited with P. aeruginosa and S. marcescens followed by S. aureus. A high level of activity was associated with C. albicans and a lower level with F. solani followed by A. fumigatus. It is clearly evident from previous published studies that no single essential oil will effectively inhibit the growth of all of the organisms in our study. However, our results demonstrate that blend^{11} and blend^{12} have a broad range of inhibitory activity affecting all of the microorganisms tested.

Abstract:
In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the posterior distribution. In this paper we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree prune-and-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We develop a novel graph-based approach to analyze tree posteriors and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so we show conclusively that topological peaks do occur in Bayesian phylogenetic posteriors from real data sets as sampled with standard MCMC approaches, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade distribution (CCD) can have systematic problems when there are multiple peaks.

The objective of this paper is to find the stationary distribution of a certain class of Markov chains arising in a biological population involved in a specific type of evolutionary conflict, known as Parker’s model. In a population of such players, the result of repeated, infrequent, attempted invasions using strategies from{0,1,2,…,m-1}, is a Markov chain. The stationary distributions of this class of chains, for mε {3,4,…,∞} are derived in terms of previously known integer sequences. The asymptotic distribution (form→∞) is derived.

Abstract:
High rates of overlapping sexual
relationships (concurrency) are believed to be important in the generation of
generalized HIV epidemics in sub-Saharan Africa. Different authors favor
socioeconomic, gender-equity or cultural explanations for the high concurrency
rates in this region. We performed linear regression to analyze the association
between the point-prevalence of concurrency in 15 - 49 years old males and
various indicators of socioeconomic status and gender-equity using data from 11
countries surveyed in 1989/1990. We found no meaningful association between
concurrency and the various markers of socioeconomic status and gender-equity.
This analysis supports the findings of other studies that high concurrency
rates in sub-Saharan Africa could be reduced without having to address
socioeconomic and gender-equity factors.