Abstract:
In phylogenetics, distances are often used to measure the incongruence between a pair of phylogenetic trees that are reconstructed by different methods or using different regions of genome. Motivated by the maximum parsimony principle in tree inference, we recently introduced the maximum parsimony (MP) distance, which enjoys various attractive properties due to its connection with several other well-known tree distances, such as TBR and SPR. Here we show that computing the MP distance between two trees, a NP-hard problem in general, is fixed parameter tractable in terms of the TBR distance between the tree pair. Our approach is based on two reduction rules--the chain reduction and the subtree reduction--that are widely used in computing TBR and SPR distances. More precisely, we show that reducing chains to length 4 (but not shorter) preserves the MP distance. In addition, we describe a generalization of the subtree reduction which allows the pendant subtrees to be rooted in different places, and show that this still preserves the MP distance. We conclude with an extended discussion in which we focus on similarities and differences between MP distance and TBR distance, and present a number of open problems.

Abstract:
A conjecture of Bandelt and Dress states that the maximum quartet distance between any two phylogenetic trees on $n$ leaves is at most $(\frac 23 +o(1))\binom{n}{4}$. Using the machinery of flag algebras we improve the currently known bounds regarding this conjecture, in particular we show that the maximum is at most $(0.69 +o(1))\binom{n}{4}$. We also give further evidence that the conjecture is true by proving that the maximum distance between caterpillar trees is at most $(\frac 23 +o(1))\binom{n}{4}$.

Abstract:
Given two phylogenetic trees on the same set of taxa X, the maximum parsimony distance d_MP is defined as the maximum, ranging over all characters c on X, of the absolute difference in parsimony score induced by c on the two trees. In this note we prove that for binary trees there exists a character achieving this maximum that is convex on one of the trees (i.e. the parsimony score induced on that tree is equal to the number of states in the character minus 1) and such that the number of states in the character is at most 7d_MP - 5. This is the first non-trivial bound on the number of states required by optimal characters, convex or otherwise. The result potentially has algorithmic significance because, unlike general characters, convex characters with a bounded number of states can be enumerated in polynomial time.

Abstract:
Within the field of phylogenetics there is great interest in distance measures to quantify the dissimilarity of two trees. Recently, a new distance measure has been proposed: the Maximum Parsimony (MP) distance. This is based on the difference of the parsimony scores of a single character on both trees under consideration, and the goal is to find the character which maximizes this difference. Here we show that computation of MP distance on two \emph{binary} phylogenetic trees is NP-hard. This is a highly nontrivial extension of an earlier NP-hardness proof for two multifurcating phylogenetic trees, and it is particularly relevant given the prominence of binary trees in the phylogenetics literature. As a corollary to the main hardness result we show that computation of MP distance is also hard on binary trees if the number of states available is bounded. In fact, via a different reduction we show that it is hard even if only two states are available. Finally, as a first response to this hardness we give a simple Integer Linear Program (ILP) formulation which is capable of computing the MP distance exactly for small trees (and for larger trees when only a small number of character states are available) and which is used to computationally verify several auxiliary results required by the hardness proofs.

Abstract:
Phylogenetic inference aims at reconstructing the evolutionary relationships of different species given some data (e.g. DNA, RNA or proteins). Traditionally, the relationships between species were assumed to be treelike, so the most frequently used phylogenetic inference methods like Maximum Parsimony were originally introduced to reconstruct phylogenetic trees. However, it has been well-known that some evolutionary events like hybridization or horizontal gene transfer cannot be represented by a tree but rather require a phylogenetic network. Therefore, current research seeks to adapt tree inference methods to networks. In the present paper, we analyze Maximum Parsimony on networks for various network definitions which have recently been introduced. For trees, there is a famous result by Tuffley and Steel which states that under a certain model of evolution, Maximum Parsimony always coincides with Maximum Likelihood. We now show that the various definitions Maximum Parsimony on networks can also be proven to be equivalent to certain functions which are similar to the likelihood concept, and we discuss their biological meaningfulness. We also present some complexity results of finding a most parsimonious network.

Abstract:
In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores.The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.Phylogenetic trees, or evolutionary trees, are the basic structures necessary to examine th

Abstract:
In this paper, we investigate a conjecture by von Haeseler concerning the Maximum Parsimony method for phylogenetic estimation, which was published by the Newton Institute in Cambridge on a list of open phylogenetic problems in 2007. This conjecture deals with the question whether Maximum Parsimony trees are hereditary. The conjecture suggests that a Maximum Parsimony tree for a particular (DNA) alignment necessarily has subtrees of all possible sizes which are most parsimonious for the corresponding subalignments. We answer the conjecture affirmatively for binary alignments on five taxa but also show how to construct examples for which Maximum Parsimony trees are not hereditary. Apart from showing that a most parsimonious tree cannot generally be reduced to a most parsimonious tree on fewer taxa, we also show that compatible most parsimonious quartets do not have to provide a most parsimonious supertree. Last, we show that our results can be generalized to Maximum Likelihood for certain nucleotide substitution models.

Abstract:
Construction of phylogenetic trees is a key means in molecular evolutionary studies. The methods of constructing phylogenetic trees include the distance-based methods, parsimony, maximum likelihood, and Bayesian inference methods. To resolve a special problem about phylogeny, several notices are necessary: first, to select the reasonable data at less bias as possible; second, to choose the proper method to reconstruct phylogenetic tree; third, to evaluate the conclusions and explain them on the field of evolution. The present paper provides a brief introduction of the principles of data selection and tree-construction methods, and discusses about their advantage and disadvantage points.

Abstract:
Phylogenetic networks are used to display the relationship of different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as Maximum Parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of Maximum Parsimony on networks, "hardwired" and "softwired", and examine the complexity of computing them given a network topology and a character. By exploiting a link with the problem Multicut, we show that computing the hardwired parsimony score for 2-state characters is polynomial-time solvable, while for characters with more states this problem becomes NP-hard but is still approximable and fixed parameter tractable in the parsimony score. On the other hand we show that, for the softwired definition, obtaining even weak approximation guarantees is already difficult for binary characters and restricted network topologies, and fixed-parameter tractable algorithms in the parsimony score are unlikely. On the positive side we show that computing the softwired parsimony score is fixed-parameter tractable in the level of the network, a natural parameter describing how tangled reticulate activity is in the network. Finally, we show that both the hardwired and softwired parsimony score can be computed efficiently using Integer Linear Programming. The software has been made freely available.

Abstract:
This paper introduces constNJ, the first algorithm for phylogenetic reconstruction of sets of trees with constrained pairwise rooted subtree-prune regraft (rSPR) distance. We are motivated by the problem of constructing sets of trees which must fit into a recombination, hybridization, or similar network. Rather than first finding a set of trees which are optimal according to a phylogenetic criterion (e.g. likelihood or parsimony) and then attempting to fit them into a network, constNJ estimates the trees while enforcing specified rSPR distance constraints. The primary input for constNJ is a collection of distance matrices derived from sequence blocks which are assumed to have evolved in a tree-like manner, such as blocks of an alignment which do not contain any recombination breakpoints. The other input is a set of rSPR constraints for any set of pairs of trees. ConstNJ is consistent and a strict generalization of the neighbor-joining algorithm; it uses the new notion of "maximum agreement partitions" to assure that the resulting trees satisfy the given rSPR distance constraints.