%0 Journal Article
%T The Interaction between Base Compositional Heterogeneity and Among-Site Rate Variation in Models of Molecular Evolution
%A Nathan C. Sheffield
%J ISRN Evolutionary Biology
%D 2013
%R 10.5402/2013/391561
%X Many commonly used models of molecular evolution assume homogeneous nucleotide frequencies. A deviation from this assumption has been shown to cause problems for phylogenetic inference. However, some claim that only extreme heterogeneity affects phylogenetic accuracy and suggest that violations of other model assumptions, such as variable rates among sites, are more problematic. In order to explore the interaction between compositional heterogeneity and variable rates among sites, I reanalyzed 3 real heterogeneous datasets using several models. My Bayesian inference recovers accurate topologies under variable rates-among-sites models, but fails under some models that account for compositional heterogeneity. I also ran simulations and found that accounting for rates among sites improves topology accuracy in compositionally heterogeneous data. This indicates that in some cases, models accounting for among-site rate variation can improve outcomes for data that violates the assumption of compositional homogeneity. 1. Introduction Recent phylogenetic studies have explored the effect of compositional heterogeneity on phylogenetic methods. Compositional heterogeneity can arise in a dataset as a result of nonstationary evolution (when the substitution pattern is not uniform across an evolutionary tree). If two nonsister subtrees have similar substitution bias, this can lead to a convergence in nucleotide composition (CNC). The taxa may then look similar due to convergent evolution rather than common ancestry, which can mislead phylogenetic analysis. There are several methods to detect and quantify the level of compositional heterogeneity in a dataset, including chi-squared tests (e.g., [1]), Disparity Index [2], and relative-rates tests [3]. When found, the presence of compositional heterogeneity is often assumed to cause problems for both parametric and nonparametric phylogenetic methods [4]. However, this assumption has been challenged; Conant and Lewis [5] claimed that “extreme amounts of heterogeneity must be present before it can mislead phylogenetics” and Rosenberg and Kumar [6] “did not find a significant interaction between phylogenetic accuracy and substitution pattern heterogeneity among lineages.” Another commonly studied modeling question is the variation of substitution rates among sites. It has been established that accounting for among-site rate variation is important in phylogenetics [7]. This is most commonly done by assuming the substitution rates among sites vary according to a discrete gamma distribution with a fixed number of categories.
%U http://www.hindawi.com/journals/isrn.evolutionary.biology/2013/391561/