全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2012 

Long Branch Effects Distort Maximum Likelihood Phylogenies in Simulations Despite Selection of the Correct Model

DOI: 10.1371/journal.pone.0036593

Full-Text   Cite this paper   Add to My Lib

Abstract:

The aim of our study was to test the robustness and efficiency of maximum likelihood with respect to different long branch effects on multiple-taxon trees. We simulated data of different alignment lengths under two different 11-taxon trees and a broad range of different branch length conditions. The data were analyzed with the true model parameters as well as with estimated and incorrect assumptions about among-site rate variation. If length differences between connected branches strongly increase, tree inference with the correct likelihood model assumptions can fail. We found that incorporating invariant sites together with distributed site rates in the tree reconstruction (+I) increases the robustness of maximum likelihood in comparison with models using only . The results show that for some topologies and branch lengths the reconstruction success of maximum likelihood under the correct model is still low for alignments with a length of 100,000 base positions. Altogether, the high confidence that is put in maximum likelihood trees is not always justified under certain tree shapes even if alignment lengths reach 100,000 base positions.

References

[1]  Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22: 240–249.
[2]  Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Biol 27: 401–410.
[3]  Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376.
[4]  Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137: 51–73.
[5]  Rogers JS (1997) On the consistency of maximum likelihood estimation of phylogenetic trees from nucleotide sequences. Syst Biol 46: 354–357.
[6]  Swofford DL, Waddell PJ, Huelsenbeck JP, Foster PG, Lewis PO, et al. (2001) Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst Biol 50: 525–539.
[7]  Gaut S, Lewis PO (1995) Success of maximum likelihood phylogeny inference in the four-taxon case. Mol Biol Evol 12: 152–162.
[8]  Bruno WJ, Halpern AL (1998) Topological bias and inconsistency in maximum likelihood using wrong models. Mol Biol Evol 16: 564–566.
[9]  Anderson FE, Swofford DL (2004) Should we be worried about long-branch attraction in real data sets? Investigations using metazoan 18S rDNA. Mol Phylogenet Evol 33: 440–451.
[10]  Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431: 980–984.
[11]  Kelchner SA, Thomas MA (2006) Model use in phylogenetics: nine key questions. Trends Ecol Evol 22: 87–94.
[12]  Yang Z, Goldman N, Friday A (1994) Comparison of models for nucleotide substitution used in Maximum-Likelihood phylogenetic estimation. Mol Biol Evol 11: 316–324.
[13]  Huelsenbeck JP, Hillis DM (1993) Success of phylogenetic methods in the four-taxon case. Syst Zool 42: 247–264.
[14]  Yang Z (1993) Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over time. Mol Biol Evol 10: 1396–1401.
[15]  Yang Z, Goldman N, Friday AE (1995) Maximum likelihood trees from DNA sequences: a peculiar statistical estimation problem. Syst Biol 44: 384–399.
[16]  Sullivan J, Swofford DL (1997) Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. J Mammal Evol 4: 77–86.
[17]  Yang Z (1997) How often do wrong models produce better phylogenies? Mol Biol Evol 14: 105–108.
[18]  Siddal ME (1998) Success of parsimony in the four-taxon case: Long branch repulsion by likelihood in the Farris zone. Cladistics 14: 209–220.
[19]  Sullivan J, Swofford DL (2001) Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? Syst Biol 50: 723–729.
[20]  Gaucher EA, Miyamoto MM (2005) A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous. Mol Phylogenet Evol 37: 928–931.
[21]  Fischer M, Steel M (2009) Sequence length bounds for resolving a deep phylogenetic divergence. J Theor Biol 256: 247–252.
[22]  Fukami-Kobayashi K, Tateno Y (1991) Robustness of maximum likelihood tree estimation against different patterns of base substitutions. J Mol Evol 32: 79–91.
[23]  Kuhner MK, Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 11: 459–468.
[24]  Huelsenbeck JP (1997) Is the Felsenstein zone a y trap? Syst Biol 46: 69–74.
[25]  Pol D, Siddal ME (2001) Biases in maximum likelihood and parsimony: a simulation approach to a 10-taxon case. Cladistics 17: 266–281.
[26]  W?gele JW, Mayer C (2007) Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects. BMC Evol Biol 7: 147.
[27]  Felsenstein J (1984) Distance methods for inferring phylogenies: a justification. Evolution 38: 16–24.
[28]  Huelsenbeck JP (1995) Performance of phylogenetic methods in simulation. Syst Biol 44: 17–48.
[29]  Lockhart PJ, Larkum AW, Steel MA, Waddell PJ, Penny D (1996) Evolution of chlorophyll and bacteriochlorophyll: The problem of invariant sites in sequence analysis. Proc Natl Acad Sci U S A 93: 1930–1934.
[30]  Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Tree 11: 367–372.
[31]  Gu X, Fu YX, Li WH (1995) Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. Mol Biol Evol 12: 546–557.
[32]  Sullivan J, Swofford DL, Naylor GJP (1999) The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Mol Biol Evol 16: 1347–1356.
[33]  Phillipe H, Germot A (2000) Phylogeny of eukaryotes based on ribosomal RNA: Long-Branch Attraction and models of sequence evolution. Mol Biol Evol 17: 830–834.
[34]  Sanderson MJ, Wojciechowski MF, Hu JM, Sher-Khan T, Brady SG (2000) Error, bias, and longbranch attraction in data for two chloroplast photosystem genes in seed plants. Mol Biol Evol 17: 782–797.
[35]  Savard J, Tautz D, Richards S, Weinstock GM, Gibbs RA, et al. (2006) Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects. Genome Res 16: 1334–1338.
[36]  Murienne J, Edgecombe G, Giribet G (2010) Including secondary structure, fossils and molecular dating in the centipede tree of life. Mol Phylogenet Evol 57: 301–313.
[37]  Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, et al. (2010) Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature 319: 473–476.
[38]  Rota-Stabelli O, Campbell L, Brinkmann H, Edgecombe GD, Longhorn SJ, et al. (2010) A congruent solution to arthropod phylogeny: phylogenomics, microRNAs and morphology support monophyletic Mandibulata. Proc R Soc B 278: 298–306.
[39]  Mayrose I, Friedman N, Pupko T (2005) A gamma mixture model better accounts for among site heterogeneity. Bioinformatics 21: 151–158.
[40]  Ren F, Tanaka H, Yang Z (2005) An empirical examination of the utility of codon-substitution models in phylogeny reconstruction. Syst Biol 54: 808–818.
[41]  Tourasse NJ, Gouy M (1997) Evolutionary distances between nucleotide sequences based on the distribution of substitution rates among sites as estimated by parsimony. Mol Biol Evol 14: 287–298.
[42]  Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hypotheses testing using maximum likelihood. Annu Rev Ecol Syst 28: 43766.
[43]  Sullivan J, Holsinger KE, Simon C (1996) The effect of topology on estimates of among-site rate variation. J Mol Evol 42: 308–312.
[44]  Fletcher W, Yang Z (2009) INDELible: A exible simulator of biological sequence evolution. Mol Biol Evol 26: 1879–1888.
[45]  Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696–704.
[46]  Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. (2010) PhyML 3.0: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307–321.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133