|
BMC Research Notes 2012
Whole genome phylogenies for multiple Drosophila speciesKeywords: Singular value decomposition, Phylogenomics, Comparative genomics, Drosophila phylogeny Abstract: An unfiltered whole genome analysis (193,622 predicted proteins) strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed.These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between Drosophila species. Furthermore, protein filtering can be effectively applied to reduce incongruence in the dataset as well as to generate alternative phylogenies.Methods that determine phylogenies based on a restricted number of genes can be negatively affected by horizontal gene transfers, incomplete lineage-sorting, introgression, and the unrecognized comparison of paralogous genes. The recent explosive increase in the number of completely sequenced genomes allows us to consider inferring gene and/or organismal relationships using complete sequence data. Several methods for generating phylogenies based on whole genome information have been explored, and many of these have been applied to re-examine the phylogeny of Drosophila. These include methods based primarily or exclusively on gene content [1], gene order [2], and detailed comparisons of operationally defined orthologs [3]. However, these methods often fail to provide detailed and unbiased comparisons of a high fraction of sequences and instead produce phylogenies based on greatly filtered, preselected datasets. We
|