All Title Author
Keywords Abstract

HaMStR: Profile hidden markov model based search for orthologs in ESTs

DOI: 10.1186/1471-2148-9-157

Full-Text   Cite this paper   Add to My Lib


We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data.HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well.The amount of protein-coding DNA sequences in the public data bases is steadily increasing. This data is mainly generated by the sequencing and annotation of entire genomes and by numerous EST sequencing projects. Approaches to resolve the evolutionary relationships of eukaryotes on a molecular basis -frequently referred to as molecular systematics- particularly benefit from this data. Recent studies on the evolution of metazoans and fungi present trees with 40 to 77 taxa, reconstructed from more than 140 genes [1-6]. Still, these studies consider only a small fraction of the data available. For example, as of May 2008 dbEST contains 714 eukaryotic taxa with more than 2.000 ESTs each, and 394 taxa have more than 10,000 ESTs


comments powered by Disqus