|
BMC Bioinformatics 2008
Algorithm of OMA for large-scale orthology inferenceAbstract: The algorithm of OMA improves upon standard bidirectional best-hit approach in several respects: it uses evolutionary distances instead of scores, considers distance inference uncertainty, includes many-to-many orthologous relations, and accounts for differential gene losses. Herein, we describe in detail the algorithm for inference of orthology and provide the rationale for parameter selection through multiple tests.OMA contains several novel improvement ideas for orthology inference and provides a unique dataset of large-scale orthology assignments.The classification of genes according to evolutionary relations is essential for many aspects of comparative and functional genomics. Evolutionary relations are often described as pairwise relations. Two genes that share a common ancestor are defined as homologs, while genes that are similar in sequence without a common origin are termed analogs. Homologs can be divided into several classes [1]: orthologs, which originate from a speciation event; paralogs, which originate from gene duplication; and xenologs, which originate from horizontal gene transfer. Orthologs are valuable in numerous analyses, including reconstruction of species phylogenies, protein function inference, database annotation, and genomic context analysis.Evolutionary relations can also be defined with respect to a third gene. Paralogs are classified as out-paralogs or in-paralogs [2]. In-paralogs are genes that diverged by a duplication that occurred after a speciation event of reference. The term co-orthologs is used occasionally to describe the same scenario from the perspective of a third gene that is orthologous to both genes. In contrast, out-paralogs are paralogs that diverged before a particular speciation event of reference.To address the need for reliable sources of orthologs, several initiatives have been created for better orthologs prediction Commonly, there are two classes of prediction methods: phylogeny based methods, which compare gene
|