|
BMC Bioinformatics 2007
TBMap: a taxonomic perspective on the phylogenetic database TreeBASEAbstract: Taxonomic names in TreeBASE were mapped onto names in the external taxonomic databases IPNI, ITIS, NCBI, and uBio, and graph G of these mappings was constructed. Additional edges representing taxonomic synonymies were added to G, then all components of G were extracted. These components correspond to "name clusters", and group together names in TreeBASE that are inferred to refer to the same taxon. The mapping to NCBI enables hierarchical queries to be performed, which can improve TreeBASE information retrieval by an order of magnitude.TBMap database provides a mapping of the bulk of the names in TreeBASE to names in external taxonomic databases, and a clustering of those mappings into sets of names that can be regarded as equivalent. This mapping enables queries and visualisations that cannot otherwise be constructed. A simple query interface to the mapping and names clusters is available at http://linnaeus.zoology.gla.ac.uk/~rpage/tbmap webcite.TreeBASE [1,2] is a database of published phylogenetic trees and associated data matrices (such as sequence alignments). It differs from other phylogenetic databases, such as PANDIT [3] and TreeFam [4], in being primarily a collection of evolutionary trees for organisms, rather than for gene families. Although it contains only a small fraction of the evolutionary trees published to date, the database is continually growing, in part because a number of journals either require or encourage authors to submit their data sets and trees to TreeBASE. In addition to supporting simple text searches to retrieve data, TreeBASE has tools for searching based on tree similarity [5] and for constructing supertrees [6].The phylogenies stored in TreeBASE provide a wealth of information on organismal phylogeny, as well as a resource for studies on the relative merits of different sources of data [7], the shape of evolutionary trees [8,9], and methods for querying trees [5,10,11]. However, research that relies on aggregating results from diff
|