|
BMC Bioinformatics 2011
ProPhylo: partial phylogenetic profiling to guide protein family construction and assignment of biological processAbstract: Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries.ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/ webcite.Phylogenetic profiling is an established and widely known technique for inferring biological roles for unknown proteins [1]. The technique capitalizes on the propensity of proteins that work together in the same cellular system to have traveled together in evolutionary processes of speciation, gene loss, and lateral transfer. Such systems may consist of biochemical pathways, multi-subunit protein complexes, protein-modifying enzymes with their targets, etc. For a given protein, the evolutionary history of its family is reflected in its present day taxonomic distribution [2]. Therefore, examining protein family co-occurrence across large numbers of genomes may reveal evidence linking one protein to others that cooperate in the same system.A phylogenetic profile in its simplest form is a series of binary char
|