全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2012 

Protein Sequence Alignment Analysis by Local Covariation: Coevolution Statistics Detect Benchmark Alignment Errors

DOI: 10.1371/journal.pone.0037645

Full-Text   Cite this paper   Add to My Lib

Abstract:

The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locopro?tein/files/

References

[1]  Felsenstein J (2004) Inferring phylogenies. Sunderland.
[2]  Kuziemko A, Honig B, Petrey D (2011) Using structure to explore the sequence alignment space of remote homologs. PLoS Computational Biology 7: e1002175.
[3]  Thompson J, Plewniak F, Poch O (1999) Balibase: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15: 87.
[4]  Thompson J, Koehl P, Ripp R, Poch O (2005) Balibase 3.0: latest developments of the multiple sequence alignment benchmark. Proteins 61: 127–136.
[5]  Edgar R (2010) Quality measures for protein alignment benchmarks. Nucleic Acids Research 38: 2145.
[6]  Marchler-Bauer A, Panchenko A, Shoemaker B, Thiessen P, Geer L, et al. (2002) Cdd: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Research 30: 281.
[7]  Kim C, Lee B (2007) Accuracy of structure-based sequence alignment of automatic methods. BMC bioinformatics 8: 355.
[8]  Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW (2000) Correlations among amino acid sites in bhlh protein domains: an information theoretic analysis. Mol Biol Evol 17: 164–178.
[9]  Fitch W, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochemical Genetics 4: 579–593.
[10]  Olmea O, Rost B, Valencia A (1999) Effective use of sequence correlation and conservation in fold recognition1. Journal of molecular biology 293: 1221–1239.
[11]  Kass I, Horovitz A (2002) Mapping pathways of allosteric communication in groel by analysis of correlated mutations. Proteins 48: 611–617.
[12]  Fares M, Travers S (2006) A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 173: 9.
[13]  Dunn S, Wahl L, Gloor G (2008) Mutual information without the inuence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 23: 333–340.
[14]  Little D, Chen L (2009) Identification of coevolving residues and coevolution potentials emphasizing structure, bond formation and catalytic coordination in protein evolution. PLoS One 4: e4762.
[15]  Dickson R, Wahl L, Fernandes A, Gloor G (2010) Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS One 5: e11082.
[16]  Rodionov A, Bezginov A, Rose J, Tillier E (2011) A new, fast algorithm for detecting protein coevolution using maximum compatible cliques. Algorithms for molecular biology 6: 17.
[17]  Yanofsky C, Horn V, Thorpe D (1964) Protein structure relationships revealed by mutational analysis. Science 146: 1593.
[18]  Poon A, Chao L (2005) The rate of compensatory mutation in the dna bacteriophage phix174. Genetics 170: 989–999.
[19]  Xu Y, Tillier E (2010) Regional covariation and its application for predicting protein contact patches. Proteins 78: 548–558.
[20]  Atchley W, Wollenberg K, Fitch W, Terhalle W, Dress A (2000) Correlations among amino acid sites in bhlh protein domains: an information theoretic analysis. Molecular Biology and Evolution 17: 164.
[21]  Clamp M, Cuff J, Searle SM, Barton GJ (2004) The jalview java alignment editor. Bioinformatics 20: 426–7.
[22]  Waterhouse A, Procter J, Martin D, Clamp M, Barton G (2009) Jalview version 2|a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189.
[23]  Dunn S, Wahl L, Gloor G (2008) Mutual information without the inuence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24: 333.
[24]  Gloor GB, Tyagi G, Abrassart DM, Kingston AJ, Fernandes AD, et al. (2010) Functionally compensating coevolving positions are neither homoplasic nor conserved in clades. Mol Biol Evol 27: 1181–91.
[25]  Thangudu R, Manoharan M, Srinivasan N, Cadet F, Sowdhamini R, et al. (2008) Analysis on conservation of disulphide bonds and their structural features in homologous protein domain families. BMC Structural Biology 8: 55.
[26]  Kleinstiver BP, Fernandes AD, Gloor GB, Edgell DR (2010) A unified genetic, computational and experimental framework identifies functionally relevant residues of the homing endonuclease i-bmoi. Nucleic Acids Research 38: 2411–2427.
[27]  Gu X (1999) Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol 16: 1664–74.
[28]  Gu X (2001) Maximum-likelihood approach for gene family evolution under functional divergence. Mol Biol Evol 18: 453–64.
[29]  Gu X (2006) A simple statistical method for estimating type-ii (cluster-specific) functional divergence of protein sequences. Mol Biol Evol 23: 1937–45.
[30]  Perez-Miller S, Hurley T (2003) Coenzyme isomerization is integral to catalysis in aldehyde dehy-drogenase. Biochemistry 42: 7100–7109.
[31]  Liu Z, Sun Y, Rose J, Chung Y, Hsiao C, et al. (1997) The first structure of an aldehyde dehydrogenase reveals novel interactions between nad and the rossmann fold. Nature Structural & Molecular Biology 4: 317–326.
[32]  Ni L, Sheikh S, Weiner H (1997) Involvement of glutamate 399 and lysine 192 in the mechanism of human liver mitochondrial aldehyde dehydrogenase. Journal of Biological Chemistry 272: 18823.
[33]  Lake J (1994) Reconstructing evolutionary trees from dna and protein sequences: paralinear distances. Proceedings of the National Academy of Sciences 91: 1455.
[34]  Takeuchi R, Lambert AR, Mak ANS, Jacoby K, Dickson RJ, et al. (2011) Tapping natural reservoirs of homing endonucleases for targeted gene modification. Proc Natl Acad Sci U S A 108: 13077–82.
[35]  Kawrykow A, Roumanis G, Kam A, Kwak D, Leung C, et al. (2012) Phylo: a citizen science approach for improving multiple sequence alignment. PLoS One 7: e31362.
[36]  Gilbert D (2003) Sequence file format conversion with command-line readseq.
[37]  Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. Journal of computational and graphical statistics 299–314.
[38]  Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, et al. (2000) The protein data bank. Nucleic Acids Research 28: 235.
[39]  Hogue CW (1997) Cn3d: a new generation of three-dimensional molecular structure viewer. Trends Biochem Sci 22: 314–6.
[40]  Delano W (2002) The pymol molecular graphics system.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133