全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
PLOS ONE  2009 

A Comprehensive Analysis of the Structure-Function Relationship in Proteins Based on Local Structure Similarity

DOI: 10.1371/journal.pone.0006266

Full-Text   Cite this paper   Add to My Lib

Abstract:

Background Sequence similarity to characterized proteins provides testable functional hypotheses for less than 50% of the proteins identified by genome sequencing projects. With structural genomics it is believed that structural similarities may give functional hypotheses for many of the remaining proteins. Methodology/Principal Findings We provide a systematic analysis of the structure-function relationship in proteins using the novel concept of local descriptors of protein structure. A local descriptor is a small substructure of a protein which includes both short- and long-range interactions. We employ a library of commonly reoccurring local descriptors general enough to assemble most existing protein structures. We then model the relationship between these local shapes and Gene Ontology using rule-based learning. Our IF-THEN rule model offers legible, high resolution descriptions that combine local substructures and is able to discriminate functions even for functionally versatile folds such as the frequently occurring TIM barrel and Rossmann fold. By evaluating the predictive performance of the model, we provide a comprehensive quantification of the structure-function relationship based only on local structure similarity. Our findings are, among others, that conserved structure is a stronger prerequisite for enzymatic activity than for binding specificity, and that structure-based predictions complement sequence-based predictions. The model is capable of generating correct hypotheses, as confirmed by a literature study, even when no significant sequence similarity to characterized proteins exists. Conclusions/Significance Our approach offers a new and complete description and quantification of the structure-function relationship in proteins. By demonstrating how our predictions offer higher sensitivity than using global structure, and complement the use of sequence, we show that the presented ideas could advance the development of meta-servers in function prediction.

References

[1]  Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–3402.
[2]  Kinoshita K, Nakamura H (2003) Protein informatics towards function identification. Curr Opin Struct Biol 13: 396–400.
[3]  Skolnick J, Fetrow JS (2000) From genes to protein structure and function: novel applications of computational approaches in the genomic era. Trends Biotechnol 18: 34–39.
[4]  Chandonia JM, Brenner SE (2006) The impact of structural genomics: expectations and outcomes. Science 311: 347–351.
[5]  Baker D, Sali A (2001) Protein structure prediction and structural genomics. Science 294: 93–96.
[6]  Kopp J, Bordoli L, Battey JN, Kiefer F, Schwede T (2007) Assessment of CASP7 predictions for template-based modeling targets. Proteins 69: Suppl 838–56.
[7]  Zhang C, Kim SH (2003) Overview of structural genomics: from structure to function. Curr Opin Chem Biol 7: 28–32.
[8]  Murzin AG, Patthy L (1999) Sequences and topology: From sequence to structure to function. Curr Opin Struct Biol 9: 359–362.
[9]  Orengo CA, Todd AE, Thornton JM (1999) From protein structure to function. Curr Opin Struct Biol 9: 374–382.
[10]  Thornton JM, Todd AE, Milburn D, Borkakoti N, Orengo CA (2000) From structure to function: approaches and limitations. Nat Struct Biol 7: Suppl991–994.
[11]  Ouzounis CA, Coulson RM, Enright AJ, Kunin V, Pereira-Leal JB (2003) Classification schemes for protein structure and function. Nat Rev Genet 4: 508–519.
[12]  Lee D, Redfern O, Orengo C (2007) Predicting protein function from sequence and structure. Nat Rev Mol Cell Biol 8: 995–1005.
[13]  Shakhnovich BE, Dokholyan NV, DeLisi C, Shakhnovich EI (2003) Functional fingerprints of folds: evidence for correlated structure-function evolution. J Mol Biol 326: 1–9.
[14]  Hegyi H, Gerstein M (1999) The relationship between protein structure and function: a comprehensive survey with application to the yeast genome. J Mol Biol 288: 147–164.
[15]  Pazos F, Sternberg MJ (2004) Automated prediction of protein function and detection of functional sites from structure. Proc Natl Acad Sci U S A 101: 14754–14759.
[16]  Di Gennaro JA, Siew N, Hoffman BT, Zhang L, Skolnick J, et al. (2001) Enhanced functional annotation of protein sequences via the use of structural descriptors. J Struct Biol 134: 232–245.
[17]  Kasuya A, Thornton JM (1999) Three-dimensional structure analysis of PROSITE patterns. J Mol Biol 286: 1673–1691.
[18]  Jonassen I, Eidhammer I, Taylor WR (1999) Discovery of local packing motifs in protein structures. Proteins 34: 206–219.
[19]  Russell RB (1998) Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. J Mol Biol 279: 1211–1227.
[20]  Ferre F, Ausiello G, Zanzoni A, Helmer-Citterich M (2004) SURFACE: a database of protein surface regions for functional annotation. Nucleic Acids Res 32: D240–244.
[21]  Polacco BJ, Babbitt PC (2006) Automated discovery of 3D motifs for protein function annotation. Bioinformatics 22: 723–730.
[22]  Laskowski RA, Watson JD, Thornton JM (2005) ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res 33: W89–93.
[23]  Pal D, Eisenberg D (2005) Inference of protein function from protein structure. Structure 13: 121–130.
[24]  Watson JD, Sanderson S, Ezersky A, Savchenko A, Edwards A, et al. (2007) Towards fully automated structure-based function prediction in structural genomics: a case study. J Mol Biol 367: 1511–1522.
[25]  Hvidsten TR, Kryshtafovych A, Fidelis K (2009) Local descriptors of protein structure: A systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 75: 870–884.
[26]  Hvidsten TR, Kryshtafovych A, Komorowski J, Fidelis K (2003) A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinformatics 19: Suppl 2II81–II91.
[27]  Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Theory and decision library Series D, System theory, knowledge engineering, and problem solving. Dordrecht; Boston: Kluwer Academic Publishers.
[28]  Komorowski J, ?hrn A, Skowron A (2002) The ROSETTA Rough Set Software System. In: Kl?sgen W, Zytkow J, editors. Handbook of Data Mining and Knowledge Discovery. Oxford University Press. pp. 554–559.
[29]  Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The Protein Data Bank. Nucleic Acids Res 28: 235–242.
[30]  Brenner SE, Koehl P, Levitt M (2000) The ASTRAL compendium for protein structure and sequence analysis. Nucleic Acids Res 28: 254–256.
[31]  Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29.
[32]  Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, et al. (2003) The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res 13: 662–672.
[33]  Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, et al. (1997) CATH–a hierarchic classification of protein domain structures. Structure 5: 1093–1108.
[34]  Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143: 29–36.
[35]  L?greid A, Hvidsten TR, Midelfart H, Komorowski J, Sandvik AK (2003) Predicting gene ontology biological process from temporal gene expression patterns. Genome Res 13: 965–979.
[36]  Eisenmesser EZ, Millet O, Labeikovsky W, Korzhnev DM, Wolf-Watz M, et al. (2005) Intrinsic dynamics of an enzyme underlies catalysis. Nature 438: 117–121.
[37]  Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM, Obradovic Z (2002) Intrinsic disorder and protein function. Biochemistry 41: 6573–6582.
[38]  Lobley A, Swindells MB, Orengo CA, Jones DT (2007) Inferring function using patterns of native disorder in proteins. PLoS Comput Biol 3: e162.
[39]  Benjamini Y, Hochberg Y (1995) Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society B 57: 289–300.
[40]  Rarey M, Kramer B, Lengauer T (1999) Docking of hydrophobic ligands with interaction-based matching algorithms. Bioinformatics 15: 243–250.
[41]  Hvidsten TR, L?greid A, Komorowski J (2003) Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics 19: 1116–1123.
[42]  Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Machine learning 11: 63–91.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133