All Title Author
Keywords Abstract

Inferring Stabilizing Mutations from Protein Phylogenies: Application to Influenza Hemagglutinin

DOI: 10.1371/journal.pcbi.1000349

Full-Text   Cite this paper   Add to My Lib


One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution.


[1]  Sippl MF (1995) Knowledge-based potentials for proteins. Curr Opin Struct Biol 5: 229–235.
[2]  Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (1999) Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations. Protein Eng 12: 549–555.
[3]  Gilis D, Rooman M (2000) PoPMuSiC, an algorithm for predicting protein mutant stability changes. application to prion proteins. Protein Eng 13: 849–856.
[4]  Guerois R, Nielsen JE, Serrano L (2002) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 320: 369–387.
[5]  Saunders CT, Baker D (2002) Evaluation of structural and evolutionary contributions to deleterious protein mutation prediction. J Mol Biol 322: 891–901.
[6]  Zhou H, Zhou Z (2002) Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci 11: 2714–2726.
[7]  Capriotti E, Fariselli P, Casadio R (2005) I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res 33: W306–W310.
[8]  Parthiban V, Gromiha MM, Schomburg D (2006) CUPSAT: prediction of protein stability upon point mutations. Nucleic Acids Res 34: W239–W242.
[9]  Steipe B, Schiller B, Pluckthun A, Steinbacher S (1994) Sequence statistics reliably predict stabilizing mutations in a protein domain. J Mol Biol 240: 188–192.
[10]  Maxwell KL, Davidson AR (1998) Mutagenesis of a buried polar interaction in an SH3 domain: sequence conservation provides the best prediction of stability effects. Biochemistry 37: 16172–16182.
[11]  Lehmann M, Loch C, Middendorf A, Studer D, Lassen SF, et al. (2002) The consensus concept for thermostability engineering of proteins: further proof of concept. Protein Eng Des Sel 15: 403–411.
[12]  Amin N, Liu AD, Ramer S, Aehle W, Meijer D, et al. (2004) Construction of stabilized proteins by combinatorial consensus mutagenesis. Protein Eng Des Sel 17: 787–793.
[13]  Steipe B (2004) Consensus-based engineering of protein stability: from intrabodies to thermostable enzymes. Methods Enzymol 388: 176–186.
[14]  Godoy-Ruiz R, Perez-Jimenez R, Ibarra-Molero B, Sanchez-Ruiz JM (2004) Relation between protein stability, evolution and structure as probed by carboxylic acid mutations. J Mol Biol 336: 313–318.
[15]  Cochran JR, Kim YS, Lippow SM, Rao B, Wittrup KD (2006) Improved mutants from directed evolution are biased to orthologous substitutions. Protein Eng Des Sel 19: 245–253.
[16]  Godoy-Ruiz R, Ariza F, Rodriguez-Larrea D, Perez-Jimenez R, Ibarra-Molero B, et al. (2006) Natural selection for kinetic stability is a likely origin of correlations between mutational effects on protein energetics and frequencies of amino acid occurrences in sequence alignments. J Mol Biol 362: 966–978.
[17]  Dai M, Fisher H, Temirov J, Kiss C, Phipps ME, et al. (2007) The creation of a novel fluorescent protein guided by consensus engineering. Protein Eng Des Sel 20: 69–79.
[18]  Felsenstein J (2004) Inferring Phylogenies. Sunderland, Massachusetts: Sinauer Associates, Inc.
[19]  Thorne JL, Goldman N, Jones DT (1996) Combining protein evolution and secondary structure. Mol Biol Evol 13: 666–673.
[20]  Goldman N, Thorne JL, Jones DT (1998) Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149: 445–458.
[21]  Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21: 1095–1109.
[22]  Tseng YY, Liang J (2006) Estimation of amino acid residue substitution rates at local spatial regions and application in protein function inference: a Bayesian Monte Carlo approach. Mol Biol Evol 23: 421–436.
[23]  Wong WSW, Sainudiin R, Nielsen R (2006) Identification of physicochemical selective pressure on protein encoding nucleotide sequences. BMC Bioinformatics 7: 148.
[24]  Choi SC, Hobolth A, Robinson DM, Kishino J, Thorne JL (2007) Quantifying the impact of protein tertiary structure on molecular evolution. Mol Biol Evol 24: 1769–1782.
[25]  Koshi JM, Goldstein RA (1998) Models of natural mutations including site heterogeneity. Proteins 32: 289–295.
[26]  Parisi G, Echave J (2001) Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol 18: 750–756.
[27]  Fornasari MS, Parisi G, Echave J (2002) Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. Mol Biol Evol 19: 352–356.
[28]  Bastolla U, Porto M, Roman HE, Vendruscolo M (2006) A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the protein data bank. BMC Evol Biol 6: 43.
[29]  Taverna DM, Goldstein RA (2002) Why are proteins marginally stable? Proteins 46: 105–109.
[30]  Taverna DM, Goldstein RA (2002) Why are proteins so robust to site mutations? J Mol Biol 315: 479–484.
[31]  Bloom JD, Raval A, Wilke CO (2007) Thermodynamics of neutral protein evolution. Genetics 175: 255–266.
[32]  Zeldovich KB, Chen P, Shakhnovich EI (2007) Protein stability imposes limits on organism complexity and speed of molecular evolution. Proc Natl Acad Sci U S A 104: 16152–16157.
[33]  Uversky VN, Oldfield CJ, Dunker AK (2005) Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling. J Mol Recognit 18: 343–384.
[34]  Jaswal SS, Sohl JL, Davis JH, Agard DA (2002) Energetic landscape of α-lytic protease optimizes longevity through kinetic stability. Nature 415: 343–347.
[35]  Canadillas MP, Tidow H, Freund SMV, Rutherford TJ, Ang HC, et al. (2006) Solution structure of p53 core domain: structural basis for its instability. Proc Natl Acad Sci U S A 103: 2109–2114.
[36]  Shortle D, Lin B (1985) Genetic analysis of staphylococcal nuclease: identification of three intragenic “global” suppressors of nuclease-minus mutations. Genetics 110: 539–555.
[37]  Pakula AA, Young VB, Sauer RT (1986) Bacteriophage λ cro mutations: effects on activity and intracellular degradation. Proc Natl Acad Sci U S A 83: 8829–8833.
[38]  Loeb DD, Swanstrom R, Everitt L, Manchester M, Stamper SE, et al. (1989) Complete mutagenesis of the HIV-1 protease. Nature 340: 397–400.
[39]  Sanchez IE, Tejero J, Gomez-Moreno C, Medina M, Serrano L (2006) Point mutations in protein globular domains: contributions from function, stability, and misfolding. J Mol Biol 363: 422–432.
[40]  Chiti F, Taddei N, Bucciantini M, White P, Ramponi G, et al. (2000) Mutational analysis of the propensity for amyloid formation by a globular protein. EMBO J 19: 1441–1449.
[41]  Broome BM, Hecht MH (2000) Nature disfavors sequences of alternating polar and nonpolar amino acids: implications for amyloidogenesis. J Mol Biol 296: 961–968.
[42]  Dobson CM (2004) Principles of protein folding, misfolding, and aggregation. Semin Cell Dev Biol 15: 3–16.
[43]  Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134: 341–352.
[44]  Mitraki A, King J (1992) Amino acid substitutions influencing intracellular protein folding pathways. FEBS Lett 307: 20–25.
[45]  del Pino IMP, Ibarra-Molero B, Sanchez-Ruiz JM (2000) Lower kinetic limit to protein thermal stability: a proposal regarding protein stability in vivo and its relation with misfolding diseases. Proteins 40: 58–70.
[46]  Fersht AR (2000) Transition-state structure as a unifying basis in protein-folding mechanisms: contact order, chain topology, stabiity, and the extended nucleus mechanism. Proc Natl Acad Sci U S A 97: 1525–1529.
[47]  Dinner AR, Karplus M (2001) The roles of stability and contact order in determining protein folding rates. Nat Struct Biol 8: 21–22.
[48]  Sato S, Xiang S, Raleigh DP (2001) On the relationship between protein stability and folding kinetics: a comparative study of the N-terminal domains of RNase HI, E. coli and Bacillus stearothermophilus L9. J Mol Biol 312: 569–577.
[49]  Cao A, Wang G, Tang Y, Lai L (2002) Linear correlation between thermal stability and folding kinetics of lysozyme. Biochem Biophys Res Commun 291: 795–797.
[50]  Chamary JV, Hurst LD (2005) Evidence for selection on synonymous mutations affecting stability of mrna secondary structure in mammals. Genome Biol 6: R75.
[51]  Akashi H (2003) Translational selection and yeast proteome evolution. Genetics 164: 1291–1303.
[52]  Rocha EPC, Danchin A (2004) An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 21: 108–116.
[53]  Fersht AR (1999) Structure and Mechanism in Protein Science. New York: W. H. Freeman and Company.
[54]  Lepock JR, Ritchie KP, Kolios MC, Rodahl AM, Heinz KA, et al. (1992) Influence of transition rates and scan rate and kinetic simulations of differential scanning calorimetry profiles of reversible and irreversible protein denaturation. Biochemistry 31: 12706–12712.
[55]  Park C, Marqusee S (2005) Pulse proteolysis: a simple method for quantitative determination of protein stability and ligand binding. Nat Methods 2: 207–212.
[56]  Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, et al. (2005) Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci U S A 102: 606–611.
[57]  Bloom JD, Labthavikul ST, Otey CR, Arnold FH (2006) Protein stability promotes evolvability. Proc Natl Acad Sci U S A 103: 5869–5874.
[58]  Besenmatter W, Kast P, Hilvert D (2007) Relative tolerance of mesostable and thermostable protein homologs to extensive mutation. Proteins 66: 500–506.
[59]  Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution probabilities for protein-coding DNA sequences. Mol Biol Evol 11: 725–736.
[60]  Wells JA (1990) Additivity of mutational effects in proteins. Biochemistry 29: 8509–8517.
[61]  Pantoliano MW, Whitlow M, Wood JF, Dodd SW, Hardman KD, et al. (1989) Large increases in general stability for subtilisin BPN' through incremental changes in free energy of unfolding. Biochemistry 28: 7205–7213.
[62]  Zhang XJ, Baase WA, Shoichet BK, Wilson KP, Matthews BW (1995) Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive. Protein Eng 8: 1017–1022.
[63]  Sandberg WS, Terwilleger TC (1993) Engineering multiple properties of a protein by combinatorial mutagenesis. Proc Natl Acad Sci U S A 90: 8367–8371.
[64]  Govindarajan S, Ness JE, Kim S, Mundorff EC, Minshull J, et al. (2003) Systematic variation of amino acid substitutions for stringent assessment of pairwise covariation. J Mol Biol 328: 1061–1069.
[65]  Serrano L, Day AG, Fersht AR (1993) Step-wise mutation of barnase to binase: a procedure for engineering increased stability of proteins and an experimental analysis of the evolution of protein stability. J Mol Biol 233: 305–312.
[66]  Li H, Tang C, Wingreen NS (1997) Nature of driving force for protein folding: a result from analyzing the statistical potential. Phys Rev Lett 79: 765–768.
[67]  van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci U S A 96: 9716–9720.
[68]  Bloom JD, Lu Z, Chen D, Raval A, Venturelli OS, et al. (2007) Evolution favors protein mutational robustness in sufficiently large populations.. BMC Biol 5: 29.
[69]  Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP (2001) Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294: 2310–2314.
[70]  Huelsenbeck JP, Larget B, Miller RE, Ronquist F (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51: 673–688.
[71]  Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.
[72]  Felsenstein J (1973) Maximum likelihood and minimum-step methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22: 240–249.
[73]  Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376.
[74]  Kumar MD, Bava KA, Gromiha MM, Parabakaran P, Kitajima K, et al. (2006) ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions. Nucleic Acids Res 34: D204–D206.
[75]  Felsenstein J (2007) PHYLIP (Phylogeny Inference Package) version 3.67. Distributed by the author. Seattle: Department of Genome Sciences, University of Washington.
[76]  Do CB, Mahabhashyan MSP, Brudno M, Batzoglou S (2005) PROBCONS: probabilistic consistency-based multiple sequence alignment. Genome Res 15: 330–340.
[77]  Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797.
[78]  Rabadan R, Levine AJ, Robins H (2006) Comparison of avian and human influenza A viruses reveals a mutational bias on the viral genomes. J Virol 80: 11887–11891.
[79]  Keller I, Bensasson D, Nichols RA (2007) Transition-transversion bias is not universal: a counter example from grasshopper psuedogenes. PLoS Genet 3: e22. doi:10.1371/journal.pgen.0030022.
[80]  Chen Z, Haykin S (2002) On different facets of regularization theory. Neural Comput 14: 2791–2846.
[81]  Kyte J, Doolittle R (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157: 105–132.
[82]  Bava KA, Gromiha MM, Uedaira H, Kitajimi K, Sarai A (2004) Protherm, version 4.0: thermodynamic database for proteins and mutants. Nucleic Acids Res 32: D120–D121.
[83]  The UniProt Consortium (2007) The Universal Protein Resource (UniProt). Nucleic Acids Res 35: D193–D197.
[84]  Martin A, Kather I, Schmid FX (2002) Origins of the high stability of an in vitro selected coldshock protein. J Mol Biol 318: 1341–1349.
[85]  Perl D, Mueller U, Heinemann U, Schmid FX (2000) Two exposed amino acid residues confer thermostability on a cold shock protein. Nat Struct Biol 7: 380–383.
[86]  Garcia-Mira MM, Boehringer D, Schmid FX (2004) The folding transition state of the cold shock protein is strongly polarized. J Mol Biol 339: 555–569.
[87]  Wunderlich M, Martin A, Schmid FX (2005) Stabilization of the cold shock protein CspB from Bacillus subtilis by evolutionary optimization of coulombic interactions. J Mol Biol 347: 1063–1076.
[88]  Jacob M, Holtermann G, Perl D, Reinstein J, Schindler T, et al. (1999) Microsecond folding of the cold shock protein measured by a pressure-jump technique. Biochemistry 38: 2882–2891.
[89]  Gribenko AV, Makhatadze GI (2007) Role of the charge-charge interactions in defining stability and halophilicity of the CspB proteins. J Mol Biol 366: 842–856.
[90]  Akasako A, Haruki M, Oobatake M, Kanaya S (1997) Conformational stabilities of Escherichia coli RNase HI variants with a series of amino acid substitutions at a cavity within the hydrophobic core. J Biol Chem 272: 18686–18693.
[91]  Akasako A, Haruki M, Oobatake M, Kanaya S (1995) High resistance of Escherichia coli ribonuclease HI variant with quintiple thermostabilizing mutations to thermal denaturation, acid denaturation, and proteolytic degradation. Biochemistry 34: 8115–8122.
[92]  Haruki M, Noguchi E, Akasako A, Oobatake M, Itaya M, et al. (1994) A novel screening strategy for stabilization of Escherichia coli ribonuclease HI involving a screen for an intragenic suppressor of carboxyl-terminal deletions. J Biol Chem 269: 26904–26911.
[93]  Ishikawa K, Nakamura H, Morikawa K, Kimura S, Kanaya S (1993) Cooperative stabilization of Escherichia coli ribonuclease HI by insertion of Gly-80b and Gly-77→Ala substitution. Biochemistry 32: 7136–7142.
[94]  Ishikawa K, Nakamura H, Morikawa K, Kanaya S (1993) Stabilization of Escherichia coli ribonuclease HI by cavity-filling mutations within a hydrophobic core. Biochemistry 32: 6171–6178.
[95]  Kimura S, Kanaya S, Nakamura H (1992) Thermostabilization of Escherichia coli ribonuclease HI by replacing left-handed Lys95 with Gly or Asn. J Biol Chem 267: 22014–22017.
[96]  Kimura S, Oda Y, Nakai T, Katayanagi K, Kitakuni E, et al. (1992) Effect of cavity-modulating mutations on the stability of Escherichia coli ribonuclease HI. Eur J Biochem 206: 337–343.
[97]  Godoy-Ruiz R, Perez-Jimenez R, Ibarra-Molero B, Sanchez-Ruiz JM (2005) A stability pattern of hydrophobic mutations that reflects evolutionary structural optimization. Biophys J 89: 3320–3331.
[98]  Hellinga HW, Wynn R, Richards FM (1992) The hydrophobic core of Escherichia coli thioredoxin shows a high tolerance to nonconservative single amino acid substitutions. Biochemistry 31: 11203–11209.
[99]  Schindelin H, Marahiel MA, Heinemann U (1993) Universal nucleic acid-binding domain revealed by crystal structure of the B. subtilis major cold-shock protein. Nature 364: 164–168.
[100]  Katayanagi K, Miyagawa M, Matsushima M, Ishikawa M, Kanaya S, et al. (1992) Structural details of ribonuclease H from Escherichia coli as refined to an atomic resolution. J Mol Biol 223: 1029–1052.
[101]  Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, et al. (2008) The Influenza Virus Resource at the National Center for Biotechnology Information. J Virol 82: 596–601.
[102]  Rambaut A, Pybus OG, Nelson MI, Viboud C, Taubenberger JK, et al. (2008) The genomic and epidemiological dynamics of human influenza A virus. Nature 453: 615–619.
[103]  Dugan VG, Chen R, Spiro DJ, Sengamalay N, Zaborsky J, et al. (2008) The evolutionary genetics and emergence of avian influenza viruses in wild birds. PLoS Pathog 4: e1000076. doi:10.1371/journal.ppat.1000076.
[104]  Ueda M, Sugiura A (1984) Physiological characterization of influenza virus temperature-sensitive mutants defective in the hemagglutinin gene. J Gen Virol 65: 1889–1897.
[105]  Nakajima S, Brown DJ, Ueda M, Nakajima K, Sugiura A, et al. (1986) Identification of the defects in the hemagglutinin gene of two temperature-sensitive mutants of A/WSN/33 influenza virus. Virology 154: 279–285.
[106]  Tong N, Nakajima K, Nakajima S (1995) Identification of the sites for suppressor mutations on the hemagglutinin molecule to temperature-sensitive phenotype of the influenza virus. Microbiol Immunol 39: 687–692.
[107]  Gamblin SJ, Haire LF, Russell RJ, Stevens DJ, Xiao B, et al. (2004) The structure and receptor binding properties of the 1918 influenza hemagglutinin. Science 303: 1838–1842.
[108]  Brownlee GG, Fodor E (2001) The predicted antigenicity of the haemagglutinin of the 1918 Spanish influenza pandemic suggests an avian origin. Philos Trans R Soc Lond B 356: 1871–1876.
[109]  Hoffmann E, Neumann G, Kawaoka Y, Hobom G, Webster RG (2000) A DNA transfection system for generation of influenza A virus from eight plasmids. Proc Natl Acad Sci U S A 97: 6108–6113.
[110]  Giver L, Gershenson A, Freskgard PO, Arnold FH (1998) Directed evolution of a thermostable esterase. Proc Natl Acad Sci U S A 95: 12809–12813.
[111]  Zhao H, Arnold FH (1999) Directed evolution converts subtilisin E into a functional equivalent of thermitase. Protein Eng 12: 47–53.
[112]  Gray KA, Richardson TH, Kretz K, Short JM, Bartnek F, et al. (2001) Rapid evolution of reversible denaturation and elevated melting temperature in a microbial haloalkane dehalogenase. Adv Synth Catal 343: 607–617.
[113]  Garrett JB, Kretz KA, O'Donoghue E, Kerovuo J, Kim W, et al. (2004) Enhancing the thermal tolerance and gastric performance of a microbial phytase for use as a phosphate-mobilizing monogastric-feed supplement. Appl Environ Microbiol 70: 3041–3046.
[114]  Sakaue R, Kajiyama N (2003) Thermostabilization of bacterial fructosyl-amino acid oxidase by directed evolution. Appl Environ Microbiol 69: 139–145.
[115]  Anfinsen CB (1973) Principles that govern the folding of protein chains. Science 181: 223–230.
[116]  White SH, Wimley WC (1999) Membrane protein folding and stability: physical principles. Annu Rev Biophys Biomol Struct 28: 319–365.
[117]  Yang AS, Honig B (1993) On the pH dependence of protein stability. J Mol Biol 231: 459–474.
[118]  Yang AS, Honig B (1994) Structural origins of pH and ionic strength effects on protein stability acid denaturation of sperm whale myoglobin. J Mol Biol 237: 602–614.
[119]  Ellis RJ (2001) Macromolecular crowding: an important but neglected aspect of the intracellular environment. Curr Opin Struct Biol 11: 114–119.
[120]  Cowan DA (1997) Thermophilic proteins: stability and function in aqueous and organic solvents. Comp Biochem Physiol A Physiol 118: 429–438.
[121]  Pal C, Papp B, Lercher MJ (2006) An integrated view of protein evolution. Nat Rev Genet 7: 337–348.
[122]  Drummond DA, Raval A, Wilke CO (2006) A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 23: 327–337.
[123]  Szretter KJ, Balish AL, Katz JM (2006) Influenza: propagation, quantification, and storage. Curr Protoc Microbiol 2006: 15G.1.1–15G.1.22.
[124]  Matrosovich M, Matrosovich T, Garten W, Klenk HD (2006) New low-viscosity overlay medium for viral plaque assays. Virol J 3: 63.


comments powered by Disqus