Lattice models are a common abstraction used in the study of protein structure, folding, and refinement. They are advantageous because the discretisation of space can make extensive protein evaluations computationally feasible. Various approaches to the protein chain lattice fitting problem have been suggested but only a single backbone-only tool is available currently. We introduce LatFit, a new tool to produce high-accuracy lattice protein models. It generates both backbone-only and backbone-side-chain models in any user defined lattice. LatFit implements a new distance RMSD-optimisation fitting procedure in addition to the known coordinate RMSD method. We tested LatFit's accuracy and speed using a large nonredundant set of high resolution proteins (SCOP database) on three commonly used lattices: 3D cubic, face-centred cubic, and knight's walk. Fitting speed compared favourably to other methods and both backbone-only and backbone-side-chain models show low deviation from the original data (~1.5?? RMSD in the FCC lattice). To our knowledge this represents the first comprehensive study of lattice quality for on-lattice protein models including side chains while LatFit is the only available tool for such models. 1. Introduction It is not always computationally feasible to undertake protein structure studies using full atom representations. The challenge is to reduce complexity while maintaining detail [1–3]. Lattice protein models are often used to achieve this but in general only the protein backbone or the amino acid centre of mass is represented [4–12]. A huge variety of lattices and energy functions have previously been developed and applied [4, 13, 14]. In order to evaluate the applicability of different lattices and to enable the transformation of real protein structures into lattice models, a representative lattice protein structure has to be calculated. Ma uch and Gaur have shown the NP completeness of this problem for backbone-only models in the 3D-cubic lattice and named it the protein chain lattice fitting (PCLF) problem [15]. The PCLF problem has been widely studied for backbone-only models [13, 16–24]. The most important aspects in producing lattice protein models with a low root mean squared deviation (RMSD) are the lattice coordination number and the neighbourhood vector angles [18, 23]. Lattices with intermediate coordination numbers, such as the face-centred cubic (FCC) lattice, can produce high resolution backbone models [18] and have been used in many protein structure studies (e.g., [3, 25, 26]). However, the use of backbone models is
References
[1]
L. Mirny and E. Shakhnovich, “Protein folding theory: from lattice to all-atom models,” Annual Review of Biophysics and Biomolecular Structure, vol. 30, pp. 361–396, 2001.
[2]
K. A. Dill, S. B. Ozkan, M. S. Shell, and T. R. Weikl, “The protein folding problem,” Annual Review of Biophysics, vol. 37, pp. 289–316, 2008.
[3]
S. Istrail and F. Lam, “Combinatorial algorithms for protein folding in lattice models: a survey of mathematical results,” Communications in Information and Systems, vol. 9, no. 4, pp. 303–346, 2009.
[4]
K. A. Dill, “Theory for the folding and stability of globular proteins,” Biochemistry, vol. 24, no. 6, pp. 1501–1509, 1985.
[5]
A. Renner and E. Bornberg-Bauer, “Exploring the fitness landscapes of lattice proteins,” Pacific Symposium on Biocomputing, pp. 361–372, 1997.
[6]
J. Miao, J. Klein-Seetharaman, and H. Meirovitch, “The optimal fraction of hydrophobic residues required to ensure protein collapse,” Journal of Molecular Biology, vol. 344, no. 3, pp. 797–811, 2004.
[7]
R. Backofen and S. Will, “A constraint-based approach to fast and exact structure prediction in three-dimensional protein models,” Constraints, vol. 11, no. 1, pp. 5–30, 2006.
[8]
F. P. E. Huard, C. M. Deane, and G. R. Wood, “Modelling sequential protein folding under kinetic control,” Bioinformatics, vol. 22, no. 14, pp. e203–e210, 2006.
[9]
C. M. Deane, M. Dong, F. P. E. Huard, B. K. Lance, and G. R. Wood, “Cotranslational protein folding—fact or fiction?” Bioinformatics, vol. 23, no. 13, pp. i142–i148, 2007.
[10]
M. Mann, S. Will, and R. Backofen, “CPSP-tools—exact and complete algorithms for high-throughput 3D lattice protein studies,” BMC Bioinformatics, vol. 9, article 230, 2008.
[11]
M. Mann, D. Maticzka, R. Saunders, and R. Backofen, “Classifying proteinlike sequences in arbitrary lattice protein models using LatPack,” HFSP Journal, vol. 2, no. 6, pp. 396–404, 2008.
[12]
R. Saunders, M. Mann, and C. M. Deane, “Signatures of co-translational folding,” Biotechnology Journal, vol. 6, no. 6, pp. 742–751, 2011.
[13]
A. ] Godzik, A. Kolinski, and J. Skolnick, “Lattice representations of globular proteins: how good are they?” Journal of Computational Chemistry, vol. 14, no. 10, pp. 1194–1202, 1993.
[14]
B. A. Reva, M. F. Sanner, A. J. Olson, and A. V. Finkelstein, “Lattice modeling: accuracy of energy calculations,” Journal of Computational Chemistry, vol. 17, no. 8, pp. 1025–1032, 1996.
[15]
J. Ma?uch; and D. R. Gaur, “Fitting protein chains to cubic lattice is NP-complete,” Journal of Bioinformatics and Computational Biology, vol. 6, no. 1, pp. 93–106, 2008.
[16]
D. G. Covell and R. L. Jernigan, “Conformations of folded proteins in restricted spaces,” Biochemistry, vol. 29, no. 13, pp. 3287–3294, 1990.
[17]
D. A. Hinds and M. Levitt, “A lattice model for protein structure prediction at low resolution,” Proceedings of the National Academy of Sciences of the United States of America, vol. 89, no. 7, pp. 2536–2540, 1992.
[18]
B. H. Park and M. Levitt, “The complexity and accuracy of discrete state models of protein structure,” Journal of Molecular Biology, vol. 249, no. 2, pp. 493–507, 1995.
[19]
D. S. Rykunov, B. A. Reva, and A. V. Finkelstein, “Accurate general method for lattice approximation of three-dimensional structure of a chain molecule,” Proteins, vol. 22, no. 2, pp. 100–109, 1995.
[20]
B. A. Reva, D. S. Rykunov, A. V. Finkelstein, and J. Skolnick, “Optimization of protein structure on lattices using a self-consistent field approach,” Journal of Computational Biology, vol. 5, no. 3, pp. 531–538, 1998.
[21]
P. Koehl and M. Delarue, “Building protein lattice models using self-consistent mean field theory,” Journal of Chemical Physics, vol. 108, no. 22, pp. 9540–9549, 1998.
[22]
Y. Ponty, R. Istrate, E. Porcelli, and P. Clote, “LocalMove: computing on-lattice fits for biopolymers,” Nucleic Acids Research, vol. 36, pp. W216–W222, 2008.
[23]
C. L. Pierri, A. De Grassi, and A. Turi, “Lattices for ab initio protein structure prediction,” Proteins, vol. 73, no. 2, pp. 351–361, 2008.
[24]
M. Mann and A. Dal Palu, “Lattice model refinement of protein structures,” in Proceedings of the Workshop on Constraint Based Methods for Bioinformatics (WCB '10), p. 7, 2010.
[25]
E. Jacob and R. Unger, “A tale of two tails: why are terminal residues of proteins exposed?” Bioinformatics, vol. 23, no. 2, pp. e225–e230, 2007.
[26]
A. D. Ullah, L. Kapsokalivas, M. Mann, and K. Steinh?fel, “Protein folding simulation by two-stage optimization,” in Proceedings of the International Symposium on Intelligence Computation and Applications (ISICA '09), vol. 51 of Communications in Computer and Information Science, pp. 138–145, 2009.
[27]
S. Sun, “Reduced representation model of protein structure prediction: statistical potential and genetic algorithms,” Protein Science, vol. 2, no. 5, pp. 762–785, 1993.
[28]
S. Bromberg and K. A. Dill, “Side-chain entropy and packing in proteins,” Protein Science, vol. 3, no. 7, pp. 997–1009, 1994.
[29]
W. E. Hart, “Lattice and off-lattice side chain models of protein folding: linear time structure prediction better than 86% of optimal,” Journal of Computational Biology, vol. 4, no. 3, pp. 241–259, 1997.
[30]
V. Heun, “Approximate protein folding in the HP side chain model on extended cubic lattices,” Discrete Applied Mathematics, vol. 127, no. 1, pp. 163–177, 2003.
[31]
A. Kolinski and J. Skolnick, “Reduced models of proteins and their applications,” Polymer, vol. 45, no. 2, pp. 511–524, 2004.
[32]
B. A. Reva, D. S. Rykunov, A. J. Olson, and A. V. Finkelstein, “Constructing lattice models of protein chains with side groups,” Journal of Computational Biology, vol. 2, no. 4, pp. 527–535, 1995.
[33]
Y. Zhang, A. K. Arakaki, and J. Skolnick, “TASSER: an automated method for the prediction of protein tertiary structures in CASP6,” Proteins, vol. 61, no. 7, pp. 91–98, 2005.
[34]
A. Kolinski, “Protein modeling and structure prediction with a reduced representation,” Acta Biochimica Polonica, vol. 51, no. 2, pp. 349–371, 2004.
[35]
V. A. Eyrich, D. M. Standley, and R. A. Friesner, “Prediction of protein tertiauy structure to low resolution: performance for a large and structurally diverse test set,” Journal of Molecular Biology, vol. 288, no. 4, pp. 725–742, 1999.
[36]
M. Feig, P. Rotkiewicz, A. Kolinski, J. Skolnick, and C. L. Brooks III, “Accurate reconstruction of all-atom protein representations from side-chain-based low-resolution models,” Proteins, vol. 41, no. 1, pp. 86–97, 2000.
[37]
M. T. Wolfinger, S. Will, I. L. Hofacker, R. Backofen, and P. F. Stadler, “Exploring the lower part of discrete polymer model energy landscapes,” Europhysics Letters, vol. 74, no. 4, pp. 726–732, 2006.
[38]
M. Mann, M. Abou Hamra, K. Steinh?fel, and R. Backofen, “Constraint-based local move definitions for lattice protein models including side chains,” in Proceedings of the Fifth Workshop on Constraint Based Methods for Bioinformatics (WCB '09), 2009.
[39]
H. M. Berman, J. Westbrook, Z. Feng et al., “The protein data bank,” Nucleic Acids Research, vol. 28, no. 1, pp. 235–242, 2000.
[40]
W. Kabsch, “A discussion of the solution for the best rotation to relate two sets of vectors,” Acta Crystallographica, vol. A34, pp. 827–828, 1978.
[41]
Y. Choi and C. M. Deane, “FREAD revisited: accurate loop structure prediction using a database search algorithm,” Proteins, vol. 78, no. 6, pp. 1431–1440, 2010.
[42]
M. Mann, C. Smith, M. Rabbath, M. Edwards, S. Will, and R. Backofen, “CPSP-web-tools: a server for 3D lattice protein studies,” Bioinformatics, vol. 25, no. 5, pp. 676–677, 2009.
[43]
A. Herráez, “Biomolecules in the computer: jmol to the rescue,” Biochemistry and Molecular Biology Education, vol. 34, no. 4, pp. 256–261, 2006.
[44]
G. Wang and R. L. Dunbrack, “PISCES: recent improvements to a PDB sequence culling server,” Nucleic Acids Research, vol. 33, no. 2, pp. W94–W98, 2005.