The long-range interactions, required to the accurate predictions of tertiary structures of β-sheet-containing proteins, are still difficult to simulate. To remedy this problem and to facilitate β-sheet structure predictions, many efforts have been made by computational methods. However, known efforts on β-sheets mainly focus on interresidue contacts or amino acid partners. In this study, to go one step further, we studied β-sheets on the strand level, in which a statistical analysis was made on the terminal extensions of paired β-strands. In most cases, the two paired β-strands have different lengths, and terminal extensions exist. The terminal extensions are the extended part of the paired strands besides the common paired part. However, we found that the best pairing required a terminal alignment, and β-strands tend to pair to make bigger common parts. As a result, 96.97%? of β-strand pairs have a ratio of 25% of the paired common part to the whole length. Also 94.26% and 95.98%? of β-strand pairs have a ratio of 40% of the paired common part to the length of the two β-strands, respectively. Interstrand register predictions by searching interacting β-strands from several alternative offsets should comply with this rule to reduce the computational searching space to improve the performances of algorithms. 1. Introduction The issue of protein structure prediction is still extremely challenging in bioinformatics [1, 2]. Usually, structural information for protein sequences with no detectable homology to a protein of known structure could be obtained by predicting the arrangement of their secondary structural elements [3]. As we know, the two predominant protein secondary structures are α-helices and β-sheets. However, a combination of the early suitable α-helical model systems and sustained researches have resulted in a detailed understanding of α-helix, while comparatively little is known about β-sheet [4]. Tertiary structures of β-sheet-containing proteins are especially difficult to simulate [3, 5]. Unlike α-helices, β-sheets are more complex resulting from a combination of two or more disjoint peptide segments, called β-strands. Therefore, the β-sheet topology is very useful for elucidating protein folding pathways [6, 7] for predicting tertiary structures [3, 8–11], and even for designing new proteins [12–14]. As fundamental components, β-sheets are plentifully contained in protein domains. In a β-sheet, multiple β-strands held together linked by hydrogen bonds and can be classified into parallel and antiparallel direction styles. Adjacent
References
[1]
H. M. Fooks, A. C. R. Martin, D. N. Woolfson, R. B. Sessions, and E. G. Hutchinson, “Amino acid pairing preferences in parallel -sheets in proteins,” Journal of Molecular Biology, vol. 356, no. 1, pp. 32–44, 2006.
[2]
M. Dorn and O. N. de Souza, “A3N: an artificial neural network n-gram-based method to approximate 3-D polypeptides structure prediction,” Expert Systems with Applications, vol. 37, no. 12, pp. 7497–7508, 2010.
[3]
R. E. Steward and J. M. Thornton, “Prediction of strand pairing in antiparallel and parallel -sheets using information theory,” Proteins, vol. 48, no. 2, pp. 178–191, 2002.
[4]
M. J?ger, M. Dendle, A. A. Fuller, and J. W. Kelly, “A cross-strand Trp-Trp pair stabilizes the hPin1 WW domain at the expense of function,” Protein Science, vol. 16, no. 10, pp. 2306–2313, 2007.
[5]
M. Kuhn, J. Meiler, and D. Baker, “Strand-loop-strand motifs: prediction of hairpins and diverging turns in proteins,” Proteins, vol. 54, no. 2, pp. 282–288, 2004.
[6]
J. S. Merkel and L. Regan, “Modulating protein folding rates in vivo and in vitro by side-chain interactions between the parallel strands of green fluorescent protein,” Journal of Biological Chemistry, vol. 275, no. 38, pp. 29200–29206, 2000.
[7]
Y. Mandel-Gutfreund, S. M. Zaremba, and L. M. Gregoret, “Contributions of residue pairing to -sheet formation: conservation and covariation of amino acid residue pairs on antiparallel -strands,” Journal of Molecular Biology, vol. 305, no. 5, pp. 1145–1159, 2001.
[8]
S. M. Zaremba and L. M. Gregoret, “Context-dependence of amino acid residue pairing in antiparallel -sheets,” Journal of Molecular Biology, vol. 291, no. 2, pp. 463–479, 1999.
[9]
I. Ruczinski, C. Kooperberg, R. Bonneau, and D. Baker, “Distributions of beta sheets in proteins with application to structure prediction,” Proteins, vol. 48, no. 1, pp. 85–97, 2002.
[10]
B. Rost, J. Liu, D. Przybylski et al., “Prediction of protein structure through evolution,” in Handbook of Chemoinformatics: From Data to Knowledge, J. Gasteiger and T. Engel, Eds., pp. 1789–1811, John Wiley & Sons, New York, NY, USA, 2003.
[11]
J. Cheng and P. Baldi, “Three-stage prediction of protein -sheets by neural networks, alignments and graph algorithms,” Bioinformatics, vol. 21, supplement 1, pp. i75–i84, 2005.
[12]
C. K. Smith and L. Regan, “Construction and design of betasheets,” Accounts of Chemical Research, vol. 30, no. 4, pp. 153–161, 1997.
[13]
T. Kortemme, M. Ramirez-Alvarado, and L. Serrano, “Design of a 20-amino acid, three-stranded -sheet protein,” Science, vol. 281, no. 5374, pp. 253–256, 1998.
[14]
B. Kuhlman, G. Dantas, G. C. Ireton, G. Varani, B. L. Stoddard, and D. Baker, “Design of a novel globular protein fold with atomic-level accuracy,” Science, vol. 302, no. 5649, pp. 1364–1368, 2003.
[15]
N. Zhang, J. Ruan, G. Duan, S. Gao, and T. Zhang, “The interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of -strands,” Biochemical and Biophysical Research Communications, vol. 386, no. 3, pp. 537–543, 2009.
[16]
N. Zhang, G. Duan, S. Gao, J. Ruan, and T. Zhang, “Prediction of the parallel/antiparallel orientation of beta-strands using amino acid pairing preferences and support vector machines,” Journal of Theoretical Biology, vol. 263, no. 3, pp. 360–368, 2010.
[17]
E. G. Hutchinson, R. B. Sessions, J. M. Thornton, and D. N. Woolfson, “Determinants of strand register in antiparallel -sheets of proteins,” Protein Science, vol. 7, no. 11, pp. 2287–2300, 1998.
[18]
J. S. Nowick, “Exploring -sheet structure and interactions with chemical model systems,” Accounts of Chemical Research, vol. 41, no. 10, pp. 1319–1330, 2008.
[19]
A. G. Cochran, R. T. Tong, M. A. Starovasnik et al., “A minimal peptide scaffold for -turn display: optimizing a strand position in disulfide-cyclized -hairpins,” Journal of the American Chemical Society, vol. 123, no. 4, pp. 625–632, 2001.
[20]
S. J. Russell and A. G. Cochran, “Designing stable -hairpins: energetic contributions from cross-strand residues,” Journal of the American Chemical Society, vol. 122, no. 50, pp. 12600–12601, 2000.
[21]
Y. Dou, P. F. Baisnée, G. Pollastri, Y. Pécout, J. Nowick, and P. Baldi, “ICBS: a database of interactions between protein chains mediated by -sheet formation,” Bioinformatics, vol. 20, no. 16, pp. 2767–2777, 2004.
[22]
N. Zhang, J. Ruan, J. Wu, and T. Zhang, “Sheetspair: a database of amino acid pairs in protein sheet structures,” Data Science Journal, vol. 6, no. 15, pp. S589–S595, 2007.
[23]
Q. Zhang, S. Yoon, and W. J. Welsh, “Improved method for predicting -turn using support vector machine,” Bioinformatics, vol. 21, no. 10, pp. 2370–2374, 2005.
[24]
J. Cheng and P. Baldi, “Improved residue contact prediction using support vector machines and a large feature set,” BMC Bioinformatics, vol. 8, article 113, 2007.
[25]
P. Baldi, G. Pollastri, C. A. Andersen, and S. Brunak, “Matching protein beta-sheet partners by feedforward and recurrent neural networks.,” in Proceedings of International Conference on Intelligent Systems for Molecular Biology (ISMB '00), vol. 8, pp. 25–36, 2000.
[26]
O. Grana, D. Baker, R. M. MacCallum et al., “CASP6 assessment of contact prediction,” Proteins, vol. 61, no. 7, pp. 214–224, 2005.
[27]
I. Halperin, H. Wolfson, and R. Nussinov, “Correlated mutations: advances and limitations. A study on fusion proteins and on the Cohesin-Dockerin families,” Proteins, vol. 63, no. 4, pp. 832–845, 2006.
[28]
P. J. Kundrotas and E. G. Alexov, “Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives,” BMC Bioinformatics, vol. 7, article 503, 2006.
[29]
G. Z. Zhang, D. S. Huang, and Z. H. Quan, “Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction,” Pattern Recognition Letters, vol. 26, no. 10, pp. 1543–1553, 2005.
[30]
J. Cheng and P. Baldi, “Improved residue contact prediction using support vector machines and a large feature set,” BMC Bioinformatics, vol. 8, article 113, 2007.
[31]
C. A. Rohl, C. E. M. Strauss, K. M. S. Misura, and D. Baker, “Protein structure prediction using rosetta,” Methods in Enzymology, vol. 383, pp. 66–93, 2004.
[32]
J. Lee, S. Y. Kim, and J. Lee, “Protein structure prediction based on fragment assembly and parameter optimization,” Biophysical Chemistry, vol. 115, no. 2-3, pp. 209–214, 2005.
[33]
G. Wang and R. L. Dunbrack, “PISCES: a protein sequence culling server,” Bioinformatics, vol. 19, no. 12, pp. 1589–1591, 2003.
[34]
G. Wang and R. L. Dunbrack, “PISCES: recent improvements to a PDB sequence culling server,” Nucleic Acids Research, vol. 33, no. 2, pp. W94–W98, 2005.
[35]
F. Ferron, S. Longhi, B. Canard, and D. Karlin, “A practical overview of protein disorder prediction methods,” Proteins, vol. 65, no. 1, pp. 1–14, 2006.
[36]
R. Linding, L. J. Jensen, F. Diella, P. Bork, T. J. Gibson, and R. B. Russell, “Protein disorder prediction: implications for structural proteomics,” Structure, vol. 11, no. 11, pp. 1453–1459, 2003.
[37]
B. Liu, L. Lin, X. Wang, X. Wang, and Y. Shen, “Protein long disordered region prediction based on profile-level disorder propensities and position-specific scoring matrixes,” in Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM '09), pp. 66–69, November 2009.
[38]
M. Parisien and F. Major, “Ranking the factors that contribute to protein -sheet folding,” Proteins, vol. 68, no. 4, pp. 824–829, 2007.
[39]
M. S. Searle and B. Ciani, “Design of -sheet systems for understanding the thermodynamics and kinetics of protein folding,” Current Opinion in Structural Biology, vol. 14, no. 4, pp. 458–464, 2004.
[40]
K. S. Rotondi and L. M. Gierasch, “Local sequence information in cellular retinoic acid-binding protein I: specific residue roles in -turns,” Biopolymers, vol. 71, no. 6, pp. 638–651, 2003.
[41]
J. Kim, S. R. Brych, J. Lee, T. M. Logan, and M. Blaber, “Identification of a key structural element for protein folding within -hairpin turns,” Journal of Molecular Biology, vol. 328, no. 4, pp. 951–961, 2003.
[42]
J. Karanicolas and C. L. Brooks, “The structural basis for biphasic kinetics in the folding of the WW domain from a formin-binding protein: lessons for protein design?” Proceedings of the National Academy of Sciences of the United States of America, vol. 100, no. 7, pp. 3954–3959, 2003.
[43]
K. S. Rotondi, L. F. Rotondi, and L. M. Gierasch, “Native structural propensity in cellular retinoic acid-binding protein I 64–88: the role of locally encoded structure in the folding of a -barrel protein,” Biophysical Chemistry, vol. 100, no. 1-3, pp. 421–436, 2003.
[44]
Y. Kato, T. Akutsu, and H. Seki, “Dynamic programming algorithms and grammatical modeling for protein beta-sheet prediction,” Journal of Computational Biology, vol. 16, no. 7, pp. 945–957, 2009.
[45]
B. Wathen and Z. Jia, “Protein -sheet nucleation is driven by local modular formation,” Journal of Biological Chemistry, vol. 285, no. 24, pp. 18376–18384, 2010.