%0 Journal Article
%T On the Variability of Neural Network Classification Measures in the Protein Secondary Structure Prediction Problem
%A Eric Sakk
%A Ayanna Alexander
%J Applied Computational Intelligence and Soft Computing
%D 2013
%I Hindawi Publishing Corporation
%R 10.1155/2013/794350
%X We revisit the protein secondary structure prediction problem using linear and backpropagation neural network architectures commonly applied in the literature. In this context, neural network mappings are constructed between protein training set sequences and their assigned structure classes in order to analyze the class membership of test data and associated measures of significance. We present numerical results demonstrating that classifier performance measures can vary significantly depending upon the classifier architecture and the structure class encoding technique. Furthermore, an analytic formulation is introduced in order to substantiate the observed numerical data. Finally, we analyze and discuss the ability of the neural network to accurately model fundamental attributes of protein secondary structure. 1. Introduction The protein secondary structure prediction problem can be phrased as a supervised pattern recognition problem [1每5] for which training data is readily available from reliable databases such as the Protein Data Bank (PDB) or CB513 [6]. Based upon training examples, subsequences derived from primary sequences are encoded based upon a discrete set of classes. For instance, three class encodings are commonly applied in the literature in order to numerically represent the secondary structure set (alpha helix , beta sheet , coil ) [7每11]. By applying a pattern recognition approach, subsequences of unknown classification can then be tested to determine the structure class to which they belong. Phrased in this way, backpropagation neural networks [7, 12每14], and variations on the neural network theme [8, 10, 11, 15每18] have been applied to the secondary structure prediction problem with varied success. Furthermore, many tools currently applying hybrid methodologies such as PredictProtein [19, 20], JPRED [8, 17, 21], SCRATCH [22, 23] and PSIPRED [24, 25] rely on the neural network paradigm as part of their prediction scheme. One of the main reasons for applying the neural network approach in the first place is that they tend be good universal approximators [26每30] and, theoretically, have the potential to create secondary structure models. In other words, after a given network architecture has been chosen and presented with a robust set of examples, the optimal parameters associated with the trained network, in principle, define an explicit function that can map a given protein sequence to its associated secondary structure. If the structure predicted by the network function is generally correct and consistent for an arbitrary input
%U http://www.hindawi.com/journals/acisc/2013/794350/