Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Protein secondary structure prediction for a single-sequence using hidden semi-Markov models
Zafer Aydin, Yucel Altunbasak, Mark Borodovsky
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-178
Abstract: In this paper, we further refine and extend the hidden semi-Markov model (HSMM) initially considered in the BSPSS algorithm. We introduce an improved residue dependency model by considering the patterns of statistically significant amino acid correlation at structural segment borders. We also derive models that specialize on different sections of the dependency structure and incorporate them into HSMM. In addition, we implement an iterative training method to refine estimates of HSMM parameters. The three-state-per-residue accuracy and other accuracy measures of the new method, IPSSP, are shown to be comparable or better than ones for BSPSS as well as for PSIPRED, tested under the single-sequence condition.We have shown that new dependency models and training methods bring further improvements to single-sequence protein secondary structure prediction. The results are obtained under cross-validation conditions using a dataset with no pair of sequences having significant sequence similarity. As new sequences are added to the database it is possible to augment the dependency structure and obtain even higher accuracy. Current and future advances should contribute to the improvement of function prediction for orphan proteins inscrutable to current similarity search methods.Accurate prediction of the regular elements of protein 3D structure is important for precise prediction of the whole 3D structure. A protein secondary structure prediction algorithm assigns to each amino acid a structural state from a 3-letter alphabet {H, E, L} representing the α-helix, β-strand and loop, respectively. Prediction of function via sequence similarity search for new proteins (function annotation transfer) should be facilitated by a more accurate prediction of secondary structure since structure is more conserved than sequence.Algorithms of protein secondary structure prediction frequently employ neural networks [1-7], support vector machines [8-13] and hidden Markov models [14-16]. Param
Understanding Protein Dynamics with L1-Regularized Reversible Hidden Markov Models  [PDF]
Robert T. McGibbon,Bharath Ramsundar,Mohammad M. Sultan,Gert Kiss,Vijay S. Pande
Quantitative Biology , 2014,
Abstract: We present a machine learning framework for modeling protein dynamics. Our approach uses L1-regularized, reversible hidden Markov models to understand large protein datasets generated via molecular dynamics simulations. Our model is motivated by three design principles: (1) the requirement of massive scalability; (2) the need to adhere to relevant physical law; and (3) the necessity of providing accessible interpretations, critical for both cellular biology and rational drug design. We present an EM algorithm for learning and introduce a model selection criteria based on the physical notion of convergence in relaxation timescales. We contrast our model with standard methods in biophysics and demonstrate improved robustness. We implement our algorithm on GPUs and apply the method to two large protein simulation datasets generated respectively on the NCSA Bluewaters supercomputer and the Folding@Home distributed computing network. Our analysis identifies the conformational dynamics of the ubiquitin protein critical to cellular signaling, and elucidates the stepwise activation mechanism of the c-Src kinase protein.
Evaluating eukaryotic secreted protein prediction
Eric W Klee, Lynda BM Ellis
BMC Bioinformatics , 2005, DOI: 10.1186/1471-2105-6-256
Abstract: Prediction accuracies were evaluated using 372 unbiased, eukaryotic, SwissProt protein sequences. TargetP, SignalP 3.0 maximum S-score and SignalP 3.0 D-score were the most accurate single scores (90–91% accurate). The combination of a positive TargetP prediction, SignalP 2.0 maximum Y-score, and SignalP 3.0 maximum S-score increased accuracy by six percent.Single predictive scores could be highly accurate, but almost all accuracies were slightly less than those reported by program authors. Predictive accuracy could be substantially improved by combining scores from multiple methods into a single composite prediction.Predicting secreted proteins from primary sequence is a major component of automated protein annotation and is critical to a wide range of studies. Embryology, tumor maker detection, and agricultural animal performance are investigated using eukaryotic secreted proteins and their role in cell-to-cell communication, cellular differentiation, morphological development, and cellular response to disease. Many software tools have been developed for ab initio cellular localization prediction, using machine learning techniques such as neural networks, hidden Markov models and support vector machines. Identifying the program best suited for a researcher's needs requires familiarity with several different programs. Prediction accuracy depends on the methods employed by a program and the integrity of the data used to develop the program. Additionally, unbiased comparison using an independent protein sequence set is needed to compare programs, as system characteristics reported by program authors are often inflated [1].The ambiguity of terminology used to describe and label secreted proteins often results in confusion on just what type of protein is being predicted or discussed. To eliminate this confusion, biologically concrete labels will be used in lieu of the term "secreted protein" or "secretory protein", here. Proteins possessing an N-terminal signal sequenc
SVM-Based Prediction of Propeptide Cleavage Sites in Spider Toxins Identifies Toxin Innovation in an Australian Tarantula  [PDF]
Emily S. W. Wong, Margaret C. Hardy, David Wood, Timothy Bailey, Glenn F. King
PLOS ONE , 2013, DOI: 10.1371/journal.pone.0066279
Abstract: Spider neurotoxins are commonly used as pharmacological tools and are a popular source of novel compounds with therapeutic and agrochemical potential. Since venom peptides are inherently toxic, the host spider must employ strategies to avoid adverse effects prior to venom use. It is partly for this reason that most spider toxins encode a protective proregion that upon enzymatic cleavage is excised from the mature peptide. In order to identify the mature toxin sequence directly from toxin transcripts, without resorting to protein sequencing, the propeptide cleavage site in the toxin precursor must be predicted bioinformatically. We evaluated different machine learning strategies (support vector machines, hidden Markov model and decision tree) and developed an algorithm (SpiderP) for prediction of propeptide cleavage sites in spider toxins. Our strategy uses a support vector machine (SVM) framework that combines both local and global sequence information. Our method is superior or comparable to current tools for prediction of propeptide sequences in spider toxins. Evaluation of the SVM method on an independent test set of known toxin sequences yielded 96% sensitivity and 100% specificity. Furthermore, we sequenced five novel peptides (not used to train the final predictor) from the venom of the Australian tarantula Selenotypus plumipes to test the accuracy of the predictor and found 80% sensitivity and 99.6% 8-mer specificity. Finally, we used the predictor together with homology information to predict and characterize seven groups of novel toxins from the deeply sequenced venom gland transcriptome of S. plumipes, which revealed structural complexity and innovations in the evolution of the toxins. The precursor prediction tool (SpiderP) is freely available on ArachnoServer (http://www.arachnoserver.org/spiderP.htm?l), a web portal to a comprehensive relational database of spider toxins. All training data, test data, and scripts used are available from the SpiderP website.
A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes
Anne-Kathrin Schultz, Ming Zhang, Thomas Leitner, Carla Kuiken, Bette Korber, Burkhard Morgenstern, Mario Stanke
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-265
Abstract: We developed a jumping profile Hidden Markov Model (jpHMM), a probabilistic generalization of the jumping-alignment approach. Given a partition of the aligned input sequence family into known sequence subtypes, our model can jump between states corresponding to these different subtypes, depending on which subtype is locally most similar to a database sequence. Jumps between different subtypes are indicative of intersubtype recombinations. We applied our method to a large set of genome sequences from human immunodeficiency virus (HIV) and hepatitis C virus (HCV) as well as to simulated recombined genome sequences.Our results demonstrate that jumps in our jumping profile HMM often correspond to recombination breakpoints; our approach can therefore be used to detect recombinations in genomic sequences. The recombination breakpoints identified by jpHMM were found to be significantly more accurate than breakpoints defined by traditional methods based on comparing single representative sequences.Profile Hidden Markov Models [1] are a popular way of modelling nucleic-acid or protein sequence families for database searching, see [2] for a review. Like other Hidden Markov Models (HMMs), profile HMMs consist of so-called states that can emit symbols of the underlying alphabet, i.e. nucleotides or amino acids [3]. Transitions are possible between these states, and a DNA or protein sequence is thought to be generated by a path Q through the model beginning with a special begin state and ending with an end state. There are probabilities (a) for possible transitions from one state to another and (b) for the emission of symbols at a given state. The states together with the possible transitions between them are called the topology of the model while the corresponding transition and emission probabilities are called its parameters. A sequence S is generated by the model with a certain probability P(S). In general, a sequence S can be generated by more than one path Q through the mo
Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy
Zheng Yang
BMC Bioinformatics , 2009, DOI: 10.1186/1471-2105-10-361
Abstract: A new approach has been developed for predicting sulfotyrosine sites using the random forest algorithm after a careful evaluation of seven machine learning algorithms. Peptides are formed by consecutive residues symmetrically flanking tyrosine sites. They are then encoded using an amino acid hydrophobicity scale. This new approach has increased the sensitivity by 22%, the specificity by 3%, and the total prediction accuracy by 10% compared with the previous predictor using the same blind data. Meanwhile, both negative and positive predictive powers have been increased by 9%. In addition, the random forest model has an excellent feature for ranking the residues flanking tyrosine sites, hence providing more information for further investigating the tyrosine sulfation mechanism. A web tool has been implemented at http://ecsb.ex.ac.uk/sulfotyrosine webcite for public use.The random forest algorithm is able to deliver a better model compared with the Hidden Markov Model, the support vector machine, artificial neural networks, and others for predicting sulfotyrosine sites. The success shows that the random forest algorithm together with an amino acid hydrophobicity scale encoding can be a good candidate for peptide classification.Tyrosine sulfation is a posttranslational modification (PTM), which introduces a sulfate group to a tyrosine residue in a protein [1-3]. During the modification process, sulfation is catalysed by tyrosylprotein sulfotransferase [4]. A targeted tyrosine for sulfation is normally required to be exposed on a protein surface [5]. Previous studies have indicated that Sulfation is an important anticipator for extracellular protein-protein interactions [6,7]. Studies have shown that sulfation is related to various diseases when a malfunction of a cellular activity occurs. For instance, sulfotyrosine can alter the affinity in some chemokine receptors leading to a downstream signalling cascade which affects the cells involved in acute and chronic events o
A conformation ensemble approach to protein residue-residue contact
Jesse Eickholt, Zheng Wang, Jianlin Cheng
BMC Structural Biology , 2011, DOI: 10.1186/1472-6807-11-38
Abstract: We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively.When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.Even after many years of intense attention and development, de novo protein structure prediction remains a difficult and open problem. In part, this is due to the inadequacy of current de novo sampling techniques which are incapable of guiding the folding process through such a vast conformational space [1-3]. To address this issue, several have proposed the use of long range contacts to reduce the size of the conformational search space. Studies have shown that with as few as L/8 long-range contacts (L being the sequence length) proteins can be folded and moderate resolution models generated [4,5]. Additional uses of protein residue-residue contacts include applications such as model evaluation, model selection and ranking [6-8], and drug design [9].Given the importance and applicability of protein contacts, considerable effort has been put forth to develop methods which can predict protein residue-residue contacts. The majority of these methods can be categorized into three groups based on machine learning, templates or correlated mutations. Machine learning approaches make predictions by employing techniques such as neural networks, support vector machines or hidden Markov models trained on contacts from experimental structure
Hidden markov model for the prediction of transmembrane proteins using MATLAB  [cached]
Navaneet Chaturvedi*,Vinay Kumar Singh3,Sudhanshu Shanker2,Dhiraj Sinha4
Bioinformation , 2011,
Abstract: Since membranous proteins play a key role in drug targeting therefore transmembrane proteins prediction is active and challenging area of biological sciences. Location based prediction of transmembrane proteins are significant for functional annotation of protein sequences. Hidden markov model based method was widely applied for transmembrane topology prediction. Here we have presented a revised and a better understanding model than an existing one for transmembrane protein prediction. Scripting on MATLAB was built and compiled for parameter estimation of model and applied this model on amino acid sequence to know the transmembrane and its adjacent locations. Estimated model of transmembrane topology was based on TMHMM model architecture. Only 7 super states are defined in the given dataset, which were converted to 96 states on the basis of their length in sequence. Accuracy of the prediction of model was observed about 74 %, is a good enough in the area of transmembrane topology prediction. Therefore we have concluded the hidden markov model plays crucial role in transmembrane helices prediction on MATLAB platform and it could also be useful for drug discovery strategy.
A Dirichlet process mixture of hidden Markov models for protein structure prediction  [PDF]
Kristin P. Lennox,David B. Dahl,Marina Vannucci,Ryan Day,Jerry W. Tsai
Statistics , 2010, DOI: 10.1214/09-AOAS296
Abstract: By providing new insights into the distribution of a protein's torsion angles, recent statistical models for this data have pointed the way to more efficient methods for protein structure prediction. Most current approaches have concentrated on bivariate models at a single sequence position. There is, however, considerable value in simultaneously modeling angle pairs at multiple sequence positions in a protein. One area of application for such models is in structure prediction for the highly variable loop and turn regions. Such modeling is difficult due to the fact that the number of known protein structures available to estimate these torsion angle distributions is typically small. Furthermore, the data is "sparse" in that not all proteins have angle pairs at each sequence position. We propose a new semiparametric model for the joint distributions of angle pairs at multiple sequence positions. Our model accommodates sparse data by leveraging known information about the behavior of protein secondary structure. We demonstrate our technique by predicting the torsion angles in a loop from the globin fold family. Our results show that a template-based approach can now be successfully extended to modeling the notoriously difficult loop and turn regions.
Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information
Ashis Biswas, Nasimul Noman, Abdur Sikder
BMC Bioinformatics , 2010, DOI: 10.1186/1471-2105-11-273
Abstract: Experimental results based on cross validations and an independent benchmark reveal the significance of using the evolutionary information alone to classify phosphorylation sites from protein sequences. The prediction performance of the proposed system is better than those of the existing prediction systems that also do not incorporate kinase information. The system is also comparable to systems that incorporate kinase information in predicting such sites.The approach presented in this paper provides an efficient way to identify phosphorylation sites in a given protein primary sequence that would be a valuable information for the molecular biologists working on protein phosphorylation sites and for bioinformaticians developing generalized prediction systems for the post translational modifications like phosphorylation or glycosylation. PPRED is publicly available at the URL http://www.cse.univdhaka.edu/~ashis/ppred/index.php webcite.One of the most critical cellular phenomenon is phosphorylation of proteins as it is involved in signal transduction of various processes including cell cycle, proliferation and apoptosis [1-3]. This phenomenon is catalyzed by protein kinases affecting certain acceptor residues (Serine, Threonine and Tyrosine) in substrate sequences. A study on 2D-gel electrophoresis showed that 30-50% of the proteins in an eukaryotic cell had undergone phosphorylation [4]. So, accurate prediction of the phosphorylation sites of eukaryotic proteins may help in understanding the overall intracellular activities.Both experimental and computational methods have been developed to investigate the phosphorylation sites. But in vivo and in vitro methods are often time-consuming, expensive and have very limited scope due to some restrictions for many enzymatic reactions. On the other hand, in silico prediction of phosphorylation sites from computational approaches may provide fast and automatic annotations for candidate phosphorylation sites. Besides, there are
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.