Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources
Mario Stanke, Oliver Sch?ffmann, Burkhard Morgenstern, Stephan Waack
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-62
Abstract: We present a fairly general method for integration of external information. Our method is based on the evaluation of hints to potentially protein-coding regions by means of a Generalized Hidden Markov Model (GHMM) that takes both intrinsic and extrinsic information into account. We used this method to extend the ab initio gene prediction program AUGUSTUS to a versatile tool that we call AUGUSTUS+. In this study, we focus on hints derived from matches to an EST or protein database, but our approach can be used to include arbitrary user-defined hints. Our method is only moderately effected by the length of a database match. Further, it exploits the information that can be derived from the absence of such matches. As a special case, AUGUSTUS+ can predict genes under user-defined constraints, e.g. if the positions of certain exons are known. With hints from EST and protein databases, our new approach was able to predict 89% of the exons in human chromosome 22 correctly.Sensitive probabilistic modeling of extrinsic evidence such as sequence database matches can increase gene prediction accuracy. When a match of a sequence interval to an EST or protein sequence is used it should be treated as compound information rather than as information about individual positions.Finding protein-coding genes in eukaryotic genomic sequences with in-silico methods remains an important challenge in computational genomics, despite many years of intensive research work. Existing methods fall into two groups with respect to the data they utilize. The first group consists of ab initio programs which use only the query genomic sequence as input. Examples are the programs GENSCAN [1], AUGUSTUS [2] and HMMGene [3] which are HMM-based and GENEID [4]. The second group of gene-finding methods, extrinsic methods, comprises all programs which use data other than the query genomic sequence. Some extrinsic methods use genomic sequences from other species. A cross-species comparison of genomic sequences
Vertebrate gene finding from multiple-species alignments using a two-level strategy
David Carter, Richard Durbin
Genome Biology , 2006, DOI: 10.1186/gb-2006-7-s1-s6
Abstract: We describe DOGFISH, a vertebrate gene finder consisting of a cleanly separated site classifier and structure predictor. The classifier scores potential splice sites and other features, using sequence alignments between multiple vertebrate species, while the structure predictor hypothesizes coding transcripts by combining these scores using a simple model of gene structure. This also identifies and assigns confidence scores to possible additional exons. Performance is assessed on the ENCODE regions. We predict transcripts and exons across the whole human genome, and identify over 10,000 high confidence new coding exons not in the Ensembl gene set.We present a practical multiple species gene prediction method. Accuracy improves as additional species, up to at least eight, are introduced. The novel predictions of the whole-genome scan should support efficient experimental verification.Gene finding can usefully be viewed as a two-level task. At the lower or local level there is a classification task: one of assigning probability estimates to potential features such as splice sites and coding start and stop sites on the basis of sequence information associated with each potential feature. At the higher or global level, on the other hand, we have a structure-building task: finding the most probable way(s) to combine potential features into exons, transcripts and genes. Classification and structure building are very different tasks, and although a gene finder can be based on a single formalism, such as hidden Markov models (HMMs) [1,2], there is no reason to assume that the same technique will be optimal for both tasks. Although HMMs seem to offer a good basis for structure building, they impose independence assumptions that are not particularly well suited to feature classification; formalisms such as neural networks [3,4], maximum entropy modeling [5], Bayesian networks [6-8], support vector machines [9-11] and relevance vector machines (RVMs) [12-14] provide alternativ
Detection and characterization of regulatory elements using probabilistic conditional random field and hidden Markov models  [cached]
Hongyan Wang,Xiaobo Zhou
Chinese Journal of Cancer , 2013, DOI: 10.5732/cjc.012.10112
Abstract: By altering the electrostatic charge of histones or providing binding sites to protein recognition mole-cules, Chromatin marks have been proposed to regulate gene expression, a property that has motivated researchers to link these marks to cis-regulatory elements. With the help of next generation sequencing technologies, we can now correlate one specific chromatin mark with regulatory elements (e.g. enhancers or promoters) and also build tools, such as hidden Markov models, to gain insight into mark combinations. However, hidden Markov models have limitation for their character of generative models and assume that a current observation depends only on a current hidden state in the chain. Here, we employed two graphical probabilistic models, namely the linear conditional random field model and multivariate hidden Markov model, to mark gene regions with different states based on recurrent and spatially coherent character of these eight marks. Both models revealed chromatin states that may correspond to enhancers and promoters, transcribed regions, transcriptional elongation, and low-signal regions. We also found that the linear conditional random field model was more effective than the hidden Markov model in recognizing regulatory elements, such as promoter-, enhancer-, and transcriptional elongation-associated regions, which gives us a better choice.
Efficient decoding algorithms for generalized hidden Markov model gene finders
William H Majoros, Mihaela Pertea, Arthur L Delcher, Steven L Salzberg
BMC Bioinformatics , 2005, DOI: 10.1186/1471-2105-6-16
Abstract: As a first step toward addressing the implementation challenges of these next-generation systems, we describe in detail two software architectures for GHMM-based gene finders, one comprising the common array-based approach, and the other a highly optimized algorithm which requires significantly less memory while achieving virtually identical speed. We then show how both of these architectures can be accelerated by a factor of two by optimizing their content sensors. We finish with a brief illustration of the impact these optimizations have had on the feasibility of our new homology-based gene finder, TWAIN.In describing a number of optimizations for GHMM-based gene finders and making available two complete open-source software systems embodying these methods, it is our hope that others will be more enabled to explore promising extensions to the GHMM framework, thereby improving the state-of-the-art in gene prediction techniques.Generalized Hidden Markov Models have seen wide use in recent years in the field of computational gene prediction. A number of ab initio gene-finding programs are now available which utilize this mathematical framework internally for the modeling and evaluation of gene structure [1-6], and newer systems are now emerging which expand this framework by simultaneously modeling two genomes at once, in order to harness the mutually informative signals present in homologous gene structures from recently diverged species. As greater numbers of such genomes become available, it is tempting to consider the possibility of integrating all this information into increasingly complex models of gene structure and evolution.Notwithstanding our eagerness to utilize this expected flood of genomic data, methods have yet to be demonstrated which can perform such large-scale parallel analyses without requiring inordinate computational resources. In the case of Generalized Pair HMMs (GPHMMs), for example, the only systems in existence of which we are familiar make
A Metastate HMM with Application to Gene Structure Identification in Eukaryotes  [cached]
Winters-Hilt Stephen,Baribault Carl
EURASIP Journal on Advances in Signal Processing , 2010,
Abstract: We introduce a generalized-clique hidden Markov model (HMM) and apply it to gene finding in eukaryotes (C. elegans). We demonstrate a HMM structure identification platform that is novel and robustly-performing in a number of ways. The generalized clique HMM begins by enlarging the primitive hidden states associated with the individual base labels (as exon, intron, or junk) to substrings of primitive hidden states, or footprint states, having a minimal length greater than the footprint state length. The emissions are likewise expanded to higher order in the fundamental joint probability that is the basis of the generalized-clique, or "metastate", HMM. We then consider application to eukaryotic gene finding and show how such a metastate HMM improves the strength of coding/noncoding-transition contributions to gene-structure identification. We will describe situations where the coding/noncoding-transition modeling can effectively recapture the exon and intron heavy tail distribution modeling capability as well as manage the exon-start needle-in-the-haystack problem. In analysis of the C. elegans genome we show that the sensitivity and specificity (SN,SP) results for both the individual-state and full-exon predictions are greatly enhanced over the standard HMM when using the generalized-clique HMM.
Prediction of State of Wireless Network Using Markov and Hidden Markov Model  [cached]
MD. Osman Gani,Hasan Sarwar,Chowdhury Mofizur Rahman
Journal of Networks , 2009, DOI: 10.4304/jnw.4.10.976-984
Abstract: Optimal resource allocation and higher quality of service is a much needed requirement in case of wireless networks. In order to improve the above factors, intelligent prediction of network behavior plays a very important role. Markov Model (MM) and Hidden Markov Model (HMM) are proven prediction techniques used in many fields. In this paper, we have used Markov and Hidden Markov prediction tools to predict the number of wireless devices that are connected to a specific Access Point (AP) at a specific instant of time. Prediction has been performed in two stages. In the first stage, we have found state sequence of wireless access points (AP) in a wireless network by observing the traffic load sequence in time. It is found that a particular choice of data may lead to 91% accuracy in predicting the real scenario. In the second stage, we have used Markov Model to find out the future state sequence of the previously found sequence from first stage. The prediction of next state of an AP performed by Markov Tool shows 88.71% accuracy. It is found that Markov Model can predict with an accuracy of 95.55% if initial transition matrix is calculated directly. We have also shown that O(1) Markov Model gives slightly better accuracy in prediction compared to O(2) MM for predicting far future.
Tuberculosis Surveillance Using a Hidden Markov Model  [PDF]
A Rafei,E Pasha,R Jamshidi Orak
Iranian Journal of Public Health , 2012,
Abstract: Background: Routinely collected data from tuberculosis surveillance system can be used to investigate and monitor the irregularities and abrupt changes of the disease incidence. We aimed at using a Hidden Markov Model in order to detect the abnormal states of pulmonary tuberculosis in Iran.Methods: Data for this study were the weekly number of newly diagnosed cases with sputum smear-positive pulmonarytuberculosis reported between April 2005 and March 2011 throughout Iran. In order to detect the unusual states of the disease, two Hidden Markov Models were applied to the data with and without seasonal trends as baselines.Consequently, the best model was selected and compared with the results of Serfling epidemic threshold which is typically used in the surveillance of infectious diseases.Results: Both adjusted R-squared and Bayesian Information Criterion (BIC) reflected better goodness-of-fit for the model with seasonal trends (0.72 and -1336.66, respectively) than the model without seasonality (0.56 and -1386.75).Moreover, according to the Serfling epidemic threshold, higher values of sensitivity and specificity suggest a higher validity for the seasonal model (0.87 and 0.94, respectively) than model without seasonality (0.73 and 0.68, respectively).Conclusion: A two-state Hidden Markov Model along with a seasonal trend as a function of the model parameters provides an effective warning system for the surveillance of tuberculosis.
Robotic Behavior Prediction Using Hidden Markov Models  [PDF]
Alan J. Hamlet,Carl D. Crane
Computer Science , 2014,
Abstract: There are many situations in which it would be beneficial for a robot to have predictive abilities similar to those of rational humans. Some of these situations include collaborative robots, robots in adversarial situations, and for dynamic obstacle avoidance. This paper presents an approach to modeling behaviors of dynamic agents in order to empower robots with the ability to predict the agent's actions and identify the behavior the agent is executing in real time. The method of behavior modeling implemented uses hidden Markov models (HMMs) to model the unobservable states of the dynamic agents. The background and theory of the behavior modeling is presented. Experimental results of realistic simulations of a robot predicting the behaviors and actions of a dynamic agent in a static environment are presented.
GuoQing Yin,Dietmar Bruckner
International Journal of Electronic Commerce Studies , 2012,
Abstract: In an Ambient Assisted Living (AAL) project the activities of the user will be analyzed. The raw data is from a motion detector. Through data processing the huge amount of dynamic raw data was translated to state data. With hidden Markov model, forward algorithm to analyze these state data the daily activity model of the user was built. Thirdly by comparing the model with observed activity sequences, and finding out the similarities between them, defined the best adapt routine in the model. Furthermore an activity routine net was built and used to compare with the hidden Markov model.
Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments
Brian J Haas, Steven L Salzberg, Wei Zhu, Mihaela Pertea, Jonathan E Allen, Joshua Orvis, Owen White, C Robin Buell, Jennifer R Wortman
Genome Biology , 2008, DOI: 10.1186/gb-2008-9-1-r7
Abstract: Accurate and comprehensive gene discovery in eukaryotic genome sequences requires multiple independent and complementary analysis methods including, at the very least, the application of ab initio gene prediction software and sequence alignment tools. The problem is technically challenging, and despite many years of research no single method has yet been able to solve it, although numerous tools have been developed to target specialized and diverse variations on the gene finding problem (for review [1,2]). Conventional gene finding software employs probabilistic techniques such as hidden Markov models (HMMs). These models are employed to find the most likely partitioning of a nucleotide sequence into introns, exons, and intergenic states according to a prior set of probabilities for the states in the model. Such gene finding programs, including GENSCAN [3], GlimmerHMM [4], Fgenesh [5], and GeneMark.hmm [6], are effective at identifying individual exons and regions that correspond to protein-coding genes, but nevertheless they are far from perfect at correctly predicting complete gene structures, differing from correct gene structures in exon content or position [7-10].The correct gene structures, or individual components including introns and exons, are often apparent from spliced alignments of homologous transcript or protein sequences. Many software tools are available that perform these alignment tasks. Tools used to align expressed sequence tags (ESTs) and full-length cDNAs (FL-cDNAs) to genomic sequence include EST_GENOME [11], AAT [12], sim4 [13], geneseqer [14], BLAT [15], and GMAP [16], among numerous others. The list of programs that perform spliced alignments of protein sequences to DNA are much fewer, including the multifunctional AAT, exonerate [17], and PMAP (derived from GMAP). An extension of spliced protein alignment that includes a probabilistic model of eukaryotic gene structure is implemented in GeneWise [18], a popular homology-based gene predict
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.