%0 Journal Article %T Error statistics of hidden Markov model and hidden Boltzmann model results %A Lee A Newberg %J BMC Bioinformatics %D 2009 %I BioMed Central %R 10.1186/1471-2105-10-212 %X Here we present a novel general approach to estimating these false positive and true positive rates that is significantly more efficient than are existing general approaches. We validate the technique via an implementation within the HMMER 3.0 package, which scans DNA or protein sequence databases for patterns of interest, using a profile-HMM.The new approach is faster than general na£żve sampling approaches, and more general than other current approaches. It provides an efficient mechanism by which to estimate error statistics for hidden Markov model and hidden Boltzmann model results.Hidden Markov models are employed in a wide variety of fields, including speech recognition, econometrics, computer vision, signal processing, cryptanalysis, and computational biology. In speech recognition, hidden Markov models can be used to distinguish one word from another based upon the time series of certain qualities of a sound [1]. In finance, the models can be used to simulate the unknown transitions between low, medium, and high debt default regimes in time [2]. In computer vision they can be used to decode American Sign Language (ASL) [3]. Hidden Markov models are used in computational biology to find similarity between sequences of nucleotides (DNA or RNA) or polypeptides (proteins) [4,5] and to predict protein structure [6].Hidden Markov models permit the facile description and implementation of powerful statistical models and algorithms that are used for calculation of the probability of sequential data. Furthermore, the algorithms used to manipulate hidden Markov models are easily applied more generally. Frequently these dynamic programming algorithms are instead employed in the calculation of an odds ratio, which is the the ratio of the probability of sequential data under a foreground model (signal), divided by the probability of the sequential data under a background model (noise). In other applications, the algorithms are used to compute other scores, frequently empl %U http://www.biomedcentral.com/1471-2105/10/212