oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
HaMStR: Profile hidden markov model based search for orthologs in ESTs
Ingo Ebersberger, Sascha Strauss, Arndt von Haeseler
BMC Evolutionary Biology , 2009, DOI: 10.1186/1471-2148-9-157
Abstract: We present a novel approach (HaMStR) to mine EST data for the presence of orthologs to a curated set of genes. HaMStR combines a profile Hidden Markov Model search and a subsequent BLAST search to extend existing ortholog cluster with sequences from further taxa. We show that the HaMStR results are consistent with those obtained with existing orthology prediction methods that require completely sequenced genomes. A case study on the phylogeny of 35 fungal taxa illustrates that HaMStR is well suited to compile informative data sets for phylogenomic studies from ESTs and protein sequence data.HaMStR extends in a standardized manner a pre-defined set of orthologs with ESTs from further taxa. In the same fashion HaMStR can be applied to protein sequence data, and thus provides a comprehensive approach to compile ortholog cluster from any protein coding data. The resulting orthology predictions serve as the data basis for a variety of evolutionary studies. Here, we have demonstrated the application of HaMStR in a molecular systematics study. However, we envision that studies tracing the evolutionary fate of individual genes or functional complexes of genes will greatly benefit from HaMStR orthology predictions as well.The amount of protein-coding DNA sequences in the public data bases is steadily increasing. This data is mainly generated by the sequencing and annotation of entire genomes and by numerous EST sequencing projects. Approaches to resolve the evolutionary relationships of eukaryotes on a molecular basis -frequently referred to as molecular systematics- particularly benefit from this data. Recent studies on the evolution of metazoans and fungi present trees with 40 to 77 taxa, reconstructed from more than 140 genes [1-6]. Still, these studies consider only a small fraction of the data available. For example, as of May 2008 dbEST contains 714 eukaryotic taxa with more than 2.000 ESTs each, and 394 taxa have more than 10,000 ESTs http://www.ncbi.nlm.nih.gov/dbEST
FitNets: Hints for Thin Deep Nets  [PDF]
Adriana Romero,Nicolas Ballas,Samira Ebrahimi Kahou,Antoine Chassang,Carlo Gatta,Yoshua Bengio
Computer Science , 2014,
Abstract: While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher's intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.
Hidden Semi Markov Models for Multiple Observation Sequences: The mhsmm Package for R  [PDF]
Jared O'Connell,S?ren H?jsgaard
Journal of Statistical Software , 2011,
Abstract: This paper describes the R package mhsmm which implements estimation and prediction methods for hidden Markov and semi-Markov models for multiple observation sequences. Such techniques are of interest when observed data is thought to be dependent on some unobserved (or hidden) state. Hidden Markov models only allow a geometrically distributed sojourn time in a given state, while hidden semi-Markov models extend this by allowing an arbitrary sojourn distribution. We demonstrate the software with simulation examples and an application involving the modelling of the ovarian cycle of dairy cows.
Evolution of DNA ligases of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes: a case of hidden complexity
Natalya Yutin, Eugene V Koonin
Biology Direct , 2009, DOI: 10.1186/1745-6150-4-51
Abstract: Phylogenetic analysis of the ATP-dependent and NAD-dependent DNA ligases encoded by the NCLDV reveals an unexpectedly complex evolutionary history. The NAD-dependent ligases are encoded only by a minority of NCLDV (including mimiviruses, some iridoviruses and entomopoxviruses) but phylogenetic analysis clearly indicated that all viral NAD-dependent ligases are monophyletic. Combined with the topology of the NCLDV tree derived by consensus of trees for universally conserved genes suggests that this enzyme was represented in the ancestral NCLDV. Phylogenetic analysis of ATP-dependent ligases that are encoded by chordopoxviruses, most of the phycodnaviruses and Marseillevirus failed to demonstrate monophyly and instead revealed an unexpectedly complex evolutionary trajectory. The ligases of the majority of phycodnaviruses and Marseillevirus seem to have evolved from bacteriophage or bacterial homologs; the ligase of one phycodnavirus, Emiliana huxlei virus, belongs to the eukaryotic DNA ligase I branch; and ligases of chordopoxviruses unequivocally cluster with eukaryotic DNA ligase III.Examination of phyletic patterns and phylogenetic analysis of DNA ligases of the NCLDV suggest that the common ancestor of the extant NCLDV encoded an NAD-dependent ligase that most likely was acquired from a bacteriophage at the early stages of evolution of eukaryotes. By contrast, ATP-dependent ligases from different prokaryotic and eukaryotic sources displaced the ancestral NAD-dependent ligase at different stages of subsequent evolution. These findings emphasize complex routes of viral evolution that become apparent through detailed phylogenomic analysis but not necessarily in reconstructions based on phyletic patterns of genes.This article was reviewed by: Patrick Forterre, George V. Shpakovski, and Igor B. Zhulin.Viruses are ubiquitous parasites of all cellular life forms. In recent years, extensive genome sequencing and comparative analysis of both viral and host genomes yielded
A Metastate HMM with Application to Gene Structure Identification in Eukaryotes  [cached]
Winters-Hilt Stephen,Baribault Carl
EURASIP Journal on Advances in Signal Processing , 2010,
Abstract: We introduce a generalized-clique hidden Markov model (HMM) and apply it to gene finding in eukaryotes (C. elegans). We demonstrate a HMM structure identification platform that is novel and robustly-performing in a number of ways. The generalized clique HMM begins by enlarging the primitive hidden states associated with the individual base labels (as exon, intron, or junk) to substrings of primitive hidden states, or footprint states, having a minimal length greater than the footprint state length. The emissions are likewise expanded to higher order in the fundamental joint probability that is the basis of the generalized-clique, or "metastate", HMM. We then consider application to eukaryotic gene finding and show how such a metastate HMM improves the strength of coding/noncoding-transition contributions to gene-structure identification. We will describe situations where the coding/noncoding-transition modeling can effectively recapture the exon and intron heavy tail distribution modeling capability as well as manage the exon-start needle-in-the-haystack problem. In analysis of the C. elegans genome we show that the sensitivity and specificity (SN,SP) results for both the individual-state and full-exon predictions are greatly enhanced over the standard HMM when using the generalized-clique HMM.
Prediction of State of Wireless Network Using Markov and Hidden Markov Model  [cached]
MD. Osman Gani,Hasan Sarwar,Chowdhury Mofizur Rahman
Journal of Networks , 2009, DOI: 10.4304/jnw.4.10.976-984
Abstract: Optimal resource allocation and higher quality of service is a much needed requirement in case of wireless networks. In order to improve the above factors, intelligent prediction of network behavior plays a very important role. Markov Model (MM) and Hidden Markov Model (HMM) are proven prediction techniques used in many fields. In this paper, we have used Markov and Hidden Markov prediction tools to predict the number of wireless devices that are connected to a specific Access Point (AP) at a specific instant of time. Prediction has been performed in two stages. In the first stage, we have found state sequence of wireless access points (AP) in a wireless network by observing the traffic load sequence in time. It is found that a particular choice of data may lead to 91% accuracy in predicting the real scenario. In the second stage, we have used Markov Model to find out the future state sequence of the previously found sequence from first stage. The prediction of next state of an AP performed by Markov Tool shows 88.71% accuracy. It is found that Markov Model can predict with an accuracy of 95.55% if initial transition matrix is calculated directly. We have also shown that O(1) Markov Model gives slightly better accuracy in prediction compared to O(2) MM for predicting far future.
Hidden markov model for the prediction of transmembrane proteins using MATLAB  [cached]
Navaneet Chaturvedi*,Vinay Kumar Singh3,Sudhanshu Shanker2,Dhiraj Sinha4
Bioinformation , 2011,
Abstract: Since membranous proteins play a key role in drug targeting therefore transmembrane proteins prediction is active and challenging area of biological sciences. Location based prediction of transmembrane proteins are significant for functional annotation of protein sequences. Hidden markov model based method was widely applied for transmembrane topology prediction. Here we have presented a revised and a better understanding model than an existing one for transmembrane protein prediction. Scripting on MATLAB was built and compiled for parameter estimation of model and applied this model on amino acid sequence to know the transmembrane and its adjacent locations. Estimated model of transmembrane topology was based on TMHMM model architecture. Only 7 super states are defined in the given dataset, which were converted to 96 states on the basis of their length in sequence. Accuracy of the prediction of model was observed about 74 %, is a good enough in the area of transmembrane topology prediction. Therefore we have concluded the hidden markov model plays crucial role in transmembrane helices prediction on MATLAB platform and it could also be useful for drug discovery strategy.
Two electrons in an external oscillator potential: hidden algebraic structure  [PDF]
Alexander Turbiner
Physics , 1994, DOI: 10.1103/PhysRevA.50.5335
Abstract: It is shown that the Coulomb correlation problem for a system of two electrons (two charged particles) in an external oscillator potential possesses a hidden $sl_2$-algebraic structure being one of recently-discovered quasi-exactly-solvable problems. The origin of existing exact solutions to this problem, recently discovered by several authors, is explained. A degeneracy of energies in electron-electron and electron-positron correlation problems is found. It manifests the first appearence of hidden $sl_2$-algebraic structure in atomic physics.
Prediction of protein binding sites in protein structures using hidden Markov support vector machine
Bin Liu, Xiaolong Wang, Lei Lin, Buzhou Tang, Qiwen Dong, Xuan Wang
BMC Bioinformatics , 2009, DOI: 10.1186/1471-2105-10-381
Abstract: In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods.The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.Identification of protein binding site has significant impact on understanding protein function. Development of fast and accurate computational methods for protein binding site prediction is helpful in identifying functionally important amino acid residues and facilitating experimental efforts to catalogue protein interactions. It also plays a key role in enhancing computational docking studies, drug design and functional annotation for the structurally determined proteins with unknown function [1].Protein binding site prediction has been studied for decades [2-4]. Several machine learning methods have been applied in this field. These methods can be split into two groups: classificati
Nonparametric Bayesian Approaches to Non-homogeneous Hidden Markov Models  [PDF]
Abhra Sarkar,Anindya Bhadra,Bani K. Mallick
Statistics , 2012,
Abstract: In this article a flexible Bayesian non-parametric model is proposed for non-homogeneous hidden Markov models. The model is developed through the amalgamation of the ideas of hidden Markov models and predictor dependent stick-breaking processes. Computation is carried out using auxiliary variable representation of the model which enable us to perform exact MCMC sampling from the posterior. Furthermore, the model is extended to the situation when the predictors can simultaneously in influence the transition dynamics of the hidden states as well as the emission distribution. Estimates of few steps ahead conditional predictive distributions of the response have been used as performance diagnostics for these models. The proposed methodology is illustrated through simulation experiments as well as analysis of a real data set concerned with the prediction of rainfall induced malaria epidemics.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.