%0 Journal Article %T NLStradamus: a simple Hidden Markov Model for nuclear localization signal prediction %A Alex N Nguyen Ba %A Anastassia Pogoutse %A Nicholas Provart %A Alan M Moses %J BMC Bioinformatics %D 2009 %I BioMed Central %R 10.1186/1471-2105-10-202 %X In this paper, we present an analysis of characterized NLSs in yeast, and find, despite the large number of nuclear import pathways, that NLSs seem to show similar patterns of amino acid residues. We test current prediction methods and observe a low true positive rate. We therefore suggest an approach using hidden Markov models (HMMs) to predict novel NLSs in proteins. We show that our method is able to consistently find 37% of the NLSs with a low false positive rate and that our method retains its true positive rate outside of the yeast data set used for the training parameters.Our implementation of this model, NLStradamus, is made available at: http://www.moseslab.csb.utoronto.ca/NLStradamus/ webciteEukaryotic cells are defined by the presence of their nucleus. The nuclear membrane enclosing the genetic material of the cell is selective in its import of material through its nuclear pores and this translocation is mediated by cellular mechanisms [1,2].Proteins entering the nucleus must do so through proteins forming the nuclear pores: the nuclear pore complex [3,4]. The pores allow the passive diffusion of small proteins, but bigger proteins entering the nucleus are usually bound by karyopherin complexes on their nuclear localization signal [5]. Although there are many nuclear import pathways in eukaryotic cells, most of these have not been characterized in detail. The best understood is the classical NLS pathway. The recognition of classical NLSs on nuclear proteins is done by the importin-¦Á subunit which in turn is recognized by the importin-¦Â subunit. This trimer (cargo, importin-¦Á and importin-¦Â) is then imported to the nucleus after series of enzymatic steps [1,6]. Other families of NLSs are independent of importin-¦Á, and may bind directly to one of the members of the importin-¦Â superfamily [1].Classical NLSs show characteristic patterns of basic residues loosely matching two consensus sequences, K(K/R)X(K/R) and KRX10¨C12KRXK, termed the 'monopartite' and 'bip %U http://www.biomedcentral.com/1471-2105/10/202