High-throughput sequencing technologies have made it possible to study bacteria through analyzing their genome sequences. For instance, comparative genome sequence analyses can reveal the phenomenon such as gene loss, gene gain, or gene exchange in a genome. By analyzing pathogenic bacterial genomes, we can discover that pathogenic genomic regions in many pathogenic bacteria are horizontally transferred from other bacteria, and these regions are also known as pathogenicity islands (PAIs). PAIs have some detectable properties, such as having different genomic signatures than the rest of the host genomes, and containing mobility genes so that they can be integrated into the host genome. In this review, we will discuss various pathogenicity island-associated features and current computational approaches for the identification of PAIs. Existing pathogenicity island databases and related computational resources will also be discussed, so that researchers may find it to be useful for the studies of bacterial evolution and pathogenicity mechanisms.
Koskiniemi, S.; Sun, S.; Berg, O.G.; Andersson, D.I. Selection-driven gene loss in bacteria. PLoS Genet. 2012, 8, e1002787, doi:10.1371/journal.pgen.1002787.
[3]
Maurelli, A.T. Black holes, antivirulence genes, and gene inactivation in the evolution of bacterial pathogens. FEMS Microbiol. Lett. 2007, 267, 1–8, doi:10.1111/j.1574-6968.2006.00526.x.
[4]
Penn, K.; Jenkins, C.; Nett, M.; Udwary, D.W.; Gontang, E.A.; McGlinchey, R.P.; Foster, B.; Lapidus, A.; Podell, S.; Allen, E.E.; et al. Genomic islands link secondary metabolism to functional adaptation in marine Actinobacteria. ISME J. 2009, 3, 1193–1203, doi:10.1038/ismej.2009.58.
[5]
Hacker, J.; Bender, L.; Ott, M.; Wingender, J.; Lund, B.; Marre, R.; Goebel, W. Deletions of chromosomal regions coding for fimbriae and hemolysins occur in vitro and in vivo in various extraintestinal Escherichia coli isolates. Microb. Pathog. 1990, 8, 213–225, doi:10.1016/0882-4010(90)90048-U.
[6]
Blum, G.; Falbo, V.; Caprioli, A.; Hacker, J. Gene clusters encoding the cytotoxic necrotizing factor type 1, Prs-fimbriae and alpha-hemolysin form the pathogenicity island II of the uropathogenic Escherichia coli strain J96. FEMS Microbiol. Lett. 1995, 126, 189–195.
[7]
Blum, G.; Ott, M.; Lischewski, A.; Ritter, A.; Imrich, H.; Tschape, H.; Hacker, J. Excision of large DNA regions termed pathogenicity islands from tRNA-specific loci in the chromosome of an Escherichia coli wild-type pathogen. Infect. Immun. 1994, 62, 606–614.
[8]
Swenson, D.L.; Bukanov, N.O.; Berg, D.E.; Welch, R.A. Two pathogenicity islands in uropathogenic Escherichia coli J96: Cosmid cloning and sample sequencing. Infect. Immun. 1996, 64, 3736–3743.
[9]
McDaniel, T.K.; Jarvis, K.G.; Donnenberg, M.S.; Kaper, J.B. A genetic locus of enterocyte effacement conserved among diverse enterobacterial pathogens. Proc. Natl. Acad. Sci. USA 1995, 92, 1664–1668, doi:10.1073/pnas.92.5.1664.
[10]
Billington, S.J.; Sinistaj, M.; Cheetham, B.F.; Ayres, A.; Moses, E.K.; Katz, M.E.; Rood, J.I. Identification of a native Dichelobacter nodosus plasmid and implications for the evolution of the vap regions. Gene 1996, 172, 111–116, doi:10.1016/0378-1119(96)00032-7.
[11]
Censini, S.; Lange, C.; Xiang, Z.; Crabtree, J.E.; Ghiara, P.; Borodovsky, M.; Rappuoli, R.; Covacci, A. Cag, a pathogenicity island of Helicobacter pylori, encodes type I-specific and disease-associated virulence factors. Proc. Natl. Acad. Sci. USA 1996, 93, 14648–14653, doi:10.1073/pnas.93.25.14648.
[12]
Fetherston, J.D.; Perry, R.D. The pigmentation locus of Yersinia pestis KIM6+ is flanked by an insertion sequence and includes the structural genes for pesticin sensitivity and HMWP2. Mol. Microbiol. 1994, 13, 697–708, doi:10.1111/j.1365-2958.1994.tb00463.x.
[13]
Du, P.; Yang, Y.; Wang, H.; Liu, D.; Gao, G.F.; Chen, C. A large scale comparative genomic analysis reveals insertion sites for newly acquired genomic islands in bacterial genomes. BMC Microbiol. 2011, 11, 135, doi:10.1186/1471-2180-11-135.
[14]
Karlin, S. Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol. 2001, 9, 335–343, doi:10.1016/S0966-842X(01)02079-0.
[15]
Vernikos, G.S.; Parkhill, J. Interpolated variable order motifs for identification of horizontally acquired DNA: Revisiting the Salmonella pathogenicity islands. Bioinformatics 2006, 22, 2196–2203, doi:10.1093/bioinformatics/btl369.
[16]
Rajan, I.; Aravamuthan, S.; Mande, S.S. Identification of compositionally distinct regions in genomes using the centroid method. Bioinformatics 2007, 23, 2672–2677, doi:10.1093/bioinformatics/btm405.
[17]
Waack, S.; Keller, O.; Asper, R.; Brodag, T.; Damm, C.; Fricke, W.F.; Surovcik, K.; Meinicke, P.; Merkl, R. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinforma. 2006, 7, 142, doi:10.1186/1471-2105-7-142.
[18]
Karlin, S.; Mrazek, J. Predicted highly expressed genes of diverse prokaryotic genomes. J. Bacteriol. 2000, 182, 5238–5250, doi:10.1128/JB.182.18.5238-5250.2000.
[19]
Lawrence, J.G.; Ochman, H. Amelioration of bacterial genomes: Rates of change and exchange. J. Mol. Evol. 1997, 44, 383–397, doi:10.1007/PL00006158.
[20]
Gal-Mor, O.; Finlay, B.B. Pathogenicity islands: A molecular toolbox for bacterial virulence. Cell. Microbiol. 2006, 8, 1707–1719, doi:10.1111/j.1462-5822.2006.00794.x.
[21]
Hacker, J.; Kaper, J.B. Pathogenicity islands and the evolution of microbes. Annu. Rev. Microbiol. 2000, 54, 641–679, doi:10.1146/annurev.micro.54.1.641.
[22]
Schmidt, H.; Hensel, M. Pathogenicity islands in bacterial pathogenesis. Clin. Microbiol. Rev. 2004, 17, 14–56, doi:10.1128/CMR.17.1.14-56.2004.
[23]
Ho Sui, S.J.; Fedynak, A.; Hsiao, W.W.; Langille, M.G.; Brinkman, F.S. The association of virulence factors with genomic islands. PLoS One 2009, 4, e8094.
[24]
Zhou, C.E.; Smith, J.; Lam, M.; Zemla, A.; Dyer, M.D.; Slezak, T. MvirDB—A microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications. Nucleic Acids Res. 2007, 35, D391–D394, doi:10.1093/nar/gkl791.
[25]
Garg, A.; Gupta, D. VirulentPred: A SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinforma. 2008, 9, 62, doi:10.1186/1471-2105-9-62.
[26]
Dobrindt, U.; Hochhut, B.; Hentschel, U.; Hacker, J. Genomic islands in pathogenic and environmental microorganisms. Nat. Rev. Microbiol. 2004, 2, 414–424, doi:10.1038/nrmicro884.
Vernikos, G.S.; Parkhill, J. Resolving the structural features of genomic islands: A machine learning approach. Genome Res. 2008, 18, 331–342, doi:10.1101/gr.7004508.
[30]
Nakamura, Y.; Itoh, T.; Matsuda, H.; Gojobori, T. Biased biological functions of horizontally transferred genes in prokaryotic genomes. Nat. Genet. 2004, 36, 760–766, doi:10.1038/ng1381.
[31]
Dobrindt, U.; Hentschel, U.; Kaper, J.B.; Hacker, J. Genome plasticity in pathogenic and nonpathogenic enterobacteria. Curr. Top. Microbiol. Immunol. 2002, 264, 157–175.
Ou, H.Y.; He, X.; Harrison, E.M.; Kulasekara, B.R.; Thani, A.B.; Kadioglu, A.; Lory, S.; Hinton, J.C.; Barer, M.R.; Deng, Z.; et al. MobilomeFINDER: Web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res. 2007, 35, W97–W104, doi:10.1093/nar/gkm380.
[34]
Fouts, D.E. Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 2006, 34, 5839–5851, doi:10.1093/nar/gkl732.
[35]
Boyd, E.F.; Almagro-Moreno, S.; Parent, M.A. Genomic islands are dynamic, ancient integrative elements in bacterial evolution. Trends Microbiol. 2009, 17, 47–53, doi:10.1016/j.tim.2008.11.003.
[36]
Lowe, T.M.; Eddy, S.R. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25, 955–964.
[37]
Hsiao, W.W.; Ung, K.; Aeschliman, D.; Bryan, J.; Finlay, B.B.; Brinkman, F.S. Evidence of a large novel gene pool associated with prokaryotic genomic islands. PLoS Genet. 2005, 1, e62, doi:10.1371/journal.pgen.0010062.
[38]
Lukashin, A.V.; Borodovsky, M. GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 1998, 26, 1107–1115, doi:10.1093/nar/26.4.1107.
Altschul, S.F.; Madden, T.L.; Schaffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402, doi:10.1093/nar/25.17.3389.
[41]
Bairoch, A.; Apweiler, R.; Wu, C.H.; Barker, W.C.; Boeckmann, B.; Ferro, S.; Gasteiger, E.; Huang, H.; Lopez, R.; Magrane, M.; et al. The universal protein resource (UniProt). Nucleic Acids Res. 2005, 33, D154–D159.
[42]
Tatusov, R.L.; Natale, D.A.; Garkavtsev, I.V.; Tatusova, T.A.; Shankavaram, U.T.; Rao, B.S.; Kiryutin, B.; Galperin, M.Y.; Fedorova, N.D.; Koonin, E.V. The COG database: New developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001, 29, 22–28, doi:10.1093/nar/29.1.22.
[43]
Wang, H.; Fazekas, J.; Booth, M.; Liu, Q.; Che, D. An Integrative Approach for Genomic Island Prediction in Prokaryotic Genomes. In Bioinformatics Research and Applications; Chen, J., Wang, J., Zelikovsky, A., Eds.; Springer Berlin/Heidelberg: Berlin, Gremany, 2011; Volume 6674, pp. 404–415.
[44]
Hacker, J.; Blum-Oehler, G.; Muhldorfer, I.; Tschape, H. Pathogenicity islands of virulent bacteria: Structure, function and impact on microbial evolution. Mol. Microbiol. 1997, 23, 1089–1097, doi:10.1046/j.1365-2958.1997.3101672.x.
[45]
Kurtz, S.; Choudhuri, J.V.; Ohlebusch, E.; Schleiermacher, C.; Stoye, J.; Giegerich, R. REPuter: The manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001, 29, 4633–4642, doi:10.1093/nar/29.22.4633.
[46]
Siguier, P.; Perochon, J.; Lestrade, L.; Mahillon, J.; Chandler, M. ISfinder: The reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006, 34, D32–D36, doi:10.1093/nar/gkj014.
[47]
Langille, M.G.; Brinkman, F.S. Bioinformatic detection of horizontally transferred DNA in bacterial genomes. F1000 Biol. Rep. 2009, 1, 25.
[48]
Avise, J.C. Gene trees and organismal histories: A phylogenetic approach to population biology. Evolution 1989, 43, 1192–1208, doi:10.2307/2409356.
[49]
Langille, M.G.; Hsiao, W.W.; Brinkman, F.S. Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinforma. 2008, 9, 329, doi:10.1186/1471-2105-9-329.
[50]
Qi, J.; Luo, H.; Hao, B. CVTree: A phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res. 2004, 32, W45–W47, doi:10.1093/nar/gkh362.
Chiapello, H.; Bourgait, I.; Sourivong, F.; Heuclin, G.; Gendrault-Jacquemard, A.; Petit, M.A.; El Karoui, M. Systematic determination of the mosaic structure of bacterial genomes: Species backbone versus strain-specific loops. BMC Bioinforma. 2005, 6, 171, doi:10.1186/1471-2105-6-171.
[53]
Chiapello, H.; Gendrault, A.; Caron, C.; Blum, J.; Petit, M.A.; El Karoui, M. MOSAIC: An online database dedicated to the comparative genomics of bacterial strains at the intra-species level. BMC Bioinforma. 2008, 9, 498, doi:10.1186/1471-2105-9-498.
[54]
Kurtz, S.; Phillippy, A.; Delcher, A.L.; Smoot, M.; Shumway, M.; Antonescu, C.; Salzberg, S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004, 5, R12, doi:10.1186/gb-2004-5-2-r12.
[55]
Hohl, M.; Kurtz, S.; Ohlebusch, E. Efficient multiple genome alignment. Bioinformatics 2002, 18, S312–S320, doi:10.1093/bioinformatics/18.suppl_1.S312.
[56]
Che, D.; Hasan, M.S.; Wang, H.; Fazekas, J.; Huang, J.; Liu, Q. EGID: An ensemble algorithm for improved genomic island detection in genomic sequences. Bioinformation 2011, 7, 311–314, doi:10.6026/007/97320630007311.
Shrivastava, S.; Reddy Ch, V.; Mande, S.S. INDeGenIUS, a new method for high-throughput identification of specialized functional islands in completely sequenced organisms. J. Biosci. 2010, 35, 351–364, doi:10.1007/s12038-010-0040-4.
[59]
Tu, Q.; Ding, D. Detecting pathogenicity islands and anomalous gene clusters by iterative discriminant analysis. FEMS Microbiol. Lett. 2003, 221, 269–275, doi:10.1016/S0378-1097(03)00204-0.
[60]
Hasan, M.S.; Liu, Q.; Wang, H.; Fazekas, J.; Chen, B.; Che, D. GIST: Genomic island suite of tools for predicting genomic islands in genomic sequences. Bioinformation 2012, 8, 203–205, doi:10.6026/97320630008203.
[61]
Che, D.; Hockenbury, C.; Marmelstein, R.; Rasheed, K. Classification of genomic islands using decision trees and their ensemble algorithms. BMC Genomics 2010, 11, S1.
[62]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140.
[63]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32, doi:10.1023/A:1010933404324.
[64]
Lee, C.C.; Chen, Y.P.; Yao, T.J.; Ma, C.Y.; Lo, W.C.; Lyu, P.C.; Tang, C.Y. GI-POP: A combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects. Gene 2013, 518, 114–123, doi:10.1016/j.gene.2012.11.063.
[65]
Soares, S.C.; Abreu, V.A.; Ramos, R.T.; Cerdeira, L.; Silva, A.; Baumbach, J.; Trost, E.; Tauch, A.; Hirata, R., Jr.; Mattos-Guaraldi, A.L.; et al. PIPS: Pathogenicity island prediction software. PLoS One 2012, 7, e30848, doi:10.1371/journal.pone.0030848.
[66]
Merkl, R. SIGI: Score-based identification of genomic islands. BMC Bioinforma. 2004, 5, 22, doi:10.1186/1471-2105-5-22.
[67]
Che, D.; Wang, H. GIV: A tool for genomic islands visualization. Bioinformation 2013, 9, 879–882, doi:10.6026/97320630009879.
[68]
Krzywinski, M.; Schein, J.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.J.; Marra, M.A. Circos: An information aesthetic for comparative genomics. Genome Res. 2009, 19, 1639–1645, doi:10.1101/gr.092759.109.
[69]
Stewart, A.C.; Osborne, B.; Read, T.D. DIYA: A bacterial annotation pipeline for any genomics lab. Bioinformatics 2009, 25, 962–963, doi:10.1093/bioinformatics/btp097.
[70]
Jain, R.; Ramineni, S.; Parekh, N. IGIPT—Integrated genomic island prediction tool. Bioinformation 2011, 7, 307–310, doi:10.6026/007/97320630007307.
[71]
Mantri, Y.; Williams, K.P. Islander: A database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res. 2004, 32, D55–D58, doi:10.1093/nar/gkh059.
[72]
Laslett, D.; Canback, B.; Andersson, S. BRUCE: A program for the detection of transfer-messenger RNA genes in nucleotide sequences. Nucleic Acids Res. 2002, 30, 3449–3453, doi:10.1093/nar/gkf459.
[73]
Langille, M.G.; Brinkman, F.S. IslandViewer: An integrated interface for computational identification and visualization of genomic islands. Bioinformatics 2009, 25, 664–665, doi:10.1093/bioinformatics/btp030.
Yoon, S.H.; Park, Y.K.; Lee, S.; Choi, D.; Oh, T.K.; Hur, C.G.; Kim, J.F. Towards pathogenomics: A web-based resource for pathogenicity islands. Nucleic Acids Res. 2007, 35, D395–D400, doi:10.1093/nar/gkl790.
[76]
Yoon, S.H.; Hur, C.G.; Kang, H.Y.; Kim, Y.H.; Oh, T.K.; Kim, J.F. A computational approach for identifying pathogenicity islands in prokaryotic genomes. BMC Bioinforma. 2005, 6, 184, doi:10.1186/1471-2105-6-184.
[77]
Pundhir, S.; Vijayvargiya, H.; Kumar, A. PredictBias: A server for the identification of genomic and pathogenicity islands in prokaryotes. In Silico Biol. 2008, 8, 223–234.
[78]
Jungo, F.; Bairoch, A. Tox-Prot, the toxin protein annotation program of the Swiss-Prot protein knowledgebase. Toxicon 2005, 45, 293–301, doi:10.1016/j.toxicon.2004.10.018.
[79]
Srinivasan, K.N.; Gopalakrishnakone, P.; Tan, P.T.; Chew, K.C.; Cheng, B.; Kini, R.M.; Koh, J.L.; Seah, S.H.; Brusic, V. SCORPION, a molecular database of scorpion toxins. Toxicon 2002, 40, 23–31, doi:10.1016/S0041-0101(01)00182-9.
[80]
Paine, K.; Flower, D.R. Bacterial bioinformatics: Pathogenesis and the genome. J. Mol. Microbiol. Biotechnol. 2002, 4, 357–365.
Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28, 45–48, doi:10.1093/nar/28.1.45.
[87]
Tompa, M.; Li, N.; Bailey, T.L.; Church, G.M.; de Moor, B.; Eskin, E.; Favorov, A.V.; Frith, M.C.; Fu, Y.; Kent, W.J.; et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 2005, 23, 137–144, doi:10.1038/nbt1053.
[88]
Brouwer, R.W.; Kuipers, O.P.; van Hijum, S.A. The relative value of operon predictions. Brief. Bioinforma. 2008, 9, 367–375, doi:10.1093/bib/bbn019.