In clinical settings it is often important to know not just the identity of a microorganism, but also the danger posed by that particular strain. For instance, Escherichia coli can range from being a harmless commensal to being a very dangerous enterohemorrhagic (EHEC) strain. Determining pathogenic phenotypes can be both time consuming and expensive. Here we propose a simple, rapid, and inexpensive method of predicting pathogenic phenotypes on the basis of the presence or absence of short homologous DNA segments in an isolate. Our method compares completely sequenced genomes without the necessity of genome alignments in order to identify the presence or absence of the segments to produce an automatic alignment of the binary string that describes each genome. Analysis of the segment alignment allows identification of those segments whose presence strongly predicts a phenotype. Clinical application of the method requires nothing more that PCR amplification of each of the set of predictive segments. Here we apply the method to identifying EHEC strains of E. coli and to distinguishing E. coli from Shigella. We show in silico that with as few as 8 predictive sequences, if even three of those predictive sequences are amplified the probability of being EHEC or Shigella is >0.99. The method is thus very robust to the occasional amplification failure for spurious reasons. Experimentally, we apply the method to screening a set of 98 isolates to distinguishing E. coli from Shigella, and EHEC from non-EHEC E. coli strains and show that all isolates are correctly identified.
References
[1]
Hall BG, Ehrlich GD, Hu FZ (2010) Pan-genome analysis provides much higher strain typing resolution than multi-locus sequence typing. Microbiology 156: 1060–1068.
[2]
Hiller NL, Janto B, Hogg JS, Boissy R, Yu S, et al. (2007) Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: insights into the pneumococcal supragenome. J Bacteriol 189: 8186–8195.
[3]
Hogg JS, Hu FZ, Janto B, Boissy R, Hayes J, et al. (2007) Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol 8: R103.
[4]
Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, et al. (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102: 13950–13955.
[5]
Hall BG, Kirkup BC, Riley MC, Barlow M (2013) Clustering Acinetobacter Strains by Optical Mapping. Genome Biol Evol. doi: 10.1093/gbe/evt085.
[6]
Wirth T, Falush D, Lan R, Colles F, Mensa P, et al. (2006) Sex and virulence in Escherichia coli: an evolutionary perspective. Mol Microbiol 60: 1136–1151.
[7]
Salipante SJ, Hall BG (2011) Inadequacies of minimum spanning trees in molecular epidemiology. J Clin Microbiol 49: 3568–3575.
[8]
Blattner FR, Plunket G III, Bloch CA, Perna NT, Burland V, et al. (1997) The complete genome sequence of Escherichia coli K-12. Science 277: 1453–1462.
[9]
Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, et al. (2001) Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res 8: 11–22.
[10]
Archer CT, Kim JF, Jeong H, Park JH, Vickers CE, et al. (2011) The genome sequence of E. coli W (ATCC 9637): comparative genome analysis and an improved genome-scale reconstruction of E. coli. BMC Genomics 12: 9.
[11]
Orskov I, Orskov F, Jann B, Jann K (1977) Serology, chemistry, and genetics of O and K antigens of Escherichia coli. Bacteriol Rev 41: 667–710.
[12]
Machado J, Grimont F, Grimont PA (2000) Identification of Escherichia coli flagellar types by restriction of the amplified fliC gene. Res Microbiol 151: 535–546.
[13]
Ochman H, Whittam TS, Caugant DA, Selander RK (1983) Enzyme polymorphism and genetic population structure of Escherichia coli and Shigella. J Gen Microbiol 129: 2715.
[14]
Rolland K, Lambert-Zechovsky N, Picard B, Denamur E (1998) Shigella and enteroinvasive Escherichia coli strains are derived from distinct ancestral strains of E. coli. Microbiology 144: 2667–2672.
[15]
Sims GE, Kim SH (2011) Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proceedings of the National Academy of Sciences of the United States of America 108: 8329–8334.
[16]
Bardhan P, Faruque AS, Naheed A, Sack DA (2010) Decrease in shigellosis-related deaths without Shigella spp.-specific interventions, Asia. Emerg Infect Dis 16: 1718–1723.
[17]
Lan R, Reeves PR (2002) Escherichia coli in disguise: molecular origins of Shigella. Microbes Infect 4: 1125–1132.
[18]
Pollard DR, Johnson WM, Lior H, Tyler SD, Rozee KR (1990) Rapid and specific detection of verotoxin genes in Escherichia coli by the polymerase chain reaction. J Clin Microbiol 28: 540–545.
[19]
Ahmed SA, Awosika J, Baldwin C, Bishop-Lilly KA, Biswas B, et al. (2012) Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS One 7: e48228.
[20]
Iowa State University (2009) Technical-Fact-Sheet: Enterohemorrhagic Escherichia coli Infections. Ames, IA: Center for Food Security and Public Health, College of Veterinary Medicine, Iowa State University.
[21]
Petty NK, Bulgin R, Crepin VF, Cerdeno-Tarraga AM, Schroeder GN, et al. (2010) The Citrobacter rodentium genome sequence reveals convergent evolution with human pathogenic Escherichia coli. J Bacteriol 192: 525–538.