|
BMC Bioinformatics 2007
Computer-aided identification of polymorphism sets diagnostic for groups of bacterial and viral genetic variantsAbstract: The Not-N algorithm has been incorporated into the "Minimum SNPs" computer program and used to derive genetic markers diagnostic for multilocus sequence typing-defined clonal complexes, hepatitis C virus (HCV) subtypes, and phylogenetic clades defined by comparative genome hybridization (CGH) data for Campylobacter jejuni, Yersinia enterocolitica and Clostridium difficile.Not-N analysis is effective for identifying small sets of genetic markers diagnostic for microbial sub-groups. The best results to date have been obtained with CGH data from several bacterial species, and HCV sequence data.The last two decades have seen an exponential increase in the generation of comparative genetic data from within bacterial and viral species. Many of the bacterial data sets are derived from electrophoresis-based genotyping methods, such as pulsed-field gel electrophoresis, which has been used to develop the inter-laboratory PulseNet system for real-time monitoring of foodborne bacterial pathogens [1]. More recently, databases of defined genetic polymorphisms have become available. Conspicuous examples are multilocus sequence typing (MLST) databases [2,3], the results of comparative genome hybridization (CGH) studies on bacteria [4-7], and whole-genome sequence databases for bacteria and viruses [8-12].The extensive knowledge base of comparative genetic information can be exploited to develop rationally-designed genotyping methods for examining epidemiology, or inferring virulence potential, vaccine susceptibility or antimicrobial-antiviral resistance. One approach to discriminating known genotypes within a species is to interrogate every known genetic polymorphism. However, this approach is inefficient due to linkage of alleles, and may also provide more resolving power than is required [13]. Despite considerable improvements in nucleic acid analysis technology in recent years, there remains a need for cost-effective and rapid genotyping methods that interrogate small sets of po
|