%0 Journal Article %T Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis %A John P. Jakupciak %A Jeffrey M. Wells %A Richard J. Karalus %A David R. Pawlowski %A Jeffrey S. Lin %A Andrew B. Feldman %J Journal of Nucleic Acids %D 2013 %I Hindawi Publishing Corporation %R 10.1155/2013/801505 %X Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. 1. Introduction Genome sequencing data of mixtures can function as biomarkers for identification of genetic content of samples and to establish a sample¡¯s genome profile, inclusive of major and minor genome components, drill down to identify SNPs and mutation events, compare relatedness of genetic content between samples, profile-to-profile, and provide a probabilistic or statistical scoring confidence for sample attribution. While high-throughput, automated sequencing has been used for years, analysis of sequencing information has focused on consensus sequencing [1¨C5]. In addition, sequencing has been used to infer microbial relationships [6¨C8]. Due to the ease of generating large volumes of sequence data, there has been pressure to develop computational tools [9]. Novel approaches, based on probabilistic analysis of sequencing information for mixtures and metagenomic samples, enable a broad capture of sequence data from a single run to characterize multiple genomes in a sample, even in isolates that are considered pure [10, 11]. When identifying genomes and determining the distribution of related organisms, knowing the populations of genomes in a sample is critical to accurate biomarker detection %U http://www.hindawi.com/journals/jna/2013/801505/