|
Genome Medicine 2012
Improving bioinformatic pipelines for exome variant callingDOI: 10.1186/gm306 Keywords: Next-generation sequencing, exomes, variant calling, single nucleotide variation, insertion, deletions Abstract: See research article http://www.biomedcentral.com/1471-2105/13/8 webcite.Next-generation DNA sequencing (NGS) has revolutionized genetics by enabling researchers to routinely sequence genomes, either in their entirety or specific subsets [1-3]. For example, exome resequencing, in which researchers enrich for all annotated and putative exons and then sequence the genomic targets, has been widely adopted. Exome sequencing has become a popular approach owing to the availability of commercial exome enrichment assays, the generally lower cost than whole-genome sequencing and the focus on coding regions and associated variants that have a direct impact on coding sequence and thus gene function. As a result, a large number of studies are using human exome resequencing to study the genetic diversity of human populations. Furthermore, exome resequencing is frequently used in the study of human diseases, including Mendelian disorders and cancer. Given the accessibility of the technology, many groups are working towards potential clinical diagnostic applications in personalized medicine.Analysis has become one of the primary challenges for NGS users, as a direct result of the sheer volume of sequencing data currently being generated. Exome sequence analysis can be generally summarized as a two step process with alignment of the data to a human genome reference followed by subsequent genetic variant calling from the post-alignment data, or, more simply, the identification of specific sequence alterations that are polymorphisms, rare variants or mutations. Exome-targeted resequencing analysis is particularly useful for the discovery of single nucleotide variants (SNVs) and insertion or deletions (indels). Although a variety of robust and now widely adopted sequence alignment tools are available, the challenge of variant calling from aligned data remains. Although alignment algorithms can be used to accurately determine the location of any sequence, it is more problematic to dete
|