|
Genome Biology 2010
Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assemblyDOI: 10.1186/gb-2010-11-12-r128 Abstract: Next-generation sequencing technologies promise a revolution in our ability to understand the architecture of the human genome, and to decipher how this architecture contributes to disease [1]. This understanding is dependent on our ability to accurately detect differences between individuals on a genome-wide scale. Although nucleotide level variants such as SNPs and insertions/deletions (indels) are numerous, large structural variants, such as deletions, duplications and inversions, affect more sequence, and as much as 15% of the human genome falls into copy number variable regions [1]. Many of the software packages currently available to detect structural variants (SVs) employ algorithms that utilize data derived from the mapping of paired-end sequence reads, using anomalously mapped read pairs as a means for detecting and cataloguing these variants. Deletions, for example, are detected when the distance between mapped paired-end reads is significantly smaller than the average size distribution of other mapped read pairs from the same mate-pair sequencing library. Similarly, inversions may be identified when read pairs are mapped to the same strand of the reference genome. Examples of software using this approach include BreakDancer [2] and VariationHunter [3]. Other software packages such as Pindel [4] apply a split-mapping approach where one end of a pair of sequence reads is mapped uniquely to the genome and acts as an anchor, while the other end is mapped so as to detect the SV breakpoint. A third approach used to detect SVs involves ascertaining changes in read depth coverage, which reflect gains and losses in sequence copy number. Calling variants in this way will report regions of the reference genome that appear to be duplicated or deleted. This analysis, however, will not report the precise location of the duplicated sequence. Several algorithms have been developed for calling copy number variants in this way, including cnD, which applies a hidden Markov
|