|
BMC Bioinformatics 2010
An algorithm for automated closure during assemblyAbstract: A procedure called the bounding read algorithm was developed for assembly of shotgun reads plus finishing reads and their constraints, targeting repeat regions. The algorithm was implemented within the Celera Assembler software and its pyrosequencing-specific variant, CABOG. The implementation was tested on Sanger and pyrosequencing data from six genomes. The bounding read assemblies were compared to assemblies from two other methods on the same data. The algorithm generates improved assemblies of repeat regions, closing and tiling some gaps while degrading none.The algorithm is useful for small-genome automated finishing projects. Our implementation is available as open-source from http://wgs-assembler.sourceforge.net webcite under the GNU Public License.The shotgun method generates reads randomly in high volumes by Sanger and next-generation sequencing platforms. Whole-genome shotgun assembly (WGA) is the process of constructing a draft assembly of a genome from whole-genome shotgun reads (WGS). WGA software constructs a read layout by inference from shared sequence between reads and constraints between pairs of reads from the same DNA fragment (paired-ends). The randomness of WGS can be exploited in software by adopting uniformity of read coverage as an objective function to be maximized by the assembly. For instance, the Celera Assembler software [1] invokes the A-stat coverage statistic to assign lower confidence to higher-coverage mini-assemblies. The Velvet software [2] invokes low-coverage to trim branches of its de Bruijn graph.Finishing is the process of improving the quality and utility of a draft genome sequence. Finishing aims to fill gaps between contigs, enlarge contigs, or provide deeper coverage for the contigs in the draft. Some finishing is accomplished without sequencing by manually editing an automatically generated draft. Most finishing requires additional sequence referred to as finishing reads. Finishing reads derive from PCR, primer walking,
|