%0 Journal Article %T SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler %A Ruibang Luo %A Binghang Liu %A Yinlong Xie %A Zhenyu Li %A Weihua Huang %A Jianying Yuan %A Guangzhu He %A Yanxiang Chen %A Qi Pan %A Yunjie Liu %A Jingbo Tang %A Gengxiong Wu %A Hao Zhang %A Yujian Shi %A Yong Liu %A Chang Yu %A Bo Wang %A Yao Lu %A Changlei Han %A David W Cheung %A Siu-Ming Yiu %A Shaoliang Peng %A Zhu Xiaoqian %A Guangming Liu %A Xiangke Liao %A Yingrui Li %A Huanming Yang %A Jian Wang %A Tak-Wah Lam %A Jun Wang %J GigaScience %D 2012 %I BioMed Central %R 10.1186/2047-217x-1-18 %X To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.The increased use of next generation sequencing (NGS) has resulted in an increased growth of the number of de novo genome assemblies being carried out using short reads. Although there are several de novo assemblers available, there remains room for improvement as shown in recent assembly evaluation projects such as Assemblathon 1 [1] and GAGE [2]. Since the publication of the first version of SOAPdenovo [3], it has been used to assemble many large eukaryotic genomes, but reports have indicated areas that would benefit from updates, including assembly coverage and length [4,5].SOAPdenovo2, as with SOAPdenovo, is made up of six modules that handle read error correction, de Bruijn graph (DBG) construction, contig assembly, paired-end (PE) reads mapping, scaffold construction, and gap closure. The major improvements we have made for in SOAPdenovo2 are: 1) enhancing the error correction algorithm, 2) providing a reduction in memory consumption in DBG constructions, 3) resolving longer repeat regions in contig assembly, 4) increasing assembly length and cover %K Genome %K Assembly %K Contig %K Scaffold %K Error correction %K Gap-filling %U http://www.gigasciencejournal.com/content/1/1/18