%0 Journal Article %T State of the art de novo assembly of human genomes from massively parallel sequencing data %A Yingrui Li %A Yujie Hu %A Lars Bolund %A Jun Wang %J Human Genomics %D 2010 %I BioMed Central %R 10.1186/1479-7364-4-4-271 %X One of the important goals of bioinformatics is to decipher the genome DNA sequence of a species. The genome serves as the digital basis of any life science. Access to a reference genome sequence for a species significantly facilitates biological studies, as proven by all the genomics-guided research in the wake of the Human Genome Project [1]. It is conventionally believed that when a reference genome is available, any following studies will take a mapping-based 're-sequencing' approach aiming for variation detection, as seen in many projects of human genomics [2,3]. Recent studies, however, suggest that assembly-based approaches have greater potential to detect a more complete set of genetic variations, especially novel sequences [4] and structural variations,[5] even in relatively well-studied human genomes. Thus, assembly of individual genomes has again been brought to the frontier of bioinformatics. With multiple assembled individual genomes available, it would be very interesting to see how rearrangements of different length scales and individual-specific sequences are distributed in the populations.The size of the human genome constrained individual human assembly by conventional Sanger sequencing because of costs. Second-generation sequencing technology produces large amounts of data more affordably, but the intrinsic high-throughput and short-read-length present considerable challenges to bioinformatics because of the difficulties in handling the data structure and in applying an appropriate assembly algorithm. Although many short-read de novo assemblers have been developed,[6] only two of them, ABySS [7] and SOAPdenovo,[8] are said to be capable of assembling human genomes de novo. This paper presents a review of the two software packages and discusses the technical aspects of human genome short-read de novo assembly.To be able to use short-read-length data to meet the general minimum requirement of overlap between reads in order accurately to assemble a h %K de novo assembly %K de Bruijn graph %K massively parallel sequencing %U http://www.humgenomics.com/content/4/4/271