|
Genome Biology 2010
Out of the sequencer and into the wiki as we face new challenges in genome informaticsDOI: 10.1186/gb-2010-11-10-308 Abstract: Next generation sequencing (NGS) analysis, open-source software, cloud computing and wiki-style genomics were among the hot topics and discussions at the recent Genome Informatics meeting at the Wellcome Trust Genome Campus, Cambridge, UK. Here we summarize some highlights of the meeting.Comparison of related genomes can generate a wealth of knowledge about genome evolution and function. Recent advances in NGS technologies have greatly increased the scale and scope with which we can interrogate novel genomes and uncover genetic variation. However, for variation detection and statistical analysis, there are false positive errors for various reasons, notably incompleteness of reference genomes, read mapping errors or limitations, and sequencing-induced features. Benjamin Dickins (Penn State University, University Park, USA) discussed an approach to estimate polymorphism accuracy from NGS data by deeply sequencing a small plasmid genome and comparing it with Sanger sequencing.Elliott Margulies (National Human Genome Research Institute, Bethesda, USA) gave an enticing presentation on this topic entitled 'Analysis of identical twins' genomes reveals sources of false-positive variation detection'. With 55× and 50× depth of read coverage from each twin's sample, they initially identified 83,538 discordant genotype calls across 97.6% of the human reference genome. Through inspection of a random set of discordantly genotyped positions, he revealed that a majority occurred in regions with poorly aligned reads. Margulies noted that he would be highly suspicious of genotype calls in regions with high coverage but with low mapping scores. When these events were filtered out, the number dropped to 13,140, a reduction of 84%. By then further introducing other filtering mechanisms, such as incorrect alignments of short reads across indels, Q20 (99% confidence) evidence in the other twin and 10% allele frequency, Margulies' final number of discordant genotype calls was only in the r
|