Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data

DOI: 10.1186/1751-0473-3-5

A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads.Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension.The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.The introduction of micro-fabricated high-density picolitre reactor pyrophosphate sequencing [1,2] by the company 454 Life Sciences (454 Life Sciences Corp., 20 Commercial Street, Branford, Connecticut 06405, USA; hereafter referred to as 454 sequencing) makes available for the first time large quantities of sequence data at reasonable cost. The continual reduction in sequencing cost will encourage sequencing by small groups or individual researchers with modest computer resources and limited experience of bioinformatics tools.The nature of the data from pyrophosphate sequencing is however both qualitatively and quantitatively different from that generated by Sanger sequencing [3] using fluorescent chain-terminating nucleotide analogues [4]. Instead of receiving a single consensus sequence with associated chromatogram (scf) file, this form of pyrophosphate sequencing generated for us 300 000 short sequence reads, around 100 bases long [5], assembled into several hundred contigs. However the normal system of ch


