%0 Journal Article
%T PAVE: Program for assembling and viewing ESTs
%A Carol Soderlund
%A Eric Johnson
%A Matthew Bomhoff
%A Anne Descour
%J BMC Genomics
%D 2009
%I BioMed Central
%R 10.1186/1471-2164-10-400
%X The PAVE (Program for Assembling and Viewing ESTs) assembler takes advantage of the 5' and 3' mate-pair information by requiring that the mate-pairs be assembled into the same contig and joined by n's if the two sub-contigs do not overlap. It handles the depth of 454 data sets by "burying" similar ESTs during assembly, which retains the expression level information while circumventing time and space problems. PAVE uses MegaBLAST for the clustering step and CAP3 for assembly, however it assembles incrementally to enforce the mate-pair constraint, bury ESTs, and reduce incorrect joins and splits. The PAVE data management system uses a MySQL database to store multiple libraries of ESTs along with their metadata; the management system allows multiple assemblies with variations on libraries and parameters. Analysis routines provide standard annotation for the contigs including a measure of differentially expressed genes across the libraries. A Java viewer program is provided for display and analysis of the results. Our results clearly show the benefit of using the PAVE assembler to explicitly use mate-pair information and bury ESTs for large contigs.The PAVE assembler provides a software package for assembling Sanger and/or 454 ESTs. The assembly software, data management software, Java viewer and user's guide are freely available.ESTs have been prevalent in genomic research since the first large scale EST project in 1991 [1]. There are many EST projects that study the gene content of genome, tissue, or condition-specific transcripts (e.g. see Additional file 1: List of EST papers, section 4). In October 2005, 454 Life Sciences released the GS 20 pyrosequencer that generates over 100,000 reads per run with an average length of 110 bases [2-4]. In January 2007, they released the GS FLX that generates over 200,000 reads with length between 200每300. Table 1 shows the growth of the number of ESTs in GenBank in relation to their length. Many of the short sequences released af
%U http://www.biomedcentral.com/1471-2164/10/400