%0 Journal Article
%T EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
%A Javier Forment
%A Francisco Gilabert
%A Antonio Robles
%A Vicente Conejero
%A Fernando Nuez
%A Jose M Blanca
%J BMC Bioinformatics
%D 2008
%I BioMed Central
%R 10.1186/1471-2105-9-5
%X We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval.The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at http://bioinf.comav.upv.es/est2uni webcite. This site also provides detailed instructions for installation and configuration of the software package. The code is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community.Recent advances in high-throughput sequencing technology have provided a mechanism to gain genomics insight on species without a complete genome sequence by generating expressed sequence tags collections (ESTs, [1]). ESTs are single-pass, partial sequences obtained from randomly selected complementary DNA (cDNA) clones and need to be processed and annotated to provide a biologically relevant data set. They include low-quality and vector regions that must be identified and removed to obtain high-quality, clean sequences suitable for further analysi
%U http://www.biomedcentral.com/1471-2105/9/5