%0 Journal Article %T OligoSpawn: a software tool for the design of overgo probes from large unigene datasets %A Jie Zheng %A Jan T Svensson %A Kavitha Madishetty %A Timothy J Close %A Tao Jiang %A Stefano Lonardi %J BMC Bioinformatics %D 2006 %I BioMed Central %R 10.1186/1471-2105-7-7 %X OLIGOSPAWN is a suite of software tools that offers two complementary services, namely (1) the selection of "unique" oligos each of which appears in one unigene but does not occur (exactly or approximately) in any other and (2) the selection of "popular" oligos each of which occurs (exactly or approximately) in as many unigenes as possible. In this paper, we describe the functionalities of OLIGOSPAWN and the computational methods it employs, and we report on experimental results for the overgo probes designed with it.The algorithms we designed are highly efficient and capable of processing unigene datasets of sizes on the order of several tens of Mb in a few hours on a regular PC. The software has been used to design overgo probes employed to screen a barley BAC library (Hordeum vulgare). OLIGOSPAWN is freely available at http://oligospawn.ucr.edu/ webcite.For most organisms, expressed sequence tag (EST) datasets represent the largest collection of genetic sequences available. As of June 2005 more than forty organisms have more than 100,000 ESTs in GenBank dbEST [1], including barley (Hordeum vulgare) with over 395,000 ESTs. Most ESTs contain only part of the transcribed sequence of a gene, generally 200¨C800 bases from one end of a cDNA clone. In order to obtain extended, and in many cases complete, transcript sequences, raw EST data is processed through several steps to produce a "unigene" dataset that represents the full complexity of the initial EST collection. Processing steps include removal of vector and low quality sequences followed by clustering into assemblies, from which consensus sequences are referred to as unigenes. In the case of barley, as of February 2005 the collection has over 53,000 unigenes comprising a total of more than 40 megabases. Unigene datasets for numerous organisms can be obtained from GenBank [2], TIGR [3] and various organism-specific sources (e.g., HARVEST [4]).Given a collection of unigenes, OLIGOSPAWN [5] serves two complementary %U http://www.biomedcentral.com/1471-2105/7/7