%0 Journal Article %T Drosophila genome sequence %A Alan Shirras %J Genome Biology %D 2000 %I BioMed Central %R 10.1186/gb-2000-1-1-reports038 %X Whole-genome sequencing was carried out by cloning size-selected, randomly sheared Drosophila genomic DNA into plasmid vectors and determining approximately 500 base pairs of sequence from either end. Overlapping stretches of sequence were then assembled into contiguous lengths. A crucial feature of this process was the use of the 'mate pairs' - stretches of sequence from either end of each clone - to minimize the problems of placing sequences containing repetitive DNA. The overall structure of the assembly was confirmed by linking the data to end-sequence information from bacterial artificial chromosomes (BACs) generated by the Berkeley Drosophila genome project (BDGP). This yielded 114.8 megabases (Mb) of sequence that could be unambiguously placed onto chromosomes. Clone-based sequence from the BDGP and the European Drosophila genome project (EDGP) allowed a further 1.4 Mb to be placed on chromosome arms. Roughly 3.8 Mb of sequence, probably representing islands of unique sequence within heterochromatin, could not be placed accurately on the map. By comparison with regions of high-quality sequence already determined by other methods, the whole-genome sequencing was found to be 99.99% accurate in non-repetitive regions. As a measure of the completeness of the sequence, 97.5% of sequenced Drosophila genes are found in the assembled sequence. Gene prediction by computational analysis followed by human curation has identified 13,601 genes. Of these, 23% do not match sequences from other organisms or from Drosophila expressed sequence tags (ESTs) and are therefore potentially novel genes. Comparison of gene sequences with other species in general reveals a high degree of conservation although there are some exceptions; for example, several proteins involved in DNA repair are missing from Drosophila. A large number of transcription factors have been identified, suggesting complex networks of gene regulation, and solute transporters are also notable for their abundance %U http://genomebiology.com/2000/1/1/reports/038