%0 Journal Article %T Gene prediction: the end of the beginning %A Colin Semple %J Genome Biology %D 2000 %I BioMed Central %R 10.1186/gb-2000-1-2-reports4012 %X Conference website: http://industry.ebi.ac.uk/gp2000/ webciteThe draft sequence of the human genome will become available later this year. For some time now it has been accepted that this will mark a beginning rather than an end. A vast amount of work will remain to be done, from detailing sequence polymorphisms to discovering the complexities of the transcriptome - the totality of sequences transcribed - and, ultimately, the proteome - all the proteins encoded by the genome. All of this work will, to a greater or lesser extent, depend on all the genes having been correctly identified. It will be necessary to document not only the coding exons of each gene but also non-coding exonic sequence and regulatory sequences. As this conference made clear, however, the production of genomic sequence has outstripped our ability to reliably predict such features computationally.Traditionally, gene prediction programs that rely only on the statistical qualities of exons have been referred to as performing ab initio predictions (from the Latin: from the beginning). Ab initio prediction of coding sequences is an undeniable success by the standards of the machine-learning algorithm field, and most of the widely used gene prediction programs belong to this class of algorithms. It is impressive that the statistical analysis of raw genomic sequence can detect around 77-98% of the genes present, which was the range of sensitivity reported at the conference. This is, however, little consolation to the bench biologist, who wants the complete sequences of all genes present, with some certainty about the accuracy of the predictions involved. As Ewan Birney (European Bioinformatics Institute, UK) put it, what looks impressive to the computer scientist is often simply wrong to the biologist.All ab initio gene prediction programs have to balance sensitivity against accuracy. It is often only possible to detect all the real exons present in a sequence at the expense of detecting many false on %U http://genomebiology.com/2000/1/2/reports/4012