|
BMC Bioinformatics 2005
Automated generation of heuristics for biological sequence comparisonAbstract: The speed and accuracy of this approach compares favourably with existing methods. Examples of its use in the context of genome annotation are given.This system allows rapid implementation of heuristics approximating to many complex alignment models, and has been incorporated into the freely available sequence alignment program, exonerate.George Box [1] once remarked that,"All models are wrong, but some are useful."This statement bears much relevance to biological sequence alignment, where there is no guarantee that the alignment model will accurately represent the evolutionary (or other) processes which separated the two sequences. All that is certain is that exhaustive dynamic programming (DP) algorithms (such as Smith-Waterman [2]) will yield the optimally scoring path in terms of the given model. Heuristics for sequence comparison (such as BLAST [3]) generate alignments which are valid paths through the underlying alignment model, but are not guaranteed to be optimal. Alignments generated by these heuristics can be calculated much more rapidly, and often closely match alignments which would be generated by exhaustive methods. Furthermore, many problems in the context of genome analysis consist simply of alignment of gene products (cDNA or protein) back onto the gene from which they were produced, and consequently do not require very sensitive alignment. Hence the aim here is not to attempt to develop models which are correct, but to facilitate development of models which are more useful in the context of large scale analyses.Transformation between a finite state automata describing an alignment model and the recurrence relations used in DP is a powerful technique [4,5] as it allows modification of the alignment algorithm by direct manipulation of the alignment model. The Dynamite compiler [6] allows automated implementation of alignment algorithms from a description of the alignment model. This allows development of complex models which can be used to generate a
|