|
BMC Bioinformatics 2008
SeqAn An efficient, generic C++ library for sequence analysisAbstract: To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.Biological sequence analysis is the heart of computational biology. Many successful algorithms (e.g., Myers' bit-vector search algorithm [2], BLAST [3]) and data structures (e.g., suffix arrays [4], q-gram based string indices, sequence profiles) have been developed over the last twenty years. The assemblies of large eucaryotic genomes like Drosophila melanogaster [5], human [1], and mouse [6] are prime examples where algorithm research was successfully applied to a biological problem. However, with entire genomes at hand, large scale analysis algorithms that require considerable computing resources are becoming increasingly important (e.g., Lagan [7], MUMmer [8], MGA [9], Mauve [10]). Although these tools use slightly different algorithms, nearly all of them require some basic algorithmic components, like suffix arrays, string searches, alignments, or the chaining of fragments. This is illustrated i
|