|
BMC Bioinformatics 2005
Genomic multiple sequence alignments: refinement using a genetic algorithmAbstract: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps.We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time.One of the primary goals in analyzing complete genomes is to identify all of the functional regions in the sequences, including genes and regulatory regions. However, this interpretive work is not keeping pace with the avalanche of raw sequence data. This disparity is due in part to the fact that algorithm development for genomic annotation has been relatively slow, and annotation of completely sequenced genomes inevitably depends on human expert knowledge. The most effective method to understand genomic content is to compare multiple genomes of various phylogenetic distances. The coding regions of a large set of common genes can be identified by comparing genomic sequences that are distantly related phylogenetically. In addition, comparing the genomic sequences of divergent non-coding regions that show some degree of conservation can yield important information related to regulation of gene expression, structural organization of the genome, and possibly other yet unknown functions [1]. Finally, functional and evolutiona
|