|
BMC Bioinformatics 2008
A new protein linear motif benchmark for multiple sequence alignment softwareAbstract: We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.We have shown that none of the programs currently available is capable of reliably aligning LMs in distantly related sequences and we have highlighted a number of specific problems. The results of the tests suggest possible ways to improve program accuracy for difficult, divergent sequences.Many eukaryotic proteins have highly modular architectures. Multidomain proteins are usual for transmembrane receptors, signalling proteins, cytoskeletal proteins, chromatin proteins, transcription factors and so forth. As a consequence, many programs have been developed for the detection and alignment of protein domains. Online resources can now provide a good overview of the globular domain architecture of a polypeptide sequence and the functions these domains are likely to perform, e.g. Pfam [1], SMART [2], Interpro [3]. However, less research has been directed towards the analysis of the large segments of multidomain proteins that are non-globular, intrinsically lacking the capability
|