%0 Journal Article
%T A Motif Finding Algorithm Based on Color Coding Technology
一种基于彩色编码技术的基序发现算法
%A WANG Jian-Xin
%A HUANG Yuan-Nan
%A CHEN Jian-Er
%A
王建新
%A 黄元南
%A 陈建二
%J 软件学报
%D 2007
%I
%X Finding common pattern, motifs or signals, in a set of DNA sequences is an important problem in computational biology. Recently, some biologists extremely focus on the (l,d)-(K-k) motif finding problem when the number of sequences K is 20 and the number of sequences with instances k is 16, (l,d)-(20-16) problem for short. For solving this problem, this paper introduces a novel sample-driven algorithm (SDA), called color coding motif finding algorithm, CCMF for short. It uses color coding technology to converse a (l,d)-(20-16) problem to some (l,d)-(16-16) problems, then uses divide-and-conquer and branch-and-bound approaches to solve this (l,d)-(16-16) problem. Using the conversion process can reduce 4 845 combinations to 403 colorings, while increasing the running rate enormously. The experimental results on synthetic and real datasets show that the CCMF algorithm can accurately and efficiently find all (l,d)-(20-16) patterns and instances. Its comprehensive performances in finding motifs are superior to those of other existing algorithms. It is applicable for real biological purpose. The color coding technology can also be used to improve the performances of other similar (l,d)-(K-k) problems when k is less than K.
%K color coding technique
%K motif finding
%K coloring
%K (l
%K d)-(K-k) problem
%K algorithm optimization
彩色编码技术
%K 基序发现
%K 着色
%K (l
%K d)-(K-k)问题
%K 算法优化
%U http://www.alljournals.cn/get_abstract_url.aspx?pcid=5B3AB970F71A803DEACDC0559115BFCF0A068CD97DD29835&cid=8240383F08CE46C8B05036380D75B607&jid=7735F413D429542E610B3D6AC0D5EC59&aid=A590E47249B5BA93&yid=A732AF04DDA03BB3&vid=13553B2D12F347E8&iid=B31275AF3241DB2D&sid=61548B9F608D3CE3&eid=F225282E8F5F1CBB&journal_id=1000-9825&journal_name=软件学报&referenced_num=2&reference_num=23