全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

带任意长度通配符的模式匹配

DOI: 10.3724/SP.J.1004.2014.02499, PP. 2499-2511

Keywords: 通配符,模式匹配,位并行,基因序列

Full-Text   Cite this paper   Add to My Lib

Abstract:

?基因序列中,许多病毒并不是简单的直接复制自己,而是相邻字符间插入或者删除序列片段,如何从序列数据中检索这些病毒具有重要的研究价值.提出了一个更普遍的问题,带任意长度通配符的模式匹配问题(Patternmatchingwitharbitrary-lengthwildcards,PMAW),这里模式中不仅可以有多个通配符约束,而且每个通配符的约束可以是两个整数,也可以从整数到无穷大.给定序列S和带通配符的模式P,目标是从S中检索P的所有出现和每一次出现的匹配位置,并且要求任意两次出现不能共享序列中同一位置.为了有效地解决该问题,设计了两个基于位并行的匹配算法MOTW(Methodofocurrencethenwindow)算法和MWTO(Methodofwindowthenocurrence)算法.同时,MWTO算法进行细微改动就可以满足全局长度约束.实验结果既验证了算法求解问题的正确性,又验证了比相关的模式匹配算法具有更好的时间性能.

References

[1]  Pisanti N, Crochemore M, Grossi R, Sagot M F. Bases of motifs for generating repeated patterns with wildcards. IEEE/ACM Transactions on Computational Biology Bioinformatics, 2005, 2(1): 40-50
[2]  Ding B L, Lo D, Han J W, Khoo S C. Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of the 25th International Conference on Data Engineering. Shanghai, China: IEEE, 2009. 1024-1035
[3]  Wang Bao-Xun, Liu Bing-Quan, Sun Cheng-Jie, Wang Xiao-Long, Sun Lin. Thread segmentation based answer detection in Chinese online forums. Acta Automatica Sinica, 2009, 39(1): 11-20(王宝勋, 刘秉权, 孙承杰, 王晓龙, 孙林. 基于论坛话题段落划分的答案识别. 自动化学报, 2013, 39(1): 11-20)
[4]  Tanbeer S K, Ahmed C F, Jeong B S, Lee Y K. Efficient frequent pattern mining over data streams. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. California, USA: ACM, 2008. 1447-1448
[5]  Fischer M J, Paterson M S. String-matching and other products. Proceeding of the 7th SIAM AMS Complexity of Computation. Cambridge, USA, 1974. 113-125
[6]  Kucherov G, Rusinowitch M. Matching a set of strings with variable length don't cares. Theoretical Computer Science, 1997, 78(1-2): 129-154
[7]  Chen G, Wu X D, Zhu X Q, Arslan A N, He Y. Efficient string matching with wildcards and length constraints. Knowledge and Information Systems, 2006, 10(4): 399-419
[8]  Wu You-Xi, Wu Xin-Dong, Jiang He, Min Fan. A heuristic algorithm for MPMGOOC. Chinese Journal of Computers, 2011, 34(8): 1452-1462(武优西, 吴信东, 江贺, 闵帆. 一种求解MPMGOOC问题的启发式算法. 计算机学报, 2011, 34(8): 1452-1462)
[9]  Wu Xin-Dong, Xie Fei, Huang Yong-Ming, Hu Xue-Gang, Gao Jun. Mining sequential patterns with wildcards and the one-off condition. Journal of Software, 2013, 24(8): 1804-1815(吴信东, 谢飞, 黄咏明, 胡学钢, 高隽. 带通配符和One-Off条件的序列模式挖掘. 软件学报, 2013, 24(8): 1804-1815)
[10]  Bille P, Thorup M. Languages and Programming. Berlin: Springer Heidelberg, 2009. 171-182
[11]  Indyk P. Faster algorithms for string matching problems: matching the convolution bound. In: Proceedings of the 39th Symposium on Foundations of Computer Science. Washington, DC, USA: IEEE, 1998. 166-173
[12]  Navarro G, Raffinot M. Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. Journal of Computational Biology, 2003, 10(6): 903-923
[13]  Ferreira P G, Azevedo P J. Protein sequence pattern mining with constraints. In: Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin Heidelberg: Springer, 2005. 96-107
[14]  Guo D, Hong X L, Hu X G, Gao J, Liu Y L, Wu G Q, Wu X D. A bit-parallel algorithm for sequential pattern matching with wildcards. Cybernetics and Systems, 2011, 42(6): 382-401
[15]  Zhang M, Zhang Y, Hu L. A faster algorithm for matching a set of patterns with variable length don't cares. Information Processing Letters, 2010, 110(6): 216-220
[16]  NCBI National Center for Biotechnology Information [Online], available: http://www.ncbi.nlm.nih.gov, November 20, 2013
[17]  Yue Feng, Sun Liang, Wang Kuan-Quan, Wang Yong-Ji, Zuo Wang-Meng. State-of-the-art of cluster analysis of gene expression data. Acta Automatica Sinica, 2008, 34(2): 113-120(岳峰, 孙亮, 王宽全, 王永吉, 左旺孟. 基因表达数据的聚类分析研究进展. 自动化学报, 2008, 34(2): 113-120)
[18]  Rahman M S, Iliopoulos C S, Lee I, Mohamed M, Smyth W F. Finding patterns with variable length gaps or don't cares. Computing and Combinatorics. Berlin, Heidelberg: Springer, 2006. 146-155
[19]  Wu X D, Zhu X Q, He Y, Arslan A N. PMBC: Pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 2013, 43(5): 481-492
[20]  Pan Yun-He, Wang Jin-Long, Xu Cong-Fu. State-of-the-art on frequent pattern mining in data streams. Acta Automatica Sinica, 2006, 32(4): 594-602(潘云鹤, 王金龙, 徐从富. 数据流频繁模式挖掘研究进展. 自动化学报, 2006, 32(4): 594-602)
[21]  Altschul S F, Gish W, Miller W, Myers E W, Lipman D J. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403-410
[22]  Clifford P, Clifford R. Simple deterministic wildcard matching. Knowledge and Information Systems, 2007, 101(2): 53-54
[23]  Cole R, Gottlieb L A, Lewenstein M. Dictionary matching and indexing with errors and don't cares. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing. New York, USA: ACM, 2004. 91-100
[24]  Wu You-Xi, Liu Ya-Wei, Guo Lei, Wu Xin-Dong. Subnettrees for strict pattern matching with general gaps and length constraints. Journal of Software, 2013, 24(5): 915-932(武优西, 刘亚伟, 郭磊, 吴信东. 子网树求解一般间隙和长度约束严格模式匹配. 软件学报, 2013, 24(5): 915-932)
[25]  Bille P, G?rtz I L, Vildhój H W, Wind D K. String matching with variable length gaps. Theoretical Computer Science, 2012, 443(1): 25-34
[26]  Ji X N, Bailey J, Dong G Z. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems, 2007, 11(3): 259-286
[27]  Navarro G, Raffinot M. New techniques for regular expression searching. Algorithmica, 2005, 41(2): 89-116
[28]  Kalai A. Efficient pattern-matching with don't cares. In: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA, USA: ACM, 2002. 655-656
[29]  Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don't cares. Information Processing Letters, 1991, 37(3): 133-136
[30]  Min F, Wu X D, Lu Z Y. Pattern matching with independent wildcard gaps. In: Proceedings of the 8th IEEE International Conference on Dependable, Autonomic and Secure Computing. Chengdu, China: IEEE, 2009. 194-199
[31]  Aho A V, Corasick M J. Efficient string matching: an aid to bibliographic search. Communications of the ACM, 1975, 18(6): 333-340
[32]  Blumer A, Blumer J, Haussler D, Ehrenfeucht A, Chen M T, Seiferas J. The smallest automation recognizing the subwords of a text. Theoretical Computer Science, 1985, 40(1): 31-55
[33]  Haapasalo T, Silvasti P, Sippu S, Soisalon-Soininen E. Online dictionary matching with variable-length gaps. In: Proceedings of the 10th International Symposium, SEA Kolimpari, Chania, Crete, Greece. Berlin Heidelberg: Springer, 2011. 76-87

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133