全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
Algorithms  2011 

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

DOI: 10.3390/a4040285

Keywords: pattern matching, analysis of algorithms, finite automaton, minimization, deterministic arithmetic automaton, probabilistic arithmetic automaton

Full-Text   Cite this paper   Add to My Lib

Abstract:

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given pattern in a random text model. Text models range from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). Furthermore, we provide an algorithm to compute the exact distribution of differences in running time cost of two pattern matching algorithms. Methodologically, we use extensions of finite automata which we call deterministic arithmetic automata (DAAs) and probabilistic arithmetic automata (PAAs) [1]. Given an algorithm, a pattern, and a text model, a PAA is constructed from which the sought distributions can be derived using dynamic programming. To our knowledge, this is the first time that substring- or suffix-based pattern matching algorithms are analyzed exactly by computing the whole distribution of running time cost. Experimentally, we compare Horspool’s algorithm, Backward DAWG Matching, and Backward Oracle Matching on prototypical patterns of short length and provide statistics on the size of minimal DAAs for these computations.

References

[1]  Marschall, T.; Rahmann, S. Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics. Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching, CPM '08, Pisa, Italy, 18–20 June 2008; Ferragina, P., Landau, G.M., Eds.; Springer: Berlin, Germany, 2008. Volume 5029; pp. 95–106.
[2]  Knuth, D.E.; Morris, J.; Pratt, V.R. Fast pattern matching in strings. SIAM J. Comput. 1977, 6, 323–350.
[3]  Boyer, R.S.; Moore, J.S. A fast string searching algorithm. Commun. ACM 1977, 20, 762–772.
[4]  Horspool, R.N. Practical fast searching in strings. Softw.-Pract. Exp. 1980, 10, 501–506.
[5]  Sunday, D.M. A very fast substring search algorithm. Commun. ACM 1990, 33, 132–142.
[6]  Crochemore, M.; Czumaj, A.; Gasieniec, L.; Jarominek, S.; Lecroq, T.; Plandowski, W.; Rytter, W. Speeding up two string-matching algorithms. Algorithmica 1994, 12, 247–267.
[7]  Allauzen, C.; Crochemore, M.; Raffinot, M. Efficient Experimental String Matching by Weak Factor Recognition. Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching, CPM '01, Jerusalem, Israel, 1–4 July 2001; Goos, G., Hartmanis, J., van Leeuwen, J., Eds.;. Volume 2089pp. 51–72.
[8]  Navarro, G.; Raffinot, M. Flexible Pattern Matching in Strings; Cambridge University Press: Cambridge, UK, 2002.
[9]  Baeza-Yates, R.A.; Gonnet, G.H.; Régnier, M. Analysis of Boyer-Moore-Type String Searching Algorithms. Proceedings of the 1st Annual ACM- SIAM Symposium on Discrete Algorithms, SODA '90, San Francisco, CA, USA, 22–24 January 1990; pp. 328–343.
[10]  Baeza-Yates, R.A.; Régnier, M. Average running time of the boyer-moore-horspool algorithm. Theor. Comput. Sci. 1992, 92, 19–31.
[11]  Mahmoud, H.M.; Smythe, R.T.; Régnier, M. Analysis of Boyer-Moore-Horspool string-matching heuristic. Random Struct. Algorithms 1997, 10, 169–186.
[12]  Smythe, R.T. The Boyer-Moore-Horspool heuristic with Markovian input. Random Struct. Algorithms 2001, 18, 153–163.
[13]  Tsai, T. Average case analysis of the Boyer-Moore algorithm. Random Struct. Algorithms 2006, 28, 481–498.
[14]  Nicodème, P.; Salvy, B.; Flajolet, P. Motif statistics. Theor. Comput. Sci. 2002, 287, 593–617.
[15]  Nicodème, P. Regexpcount, a symbolic package for counting problems on regular expressions and words. Fundam. Inform. 2002, 56, 71–88.
[16]  Nuel, G. Pattern Markov chains: Optimal Markov chain embedding through deterministic finite automata. J. Appl. Probab. 2008, 45, 226–243.
[17]  Lladser, M.; Betterton, M.D.; Knight, R. Multiple pattern matching: A Markov chain approach. J. Math. Biol. 2008, 56, 51–92.
[18]  Marschall, T.; Rahmann, S. Exact Analysis of Horspool's and Sunday's Pattern Matching Algorithms with Probabilistic Arithmetic Automata. Proceedings of the 4th International Conference on Language and Automata Theory and Applications, LATA '10, Trier, Germany, 24–28 May 2010; Dediu, A.H., Fernau, H., Martín-Vide, C., Eds.;. Volume 6031pp. 439–450.
[19]  Marschall, T.; Rahmann, S. Efficient exact motif discovery. Bioinformatics 2009, 25, i356–i364.
[20]  Herms, I.; Rahmann, S. Computing Alignment Seed Sensitivity with Probabilistic Arithmetic Automata. Proceedings of the 8th International Workshop Algorithms in Bioinformatics, WABI '08, Karlsruhe, Germany, 15–19 September 2008; Crandall, K., Lagergren, J., Eds.; Springer: Berlin, Germany, 2008. Volume 5251; pp. 318–329.
[21]  Hopcroft, J. An n log n Algorithm for Minimizing the States in a Finite Automaton. In The Theory of Machines and Computations; Kohavi, Z., Paz, A., Eds.; Academic Press: New York, NY, USA, 1971; pp. 189–196.
[22]  Knuutila, T. Re-describing an algorithm by Hopcroft. Theor. Comput. Sci. 2001, 250, 333–363.
[23]  Kucherov, G.; Noé, L.; Roytberg, M. A unifying framework for seed sensitivity and its application to subset seeds. J. Bioinform. Comput. Biol. 2006, 4, 553–569.
[24]  Schulz, M.; Weese, D.; Rausch, T.; D?ring, A.; Reinert, K.; Vingron, M. Fast and Adaptive Variable Order Markov Chain Construction. Proceedings of the 8th International Workshop Algorithms in Bioinformatics, WABI '08, Karlsruhe, Germany, 15–19 September 2008; Crandall, K.A., Lagergren, J., Eds.; Springer: Berlin, Germany, 2008. Volume 5251; pp. 306–317.
[25]  Wu, S.; Manber, U. A Fast Algorithm for Multi-Pattern Searching. Technical report; University of Arizona: Tucson, AZ, USA, 1994.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133