OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

- 2017

Phylogeny reconstruction based on the length distribution of k-mismatch common substrings

DOI: 10.1186/s13015-017-0118-8

Burkhard Morgenstern,Svenja Sch？bel

Keywords: Alignment-free, Phylogeny, Kmacs, Average common substring, Pattern matching

Full-Text Cite this paper Add to My Lib

Abstract:

k-mismatch common substrings with k = 2. For position i = 5 in S1, kmacs searches the longest substring of S1 starting at i that exactly matches a substring of S2. This is the substring starting at i？ = 2 in S2 (matching substrings shown in red). It then extends this match without gaps until the k + 1st mismatch is reached. In this example, the k-mismatch common substring would consist of the red, blue and green substrings and has length 12. In the paper, the lengths of these k-mismatch common substrings are modelled by the random variables X i ( k ) , defined in (1). The original version of kmacs uses the average length of these k-mismatch common substrings to assign a distance value to a pair of sequences. In our modified implementation of kmacs, we consider the k-mismatch extension of the longest common substring at i. That is, the program would return the length of the k-mismatch substring match that starts after the first mismatch following the longest common substring. In our example, for i = 5, this would be the substring match starting with ‘T’ at position 11 in S1 and at position 8 in S2, consisting of the blue, green and orange matches; the length of this k-mismatch substring extension would be 9. The length of these k-mismatch extensions are modelled by the random variable X ^ i ( k ) , defined in (16

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133