%0 Journal Article %T Phylogeny reconstruction based on the length distribution of k-mismatch common substrings %A Burkhard Morgenstern %A Svenja Sch£¿bel %J Archive of "Algorithms for Molecular Biology : AMB". %D 2017 %R 10.1186/s13015-017-0118-8 %X k-mismatch common substrings with k = 2. For position i = 5 in S1, kmacs searches the longest substring of S1 starting at i that exactly matches a substring of S2. This is the substring starting at i£¿ = 2 in S2 (matching substrings shown in red). It then extends this match without gaps until the k + 1st mismatch is reached. In this example, the k-mismatch common substring would consist of the red, blue and green substrings and has length 12. In the paper, the lengths of these k-mismatch common substrings are modelled by the random variables X i ( k ) , defined in (1). The original version of kmacs uses the average length of these k-mismatch common substrings to assign a distance value to a pair of sequences. In our modified implementation of kmacs, we consider the k-mismatch extension of the longest common substring at i. That is, the program would return the length of the k-mismatch substring match that starts after the first mismatch following the longest common substring. In our example, for i = 5, this would be the substring match starting with ¡®T¡¯ at position 11 in S1 and at position 8 in S2, consisting of the blue, green and orange matches; the length of this k-mismatch substring extension would be 9. The length of these k-mismatch extensions are modelled by the random variable X ^ i ( k ) , defined in (16 %K Alignment-free %K Phylogeny %K Kmacs %K Average common substring %K Pattern matching %U https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5724348/