%0 Journal Article
%T Phylogeny reconstruction based on the length distribution of k-mismatch common substrings
%A Burkhard Morgenstern
%A Svenja Sch？bel
%J Archive of "Algorithms for Molecular Biology : AMB".
%D 2017
%R 10.1186/s13015-017-0118-8
%X k-mismatch common substrings with k = 2. For position i = 5 in S1, kmacs searches the longest substring of S1 starting at i that exactly matches a substring of S2. This is the substring starting at i？ = 2 in S2 (matching substrings shown in red). It then extends this match without gaps until the k + 1st mismatch is reached. In this example, the k-mismatch common substring would consist of the red, blue and green substrings and has length 12. In the paper, the lengths of these k-mismatch common substrings are modelled by the random variables X i ( k ) , defined in (1). The original version of kmacs uses the average length of these k-mismatch common substrings to assign a distance value to a pair of sequences. In our modified implementation of kmacs, we consider the k-mismatch extension of the longest common substring at i. That is, the program would return the length of the k-mismatch substring match that starts after the first mismatch following the longest common substring. In our example, for i = 5, this would be the substring match starting with ‘T’ at position 11 in S1 and at position 8 in S2, consisting of the blue, green and orange matches; the length of this k-mismatch substring extension would be 9. The length of these k-mismatch extensions are modelled by the random variable X ^ i ( k ) , defined in (16
%K Alignment-free
%K Phylogeny
%K Kmacs
%K Average common substring
%K Pattern matching
%U https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5724348/