String Algorithms :The KNUTH-MORRIS-PRATT Algorithm

0
1079

Knuth Morris Pratt Algorithm

The Knuth-Morris-Pratt algorithm (commonly referred to as KMP) is an improvement for the naïve string matching approach and is based on the following idea: when a mismatch is detected, we have already iterated through some positions in the big string. We can use this information to set the new index to a position where we know the small string can be found, rather than going back to characters we have already discovered as part of the search, like in the naïve approach.

 

Diversity in WorkPlace - Japanese,Korea,ChineseLet’s consider the example presented above again. When we look for “aab” in “aabaaaaabbaabba”and we check the position in bold, we already know that the following two letters cannot represent the start of “aab” because we have already iterated through them once. We then set resume searching at the position in red (the next ‘a’ character). The complexity of KMP is reduced to O(n + m).

The steps of the KMP algorithm are as follows:

  1. Compute the function array prefix[m] for the string we are looking for (the smaller string). prefix[i] has the following meaning: the length of the longest prefix matching a suffix of the substring that starts at 0 and ends at i.
    1. For example, let’s compute this array for “ababab”:
      1. prefix[0] = 0 (the substring is “a”, so the length of the longest prefix matching a suffix is 0).
      2. prefix[1] = 0 (the substring is “ab”, there is still no prefix that matches a suffix)
  • prefix[2] = 1 (the substring is “aba”, the longest prefix matching a suffix is “a”, which has the length 1).
  1. prefix[3] = 2 (the substring is “abab”, the longest prefix matching a suffix is “ab”, which has the length 2).
  2. prefix[4] = 3 (the substring is “ababa”, the prefix matching a suffix is “aba”).
  3. prefix[5] = 4 (the substring is “ababab”, the prefix matching a suffix is “abab”).
  1. Place the smaller string (the one we are looking for) under the bigger one (we will call it this one a template). Start matching until a mismatch occurs. In case of a mismatch there are two situations:
    1. If the mismatch occurs at the first position in the smaller string then just shift it one position to the right;
    2. Otherwise, if the mismatch occurs at position j (j > 0), shift it j – prefix[j – 1] positions to the right.
  2. If there is a match, just shift the smaller string one position to the right and continue matching.

Assumptions at HiringLibrary-Com

Suppose we want to find “ababab” in “abaabababa”.

a b a a b a b a b a
a b a b a b        

 

Missmatch at HiringLibrary-comThe first mismatch occurs in the column marked with red, at position 3 (remember, we are indexing from 0). prefix[2] = 1, so we shift “ababab” to the right 3 -1 = 2 positions, as follows:

a b a a b a b a b a
    a b a b a b    

 

Missmatch at HiringLibrary-comThis time a mismatch occurs at character 1 in the smaller string, so we shift it 1 – prefix[0] = 1 position.

a b a a b a b a b a
      a b a b a b  

 

Success as match occurs - HiringLibrary-ComA match occurs. We note this and then shift the smaller string one position to the right.

a b a a b a b a b a
        a b a b a b

 

Diversity in WorkPlace - Japanese,Korea,ChineseThere is a mismatch at the first character, but the algorithm ends because the length of the template has now been reached.

SHARE
Previous articleStrings : Parse Regular Expressions
Next articleString Algorithms :The RABIN-KARP Algorithm
Since last 15 years in different geographical locations, Sumit prepared hiring format for several hiring managers/teams to hire the balanced talents and interviewed talents on the various stages of their selection process. He also interviewed by hundreds of companies in different geographical locations.His best conclusion for hiring teams and candidate is to prepare in advance. Here ‘advance’ means keep your interview book ready and continue to update it even you are not going to interview candidates or applying for any job in next six months.