**The Knuth-Morris-Pratt algorithm** (commonly referred to as KMP) is an improvement for the naïve string matching approach and is based on the following idea: when a mismatch is detected, we have already iterated through some positions in the big string. We can use this information to set the new index to a position where we know the small string can be found, rather than going back to characters we have already discovered as part of the search, like in the naïve approach.

Let’s consider the example presented above again. When we look for “aab” in “aabaaaaabbaabba”and we check the position in bold, we already know that the following two letters cannot represent the start of “aab” because we have already iterated through them once. We then set resume searching at the position in red (the next ‘a’ character). The complexity of KMP is reduced to O(n + m).

The steps of the KMP algorithm are as follows:

- Compute the function array prefix[m] for the string we are looking for (the smaller string). prefix[i] has the following meaning: the length of the longest prefix matching a suffix of the substring that starts at 0 and ends at i.
- For example, let’s compute this array for “ababab”:
- prefix[0] = 0 (the substring is “a”, so the length of the longest prefix matching a suffix is 0).
- prefix[1] = 0 (the substring is “ab”, there is still no prefix that matches a suffix)

- For example, let’s compute this array for “ababab”:

- prefix[2] = 1 (the substring is “aba”, the longest prefix matching a suffix is “a”, which has the length 1).

- prefix[3] = 2 (the substring is “abab”, the longest prefix matching a suffix is “ab”, which has the length 2).
- prefix[4] = 3 (the substring is “ababa”, the prefix matching a suffix is “aba”).
- prefix[5] = 4 (the substring is “ababab”, the prefix matching a suffix is “abab”).

- Place the smaller string (the one we are looking for) under the bigger one (we will call it this one a template). Start matching until a mismatch occurs. In case of a mismatch there are two situations:
- If the mismatch occurs at the first position in the smaller string then just shift it one position to the right;
- Otherwise, if the mismatch occurs at position j (j > 0), shift it j – prefix[j – 1] positions to the right.

- If there is a match, just shift the smaller string one position to the right and continue matching.

Suppose we want to find “ababab” in “abaabababa”.

a | b | a | a |
b | a | b | a | b | a |

a | b | a | b |
a | b |

The first mismatch occurs in the column marked with red, at position 3 (remember, we are indexing from 0). prefix[2] = 1, so we shift “ababab” to the right 3 -1 = 2 positions, as follows:

a | b | a | a |
b | a | b | a | b | a |

a | b |
a | b | a | b |

This time a mismatch occurs at character 1 in the smaller string, so we shift it 1 – prefix[0] = 1 position.

a | b | a | a | b | a | b | a | b | a |

a | b | a | b | a | b |

A match occurs. We note this and then shift the smaller string one position to the right.

a | b | a | a | b |
a | b | a | b | a |

a |
b | a | b | a | b |

There is a mismatch at the first character, but the algorithm ends because the length of the template has now been reached.