The Rabin-Karp algorithm is a string matching algorithm that uses hashing techniques. When searching for a string of length m in a string of length n the best case complexity of this algorithm is O(n). On the other hand, the worst case complexity is still O(n*m), just as the naïve approach, which is the main reason why the algorithm is not widely used.
The first thing you need to know about the Rabin-Karp technique is that uses a hash function to convert any string into a numerical value, exploiting the property according to which any two equal strings have the same hash value. One of the most common hash functions used by Rabin-Karp is the one that uses the ASCII value of the characters in a string to convert the string into a number in base q.
For example, if the string to be converted is “abc” and q = 101, then “abc” = 97 * 1012 + 98 * 101 + 99 (97 is the ASCII value for ‘a’, while 98 corresponds to ‘b’ and 99 corresponds to ‘c’ in the same system).
Suppose we are looking for “abc” in “ababcdab”. The steps are as follows:
- Choose a reasonable value for base q, as described above. This value is usually a prime number and for the purpose of this example we will use 13. A prime number is used for the same reason it is used in most hashing functions, mainly because it provides an even distribution across the hash space.
- Compute the value of the smaller string, as described above. In our example, abc = 97 * 132 + 98* 13 + 99.
- Take each substring of length m in the bigger string and compute its value. Try to match only at the positions that produce the same value as our template. For example we want to skip the first position in the larger string because “aba” = 97 * 132 + 98 * 13 + 97, which is different from what we obtained for “abc”.
Note that, as the values can get quite large when converting from strings to integers, it is a common practice to choose a number m and perform a %m operation on all the values computed using the algorithm.