I created the most difficult problem for May edition of Code.cpp on HackerRank. The full problem is available here. The problem got great attention, because many contestants were surprised how it is possible to "compare" two strings that fast.
The statement is quite straightforward:
We define two strings and as almost equal if and only if they have the same length and they differ on at most one position.
You are given a string of length and an integer . You have to perform queries on . Each query consists of four integers, , , , and . For each such query, your task is to decide if the substrings and are almost equal.
While the problem is formulated very simple, the solution is not obvious, especially because there are many queries to handle - up to and the input string can have at most letters. I will sketch two completely different approaches here.
First of all, we construct suffix array for string . While algorithm is fine here, we can use a simple and famous linear one by Kärkkäinen and Sanders known as Skew algorithm
After that, we compute longest common prefix array abbreviated as array, where is the longest common prefix of suffixes and in the suffix array.
Using our array and some range queries, we are able to compute the length of the longest common prefix for arbitrary suffixes of , the sketch of this method is available here.
Being able to do that, let's consider any query and two substrings and . Let assume that they have equal length , because the other case is trivial. In order to answer, if these substrings are almost equal, first we compute the length of of suffixes of starting in and . Let this length to be . If , then we know that our two substrings are exactly equal. In the other case, we know that they differ on position , so we compute the length of the of suffixes starting just after this different character and the answer to the query is positive if these suffixes have equal at least characters, otherwise it is negative.
If we use Skew algorithm and the fastest method of computing of arbitrary strings, we can achieve , but any algorithm running in is also fine.
The idea is to provide a method for deciding if in constant time for any and
For now, let's assume that we can do that.
Then in order to decide if is almost equal to , first we check if they have equal length. If they have, then we check if if it is, they are exact the same strings. If not, we find the first letter on which they are different. We can do this using binary search and our method for equality of substrings on prefixes of both strings. After we find this first letter, let say it has index inside these strings, we just need to compare with and we assumed that we can do this in time.
In order to do our magic comparison in constant time, first we define
which is called polynomial hash of a string and we can compute for all in using Horner's method i.e:
All these computations can be done modulo some big prime, but it is faster to use unsigned long long type here, especially for c++ which is nice in this contest, and we can avoid modulo computation.
The next step is to be able to check if and we can do this by multiplications and subtractions of some values from table, which can be done in constant time.
Of course this method is a probabilistic one and it can return false results, but we can either assume that it works for one function, or compute more than one hashing function to increase its probability. It is worth to mention that using two functions rather that one squares the probability of an error. An example of using two functions can be computing both as function given above, but for each one computing it modulo a different big prime number.
Computing table takes and binary search for one query takes , so the overall complexity is .