Hackerrank Weekly Challenge #6 // Problem 1

Problem statement

The problem statement is really simple. You are given a set S of n binary strings each of length m. The task is to return the maximum number of 1's in binary representation of a \vee b, where a, b \in S and \vee is the logical OR operator. In addition, you have to return the number of pairs for which that number is maximal.

For example, if S = \{1100, 0011, 1010\} the result is 4 and 1, because 1100 \vee 0011 has 4 ones and no other pair of strings produces 4 ones.

Constraints update

Initially the constraints were 2 \leq n \leq 3000, 1 \leq m \leq 1000, but in the middle of the contests hackerrank decides to change it to n, m \leq 500, for which a simple brute force solution works fine i.e. for any pair of strings a, b count the number of 1's in a \vee b and return the maximum. The time complexity of this approach is O(n^2 \cdot m).

Many coders complained about changing constraints, because it happened in the middle of the contest, which is not fair at all, especially because the time of submission matters.

Solution for initial constraints

I'll give here my solution for the initial constraints that works in O(n \cdot m + n \cdot n / 2 \cdot 32), which is \backsimeq 10^8 in the worst case.

The idea is to divide each string into chunks of size 32, because in that case we can store these chunks as 32-bit integers and there is a very clever and fast way of computing number of ones of 32-bit integer. We know that m \leq 1000, so we need 32 chunks, because 32^2 > 1000.

Let s[i] be the i-th input string and s[i][j] be the j-th character of that string.

Let bm[i][k] be the value of the k-th chunk of the i-th string.

We can compute bm table using that code:

for(int i = 0; i < n; ++i)
{
   for(int k = 0; k < 32; ++k) bm[i][k] = 0;
}   
for(int i = 0; i < n; ++i)
{
   int k = 0;
   for(int j = 0; j < m; ++j)
   {
      if(j > 0 && j % 32 == 0) ++k;
      bm[i][k] *= 2;
      bm[i][k] += (int)(a[i][j] - '0');
   }
}

Now we can iterate over each pair of input strings. Let's assume that we want to compute the number of 1's in binary representation of s[i] \vee s[j]. In order to do that, we can compute the number of 1's in bm[i][k] \vee bm[j][k] for any k separately and add these partial results at the end. But if we want to achieve a speedup over the brute force solution, we have to do it very fast for a single chunk. You can find many ways to do it quickly. I used the below, which I found on stackoverflow. We can assume that it works in O(1):

int BitCount(unsigned int u)
{
   unsigned int uCount;

   uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
   return ((uCount + (uCount >> 3)) & 030707070707) % 63;
}

Here is my main loop based on bm table and \texttt{BitCount} function:

int res = 0;
int res_cnt = 0;
for(int i = 0; i < n; ++i)
{
   for(int j = i + 1; j < n; ++j)
   {
      int local = 0;
      for(int k = 0; k < 32; ++k)       
      {          
         int tmp = bm[i][k] | bm[j][k];          
         int ones = BitCount(tmp);          
         local += ones;       
      }       
      if(local > res)
      {
         res = local;
         res_cnt = 1;
      }
      else if(local == res)
      {
         res_cnt++;
      }
   }
}

Summary

I generated a few big test files and tested them on hackerrank online judge as custom tests before constraints were changed and there was no time limit exceeded error, so my program works in the desire time limit. I wonder, how many submissions would pass final test cases if constraints weren't changed.

  • Shivam Goel

    it should be an unsigned int right?? ;)
    Great logic btw! :)

    • http://chasethered.com pkacprzak

      Yes, but only BitCount function needs an unsigned int as an argument. When BitCount is called, the conversion from int to unsigned int is made automatically. That conversion is guaranteed by the c++ standard as valid.