Unofficial Editorial: Big Queries from Codechef October Long 2016

Codechef October Long has just finished and I wanted to share my unusual approach to one of the problems from the contest. It barely fits the time limit even after optimizations, so I expect that this is not the model solution (I expect some kind of segment tree to be the intended solution). However, knowing the technique I used can be really helpful when trying to solve similar problems with queries.

Problem link:
Contest: https://www.codechef.com/OCT16/problems/BGQRS/
Practice: https://www.codechef.com/problems/BGQRS/


For a given array A of N integers, the task is to perform Q queries, each of one of the following types:

- \texttt{mul}(L, R, X) := multiply elements of A with indices in range [L, R] by X

- \texttt{assign}(L, R, Y) := assign new values to elements of A in range [L, R] in such a way that A_L = Y, A_{L+1} = 2 \cdot Y, A_{L+2} = 3 \cdot Y, \ldots

- \texttt{query}(L, R) := returns the number of trailing zeros in the product of elements of A with indices in range [L, R]

As the final answer we need to return the sum of results of all query operations.

Here, I’m going to provide a solution which is possibly slower than the one intended by the author. When I implemented it without much care at the first time, it just passed only one of test cases in the last subtask. I wanted to know if there is a change that it will pass all of them, so I submitted it handling only 10% of queries and it barely passed in time something like 0.95/1s in the worst case. So I knew that I need to optimize it at least around 10 times. I started throwing out dynamically allocated memory and exchanged it with static arrays and improved all others bottlenecks. Finally, the solution works in around 0.89 in the worst case, so it barely fits time limit. I think that authors didn’t want such a solutions to pass, and this was probably the reason that output in the problem is reduced just to one number in order to avoid some fast I/O tricks.


First observation that we should make is that the number of trailing zeros in a product of some integers is equal to the highest power of 10 which divides this product. Let’s say that the product is equal P, moreover, let e_2 be the highest integer such that P is divisible by 2^{e_2}. Similarly, let e_5 be the highest integer such that P is divisible by 5^{e_5}. Then the highest power of 10 that divides P is equal to \min(e_2, e_5). Thus, in order to answer query(L, R), we just need to know how many times, in total, numbers from range [L, R] in A are divisible by 2 and how many times, in total, they are divisible by 5. We will show how to maintain that information across all the updates and how to exactly answer the queries.

The method I’m going to explain is a nice trick to have in mind while approaching this kind of problems. Personally, I call the method “sqrt-blocks”, but maybe a better name can be suggested. The general idea is to initially divide the input array into M blocks, each of size K in such a way that the first blocks stores information about A_1, A_2, \ldots A_K, the second one about A_{K+1}, A_{K+2}, \ldots, A_{2 \cdot K} and so on. What information do we need to store in these blocks? Well, we should store some indices denoting the starting and ending positions of the block, its length, and most importantly, the total number of times that 2 divides elements of A corresponding to the block and the total number of times that 5 divides elements of A corresponding to the block. If we can maintain these information across all queries in all blocks, we will be able to handle all of them.

In my implementation I used one more invariant: before each update and query operation on range [L, R] there is a block starting with A_L and there is a block ending in A_R.

This invariant makes the code more clear, because any update or query will operate only on full blocks. In order to fulfill the condition from the invariant, we can split blocks just before update/query operation if needed - notice that at most 2 blocks need to be split.

Now, how do we handle each operation? Well let’s take a look:

- \texttt{mul}(L, R, X) - We want to multiply all elements in range [L,R] by X. First, we want to compute the number of times that 2 divides X and the number of times that 5 divides X. Having these values, we will update information stored in all blocks falling into range [L, R]

- \texttt{assign}(L, R, Y) - This operation is the most tricky one. Notice that it corresponds to first assigning values 1, 2, \ldots to A_L, A_{L+1}, \ldots A_R and then multiplying each element in [L, R] by Y. This last multiplication is the same as \texttt{mul}(L, R, Y) and we already know how to do it, so let’s take a look how to assign values 1, 2, \ldots to corresponding elements. The method we are going to use is to assign to each block its starting and ending elements of this sequence and these indices can be easily computed while iterating over the blocks from left to right. Then, based on this values, we have to update the information stored in a single block. Notice than we need to overwrite almost all of them because all values have changed in [L, R]. We want to compute the total number of times that 2 and 5 divides elements covered by a single block and we can do it quickly if before handling any queries we precompute prefix sums of these values for all i = 1,\cdots,N. Then we can use indices stored in blocks denoting first and last elements of the sequence written in the block and these prefix sums to compute the total number of times that 2 and 5 divides elements covered by the block

- \texttt{query}(L, R) - This operation is the easiest one. The only think we have to do is to count the total number of times that 2 and 5 divides elements in [L, R] and this corresponds to summing up information stored in all blocks falling into [L, R]. Finally, we return minimum of these two values, because it is equal to the number of trailing zeros in the product of elements in [L, R] as we noticed at the beginning

So far so good, but what are the advantages of the blocks? Well, if there are always less than c \cdot M blocks, each of size at most K, then each single update or query operation will take O(M + K) time, because there will be at most O(M) blocks to iterate over, and splitting them costs O(K), because we split at most 2 blocks before each update/query and each one has size at most K. However, the number of blocks grows when we split them, but on the other hand we want to keep c \cdot M small enough so the complexity is not too high. How to handle it? This is the crucial step: as soon as the number of blocks exceeded c \cdot M, we rebuild the whole array A and construct completely new M blocks as we did in the beginning of the computation. This reconstruction is possible using the information stored in the blocks and one such reconstruction takes O(N) time, because we want to compute the exact value of each element in the array and construct new blocks from the scratch. For more details please refer to implementation section below.

Now it’s time to wisely choose values for M, K and c in such a way that the total complexity is minimized. The intuition is that when M is big, then a single update or query may take longer, but we will need less rebuilding. It turns out, that choosing M closely to \sqrt N and c = 2 works very well. Notice that if M = \sqrt N then K = \sqrt N as well (of course we want them to be integers so actually they might be slightly different). Using these values of the parameters, we have the total time complexity of O(Q \cdot \sqrt N + \big(Q \big/ \sqrt(N)\big) \cdot N). The second value, \big(Q \big/ \sqrt(N)\big) \cdot N, denotes the total time needed for all rebuilding - each rebuild takes O(N) and there will be at most O(Q \big/ \sqrt(N)) of them, because O(M) = O(\sqrt N) queries have to be performed in order to get the number of blocks greater than c \cdot M.

Implementation details

In my implementation I don’t store exact elements of array A. Instead, for each element A_i, I store two information: the number of times 2 divides A_i and the number of times 5 divides A_i. This is the only information we need and it makes the implementation more clear. For all the details please refer to my submission below, especially to see how rebuilding is done and what information exactly I store in blocks to make the code clear enough:

You can see my solution here: https://www.codechef.com/viewsolution/11864811


In the past, I used the same technique to solve the hardest problem from HackerRank Weekly Challenges 7:

Problem https://www.hackerrank.com/contests/w7/challenges/helix

My solution link https://www.hackerrank.com/contests/w7/challenges/helix/submissions/code/1786390

Some discussion about my approach https://www.hackerrank.com/contests/w7/challenges/helix/forum/comments/23253

I hope you enjoyed the description and the method itself.

  • Dinesh Agarwal

    awesome is the word (Y)

  • Dinesh Agarwal

    I implemented the code , However Codechef is showing me segmentation fault
    I tried 10 -15 test cases but cant device a test case for Segmentation fault . .

    • Dinesh Agarwal

      can you help me in solving this bug : )

      • http://chasethered.com pkacprzak

        It depends on the implementation. Try to check if you're trying to not allocated memory addresses and other related errors. Did you see my submission?

        • Dinesh Agarwal

          i solved that error but still wrong answers :(

          • http://chasethered.com pkacprzak

            in all test cases? What you can try to do is to write a brute force solution and a random small test generator. Then you can generate a test using this generator, run both solutions on this test and compare results. If they differ, then take a closer look at the test and the solutions. If not, generate next test until they differ

          • Dinesh Agarwal

            not in all , for subtask 1 -> 3 AC but 2 WA
            in subtask 2 WA at 0.00
            in subtask 3 WA at around 0.8 sec

          • Dinesh Agarwal

            And for checking my solution i am running your solution side by side . Btw I wrote my code taking yours as a reference , since this was my first sqrt decomposition

          • Dinesh Agarwal

            finally getting WA at only one test case . #3rd subtask's last one :(

  • p k

    i am a beginner i show the code but cant understand .....can someone plz show me how it is wrking by taking a sample case..
    thanks in advance :)