Codechef Long // July Challenge - GERALD09 the tiebreaker problem

The June challenge is over. I am very happy, because I finished 19th overall, which is my personal best:


While editorials for all problems are available on Codechef forum, almost every solution to the tie breaker problem is unique, so I want to share mine, because it is completely different from the one posted in the official editorial.

Problem statement

The full problem statement is here. You are given a matrix T with n \leq 15 rows and m \leq 15 colums. In addition, you are given an integer k, where 1 \leq k \leq n^2 \cdot m^2. Your task is to fill the matrix with letters from 4-letters alphabet \{A, C, G, T\} in such a way, that the number of different submatrices of matrix T is as close to k as possible - the less the difference is the greater score you have.


I used the concept of 2 dimensional variant of polynomial hashing in order to count the number of different submatrices in time O(n^2 \cdot m^2). The concept allows to compare any two submatrices in O(1) time.

Lets make some observations:

  1. It is impossible to create a matrix with less than n \cdot m different submatrices, because every 2 submatrices of different size are different. It follows that,if k \leq n \cdot m, the best you can do is to fill the matrix with only one letter, lets cay A.
  2. It is impossible to create a matrix with more than M = n \cdot (n + 1) / 2 \cdot m \cdot (m + 1) / 2 different submatrices, because this is the number of all possible submatrices. It follows that if k \geq M, the best you can do is to return a matrix with n rows and m colums with the maximal number of different submatrices.
    My approach is to precompute for every of 225 pairs (n, m) the matrix with the maximum number of different submatrices. I did that by repeatedly exchanging some random letter in a matrix and if the resulting matrix has a greater number of different submatrices than it had before, then it becomes the resulting matrix. Of course it might not return the best matrix, but if you repeat the exchange many times it should be at least quite close to the optimal matrix. It is worth to mention, that such a a process can stuck in a local maximum, so it is better to choose the result from many different runs.
  3. Since the maximum allowed source file limit on Codechef is 50kb, you can store a lot precomputed information in it. Let R := \{s : n \cdot m < s < M\}. Based on two above cases, we know that k \in R. For every pair of (n, m), I assigned to mem[n][m][s] a matrix of size n \times m with exactly s different submatrices, where s is equally distributed over R. For example, let n, m = 10, then 100 < s < 3025 and we chose to memorize matrices which differ in number of different submatrices by 500, then we compute mem[10][10][s] for s \in \{600, 1100, 1600, 2100, 2600\}. It helps us to chose the initial matrix for the beginning of the on-line computation. For example, if n, m = 10, k = 1220 and we computed mem[10][10] as above, we select mem[10][10][1100] as the initial matrix. I computed mem using the same random method as in Step 2.
  4. Lets fill the matrix with the letter A only and consider, how many different submatrices we can generate by placing letter C in the i-th row and j-th column. If we do that, then every submatrix with the upper left corner in (i_1, j_1) where i_1 \leq i and j_2 \leq j and the lower right corner in (i_2, j_2), where i_2 \geq i and j_2 \geq j is unique, because any other matrix of the same size has either no letter C or has letter C at different relative position. It follows that using that method, we can create a matrix with up to about n^2 / 4 \cdot m^2 / 4 + n \cdot m different submatrices. The natural extension is to insert letters G and T at some positions different than (i, j) in our matrix, which helps us to create a matrix with up to about 3/16 \cdot n^2 \cdot m^2 + n \cdot m different submatrices.

We are ready now for the on-line computation. We choose as the initial matrix the one, which can be generated in any of the above steps with the closest number of different submatrices to k.

My online computation consists of only one technique. For any pair of (n, m), I set the number of computation steps to c \cdot 225 / (n^2 \cdot m^2) for the largest possible c which doesn't exceed the time limit, in which I try to exchange some random elements in the matrix and hope to generate a better one.

At the beginning, I count the number of different submatrices in my initial matrix. Then, for every computations step, I exchange exactly \sqrt{n \cdot m} random elements, again count the number of different submatrices and if it generated a better matrix, I store it as a current matrix.


While the obvious method is to try to exchange on-line some elements in order to achieve a better solution, the crucial step is to set up the initial matrix in the best possible way. I am pretty satisfied with my solution. It results in 20th best score in that problem and gave me 19th place in the contest.