Tag Archives: Graph

My article for HackerEarth Codemonk

HackerEarth just published my first article prepared in purpose of Codemonk series.

This article is a comprehensive tutorial to the most basic graph algorithms, especially useful if you want an overview of the shortest paths problem. In addition, I explained topological sorting also, which is a great property of directed acyclic graphs.

The article is available here: Codemonk: Graph Theory Part II

Long Narrow City // My hard problem in Counter Code by HackerRank

It was great preparing a hard problem for Counter Code hosted by HackerRank. Competition was great, and during 24 hours 26 participants manage to solved it. This is the most detailed editorial that I have written so far.

The problem is available here: Long Narrow City


Problem statement is quite clear, we are given a graph for which we have to count the number of its spanning trees, containing at most k distincted roads. k can be any number in range [0, 200] and there are at most 1000 distincted roads given in the input.

However, the graph can be really huge! It can have up to 3 \cdot 10^{12} vertices - definitely way too big to store them in the memory.

The good news is that is has a very specific structure. As the name of the problem suggests, it is very long, but fortunately a narrow one :)

Specifically, it has 3 layers: top, middle and bottom, each consisting of N vertices. Layers along with graph edges are presented in the below drawing. Notice, that there are N edges between consecutive layers and each layer contains N - 1 internal edges:


Now we know what we are dealing with, so let's start solving it. What I like often to do, is to start with some simpler problem, derive a solution to it, and then adapt it further to the original problem. Let's give it a try here!

Simplified problem

First, let's assume that there are no distincted edges.

Simplified graph

Moreover, let's assume, that the graph has only 2 layers instead of 3. It is even more narrow, isn't it? From now we are referring only to simplified graphs unless we call the graph original explicitly.

Our simplified graph looks like this:


Since we assumed that there are no distincted edges, our task is to find the number of all spanning trees of it.

First, let's define what a segment it our narrow graph is. We call a segment a set of 3 vertices with numbers i, i + N, i + 2N for original graph, and a set of 2 vertices with numbers i, i + N for a simplified one.

An example segments, for original graph and simplified graph, are presented it the below drawing. Green vertices belong to the current segments, while red ones belong to the previous ones. Edges connecting these segments are also presented:









Let tail be the last segment of the graph.

Now we are ready to count the number of spanning trees of simplified graph. A great way to do that, is to try to extend a solution for a smaller graph by adding a new segment to it as a tail.

Let G_{N} be a simplified graph with 3 \cdot N vertices. We can extend G_{N} to G_{N + 1} by adding a new segment to it and connect it to the tail of G_N.

This is the incremental method to build our graphs.

In order to count the number of spanning trees of G_{n + 1}, let's assume that we know the number of spanning trees of G_{N}. How does it help? Well, let T_{N} be any spanning tree of G_N. We can extend T_N to a spanning tree of G_{N+1} by adding some edges of the tail of G_{N+1} to T_N.

The crucial observation is that we can count spanning trees incrementally as well.

But be aware here. Does any spanning tree of G_{N+1} can be produced from a spanning tree of G_{N}? Let's consider the below drawing presenting a valid spanning tree of G_3.


If you look closely, you will notice that this spanning tree cannot be produced from any spanning tree of G_4, because vertices 2 and 5 are connected in it through the tail of G_5.

However, the good news is that it can be producted from a spanning forest of G_4 with two connected components in which the top layer is the first connected component and the bottom layer is the second connected component. What is more important, the following fact is true:

Every spanning tree of G_{N+1} can be produced by extending a spanning tree of G_N or a spanning forest of G_N, in which top and bottom layers are different connected components, by adding a new layer and connecting it with some edges to the previous structure.

This is true, because in any valid spanning tree of G_{N+1}, vertices N and 2N have to be connected by exactly one path. They can be connected by a path which does not use vertices N + 1 and 2N + 1 or by a path which uses them. In the first case, the T_{N+1} is formed from T_N, while is the second case, it is formed from a forest of G_N in which the top and the bottom layers are distinct connected components.

This is a good news, because we can count the number of spanning trees of G_{N+1} by just knowing the number of spanning trees of G_N and the number of specific spanning forests of G_N.

Let T(N) be the number of spanning trees of G_N and F_N be the number of spanning forests of G_N in which top and bottom layers are distinct connected components. In order to get the exact equation for G_N, we have to examine below cases and count the number of possible extensions in each case.

  1. Any T_N can be extended to T_{N+1} in 3 different ways, because we can add any 2 of 3 edges which are adjacent to vertices from the last segment of G_{N+1}
  2. Any F_N can be extended to T_{N+1} in just one way, because we have to add all 3 edges adjacent to vertices from the last segment to G_{N+1}.
  3.  Any T_N can be extended to F_{N+1} in 2 ways, because we have to add just one horizontal edge adjacent to vertices from the last segment of G_{N+1} to T_N and there are two ways to do that.
  4. Any F_N can be extended to F_{N+1} in just one way, because we have to add 2 horizontal edges adjacent to vertices from the last segment of G_{N+1} to F_N .

Let T(N) be the number of valid graphs of type T_N and F(N) be the number of valid graphs of type F_N.

Clearly, T(1) = 1 and F(1) = 1. From the above analysis, we know that:

T(N + 1) = 3 \cdot G(N) + F(N)

F(N + 1) = 2 \cdot G(N) + F(N)

These two equations helps us to compute the number of spanning trees of simplified graphs.

Original graph

That is great, but what does it have to do with our original graph with 3 layers of vertices? Well, the same analysis works, but we have more cases to consider. From now we are referring to original, 3 layer graphs, unless we call the graph explicitly simplified.

First some definition, which will helps us to refer to some things.

Let i, i + N, i + 2N be vertices in the last layer of G_N. We define the following graphs:

Let F^1_N be a spanning forest of G_N, in which vertex i belongs to one connected component, vertices i + N and i + 2N belong to the other one, and there are only two connected components in this forest.

Let F^2 be a spanning forest of G_N, in which vertex i + N belongs to one connected component, vertices i and i + 2N belong to the other one, and there are only two connected components in this forest.

Let F^3_N be a spanning forest of G_N, in which vertex i + 2N belongs to one connected component, vertices i and i + N belong to the other one, and there are only two connected components in this forest.

Let F^0_N be a spanning forest of G_N, in which vertices i, i + N, i + 2N belong to 3 different connected components, and there are 3 connected components in this forest

T_N is defined as before, i.e. it is a spanning tree of G_N, so all vertices i, i + N, i + 2N belong to the only connected component.

All 5 above graphs are presented below - vertices belonging to the same connected component have the same color and there are as many connected components as colors of vertices in the last segments (notice that I did not draw any edges between these vertices just for generality - we don't assume by what path they are connected, we just indicate that they are connected - might be by direct edges and might be by longer paths):

definition_f1 definition_f2 definition_f3 definition_f0 definition_t
Let also F_1(N), F_2(N), F_3(N), F_0(N), G(N) be the number of graphs of corresponding types defined just above.

Based on these values, we can compute G(N + 1) in a similar way that we did it in for simplified graphs. There are just more cases to consider and each of the above 5 functions has its own recursive relation. The exact relations are presented below. You can compute them either by hand or write a small program to do it for you. I decided to do it by hand first and then double check it by a program:

T(N + 1) = 8 \cdot T(N) + 3 \cdot F_1(N) + 4 \cdot F_2(N) + 3 \cdot F_3(N) + 1 \cdot F_0(N)
F_1(N + 1) = 4 \cdot T(N) + 3 \cdot F_1(N) + 2 \cdot F_2(N) + 2 \cdot F_3(N) + 1 \cdot F_0(N)
F_2(N + 1) = 1 \cdot T(N) + 0 \cdot F_1(N) + 1 \cdot F_2(N) + 0 \cdot F_3(N) + 0 \cdot F_0(N)
F_3(N + 1) = 4 \cdot T(N) + 2 \cdot F_1(N) + 2 \cdot F_2(N) + 3 \cdot F_3(N) + 1 \cdot F_0(N)
F_0(N + 1) = 3 \cdot T(N) + 2 \cdot F_1(N) + 2 \cdot F_2(N) + 2 \cdot F_3(N) + 1 \cdot F_0(N)

You have to derive values to T(0), F_1(0), F_2(0), F_3(0), F_0(0) also.

Having these relations, we are able to compute the number of spanning trees of our graphs of any reasonable size. Even if N is as big as 10^{12}, we can still use matrix exponentiation to do that quickly.

So far so good, but remember that we simplified the problem to not having any distincted edges. Let's now get back to our original problem.

Original problem

In the original problem, we have at most 1000 distincted edges and we want to know the answer to how many spanning trees of G_N are there, such that they have at most k distincted edges, where k is at most 200.

How do deal with that problem? Well, do you remember how we extend, let's say F^1_N to T_{N+1}? From the recursive equations derived before, we know that there are 3 ways to do that i.e. there are 3 subsets of edges forming a segment such that you can connect them to the tail of any F^1_N to form a valid T_{N+1}. Let's call the st of these subsets S.

Let's also define F^1_{N,a} as F^1_N with exactly a distincted edges. Similarly we define T_{N,a} and so on. Let's pick any subset of edges in S, call it s, and assume that s has d distincted edges.

The crucial observation here is that we can form a valid T_{N, a + d} from any F^1_{N, a} by adding edges from s to F^1_{N, a}.

Based on these observations we can implement a dynamic programming solution to the the problem.

Let \text{dp}_{T,k,N} be the number of spanning trees of G_N with exactly k distincted edges.

Similarly we define \text{dp}_{F_1,k,N}, \text{dp}_{F_2,k,N}, \text{dp}_{F_3,k,N} and \text{dp}_{F_0,k,N}

Then we can compute any of above tables iterating over each segment in the graph, then iterating over each possible subset of edges in this segment and extending the solution to the larger graphs by examining how many distincted edges this subset has and to what kind of graph this subset extends our current graph (if it extends to any). You can precompute the extension table, i.e. precompute if a graph of type, let's say F_1 can be extended to, let's say F_3, using a particular subset of edges.

However, since N is up to 10^{12} we are not allowed to iterate over all segments in the graph. Think for a second, do we really need to do this? Of course we don't. Notice that there are at most 1000 distincted edges in the input, so there are also at most 1000 segments with any distincted edge, which is like nothing comparing it to the 10^{12} possible segments. What we can do instead, is to assign distincted edges to segments to which they belong, sort this segments in increasing manner and process the graph from left to right. We have two cases to consider then:

  1. The current segment has distincted edges.
    In this case, we can use our dynamic programming method to extend the solution to the graph with one more segment.
  2. The current segment is free of distincted edges.
    In this case, we can see how far to the right is the next segment with distincted edges. Then we can use the recursive equation and matrix exponentiation, which are described in the simplifier problem, to extend the current solution to the segment just before the next segment with distincted edges. Since this method is pretty fast, we are able to skip over huge number of segments without dangerous edges very fast.

Notice that we are not allowed to store the whole \text{dp} table anymore, because there are just too many segments. The good news is that we do not have to do this, because we just need to know the entry corresponding to the last examined segment. This reduces the space complexity a lot and is a common trick in these type of problems.

Since in the statement we are asked to answer questions on how many spanning trees with at most k dangerous edges are for some values of k, we can now easily answer them based on our \text{dp} tables, just by computing:

\text{answer}_{N,k} = \sum_{m=0}^k \text{dp}_{T,N,m}

where \text{answer}_{N,k} is the number of spanning trees of the input graph with exactly k distincted edges.

Time complexity

First, you have to precompute the table of extensions, which stores the information if a graph of one type can be extended to a graph of another type using some particular subset of edges. Since there are 5 types of graphs, 32 subsets of edges, and you can compute the resulting graph for a given graph and subset of edges using \text{dfs} this takes so small amount of time that we do not have to worry about it.

During the computations, sometimes you use skips by matrix exponentiation and since the matrix size if 5 \times 5 and the longest skip has length 10^{12}, one skip takes O(5^3 \cdot \log(10^{12})).

At most 1000 times we are forced to compute the extension of a current graph by dynamic programming, and doing this for one segment takes O(5 \cdot 5 \cdot K \cdot 32), where $K$ is maximum number of distincted roads in a query. This is true, because in \text{dp} table an entry for a graph of one type is based on values of every of 5 other types of smaller graph and we compute the extension for any possible number of distincted edges. Computing one such entry takes 32 subsets of edges to consider. However this could be reduced by just examining the valid subsets of edges for a particular type of graph i.e. these ones which extends it to some other graph.

My problem for Hackerrank Weekly Challenge #12 // Favorite sequence

You can find and solve the problem here: https://www.hackerrank.com/contests/w12/challenges/favourite-sequence

Problem statement

In this problem you are given N sequences of distinct numbers. The task is to construct the shortest sequence S, such that all N sequences are (not necessarily continuous) subsequences of S. If there are many such sequences, then you have to construct the lexicographically smallest one. It is guaranteed that such a sequence exists.

Sequence A[1\ldots n] is lexicographically less than sequence B[1\ldots n] if and only if there exists 1 \le i \le n such that for all 1 \le j < i, A[j] = B[j] \ \ and\ \ A[i] < B[i].


Let's consider the following example first, there are 3 sequences:

1 3
2 3 4
6 5

From the first one, you know that in the resulting sequence 1 has to be before 3, while from the second one you know that 2 has to be before 3 and 3 has to be before 4 and. From the third one you know that 6 has to be before 5.

Based on these facts we can construct a graph of precedence. Let's call this graph G. Numbers presented in sequences are vertices in G and there is a directed edge (u, v) if and only if there is a sequence where v is a direct successor o u.

Out example graph G looks like this:





It is easy to see that the shortest valid sequence will be a topological sorting of vertices of G. A topological order is a linear ordering of vertices such that for every directed edge (u, v), u comes before v in the ordering. The reason for this is that any topological order forms a valid sequence, because it preserves all precedence relations from all input sequences and it is the shortest one, because any number is presented only once. We reduced the problem to finding the lexicographically smallest one topological order of G. If you want to read more about topological order, you can check this article.

A graph can have many topological orders, for example any of below orders is a valid topological order of our graph G, but there are even more valid ones:

2 1 3 4 6 5
1 2 3 4 6 5
6 1 2 5 3 4

The problem statement says that we have to return the lexicographically smallest one and in our case it is: 1 2 3 4 6 5.

While there are several methods of finding a topological order of a graph, we will use the following, because it is easy to modify it to construct the lexicographically smallest one.

Since there exists a topological order of vertices, there exists a vertex with in-degree 0. It is easy to see that if any vertex has in-degree greater than 0, topological order does not exist because is a directed cycle in the graph. We construct the desire order in a main loop:

While there exists a vertex in the graph:
    select a vertex v with in-degree 0, add it to the resulting order and remove it with all outgoing edges from the graph

You can prove that the above method returns a topological order. If we want the lexicographically smallest one, inside the loop we have to pick a vertex represented by a smallest number from vertices with in-degree 0.

In order to implement the above method, we can use a priority queue where vertices are ordered by in-degree and if two vertices have the same in-degree, then a number on these vertices decides (a vertex with smallest number on it has higher priority). While we are in the main loop, we pick the first vertex u from the queue, delete it from it, put in into the resulting order and decrease in-degree of every vertex v such that there is an edge (u, v) in a graph.

Time complexity and implementation details

The time complexity of the above method depends on the queue implementation. We can achieve O((V + E) \cdot \log V) where v is the number of vertices in the graph and E is the number of edges in it, if we implement the queue as a heap with pointers or a balanced binary tree. This is because we V times select the minimum from the queue and E times we decrease in-degrees of vertices in the queue which has at most V elements in it and each of these operations takes logarithmic time on a heap with pointers (you have to have pointers to elements in order to achieve this performance for a decrease operation, because you have to be able to find any element quickly) or a balanced binary tree. We can notice that V and E are at most N \cdot K.

My Editorial to TopCoder SRM 638 Div 2

SRM 638 took place in the middle of the night in my time zone, so I didn't participate.

While I haven't read problems from my division (Div 1) yet, I decided to write an editorial for a whole Div 2 problem set.

One interesting thing is that first two problems can be solve very very fast in python, so having an ability to code in python can be a great advantage during live contest.

1000 problem was solved only by 3 users during the contest so I will give you a really simple method to implement it.

250 Problem // NamingConvention

Problem statement

This is really simple. You are given a string consisting of lowercase letters a-z and underscores '_', where underscores divide consecutive words and there is at least one letter between two underscores. The task is to convert the given string into camel case notation, i.e. first word is written in lower case and every other word starts with a capital letter while other letters are lower case.

The full problem statement is available here: http://community.topcoder.com/stat?c=problem_statement&pm=13521&rd=16081


You can split the string by underscores and capitalize the first letter of each word starting from the second one. Print resulting words one by one. In python this is so simple:

class NamingConvention:
    def toCamelCase(self, variableName):
        s = variableName.split('_')
        return "".join([s[0]] + map(lambda x: x.title(), s[1:]))

The title() method of the class string class is very useful here. The solution has of course linear time complexity.

500 Problem // NarrowPassage2Easy

Problem statement

You are given an array A of size at most 6, where every element is a positive integer not greater than 1000. You are also given a positive integer K not greater than 1000.

The task is to count the number of different orders of A (like permutations, but two elements with same value are considered different) for which there are no elements a_i, a_j such that a_i + a_j > +K and a_j was before a_i in the original array.

The full problem statement is available here: http://community.topcoder.com/stat?c=problem_statement&pm=13295&rd=16081


Since the array has at most 6 elements, we can consider every possible ordering and for each one check if there is a pair of elements, for which the condition from the problem statement is invalid. If there is no such pair, then we count this ordering as a valid and at the end we return the number of valid orderings. Again python makes it so easy to implement:

class NarrowPassage2Easy:
    def count(self, A, K):
        from itertools import permutations
        cnt = 0
        n = len(A)
        for p in permutations(enumerate(A)):
            fail = 0
            for i in range(n):
                for j in range(i + 1, n):
                    if p[i][0] &gt; p[j][0] and p[i][1] + p[j][1] &gt; K:
                        fail = 1
            if not fail:
                cnt += 1                    
        return cnt

If you haven't heard about enumerate() function this is the time to read about it. Time complexity of this method is O(n^2 \cdot n!) where n is the size of the input array and since n \leq 6 it is enough here.

1000 Problem // CandleTimerEasy

Problem statement

You are given a tree consisting of n nodes. Each edge in the tree has a positive length of at most 1000 units.

You can ignite any subset of leaves in the tree. When a node v is ignited, all edges adjacent to it starts burning, and if the fire meets another node u, it becomes ignited and the process continues. The whole process lasts until all edges are burned.

Edges burn at a uniform rate. This means that if you ignite an edge of length L at one end, it will burn in L units of time, but if you place fire at its both ends, it will burn quicker (for example, if you place it in the same time, it will burn after L/2 units of time).

The tree is burned if all its edges are burned.

The task is count the number of different times in which the tree can be burned completely. These times may differ if you ignite different subsets of leaves at the beginning.

The full problem statement is available here: http://community.topcoder.com/stat?c=problem_statement&pm=13519&rd=16081


Since n is at most 20 here, we can try to ignite all subsets of leaves and for each one count the time in which the tree will burn completely.

At first I thought about simulating the process of burning, but it can be quite complex to implement, so I developed a solution very easy to code.

Let d[v][u] be the time in which v is ignited when you ignite only uat the beginning. In other words, d[v][u] is a distance between v and u where u is a leaf in the tree. You can compute d table using dfs for example.

Next we iterate over all subset of ignited leaves. I will show how to count the burning time for a given subset S of size k.

Let f[v] be the time when a node v is ignited. We ignite at the beginning all leaves from S. You can thing of this process as of k different fire flames spreading on the tree. Node v will be ignited by one of these flames. Which one? The one which comes to v first and since we have our d table computed, we can compute f[v] = \min\limits_{u \in S} d[v][u].

Now we know f[v] for any node in the tree. Since the tree is burned when all its edges are burned, we iterate over all edges and for each edge (v, u) we find the time after it becomes completely burned based on values f[v] and f[u].

Time complexity of this method is O(2^L \cdot L \cdot n) where L is the number of leaves in the tree and n is the number of all nodes in it.

I attached my C++ code below:

const int MAX_N = 20;
vi g[MAX_N];
int w[MAX_N][MAX_N];
vi leaves;
int fire1[MAX_N];
int fire2[MAX_N][MAX_N];
void dfs(int v, int p, int len, int root)
    fire2[v][root] = len;
    FOR(i, SZ(g[v]))
        int u = g[v][i];
        if(u == p) continue;
        dfs(u, v, len + w[v][u], root);
class CandleTimerEasy {
    int differentTime(vector A, vector B, vector len) {
        int n = SZ(A) + 1;
        FOR(i, n) g[i].clear();
        FOR(i, n - 1)
            w[B[i]][A[i]] = len[i];
            w[A[i]][B[i]] = len[i];
        FOR(i, n) if(SZ(g[i]) == 1) leaves.pb(i);
        set ans;
        FOR(i, n) dfs(i, i, 0, i);
        for(int b = 1; b &lt; (1 &lt;&lt; SZ(leaves)); ++b)
            int k = b;
            int c = 0;            
            vector vs;
            while(k &gt; 0)
                int v = leaves[c];
                if(k &amp; 1) vs.pb(v);
                k /= 2;
            FOR(i, n) fire1[i] = INF;
            FOR(v, n)
                FOR(i, SZ(vs))
                    int r = vs[i];                    
                    REMIN(fire1[v], fire2[v][r]);
            double res = 0;
            FOR(v, n)
                FOR(i, SZ(g[v]))
                    int u = g[v][i];
                    double burned = min(fire1[v], fire1[u]) + w[v][u];
                    int dif = abs(fire1[v] - fire1[u]);
                    REMIN(burned, min(fire1[v], fire1[u]) + dif + (w[v][u] - dif) / 2.0);
                    REMAX(res, burned);
        return SZ(ans);

TopCoder / SRM 622 div1 250

SRM 622 took place 10 hours ago, but I didn't participate because it was in the middle of the night in Poland. Here's a quick post on the 250 problem.

250 Problem

You're given an integer T and a full weighted directed graph of at most N \leq 50 vertices, where w_{i,j} is the weight of the edge connecting vertex i with vertex j. The task is to count a sum of weights of overloaded edges. An edge e is overloaded if and only if there are at least T pairs of vertices (v, u) such that e lies on any shortest path between v and u. The full problem statement is here SRM 622 div1 250.

The simplest method that came to my mind is to first compute the lengths of shortest paths between all pairs of vertices. Let d_{v, u} denote the distance between v and u. During the contest, the crucial thing here is to use the Floyd-Warshall algorithm rather than for example Dijkstra, not because it's faster, but because it's significantly easier to code.

It remains to decide if an edge e = (v, u) lies on any shortest path between given vertices s and t. Of course we cannot generate all shortest paths from s to t, because the number of such paths can be \Omega(2^n). The most clever method to check it is to observe that:

e = (v, u) lies on any shortest path from s to t if and only if

d_{s, t} = d_{s, v} + w_{v, u} + d_{u, t}

This can be checked in O(1) time because we've just computed distances for all pairs of vertices.

The time complexity of my algorithm is dominated by the cost of doing the above check for the whole graph. For each pair of verices, we iterate over all edges and do a constant time work which results in O(n^4).

Here's my code:

const int MAX_N = 55;

int d[MAX_N][MAX_N];
int cnt[MAX_N][MAX_N];
class BuildingRoutes {
    int build(vector w, int T) {
        int n = w.size();
        FOR(i, n) FOR(j, n) d[i][j] = w[i][j] - '0';
        FOR(k, n)
            FOR(i, n)
                FOR(j, n)
                    REMIN(d[i][j], d[i][k] + d[k][j]);
        FOR(s, n)
            FOR(t, n)
                FOR(i, n)
                    FOR(j, n)
                        if(d[s][t] == d[s][i] + (w[i][j] - '0') + d[j][t]) 
        int res = 0;
        FOR(i, n)
            FOR(j, n)
                if(cnt[i][j] >= T) res += w[i][j] - '0';
        return res;