The Fake Geek's blog: Problems using distributed system

Find the sum of two n-bit binary numbers

Assuming each number is a power of 2, so we can partition each number to two parts, the first one the most significant n / 2 bits and the second one the least significant n / 2 bits. For example, 111,111 can be partitioned to 111 << 3 and 111, now if 010 adds 111, we are good, there is no carry. If 110 adds 111, we will have carries. When the carries is in the least significant parts, we need to modified the most significant parts by the carry. If we keep dividing, in the end, we can solve the problem by just adding 2 bits. Familiar with the approach? Yep, that's divide - and - conquer.

In reality, we cannot divide two numbers all the way to two bits, so here comes the question my friend likes to ask me a lot about it: Given m machines, how many iterations do we need to solve the problem given each machine has x memory? Now if mx > 2 * n, we are good, we can solve the problem by one iteration. However, what if we are given two super large numbers that, say, around 1GB each, and each machine has, 1 kb memory? So since each time we need to put the bits of two numbers into the machine, so each number can only acquire x / 2 memory in the machine. In total we can handle mx / 2 memory (in bits) of each number in each iteration. If the number has n bits, then we will need n / (mx / 2) iterations to solve the problem.

The prefix problem

The prefix product of a given sequence x0, x1, x2, ..., xn is y0, y1, y2, ..., yn where:
y0 = x0
y1 = x0 · x1
y2 = x0 · x1 · x2
...
yn = x0 · x1... · xn
where · can be any binary operations that satisfies:

x · (y · z) = (x · y) · z

The problem requires us to compute the products x0 · x1 · ... · xk where 1 <= k <= n, assuming n is a power of 2.

This problem also follows a divide - and - conquer approach. For k = 2, it is easy, just binary operation. For k = n, if we divide the problem into two parts, and we calculate the product for each part, which is represented as:

P(1, n / 2) = x0 · x1 · ... · xn/2
P(n/2+1, n) = xn/2+1 · xn2/+2 · ... · xn

then P(1, k) for k > n/2 would be P(1, n/ 2) · P(n/2+1, k).
So now we only need to keep dividing the the two subsets until we can calculate the binary product.

If we have enough processors, we can divide the array to n/2 processors, then for the combining part, since for each k > n / 2, we calculate P(1, n / 2) · P(n/2 + 1, k), so we can also divide each operation to multiple processors, so overall we will have O(logn) complexity.

Alternative approach

Initialize P(1, k) = xk for (k = 1, n), let's call x[k]
Let k = 2i, calculate x[2i] <- x[2i - 1] · x[2i]

So if we have n = 8,
x[2] <- x[1] · x[2]
x[4] <- x[3] · x[4]
x[6] <- x[5] · x[6]
x[8] <- x[7] · x[8]

Then we use induction and calculate 2k
x[4] <- x[2] · x[4]
x[8] <- x[6] · x[8]

But we miss the 2i + 1 part:
x[3] <- x[2] · x[3]
x[5] <- x[4] · x[5]
x[7] <- x[6] · x[7]

Do it iteratively, we can solve the problem with O(logn) complexity.

Finding kth smallest element
Given a interconnection network with is a complete binary tree with n = 2^(h - 1) nodes (Each node represents a processor). Find the kth smallest element. This tree-machine structure has been used in imaging processing applications, where the leaves correspond to inputs (e.g., pixels) and the algorithms for manipulating them are hierarchical.

We know that if given an array, we can use Quick select to find the solve the problem. Here, we can still apply a similar approach:

1. choosing a random element as pivot,
2. computing its rank
3. eliminating.

Choosing a random element can be achieved by a tournament. Each leaf competes with its sibling, if it wins, it sends its number to the parent. The root will get an overall winner, and that is our pivot.
The pivot is then broadcast down the tree. All leaves compare with it, and send:

1: if their number is smaller than or equal to the pivot or
0: otherwise

to their parent. The rank of the pivot is the number of 1s the root gets.

Then based on the rank of the pivot, each leave node can determine if its number is eliminated.

In each iteration, we will have four waves of communication:
1. up the tree to choose a pivot;
2. down the tree to broadcast it;
3. up the tree to compute its rank;
4. down the tree to broadcast the rank and eliminate the element

Now after the iteration, there is highly chance that we will need another one. But the problem is, the tournament is no longer fair. The node with a sibling node's number being eliminated will be automatically promoted to the next level. If, in an extreme case, all nodes in half of the tree are eliminated except for one, then this node will be promoted to the root to compete with another half of the tree with total probability 1 / 2. The other half of the tree, on the other hand, will have to killing each other to get promoted, which leads to a much lower total probability to win the game. So how to keep the fairness?
Alternatively, we can set up a counter, which is initialized as 1. This counter indicates the real opponents (the numbers that haven't been eliminated) that participated in the part of the tournament involving this element. When an element wins a game at some node, it's promoted upward, and the losing element's counter is added to the wining one's. Every game now is played with a biased coin according to the counter of the competing elements.
For example, if x wins its first game against y, x will have a counter of 2, while z advances by competing nobody (counter = 1). If x now plays with z, x will have probability 2/3 to win the game while z will only have 1 / 3. Is it fair? Indeed it is. Since when x plays against y, both nodes have a chance of 2 / 3 * 1/ 2 of winning, so overall x (or y) has 1/ 3 chance of wining, while z has overall 1 / 3 chance of wining, so we still preserve uniform randomness.

Since in each iteration, we will need to communicate from root to leaves 4 times, so we need O(logn) for communications. On average we will need to do O(logn) iterations (hopefully we are not too unlucky), so average running time is O((logn)^2).

Note that most of the time, the root and leaves are waiting: the root waits for the children to send up the pivot; the leaves waits for their parents to send down the pivot. So to improve efficiency, we can do this in a pipeline fashion. After the first layer of winners are selected and proceeds to their parents, the leaves start the second tournament again. And we can keep eliminating elements based on the pivots that was sent down in previous tournaments. This at anytime, we will have h pivots, so we can cut the elements to 1 / (h + 1) of its original size. The regular algorithm requires O(log n ) steps. Since now we can generate h = log n pivots at about the same time, we can save a factor of O(log(logn)). So the running time is reduced to O((logn)^2 / (log(logn))).

The Fake Geek's blog

AdSense

Thursday, April 2, 2015

Problems using distributed system

No comments:

Post a Comment