Tuesday, March 24, 2015

Apply Function to multiple machines

Part 1: You are given a computer #1 with array Foo, a computer #2 with array Bar and a spare computer #3. You need to apply a function F to corresponding/matching elements of the two arrays. How would you do that?
Part 2: Once you scale up, how would you balance the number of machines sorting with the machines applying the function?
Part 3: What if the master (which is distributing the work) dies and never recovers?

I would try to use my limited knowledge to approach this problem. The answer may or may not be correct. For the original problem, see here.


Part1:
From Part2, we know that we need to sort the two arrays. However, the problem is not clear on "matching elements", so I would say, sort them separately in two machines and merge the result in computer #3. We can also assign some elements of both arrays to computer #3 and let it do some of the work, then merge all 3 partially sorted arrays. However, we need to think about the band width problem when we try to distribute some data to the third machine. Any O(nlogn) sorting algorithm would be good.

So the next step is to find all needed elements, distribute them evenly  (to all three) and let them apply the function F.


Part2:
I think the problem may be, while sorting the array, if we find some elements that we need to apply to F, we can just send another job for applying F. The answer provided by the interviewee is quite nice:

Either run a small subset first to get an idea if it is linear distribution and then divide statically according to that or try to adapt during processing depending on the load of the machines.


Part3:
I would say to always keep a copy of the data to the other two machines (assuming we only have 3), and keep communicating with each other for data update. When one fails, use another one as the master.

No comments:

Post a Comment