Tuesday, March 24, 2015

Breadth first search using Distributed System


How would you design the data structures for a very large social network (Facebook, LinkedIn, etc)? Describe how you would design an algorithm to show the connection, or path, between two people (e.g., Me -> Bob -> Susan -> Jason -> You).

This problem is from the Cracking code interview. However, I saw a video on YouTube about how to use distributed system (MapReduce) to do Breadth first search. So I guess that would be a good answer.


How to store the graph: The nodes a list of adjacent nodes (if all weights are 1).

At each iteration, we will start from the original node and grow the frontier by one level. The distance to the start node (DistanceTo(startNode) = 0). For all nodes n directly reachable from startNode, DistanceTo(n) = 1.

Using the above graph, if startNode = 1, then DistanceTo(2) = 1, DistanceTo(11) = 1, DistanceTo(5) = 1, ...etc....
For all nodes reachable from other set of nodes S, DistanceTo(n) = 1 + min(DistanceTo(m), m in S).
So if 4, and 7 is reachable by 2, DistanceTo(4) = 2, DistanceTo(7) = 2.

Not the entire adjacency matrix(sparse matrix, adjacent nodes) to the mapper. Each mapper receives a single row, describing who can be reached from some nodes that we've already known about.
Key: node n that is processing
Value: DistanceTo(n), a list of adjacency nodes (nodes n points to).

So for 1:
Key:1
Value: 0,  (2, 3, 5, 11)

Then from those nodes it can reach, we emit those nodes as keys, DistanceTo = D + 1 (shuffle & sort phrase).
So output from mapper:
Key 2, Value 1
Key 3, Value 1
Key 5, Value 1
Key 11, Value 1

The reducer then receive all these values and select the minimum as the new distance. So if 3 can be reached by:
1 -> 3 (1)
1 -> 11 -> 12 -> 3 (3)
The reducer will select 1.

Then we will pick those output keys and move to the next iteration: a non- MapReduce component then feeds the output of this step back into the MapReduce task for another iteration
Mapper emits the node itself and the points-to list as well. So 1 will be sent back to mapper again, so the shortest distanceTo will not be changed.

Eventually all DistanceTo will converge to their shortest distance, so the algorithm will stop if no shorter distance is found.

Add the edge weight to the adjacency nodes, DistanceTo(n) = DistanceTo(m) + weight(m, n).

1 comment:

  1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete