The Fake Geek's blog: Disjoint Set Union/Find ADT

Saturday, January 17, 2015

Disjoint Set Union/Find ADT

Definition

Set U = {a1, a2, ..., an}
Maintain a partition of U, a set of subsets of U {S1, S2, ..., Sk} such that:

each pair of subsets Si and Sj are disjoint;
together, the subsets cover U;
each subset has a unique name.

Union(a, b) creates a new subset which is the union of a and b;
Find(a) returns the representative member of a set: similar to find parent of a TreeNode in a tree;
Thus Disjoint Set can be represented by an up-tree;
Disjoint set equivalence property: every element of a DS U/F structure belongs to exactly one set;
Dynamic equivalence property: the set of an element can change after execution of a union.

Up-Tree Union-Find Data Structure

Image source: http://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture18/sld015.htm

Each subset is an up-tree with its root as its representative member;
All members of a given set are nodes in that set's up-tree;
Not necessarily binary;
Find(a): traverse from the leaf to the root;
Union(a, b): union the root with the other;

Path Compression

Image source: http://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture18/sld034.htm

Points everything along the path of a find to the root;
Thus reduces the height of the entire access path to 1.

Union-by-rank

Use an extra array to store the rank of a given node. Rank[a] is the depth of the tree rooted at a.

Image source: http://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture18/sld029.htm

Union(a, b): pick the shallower tree and point it at the deeper one.

Image source: http://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture18/sld031.htm

This can cut down on height of the new tree;
Find(a) complexity: a tree of height h must have at least 2 ^h nodes. Find(a) takes O(max height) time. Thus Find(a) takes O(log(n)) time.

Complexity of Union-by-rank path compression

Tarjan proved that the find operation on a set of n elements in a union with m union-by-rank subsets has worst complexity O(m * alpha(m,n)).

Implementation
This implementation uses an array.
The test data set: 16 integers range from 0 to 15.
After union, each set should contain 4 integers.

public class DisjointSets {
 //array[x] stores the root of the tree (representative of the set)
 //if array[x] is less than 0, that means x is the root of the tree (representative of the set)
 private int[] array;
 
 //initial number of elements;
 //also the initial number of disjoint sets
 //since every element is initially in its own set
 public DisjointSets(int numElements) {
  array = new int [numElements];
  //since every element is the root of its tree,
  //set each element in the array to negative
  Arrays.fill(array, -1);
 }

Every element is the root of the tree it belongs

Initial Array: every element is negative

public void union(int root1, int root2) {
  if (array[root2] < array[root1]) 
   array[root1] = root2; // root2 is taller; make root2 new root
  else {
   if (array[root1] == array[root2]) 
    array[root1]--;
   array[root2] = root1; // root1 equal or taller; make root1 new root
  }
 }

After first union

Array after first union

After second union

Array after second union

public int find(int x) {
  if (array[x] < 0)
   return x; //x is the root of the tree; return it
  else {
   //find the root of the tree
   array[x] = find(array[x]);
   return array[x];
  }
 }

Test code and result:

public static void main(String[] args) {
  int NumElements = 16;
  int NumInSameSet = 4;
  DisjointSets s = new DisjointSets(NumElements);
  int set1, set2;
  for (int k = 1; k < NumInSameSet; k *= 2) {
   for (int j = 0; j + k < NumElements; j += 2 * k) {
    set1 = s.find(j);
    System.out.println("set1: " + set1);
    set2 = s.find(j + k);
    System.out.println("set2: " + set2);
    s.union(set1, set2);
   }
  }
  for (int i = 0; i < NumElements; i++) {
   System.out.print(s.find(i) + "*");
   if (i % NumInSameSet == NumInSameSet - 1)
    System.out.println();
  }
  System.out.println(); 
 }

The Fake Geek's blog

AdSense

Saturday, January 17, 2015

Disjoint Set Union/Find ADT

No comments:

Post a Comment