Thursday, February 5, 2015

Running Median

"There is a stream of numbers, design an effective datastructre to store the numbers and to return the median at any point of time."


Since the number is coming from a stream, the size of the list is dynamic, so as the median. It is possible to store the numbers in a sorted list and find the middle. However, this will take O(n) to insert an element and O(1) to get the median. So the question is, can we do better?

If the total numbers are divided into two parts, the numbers in the left part are always smaller than those in the right part. If we divide the total number in such a way that the size of the left and right part will differ no larger than 1, then the median will be the smallest number in the right part if the right part has more elements, or the largest number in the left part if the left part has more elements, or the average of the minimum of the left part and maximum of the right part if two parts have equal elements. So how can we find the maximum and minimum?

Use two heaps. The left part will be a max heap and the right part will be a min heap. Though it is quite flexible, I set the rule that if two parts have equal size, then the element will be added to the right part, otherwise it will be inserted to the left part.

Here comes one more question, what if the new number is smaller (or larger) than any number in the left part when it is supposed to be added to the right (left) part? In that case, we can add the number into the left (right) part, and then poll the head, which is the maximum in the left (minimum in the right), and add that number to the right (left) part.

 
package runningMedian;
/**
 * There is a stream of numbers, design an effective data structre to store 
 * the numbers and to return the median at any point of time. 
 * @author shirleyyoung
 *
 */
import java.util.*;
public class RunningMedian {
 //max heap
 PriorityQueue leftQueue;
 //min heap
 PriorityQueue rightQueue;
 
 public RunningMedian() {
  leftQueue = new PriorityQueue (
    new Comparator () {
   public int compare(Integer a, Integer b) {
    return b - a;
   }
  }
  );
  rightQueue = new PriorityQueue ();
 }
 /**
  * the number will be added to the rightQueue if the total numbers are even
  * however, in order for all numbers in the leftQueue to be smaller than those
  * in the rightQueue, we add n to the leftQueue first, then poll out the head 
  * of the leftQueue, which is the maximum in the queue, then add it to the rightQueue
  * if the total numbers are odd, the number will be added to the leftQueue
  * @param n
  */
 public void add(int n) {
  if (leftQueue.size() == rightQueue.size()) {
   if (rightQueue.size() == 0) {
    rightQueue.add(n);
   }
   else {
    leftQueue.add(n);
    int leftMax = leftQueue.poll();
    rightQueue.add(leftMax);
   }
  }
  else {
   if (rightQueue.peek() < n) {
    rightQueue.add(n);
    int rightMin = rightQueue.poll();
    leftQueue.add(rightMin);
   }
   else {
    leftQueue.add(n);
   }
  }
 }
 
 public double getMedian() {
  if (leftQueue.size() == 0 && rightQueue.size() == 0)
   throw new IllegalArgumentException("No elements!");
  if (leftQueue.size() == rightQueue.size()) {
   return (double)(leftQueue.peek() + rightQueue.peek()) / 2.0;
  }
  return (double)rightQueue.peek();
 }
}

No comments:

Post a Comment