Saturday, April 4, 2015

Determine if an array is almost sorted, if not, how to sort it efficiently?

Given an array A of distinct elements with length n, determine if the array is 90% sorted.

1. Considering the following algorithm:

  • Choose an element with index i independently and uniformly at random from 0 < i < n - 1;
  • Compare the element with A[i - 1], output false if they are not sorted correctly;
  • Compare A[i] with A[i + 1], output false if they are not sorted correctly.


Prove that the algorithm will return false with probability at least 2 / 3 if A is not 90% sorted only the algorithm is repeated k = Ω(n). 

Given the following counter example:
A[n/2 + 1, ..., n, 1, 2, 3, ... n / 2].
It is obvious that A is not 90 % sorted. So if we want to use the above algorithm to prove that A is not 90% sorted, each iteration we need to choose the first element to be 1, and then compare it with n, or the first element must be n, which will return false when compare with the next element.
Either case, the probability is 2 / n. Now this leads the probability of not in either of the above case 1 - 2/n.
Moreover, we know that the algorithm will be terminated once either the above case is determined. So:

(1 - 2/n)^k <= 1/3

We know that ( 1 - 1 / x)^x < 1 / e

so performing a little bit math leads to k >= (ln3)/2 *n = Ω(n)

2. Consider performing binary search on an unsorted array. Given key1 and key2 in A, it will return index i and j. We know that if key1 < key2, then i < j (why?). Now consider the following randomized algorithm:
Randomly pick up index i from A, performing binary search with key A[i], which will return index j. If i != j, return false. 
Show that the algorithm is correct and will return false with probability at least 2/3 if the list is not 90% sorted if k is sufficiently large constant. 

First, it is easy to understand that if the array is sorted, then BST will always return true. Now we want to know that if the array is not 90 % sorted, then it will return false with probability 2 / 3 with a proper k. Alternatively we can translate the problem into a typical probability problem:

Given a bag of balls, we know that at least 10% balls are blue, and the others are red, how many balls do we have to draw from the bag to get a blue ball with the probability at least 2/3? 

Similar to the first question we can get:

(9/10)^k <= 1/3

k >= (ln3)/(ln(10/9))

So if we iterate the algorithm greater than (ln3)/(ln(10/9)), we will find an unsorted element with probability 2/3.

3. So what if we want to check if the array is not (1 - ε) sorted? 

The same:

(1- ε)^k <= 1/3

k >= (ln3)/ε

4. Prove that if the given array is partially sorted in k slots, e.g., 

A={7, 8, 9, 4, 5, 6, 1, 2, 3}, then k = 3

Using insertion sort will lead to O(nk) complexity. 

We know that since that each element is in the right order within its slot, so elements in the first slot need not move, the second slot will move number of elements in the first slot, and so on. Given n elements and k slots, each slot will contain n / k elements. So we will have the following equation:

(n/k) * 0 + (n/k) * 1 + ... + (n/k) * (k - 1) = n(k-1)/2 = O(nk)

5. Find an algorithm that sort this partially sorted array in O(nlog k) times, where k is number of slots. 

This is just merge k sorted array algorithm. We use a priority queue, insert the first element in each slot into the queue, since the size of the queue is always k, inserting an element takes O(logk) time, we have n elements, thus merge the whole array takes O(nlogk) time.

No comments:

Post a Comment