Tuesday, March 24, 2015

Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file.Assume you have 1 GB of memory.
FOLLOW UPWhat if you have only 10 MB of memory?

This is a very interesting problem. The problem requires knowledge on both memory limits or bit operations, and consider I am not good at either, it becomes even more interesting.

So the first question, we have 1 GB memory
1GB memory is 2^30 bits, so if we use each bit to represent a number, we can put at most 2 ^ 30 numbers, which can fit 4 billion numbers. So the way we do it is to create a byte array with each element represent 1 byte, which can hold 8 numbers. We go through all the numbers and set the corresponding bit to 1. For example, 11 would be in the 2nd byte and the 3rd bit. Then we go through the byte array again, if a bit is not set to one, then that one is the missing one we are looking for.

public static void findMissing() throws FileNotFoundException{
  byte[] bitField = new byte[0xfffffff / 8];
  Scanner in = new Scanner(new FileReader("numbers"));
  while(in.hasNextInt()){
   int n = in.nextInt();
   bitField[n / 8] |= 1 << (n % 8);
  }
  boolean found = false;
  for(int i = 0; i < bitField.length && !found; i++){
   for(int j = 0; j < 8; j++){
    if((bitField[i] & (1 << j)) == 0){
     System.out.println(i * 8 + j);
     found = true;
     break;
    }
   }
  }
 }


Second, we have 10 MB memory
10 MB is around 2^23 + 2 bits, so we cannot put all numbers this time. Naturally we will think of separate the number so that we can fit each portion into the memory. So consider we have a bucket that can put 2^20 numbers, for 2^32 numbers, we will need 2^12 buckets (These numbers are from the solution provided by the book, I don't think the numbers are accurate here, but the idea is the same.  4 billion integers is around 2^30, so I think we only need 2 ^ 10 buckets, anyway).
We use two arrays, the bucket, which has the length of numBucket, and the bitField as the previous question. Now we go through all the numbers, if the number should be in the range of that bucket, we increase the count of that bucket. So if there is a bucket that has less numbers than it is supposed to have (2^20), we know there is at least one missing number in that bucket.
Now we go through the numbers again, and put all numbers in that range into the bitField, as we did in the last problem, and then we find the missing one.


public static void findMissing2() throws FileNotFoundException{
  int bitSize = 1048576; //2^20, number of numbers we need to put in one bucket
  int numBucket = 4096; // (2^32 / 2^20), number of buckets we need
  byte[] bitField = new byte[bitSize / 8];
  int[] blocks = new int[numBucket];
  Scanner in = new Scanner(new FileReader("numbers"));
  //count how many numbers are in each bucket
  while(in.hasNextInt()){
   int n = in.nextInt();
   blocks[n / bitSize]++;
  }
  int startNumber = 0;
  for(int i = 0; i < numBucket; i++){
   if(blocks[i] < bitSize){
    startNumber = i * bitSize;
    break;
   }
  }
  in = new Scanner(new FileReader("numbers"));
  while(in.hasNextInt()){
   int n = in.nextInt();
   if(n >= startNumber && n < startNumber + bitSize){
    bitField[(n - startNumber) / 8] |= (1 << (n - startNumber) % 8);
   }
  }
  boolean found = false;
  for(int i = 0; i < bitField.length && ! found; i++){
   for(int j = 0; j < 8; j++){
    if(((bitField[i] << j) & 1) == 0){
     System.out.println(startNumber + i * 8 + j);
     found = true;
     break;
    }
   }
  }
 }


No comments:

Post a Comment