The Fake Geek's blog: September 2016

Friday, September 30, 2016

K sum

Given n distinct positive integers, integer k (k <= n) and a numbertarget.

Find k numbers where sum is target. Calculate how many solutions there are?

Have you met this question in a real interview?

Yes

Example

Given [1,2,3,4], k = 2, target = 5.

There are 2 solutions: [1,4] and [2,3].

Return 2.

After we have 2 sum, 3 sum and 4 sum, finally we are embracing k sum. It's easy to think of using backtracking approach... if the question is to find all solutions. However, in the problem, we are asked to give the number of possible solutions. Of course you can still use backtracking, yet the complexity will be way to high for solving the problem.

Usually when we see find the number of solutions, we can guess it may relate to DP. So yes, this problem also uses DP approach. Here we have a 3-D matrix, the first dimension represents array length (number of candidate numbers), the second one represents how many numbers allowed to find the solution, and the last one is the target we want to find.

So initial condition:

matrix[i][0][0] = 1

This means we want to find 0 numbers in i numbers with sum 0.

The recursive formula:

matrix[i][j][t] = matrix[i - 1][j][t] + (t >= A[i - 1]) ? matrix[i - 1][j - 1][t - A[i - 1]] : 0

The first part means the number of solutions in matrix[i][j][t] is the same as the number of solutions of finding j numbers in i - 1 numbers with sum t. The second part indicates if current A[i -1] is smaller than t, the number of solutions should also add finding j - 1 numbers in i - 1 numbers with target t - A[i - 1], because adding A[i] will get us target t.

public int kSum(int A[], int k, int target) {
        
        int[][][] matrix = new int[A.length + 1][k + 1][target + 1];
        
        for (int i = 0; i <= A.length; i++) {
            matrix[i][0][0] = 1;
        }
        
        for (int i = 1; i <= A.length; i++) {
            for (int j = 1; j <= k && j <= i; j++) {
                for (int t = 1; t <= target; t++) {
                    matrix[i][j][t] = matrix[i - 1][j][t] + (t >= A[i - 1] ? matrix[i - 1][j - 1][t - A[i - 1]] : 0);
                }
            }
        }
        return matrix[A.length][k][target];
}

Monday, September 26, 2016

Design a tinyurl system

What is it?

Tinyurl system is a url shortening service, which is a web service for redirecting to long urls to short alias. For example, http://www.this-is-a-long-url.com/page1/page2/page3/ can be replaced by alias http://tinyurl.com/abcd123. If you click the alias of a long url, it will redirect you to the actual url.

First thought

A long url and its alias forms a key-value pair. Since each url and its alias is a one-to-one mapping, simply we would think of having a large map. Now there are few things that need to be considered:

a good hash function
Performance
Scale

Hash function

In the above example http://tinyurl.com/abcd123, abcd123 is a hash id that is created by the hash function. When user types this tiny url, it will look for the actual url by the key abcd123 and redirect the user there. One of the most popular url shortening services simply take the ID in the database of the url and then convert it to Base 62[a-zA-Z0-9], i.e., in record <57, "abcd123", "http://www.this-is-a-long-url.com/page1/page2/page3/ ">, "abcd123" is created by hashing id = 57, which is the index of the record in the db. A simple code snippet is shown in the following:

        private static final String ALPHABET_MAP = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789" ;
 private static final int BASE = ALPHABET_MAP.length() ;

 public static String encode ( int IndexNum ) {
  StringBuilder sb = new StringBuilder() ;
  
  while ( IndexNum > 0 ) {
   sb.append ( ALPHABET_MAP.charAt ( IndexNum % BASE ) ) ;
   IndexNum /= BASE ;
  }
  return sb.reverse().toString() ;
 }

 public static int decode ( String str ) {
  int Num = 0 ;

  for ( int i = 0, len = str.length(); i < len; i++ ) {
   Num = Num * BASE + ALPHABET_MAP.indexOf ( str.charAt(i) ) ;
  }
  return Num ;
 }

Performance

Using GUID vs Incrementing ID

Using GUID may slow down writing process especially when the data set is very large since we have to go through the whole data set to find the correct spot for the id. On the other hand, using incrementing ID means we only need to go to the bottom of the dataset and insert the data.

However, using incrementing ID may loss some flexibility, especially when the service allows user to create custom short url. Using GUID, we can just "reverse the hash" and get the id.

Data store

Using MySQL is always easy, it's stable, and everybody knows how to do it. Yet as we mentioned above, it takes pain when writing to DB. Alternatively, we can use some NoSQL DB for faster writing and reading.

Scale

This is always an exciting topic, your money and your data never aligns on the same scale, but your boss just wants the data to return faster.

How much data needed?

Assuming each record stores <id, hash, url>, and the max length of url is 2083 chars. So approximately 8 (long) + 7 * 4 + 2083 * 4 ~8.4kb. (Not much?)

Data store...again

There are two solutions for the question, and which one to choose depends on your DB.

#1. I like "My"-SQL

Ok, so you want to keep using the beast. What we can do is to do a master-slave replication (write to master, read with proper data sharding on each slave). For master server, we can add more and more RAM to keep it running. For weeks, your boss will be so pissed by the hardware bill and you realize it becomes much harder for you to migrate your DB.

#2. Try something else?

Alternatively, when your dataset is rather small, you probably can try some easier to scale NoSQL database. Now a followup question would be: How to add a new machine?

If you are familiar with Distributed hash table, you know what I'm talking about. If not, you probably want to be patient and read the following explanation.

The basic difference between a normal hash and a distributed hash is ... literately, it's distributed. It provides the same functionality as normal hash, yet the data is distributed over a set of connecting nodes. A normal hash usually takes two variables, key "x" and number of buckets "n". Usually hash function do some calculations and then mod the result by n, i.e., hash(x) % n, which can lead to which bucket "x" belongs. Now if we change number of buckets (by adding one more node), the address of "x" also changes, and that will cause some disaster.

On the contrary, distributed hash defines a range "r", which can be any large number. The address of the key is calculated by hash(x) / r, since the address is independent of number of buckets "n", the number of remapping is dramatically decreased.

For example, consider a key space range r and a logic ring of N nodes, each is responsible of r / N key spaces. When you add one more node between two existing nodes, the new node takes care of some of one or both of siblings' data. All other nodes has no effect from the new node and only one or two nodes need to redistribute keys.

Locating a key in a ring requires searching round the ring one node at a time so it takes O(n) time (O(n)-hop-routing). Other structure (e.g., augmented ring) takes O(log(n)) time, some claims O(1) look-up with more maintenance.

Other things

Now we are talking about distributed system, things like replication, partition and concurrency need to be considered.

Sunday, September 25, 2016

Subsets

Given a set of distinct integers, nums, return all possible subsets.

Note: The solution set must not contain duplicate subsets.

For example,
If nums = [1,2,3], a solution is:

[
  [3],
  [1],
  [2],
  [1,2,3],
  [1,3],
  [2,3],
  [1,2],
  []
]

A very old question, basically we use backtracking to solve it. However, besides normal recursive solution, we can use iteration to solve the problem. The idea behinds it is that for every number in the array, we add it to every existing subset and add the new subset back to result. It takes the same O(2^n) complexity.

public List<List<Integer>> subsets(int[] nums) {
        List<List<Integer>> rst = new ArrayList<>();
        rst.add(new ArrayList<Integer>());
        for (int i = 0; i < nums.length; i++) {
            int size = rst.size();
            for (int j = 0; j < size; j++) {
                List<Integer> newSet = new ArrayList<>(rst.get(j));
                newSet.add(nums[i]);
                rst.add(newSet);
            }
        }
        return rst;
    }

Monday, September 19, 2016

Paint House II

There are a row of n houses, each house can be painted with one of the k colors. The cost of painting each house with a certain color is different. You have to paint all the houses such that no two adjacent houses have the same color.

The cost of painting each house with a certain color is represented by a n x k cost matrix. For example, costs0 is the cost of painting house 0 with color 0; costs1 is the cost of painting house 1 with color 2, and so on... Find the minimum cost to paint all houses.

Note: All costs are positive integers.

Follow up: Could you solve it in O(nk) runtime?

Still DP problem. The deduction is:
costs[i][j] = unitCost[i][j] + Math.min(costs[i - 1][0, ..., k]) if j != the color that gets previous min.
costs[i][j] = unitCost[i][j] + secondMin(costs[i - 1][0, ..., k]) if j == the color that gets previous min.

Track the min and second min for each iteration on j.

public int getCost(int[][] unitCost) {
        int n = unitCost.length;
        int k = unitCost[0].length;
        int[][] costs = new int[n][k];
        int prevMin = 0, prevSecond = 0;
        for (int i = 0; i < n; i++) {
            int currMin = Integer.MAX_VALUE, currSec = Integer.MAX_VALUE;
            for (int j = 0; j < k; j++) {
                if (i == 0) {
                    costs[i][j] = unitCost[i][j];
                } else {
                    costs[i][j] = prevMin == costs[i - 1][j] ? prevSecond : prevMin;
                }
                if (costs[i][j] < currMin) {
                    currSec = currMin;
                    currMin = costs[i][j];
                } else if (costs[i][j] < currSec) {
                    currSec = costs[i][j];
                }
            }
            prevMin = currMin;
            prevSecond = currSec;
        }
        return prevMin;
    }

Thursday, September 15, 2016

Binary Tree Vertical Order Traversal

Given a binary tree, return the vertical order traversal of its nodes' values. (ie, from top to bottom, column by column).

If two nodes are in the same row and column, the order should be from left to right.

Examples:
Given binary tree [3,9,20,null,null,15,7],

return its vertical order traversal as:

[
  [9],
  [3,15],
  [20],
  [7]
]

Given binary tree [3,9,20,4,5,2,7],

    _3_
   /   \
  9    20
 / \   / \
4   5 2   7