The Fake Geek's blog: Strings

Showing posts with label Strings. Show all posts

Tuesday, April 7, 2015

[An overflow post] Longest valid subsequence

Given a string s, and a function isValid(String str), write a function to check the longest subsequence in s that is valid. For example, a subsequence in "whreat" can be "rat", "eat", "what" or "wheat". Please don't speculate the implementation of isValid(String str) function.

I was asked this question yesterday. It bothered some much that the person who asked me this question and I could not agree on the solution. Plus I had a rough day&night last night, so I decide to break my rule to write this overflow post (256).

At first I thought it should be a DP problem, however, since we cannot assume anything about the isValid function, we can not break down the problem to smaller problem, i.e., "wh" is true will not indicate "what" is also true. So the only bloody brutal solution I can think of is backtracking.

public class LongestValidSubSequence {
 private static boolean isValid(String str){
  if(str.equals("what") || str.equals("wheat") || str.equals("eat") || str.equals("rat"))
   return true;
  return false;
 }
 static String max = "";
 public static String longestValidSubsequence(String str){
  getSubsequence(str, new StringBuilder(), 0);
  return max;
 }
 
 private static void getSubsequence(String str, StringBuilder sb, int start){
  if(isValid(sb.toString())){
   String tmp = sb.toString();
   if(max.length() < tmp.length()){
    max = tmp;
   }
  }
  for(int i = start; i < str.length(); i++){
   sb.append(str.charAt(i));
   getSubsequence(str, sb, i + 1);
   sb.deleteCharAt(sb.length() - 1);
  }
 }
 public static void main(String[] args) {
  String s = "whreat";
  System.out.println(longestValidSubsequence(s));

 }

}

Sunday, March 29, 2015

Valid words

You are given a string 's' and you are given a dictionary of english words. You goal is to write an algorithm that returns all words from the dictionary the can be formed by characters from that string 's'.
Example:
s = "ogeg"
following words can be formed from 's': go egg ego . . .
Further it is given that string 's' always consists of four lower case characters. Lets say that the dictionary is contained in a file and can be fitted in the memory. It is up to you which data structure you use to represent the dictionary.
How would you design an efficient algorithm? Follow up: What if the dictionary file can not be fitted in the memory?

This is a very interesting question. The problem statement is kind confusing.
Here is a question, why do we need to put the whole dictionary into the memory? The goal is to output all words that can be formed from s. From the example, we know that the order doesn't matter. So we only need to construct a hash map and store all four characters of s in it. And then read the dict from the stream and compare it with the characters in the hash map. I used another hash map. Since there are at most four characters for both strings, we don't need much extra space, do we?

If the question is, how to store the output words in an efficient way, I guess Trie would be a good one.

public static void validWords(String s, String dict) throws FileNotFoundException, IOException{
  BufferedReader br = new BufferedReader(new FileReader(dict));
  String currLine;
  Map sMap = new HashMap();
  for(int i = 0; i < s.length(); i++){
   char c = s.charAt(i);
   if(!sMap.containsKey(c))
    sMap.put(c, 1);
   else
    sMap.put(c, sMap.get(c) + 1);
  }
  Map word = new HashMap();
  while((currLine = br.readLine()) != null){
   if(currLine.length() > 4)
    continue;
   word.clear();
   boolean valid = true;
   for(int i = 0; i < currLine.length(); i++){
    char c = currLine.charAt(i);
    if(!sMap.containsKey(c)){
     valid = false;
     break;
    }
    if(!word.containsKey(c))
     word.put(c, 1);
    else
     word.put(c, word.get(c) + 1);
    if(word.get(c) > sMap.get(c)){
     valid = false;
     break;
    }
   }
   if(valid)
    System.out.println(currLine);
  }
  br.close();
 }

Thursday, March 26, 2015

Markov String Generator

Update 2015 - 03 - 31:

Ok, I optimized to code and removed those cumbersome extra space. Now my code reads the strings from the file and generate the table directly.

public class StringGenerator2 {
 private class Frequency{
  String s;//hashcode of the string
  int count;//occurrence
  public Frequency(String s, int c){
   this.s = s;
   this.count = c;
  }
 }
 Map> table;
 Random r;
 public StringGenerator2(String fileName) throws FileNotFoundException, IOException{
  table = new HashMap> ();
  r = new Random();
  generateTable(fileName);
  assert(isUniqueList());
 }
 public void generateTable(String fileName) throws FileNotFoundException, IOException{
  BufferedReader reader = new BufferedReader(new FileReader(fileName));
  String line;
  String last = null;
  while((line = reader.readLine()) != null){
   String[] row = line.split(" ");
   for(String s : row){
    String puntua = null;
    if(!Character.isLetter(s.charAt(s.length() - 1))){
     puntua = s.substring(s.length() - 1);
     s = s.substring(0, s.length() - 1);
    }
    if(!table.containsKey(s))
     table.put(s, new ArrayList ());
    if(last != null)
     add(table.get(last), s);
    if(puntua != null)
     add(table.get(s), puntua);
    last = s;
   }
  }
  reader.close();
  mapFrequency();
 }
 private void add(List next, String s){
  int index = -1;
  for(int i = 0; i < next.size(); i++){
   if(next.get(i).s.equals(s)){
    index = i;
    break;
   }
  }
  if(index != -1)
   next.get(index).count++;
  else
   next.add(new Frequency(s, 1));
 }
 private void mapFrequency(){
  for(Entry> e : table.entrySet()){
   List next = e.getValue();
   for(int i = 1; i < next.size(); i++){
    next.get(i).count += next.get(i - 1).count;
   }
  }
 }
 private boolean isUniqueList(){
  Set words = new HashSet ();
  for(List next : table.values()){
   words.clear();
   for(Frequency f : next){
    if(!words.add(f.s))
     return false;
   }
  }
  return true;
 }
 public void generator(String outputName, int length) throws IOException{
  if(table.size() == 0){
   System.out.println("No training set found!");
   return;
  }
  FileWriter writer = new FileWriter(outputName);
  int index = 0;
  String last = null;
  int countWord = 0;//number of words in one line
  while(index < length){
   String s = null;
   if(last == null || !table.containsKey(last)){
    //generate a random string from the key set
    Object[] keys = table.keySet().toArray();
    s = (String) keys[r.nextInt(keys.length)];
   } else
    s = getNext(table.get(last));
   writer.append(s).append(" ");
   countWord++;
   if(countWord == 15){
    writer.append("\n");
    countWord = 0;
   }
   last = s;
   index++;
  }
  writer.append(".\n");
  writer.flush();
  writer.close();
 }
 private String getNext(List next){
  int size = next.size();
  int nextIndex = r.nextInt(next.get(size - 1).count) + 1;
  for(Frequency f : next){
   if(nextIndex <= f.count)
    return f.s;
  }
  return next.get(r.nextInt(size)).s;
 }
  
}

Too much fun for this morning. I saw this problem posted on Career Cup yesterday. I couldn't understand what the problem really meant (see here), especially after I took a look at Wikipedia's couldn't-be-more-confusing explanation on Markov chain. But Google never let me down. I found this blog, which explained it in a clearer way. In short, using Markov chain to generate a new String is like using Bayesian to predict when the first snow in next year will happen: based on the probability of when the first snow in last couple years happened.

In short, given a training set of strings, we create a table of the probability of all the next strings after a given string. Then when we need to generate a new string, we randomly select the first string from our training set, then pick the next string based on the probability of each "next" string in the table given the last generated string.

To get the next string based on the pre-calculated probability, I used this method. Basically when you get the probability distribution. You generate a random number, and based on the corresponding range in the PDF, you select the string.

I have to say that my implementation is not a good one. First, I used a list to store all strings into the memory, it definitely will cause a problem if there are too many strings. However, I don't know how to check if the current string is already in the table if I don't put everything into the memory.

Next, I calculate the frequency, to do that, I used another map, which is used to count the occurrence of each "next" string, then use two arrays to sum up, and finally use a list of structure (string, accumulated frequency) to store all "next" strings, given one string key in the table. See how much extra space I have used, this is definitely not good.

Moreover, when we hit a punctuation, I added the punctuation into the "next" list but didn't include it as a key.

To actually generate the string is easy, as I mentioned before, just keep randomly generate the next string in the "next" string list based on the frequency.

There are lots of ways to optimize this implementation. But I think this is enough for interview/learning purpose.

package markovChain;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.*;
import java.util.Map.Entry;
public class StringGenerator {
 List trainingSet;
 String file;
 public StringGenerator(String file) throws FileNotFoundException, IOException{
  this.file = file;
  trainingSet = new ArrayList ();
  read();
 }
 /**
  * read training set and buffered it into a list
  * @throws FileNotFoundException
  * @throws IOException
  */
 public void read() throws FileNotFoundException, IOException{
  BufferedReader reader = new BufferedReader(new FileReader(file));
  String line;
  while((line = reader.readLine()) != null){
   String[] row = line.split(" ");
   for(String s : row)
    trainingSet.add(s);
  }
  reader.close();  
 }
 /**
  * Generate frequency table
  * @param output
  * @param length
  * @throws IOException
  */
 private Map> generateTable(){
  Map> frequencyTable = new HashMap> ();
  frequencyTable.put(trainingSet.get(0), new ArrayList ());
  for(int i = 1; i < trainingSet.size(); i++){
   String s = trainingSet.get(i);
   String p = null;
   if(!Character.isLetter(s.charAt(s.length() - 1))){
    p = s.substring(s.length() - 1, s.length());
    s = s.substring(0, s.length() - 1);
   }
   if(!frequencyTable.containsKey(s)){
    frequencyTable.put(s, new ArrayList ());
    if(p != null)
     frequencyTable.get(s).add(p);
   }
   String last = trainingSet.get(i - 1);
   if(!Character.isLetter(last.charAt(last.length() - 1)))
    last = last.substring(0, last.length() - 1);
   frequencyTable.get(last).add(s); 
  }
  Map> nextFrequency = getFrequency(frequencyTable);
  return nextFrequency;
 }
 private Map> getFrequency(Map> frequencyTable){
  Map> countFre = new HashMap>();
  for(Entry> e : frequencyTable.entrySet()){
   String key = e.getKey();
   List next = e.getValue();
   Map f = new HashMap();
   for(String s : next){
    if(!f.containsKey(s))
     f.put(s, 1);
    else
     f.put(s, f.get(s) + 1);
   }
   List nextFrequency = mapFrequency(f);
   countFre.put(key, nextFrequency); 
  }
  return countFre;
 }
 private List mapFrequency(Map f){
  String[] array = new String[f.size()];
  int[] c = new int[f.size()];
  int index = 0;
  for(Entry ec : f.entrySet()){
   array[index] = ec.getKey();
   c[index] = ec.getValue();
   index++;
  }
  for(int i = 1; i < c.length; i++){
   c[i] += c[i - 1];
  }
  List rst = new ArrayList ();
  for(int i = 0; i < c.length; i++)
   rst.add(new Frequency(array[i], c[i]));
  return rst;
 }
 
 
 /**
  * generate the string
  * @param output
  * @param length
  * @throws IOException
  */
 public void Generator(String output, int length) throws IOException{
  if(trainingSet.size() == 0){
   System.out.println("No training set found!");
   return;
  }
  Map> nextFrequency = generateTable();
  Random r = new Random();
  FileWriter writer = new FileWriter(output);
  int i = 0;
  String last = null;
  int countWord = 0;
  while(i < length){
   String s = null;
   if(last == null || !nextFrequency.containsKey(last))
    s = trainingSet.get(r.nextInt(trainingSet.size()));
   else
    s = getNext(nextFrequency.get(last));
   writer.append(s).append(" ");
   countWord++;
   if(countWord == 15){
    writer.append("\n");
    countWord = 0;
   }
   last = s;
   i++;
  }
  writer.append(".\n");
  writer.flush();
  writer.close();
 }
 private String getNext(List nextFre){
  int size = nextFre.size();
  Random r = new Random();
  int next = r.nextInt(nextFre.get(size - 1).count) + 1;
  for(Frequency f : nextFre){
   if(next <= f.count)
    return f.s;
  }
  return nextFre.get(r.nextInt(size)).s;
 }
 /**
  * the structure that contains the string and its count
  * @author shirleyyoung
  *
  */
 private class Frequency{
  String s;
  int count;
  public Frequency(String s, int c){
   this.s = s;
   count = c;
  }
 } 
}

Test
In order to show my averseness to the research that I am working on. I decide to use some paragraphs from the papers I am reading as the test case.

"The dynamics of flexible polymers in shear is of great practical interest because this type of flow occurs whenever a fluid flows past a surface.

Macroscopic, non-Newtonian rheological properties of polymer solutions, such

as flow-dependent viscosities and normal

stresses, result from microscopic stresses that

arise when polymeric molecules are stretched

and affect the solvent motion. Thus, much

effort has been directed at predicting the molecular

dynamics of polymers in shear flows. However, it has been difficult to rigorously

test these predictions because the dynamics

of a polymer molecule in shear have

not been observed directly. Experimental efforts

have mainly focused on measuring bulk

rheological properties or on measuring the

scattering of light or neutrons by polymer

solutions. Here we describe how single-molecule imaging techniques can be

used to study the configurations of polymers

in shear flow so that the detailed molecular predictions of theories can be tested.

In short, Shirley doesn't care about how polymer tumbles in shear flow!

Polymer dynamics are of central importance in materials science,

mechanical engineering, biology and medicine. The dynamics of

macromolecular solutions and melts in shear flow are typically

studied using bulk experimental methods such as light and

neutron scattering and birefringence. But the effect of shear

on the conformation and dynamics of individual polymers is

still not well understood. Here we describe observations of the

real-time dynamics of individual, flexible polymers fluorescently

labelled DNA molecules under a shear flow. The sheared

polymers exhibit many types of extended conformation with an

overall orientation ranging from parallel to perpendicular with

respect to the flow direction. For shear rates much smaller than

the inverse of the relaxation time of the molecule, the relative

populations of these two main types of conformation are

controlled by the rate of the shear flow. These results question

the adequacy of assumptions made in standard models of polymer

dynamics."

Here is the 300 strings generated from the input.

"mainly focused on measuring bulk rheological properties or on measuring the rate of extended conformation

and dynamics of individual flexible polymers in shear is still not been difficult to rigorously

test these predictions of shear on measuring bulk rheological properties or on measuring bulk rheological

properties of assumptions made in shear is still not been difficult to rigorously test these

two main types of polymer solutions Here we describe how polymer solutions , the detailed

molecular dynamics of the molecular predictions of light and normal stresses result from microscopic stresses

that the shear on measuring the molecular predictions of polymers exhibit many types of individual

flexible polymers fluorescently labelled DNA molecules are controlled by the shear is of extended conformation

with an overall orientation ranging from microscopic stresses result from microscopic stresses that arise when

polymeric molecules are of polymers in shear flow Polymer dynamics of flow occurs whenever a

shear flows However , by the molecule in shear rates much effort has been observed

directly Experimental efforts have mainly focused on the effect of conformation with an overall orientation

ranging from microscopic stresses that arise when polymeric molecules under a surface . flexible polymers

exhibit many types of light and normal stresses , rheological properties of the adequacy of

a shear flow so that the detailed molecular dynamics of conformation are of polymers in

shear flow occurs whenever a surface . under a shear flow Polymer dynamics of assumptions

made in shear flow are controlled by the dynamics of extended conformation and neutron scattering

and affect the molecule the effect of great practical interest because the scattering and affect

the scattering of polymers is still not been difficult to study the solvent motion Thus

, much smaller than the inverse of shear flow Polymer dynamics of conformation and medicine."

Well, not that bad.

Wednesday, March 4, 2015

Zigzag conversion

The string "PAYPALISHIRING" is written in a zigzag pattern on a given number of rows like this: (you may want to display this pattern in a fixed font for better legibility)

P   A   H   N
A P L S I I G
Y   I   R

And then read line by line: "PAHNAPLSIIGYIR"

Write the code that will take a string and make this conversion given a number of rows:

string convert(string text, int nRows);

convert("PAYPALISHIRING", 3) should return "PAHNAPLSIIGYIR".

I don't like this problem, the only thing this problem is testing is your patience, to find weird and unnecessary patterns.

So the approach. From the problem, we know that for any string, we should get a table like the following (e.g., nRows = 4):

So in the end, we want a string in the following order:

Apparently, we need to find some pattern. First, the steps of each all-filled-column, which is:

2 * (nRows - 1)

Next thing, the interval in between. There are several rules of the interval:

The interval should be greater than zero (of course...) and no larger than the step;
The index of the interval should be less than the length of the string;
No interval if we reach the end of the string;

So if you observe carefully, you will find that the interval follows the following equation:

steps - 2 * i

where i is the row index.

public String convert(String s, int nRows) {
        if(s == null || nRows == 1 || s.length() <= nRows)
            return s;
        int step = 2 * (nRows - 1);
        int count = 0;
        int l = s.length();
        char[] sArray = new char[l];
        for (int i = 0; i < nRows; i++){
            int interval = step - 2 * i;
            for (int j = i; j < l; j += step){
                sArray[count++] = s.charAt(j);
                if(interval > 0 && interval < step && interval + j < l && count < l)
                    sArray[count++] = s.charAt(j + interval);
            }
        }
        return new String(sArray);
    }

Saturday, February 28, 2015

Sort anagram lists

Write a method to sort an array of strings so that all the anagrams are next to each other.

The basic idea is to write a function and rearrange the string to its lexicographic order, and use String comparator to compare two strings. Some concepts about String comparator from Java doc:

Compares two strings lexicographically. The comparison is based on the Unicode value of each character in the strings. The character sequence represented by this String object is compared lexicographically to the character sequence represented by the argument string.

The result is a negative integer if this String object lexicographically precedes the argument string. The result is a positive integer if this String object lexicographically follows the argument string. The result is zero if the strings are equal;compareTo returns 0 exactly when the equals(Object) method would return true.

If two strings are different, then either they have different characters at some index that is a valid index for both strings, or their lengths are different, or both. If they have different characters at one or more index positions, let k be the smallest such index; then the string whose character at position k has the smaller value, as determined by using the < operator, lexicographically precedes the other string.

private static class AnagramComparator implements Comparator {
  public static String sortChars(String s) {
   char[] content = s.toCharArray();
   Arrays.sort(content);
   return new String(content);
  }
  public int compare(String s1, String s2) {
   return sortChars(s1).compareTo(sortChars(s2));
  }
 }
 public static void sortStrings(List list) {
  Collections.sort(list, new AnagramComparator());
 }

Thursday, January 22, 2015

Determine if two strings are match

Two texts are considered to "match" if they have a common substring of at least length n. Describe an algorithm to determine if two strings are matches.

This is another FB interview question. The reason I am writing this blog is that lots of people are talking about using DP, i.e., find the longest common substring. However, I find it unnecessary and more expensive. I implemented both methods and did some performance test.

//hashset method
 public static boolean isMatch(String s1, String s2, int n) {
  if (s1 == null || s2 == null || s1.length() < n || s2.length() < n)
   return false;
  Set substrings = new HashSet ();
  for (int i = 0; i <= s1.length() - n; i++) {
   substrings.add(s1.substring(i, i + n));
  }
  for (int i = 0; i <= s2.length() - n; i++) {
   if(substrings.contains(s2.substring(i, i + n)))
    return true;
  }
  return false;
 }
 //DP method
 public static boolean isMatch2(String s1, String s2, int n) {
  if (s1 == null || s2 == null || s1.length() < n || s2.length() < n)
   return false;
  int lcs = longestCommonSubstring(s1, s2);
  return lcs >= n;
 }
 public static int longestCommonSubstring(String s1, String s2) {
  if (s1 == null || s2 == null)
   throw new NullPointerException("Null string(s)!");
  int[][] lcs = new int[s1.length() + 1][s2.length() + 1];
  int max = 0;
  for (int i = 1; i <= s1.length(); i++) {
   for (int j = 1; j <= s2.length(); j++) {
    if (s1.charAt(i - 1) == s2.charAt(j - 1)) {
     lcs[i][j] = lcs[i - 1][j - 1] + 1;
    }
    max = Math.max(lcs[i][j], max);
   }
  }
  return max;
 }

The hashset method takes O(m + n) time but DP takes O(mn) time. And the memory usage is almost same.

Thursday, January 15, 2015

Largest Number

Given a list of non negative integers, arrange them such that they form the largest number.

For example, given [3, 30, 34, 5, 9], the largest formed number is 9534330.

Note: The result may be very large, so you need to return a string instead of an integer.

I was in the approach when I tried to sort the array. In fact, we do need to sort the array, but not in the traditional way.

If we look at the example, we will quickly figure out that we need to compare the most significant digit in the array, then probably the next, and so on. But the question is, how to do it in an acceptable time manner? Besides, why do we need to put 3 in front of 30?

Thinking the problem in another way: Randomly pick two numbers XYZ and ABC from the array, why do we want to put XYZ (ABC) in advance of ABC(XYC)? Because if we combine two numbers, XYZABC (ABCXYZ) will be larger than ABCXYZ (XYZABC). For example, 3 and 30, 330 is larger than 303, so we want to put 3 in front of 30.

So the only thing we need to do is to write a comparator to compare the concatenations of two numbers, and sort the entire array!

Easier than you thought, right?

Note on string.concat() and +
String.concat() will throw a NullPointerException if either of the string is null, but a + b will return, very interesting:

It treats b as b = "null". Another way to look at + can be like this:

public static String plus(String a, String b) {
  StringBuilder sb = new StringBuilder();
  return sb.append(a).append(b).toString();
 }

The concat() method, from Java src is like this:

public String concat(String str) {
        int otherLen = str.length();
        if (otherLen == 0) {
            return this;
        }
        char buf[] = new char[count + otherLen];
        getChars(0, count, buf, 0);
        str.getChars(0, otherLen, buf, count);
        return new String(0, count + otherLen, buf);
    }

Update: 2015 - 03 - 01

Update comparator method, note the number may be very large and overflow.

private class StringComparator implements Comparator{
        public int compare(String s1, String s2){
            return (int)(Long.parseLong(s1 + s2) - Long.parseLong(s2 + s1));
        }
    }

public String largestNumber(int[] num) {
        if (num == null || num.length == 0)
            return "";
        String[] s = new String[num.length];
        for (int i = 0; i < num.length; i++) {
            s[i] = String.valueOf(num[i]);
        }
        Arrays.sort(s, new StringComparator ());
        String rst = "";
        for (int i = s.length - 1; i >= 0; i--) {
            rst += s[i];
        }
        int i = 0;
        while (i < rst.length() && rst.charAt(i) == '0')
            i++;
        if (i == rst.length())
            return "0";
        return rst.substring(i);
        
    }
    private class StringComparator implements Comparator {
        public int compare (String a, String b) {
            return Integer.parseInt(a.concat(b)) - Integer.parseInt(b.concat(a));
        }
    }

Monday, January 12, 2015

Implement strStr()

Implement strStr().

Returns the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

If you are interested in a more advanced algorithm, KMP algorithm, look this. This post will implement this problem using a hash function. It is based on the rolling hash method. Basically, we calculate a hash function for the pattern string, or the needle, and compare it with the text string, or the haystack. The hash function is calculated as the following way:

The constant is chosen as 29 in this case. When comparing with the haystack, we remove the first part of the hash function and add the part for the new character. It's like a sliding window.

public int strStr(String haystack, String needle) {
        if (haystack == null || needle == null) 
            return -1;
        if (haystack.length() == 0)
         return needle.length() == 0 ? 0 : -1;
        if (needle.length() == 0)
            return 0;
        if (haystack.length() < needle.length())
         return -1;
        int base = 29;
        int n = needle.length();
        long tmpBase = 1;
        long needleHash = 0;
        
        for (int i = n - 1; i >= 0; i--) {
         needleHash += (long)needle.charAt(i) * tmpBase;
            tmpBase *= base;
        }
        tmpBase = 1;
        long haystackHash = 0;
        for (int i = n - 1; i >= 0; i--) {
            haystackHash += (long)haystack.charAt(i) * tmpBase;
            tmpBase *= base;
        }
        if (haystackHash == needleHash)
            return 0;
        tmpBase /= base;
        for (int i = n; i < haystack.length(); i++) {
            haystackHash = (haystackHash - (long)haystack.charAt(i - n) * tmpBase) * base + (long)haystack.charAt(i);
            if (haystackHash == needleHash)
                return i - n + 1;
        }
        return -1;
    }

Update: 2015 - 01 - 19
As usual, my curiosity drove me to do the following performance test. It looks like KMP algorithm does hit the lower bound of the complexity:

P.S.: The test strings, if you are interested, are:

String haystack = "mississippimississippimississipippimissisippimissisippimissispimississippimississippimississippimississippimississippiabcabdabc"

String needle = "abcabdabc"

P.P.S: The KMP code:

public int strStr(String haystack, String needle) {
        if (needle == null || needle.length() == 0)
            return 0;
        if (haystack == null || haystack.length() == 0)
            return -1;
        if (haystack.length() < needle.length())
            return -1;
        int n = needle.length();
        int h = haystack.length();
        int[] PMT = getPMT(needle);
        int index_h = 0;
        int index_n = 0;
        while (index_h < h) {
            while(index_n >= 0 && haystack.charAt(index_h) != needle.charAt(index_n))
                index_n = PMT[index_n];
            index_h++;
            index_n++;
            if (index_n == n)
                return index_h - n;
        }
        return -1;
    }
    
    private int[] getPMT(String needle) {
        int[] PMT = new int[needle.length() + 1];
        PMT[0] = -1;
        PMT[1] = 0;
        for (int i = 2; i <= needle.length(); i++) {
            PMT[i] = (needle.charAt(i - 1) == needle.charAt(PMT[i - 1])) ? PMT[i - 1] + 1 : 0;
        }
        return PMT;
    }

Knuth - Morris - Pratt Algorithm: Let's give it a fancy name, pattern match

The initiative of studying this algorithm is because of this problem from LeetCode:

Implement strStr().

Returns the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

It is an easy level problem, yet I spend my whole night figuring out this algorithm. And yes, I missed the Golden Globe (Congratulations to Eddie Redmayne, who portrayed one of my favorite physicists Stephen Hawking).

This algorithm is indeed a very interesting one. At first glance, it is very hard to understand, and require O(m) extra space, but if you really take some effort and try to understand it, you will realize how neat it can make the whole method.

Partial Match Table
Ok, start from Partial Match Table / Failure Function (PMT). There are lots of resources to explain it, some of them are very fancy and complicated, I put the Wikipedia link here if you want some official explanation. Mine will be simpler. The partial match table matches the longest prefix and suffix that are equal of all substrings in a string. The prefix and suffix do not include the substring.
For example, consider string ="abcabdabc" will form a PMT as shown in the following table.

For an empty substring, the length is not defined, so we choose -1. Another reason to choose -1 is that when we use this table later in the pattern match, it will be easier for index incrementation. For a substring with length equals to 1, there is no prefix or suffix that doesn't include the whole substring, so the length = 0. For substring with length equal to 2 and 3, no equal prefix and suffix is found, so lengths still equal to 0. And we move on....

So how to calculate the PMT? Take a look at the table again. For "ab", since the length of equal prefix & suffix = 0 for the previous substring "a", we compare the first character and the last character, since they are not equal, the length is still 0. Same for "abc", now it comes to "abca", since the the first and last character are equal, the length is 1. Or it is 0 + 1. Then it comes to "abcab", we already know from the last substring that the first character equals the last character (of the last substring), now we add one more character, we compare the second character with the last character (of this substring), the length is 2, or 1 + 1, You see the trend here? Every time, we compare the next character after the longest prefix (that equals to the suffix) with the last character, if the longest prefix = 0, we compare the first character with the last one, if the longest prefix = 1, we compare the second character, which has the index 1, with the last character. However, if the characters compared are not equal, the longest prefix = 0, i.e., :

PMT[i] = (ptrn.charAt(i - 1) == ptrn.charAt(PMT[i - 1])) ? (PMT[i - 1] + 1) : 0;

Yeah, as you know, it's DP.

What is the table used for?

Considering a text string = "abcabcabdabcabcabdabdabc" (I am making it larger to look clearer), how to find the previous pattern string we showed above? You can say we compare each character in the pattern with the text string, if any character doesn't match, we slide the pattern string down to the next character in the text string that matches the first character in the pattern string. Yes, we can do that way, but it will take O(mn) time where m is the length of the pattern string and n is the length of the text string. So, can we do better?

Look at the above figure, now we are at index 5 and the characters don't match. Since we already know that "abcab" has the longest prefix and suffix = 2, that means"ab" matches with "ab", now if we compare the character at index 5 in the text string with the character at index 2 (from the PMT table), they match, so next time we start from the characters at index 6 in the text string and index 3 at pattern string. If they don't match, say the following string, then from the table, the longest prefix of substring "ab" is 0, so unfortunately, we need to start over.

Using this algorithm allows us to trim off lots of unnecessary comparisons, the complexity is O(m + n) compared to the brutal force implementation. However, when the length of the string increases and mismatch increases, the complexity also increases.

public class KMP {
 private int[] getPMT(String ptrn) {
  int ptrnLen = ptrn.length();
  //partial match table
  int[] PMT = new int[ptrnLen + 1];
  PMT[0] = -1;
  PMT[1] = 0;
  for (int i = 2; i <= ptrnLen; i++) {
   PMT[i] = (ptrn.charAt(i - 1) == ptrn.charAt(PMT[i - 1])) ? (PMT[i - 1] + 1) : 0;
   System.out.println(i + ": " + PMT[i]);
  }
  return PMT;
  
     
    
 }
 public List searchSubString(String text, String ptrn) {
  if (text == null || ptrn == null)
   throw new NullPointerException("Null String(s)!");
  List rst = new ArrayList ();
  if (ptrn.length() == 0) {
   rst.add(0);
   return rst;
  }
  if (text.length() == 0 || text.length() < ptrn.length()) {
    return rst;
  }
  
  int indexT = 0;
  int indexP = 0;
  int ptrnLen = ptrn.length();
  int txtLen = text.length();
  int[] PMT = getPMT(ptrn);
  while (indexT < txtLen) {
   while (indexP >= 0 && text.charAt(indexT) != ptrn.charAt(indexP)) {
    indexP = PMT[indexP];
   }
   indexP++;
   indexT++;
   if (indexP == ptrnLen) {
    rst.add(indexT - ptrnLen);
    indexP = PMT[indexP];
   }
  }
  return rst;
 }

Source code can be found here: https://github.com/shirleyyoung0812/Knuth-Morris-Pratt-Algorithm.git

Thursday, January 8, 2015

Multiply Strings

Given two numbers represented as strings, return multiplication of the numbers as a string.

Note: The numbers can be arbitrarily large and are non-negative.

Yeah, here comes the problem that always causes overflow. I used a long variable at first, but it couldn't solve the problem. In the end, I choose to use an array.

public String multiply(String num1, String num2) {
        if (num1 == null || num2 == null)
            return null;
        if (num1.length() == 0 || num2.length() == 0)
            return "0";
        while (num1.length() > 0 && num1.charAt(0) == '0') {
            num1 = num1.substring(1);
        }
        if (num1.length() == 0)
            return "0";
        while (num2.length() > 0 && num2.charAt(0) == '0') {
            num2 = num2.substring(1);
        }
        if (num2.length() == 0)
            return "0";
        
        int l1 = num1.length();
        int l2 = num2.length();
        int[] num3 = new int[l1 + l2];
        for (int i = num2.length() - 1; i >= 0; i--) {
            int p2 = Character.getNumericValue(num2.charAt(i));
            int carry = 0;
            for (int j = num1.length() - 1; j >= 0; j--) {
                int p1 = Character.getNumericValue(num1.charAt(j));
                int product = (p1 * p2 + carry + num3[i + j + 1]);
                carry = product / 10;
                num3[i + j + 1] = product % 10;
            }
            if (carry > 0) {
                num3[i] = carry;
            }
        }
        String rst = "";
        for (int n : num3) {
            rst += String.valueOf(n);
        }
        while (rst.charAt(0) == '0') {
            rst = rst.substring(1);
        }
        return rst;
    }

Friday, January 2, 2015

Substring with Concatenation of All Words

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given:
S: "barfoothefoobarman"
L: ["foo", "bar"]

You should return the indices: [0,9].
(order does not matter).

Two maps are used to store the words in the list and those in the string. I used two hash sets at first because I didn't realize that duplicates are allowed in the list. For example, "aaa" and ["a", "a"].

Another mistake I made was that the increment of index in S cannot be the word length in the list, since those that are not in the list can be in any length. For example, "afoobar" and ["foo", "bar"].

The inner loop is used to check if the substring is a concatenation of all words in the list. Thus, index i becomes the starting index of the substring, which will be added into the result list if all conditions are satisfied.

If any condition is not satisfied, e.g., the substring cannot be found in the list, or the word occurs more than once, we break the inner loop and start from the next index.

The map is cleared at the beginning of every outer loop. If an index is found, or if a concatenation cannot be found from the last index, everything inside the map is useless. We need to start all over again. :)

Update: 2015 - 01 - 15
Walk through:
1. Store all strings in the list to a map.
2. Since all strings in the list needs to be included, the loop will exit at S.length() - L.length * L[0].length().
3. Use another loop to check for the concatenation of the words in the list. Use count to check if all strings in the list are included.
1. If the substring that equals to the length of the words is not in the list map, break;
2. If the number of any string in the list exceeds the number of that string in the list, break;
4. Add the start index if count == L.length.
5. Increment index, clear map.

public class SubstringWithConcatenation {
    public List findSubstring(String S, String[] L) {
        if (S == null || L == null)
            throw new NullPointerException("Null string or list!");
        List rst = new ArrayList ();
        if (S.length() == 0 || L.length == 0)
            return rst;
        int len = L[0].length();
        if (len > S.length())
            return rst;
        HashMap L_map = new HashMap ();
        for (int i = 0; i < L.length; i++) {
            if (!L_map.containsKey(L[i]))
                L_map.put(L[i], 1);
            else
                L_map.put(L[i], L_map.get(L[i]) + 1);
        }
        HashMap hm = new HashMap ();
        for (int i = 0; i <= S.length() - L.length * len; i++) {
            hm.clear();
            int count = 0;
            for (count = 0; count < L.length; count++) {
                int pos = i + count * len;
                String tmp = S.substring(pos, pos + len);
                if (!L_map.containsKey(tmp)) 
                    break;
                if (!hm.containsKey(tmp)) {
                        hm.put(tmp, 1);
                }
                else {
                    hm.put(tmp, hm.get(tmp) + 1);
                    if (hm.get(tmp) > L_map.get(tmp))
                        break;
                }
            }
            if (count == L.length)
                rst.add(i);
        }
        return rst;
    }
}

Tuesday, December 30, 2014

Isomorphic Strings

Given two (dictionary) words as Strings, determine if they are isomorphic. Two words are called isomorphic
if the letters in one word can be remapped to get the second word. Remapping a letter means replacing all
occurrences of it with another letter while the ordering of the letters remains unchanged. No two letters
may map to the same letter, but a letter may map to itself.

Example:
given "foo", "app"; returns true
we can map 'f' -> 'a' and 'o' -> 'p'
given "bar", "foo"; returns false
we can't map both 'a' and 'r' to 'o'

given "turtle", "tletur"; returns true
we can map 't' -> 't', 'u' -> 'l', 'r' -> 'e', 'l' -> 'u', 'e' -'r'

given "ab", "ca"; returns true
we can map 'a' -> 'c', 'b'

Click here for the original problem.
I implement two methods, the first one map the character to the first occurred index for both strings, and check if the encodings are the same for both strings. The second one is more traditional, map characters in the first string to the same index one in the second string. However, both methods require two maps.

import java.util.*;
public class IsomorphicString {
 //map index
 public boolean isIsomorphic(String a, String b) {
  if (a.length() != b.length()) 
   return false;
  a = a.toLowerCase();
  b = b.toLowerCase();
  
  Map amap = new HashMap ();
  Map bmap = new HashMap ();
  
  for (int i = 0; i < a.length(); i++) {
   if (!amap.containsKey(a.charAt(i)))
    amap.put(a.charAt(i), i);
   if (!bmap.containsKey(b.charAt(i))) 
    bmap.put(b.charAt(i), i);
  }
  for (int i = 0; i < a.length(); i++) {
   if (amap.get(a.charAt(i)) != bmap.get(b.charAt(i)))
    return false;
  }
  return true;
 }
 //map character
 public boolean isIsomorphic2 (String a, String b) {
  if (a.length() != b.length()) 
   return false;
  a = a.toLowerCase();
  b = b.toLowerCase();
  
  Map amap = new HashMap ();
  Map bmap = new HashMap ();
  
  for (int i = 0; i < a.length(); i++) {
   if (amap.containsKey(a.charAt(i))) {
    if(amap.get(a.charAt(i)) != b.charAt(i))
     return false; 
   }
   if (bmap.containsKey(b.charAt(i))) {
    if (bmap.get(b.charAt(i)) != a.charAt(i))
     return false;
   }
   amap.put(a.charAt(i), b.charAt(i));
   bmap.put(b.charAt(i), a.charAt(i));
  }
  return true;
 }

}

Monday, December 29, 2014

Find all the repeating sub-string sequence

Find all the repeating sub-string sequence of specified length in a large string sequence. The sequences returned i.e. the output must be sorted alphabetically.

For e.g.

Input String: "ABCACBABC"
repeated sub-string length: 3

Output: ABC

Input String: "ABCABCA"
repeated sub-string length: 2

Output: AB, BC, CA

Click here for the original problem.
Ha, very interesting problem. I was wondering why all the add() methods returns boolean type. Now I know, since Set doesn't allow duplicate values, if add() returns false, it means the set fails to add the value, which probably imply there already exists the value. And in this problem, we are utilizing this feature.

import java.util.*;
public class RepeatingSubstring {
 public ArrayList repeatSubstring (String s, int length) {
  if (s == null)
   throw new NullPointerException("Null array");
  ArrayList rst = new ArrayList ();
  if (s.length() == 0)
   return rst;
  HashSet nonRepeating = new HashSet ();
  TreeSet repeating = new TreeSet ();
  for (int i = 0; i + length <= s.length(); i++) {
   if (!nonRepeating.add(s.substring(i, i + length)))
    repeating.add(s.substring(i, i + length));
  }
  rst = new ArrayList (repeating);
  return rst;
 }

}

Sunday, December 28, 2014

Word wrap / Text Justification

Word Wrap / String Justification algorithm.
Given a set of words and a length.
You are required to print the words such that the words on each line end almost on the same column and the number of trailing spaces at the end is minimized.

Given aaa bb cc ddddd and length is 5 print the following output.

aaa
bb cc
ddddd

Click here for the original problem.

This one is similar to LeetCode's Text Justification. Apparently the question is not clear, the length of each line should be 5, but there is one space after "bb" and one space after "cc", which makes the total length of that line 6. I am not sure if there should be no space between "bb" and "cc" or no space after "cc". The common sense tells me there should be one space between two strings and based on "the number of trailing spaces at the end is minimized", I go with the second way.

In case the guy who posted this problem remembers the question wrong and this should be the same as the Text Justification. I include that code as well.

Text Justification

public class Solution {
    public ArrayList fullJustify(String[] words, int L) {
        if (words == null)
            throw new NullPointerException("Null string array!");
        ArrayList rst = new ArrayList ();
        if (words.length == 0)
            return rst;
        int maxLength = words[0].length();
        for (int i = 0; i < words.length; i++) {
            maxLength = Math.max(maxLength, words[i].length()); 
        }
        if (maxLength > L)
            return rst;
            
        int prev_start = 0;
        int currSum = 0;
        int countWord = 0;
        for (int i = 0; i <= words.length; i++) {
            if (i == words.length || (currSum + words[i].length() + countWord > L)) {
                int totalSpace = L - currSum;
                String tmp = "";
                if (i == words.length || countWord == 1) {
                    for (int j = prev_start; j < i; j++) {
                        tmp += words[j];
                        if (j != i - 1)
                            tmp += " ";
                    }
                    tmp = appendSpace(tmp, L - tmp.length());
                }
                else {
                    int spaceEachWord = totalSpace / (countWord - 1);
                    int extraSpace = totalSpace % (countWord - 1);
                    for (int j = prev_start; j < i - 1; j++) {
                        tmp += words[j];
                        if (j != i - 1) {
                            tmp = appendSpace(tmp, spaceEachWord);
                            }
                        
                        if (extraSpace > 0) {
                            tmp += " ";
                            extraSpace--;
                        }
                    }
                    tmp += words[i - 1];
                }
                rst.add(tmp);
                if (i == words.length)
                    break;
                prev_start = i;
                currSum = words[i].length();
                countWord = 1;
            }
            else {
            currSum += words[i].length();
            countWord++;
            }
        }
        return rst;
        
    }
    private String appendSpace(String s, int space) {
        String rst = s;
        for (int i = 0; i < space; i++)
            rst += " ";
        return rst;
    }
}

Word Wrap

import java.util.ArrayList;


public class TextJustification {
 public ArrayList fullJustify(String[] words, int L) {
        if (words == null)
            throw new NullPointerException("Null string array!");
        ArrayList rst = new ArrayList ();
        if (words.length == 0)
            return rst;
        int maxLength = words[0].length();
        for (int i = 0; i < words.length; i++) {
            maxLength = Math.max(maxLength, words[i].length()); 
        }
        if (maxLength > L)
            return rst;
            
        int prev_start = 0;
        int currSum = 0;
        int countWord = 0;
        for (int i = 0; i <= words.length; i++) {
            if (i == words.length || (currSum + words[i].length() + countWord > L)) {
                int totalSpace = L - currSum;
                String tmp = "";
                if (i == words.length || countWord == 1) {
                    for (int j = prev_start; j < i; j++) {
                        tmp += words[j];
                        if (j != i - 1)
                            tmp += " ";
                    }
                    tmp = appendSpace(tmp, L - tmp.length());
                }
                else {
                    int spaceEachWord = totalSpace / (countWord - 1);
                    int extraSpace = totalSpace % (countWord - 1);
                    for (int j = prev_start; j < i - 1; j++) {
                        tmp += words[j];
                        if (j != i - 1) {
                            tmp = appendSpace(tmp, spaceEachWord);
                            }
                        
                        if (extraSpace > 0) {
                            tmp += " ";
                            extraSpace--;
                        }
                    }
                    tmp += words[i - 1];
                }
                rst.add(tmp);
                if (i == words.length)
                    break;
                prev_start = i;
                currSum = words[i].length();
                countWord = 1;
            }
            else {
            currSum += words[i].length();
            countWord++;
            }
        }
        return rst;
        
    }
    private String appendSpace(String s, int space) {
        String rst = s;
        for (int i = 0; i < space; i++)
            rst += " ";
        return rst;
    }

}

Tuesday, December 16, 2014

Palindrome Partitioning I & II

Given a string s, partition s such that every substring of the partition is a palindrome.

Return all possible palindrome partitioning of s.

For example, given s = "aab",
Return

  [
    ["aa","b"],
    ["a","a","b"]
  ]

The first problem is a typical backtracking problem. Start from index 0. We check each substring, and recursively check the next substring, until we reach the end of the string. If we successfully add all substrings into the partition list, we can add this partition to the rst list.

For example: s = "aab"

start = 0

check substring(0, 1) -> (1, 2) -> (2, 3) -> add

remove (2, 3)

(0, 1) -> (1, 3) not palindrome

(0, 2) -> (2, 3) -> add

public class PalindromePartition {
    public ArrayList> partition(String s) {
        if (s == null)
            throw new NullPointerException("Null string!");
        ArrayList> rst = new ArrayList> ();
        if (s.length() == 0)
            return rst;
        partitionString(s, rst, new ArrayList (), 0);
        return rst;
       
    }
    private void partitionString (String s, ArrayList> rst, ArrayList partition, int start) {
        if (start == s.length()) {
            rst.add(new ArrayList (partition));
            return;
        }
        for (int i = start + 1; i <= s.length(); i++) {
            String tmp = s.substring(start, i);
            if (!isPalindrome(tmp))
                continue;
            partition.add(tmp);
            partitionString(s, rst, partition, i);
            partition.remove(partition.size() - 1);
        }
    }
    private boolean isPalindrome(String s) {
        int start = 0;
        int end = s.length() - 1;
        while (start < end) {
            if (s.charAt(start) != s.charAt(end))
                return false;
            start++;
            end--;
        }
        return true;
    }
    
}

Given a string s, partition s such that every substring of the partition is a palindrome.

Return the minimum cuts needed for a palindrome partitioning of s.

For example, given s = "aab",
Return 1 since the palindrome partitioning ["aa","b"] could be produced using 1 cut.

I am definitely not good at DP, especially when I need to it.. twice...

This problem is obviously a typical DP (when it comes to find the maximum or minimum, you probably want to start your approach of DP first).

I like to show things using examples, mainly because I am not at describing things.

s = "abbab"
s.charAt(i) 0 1 2 3 4
s a b b a b
possible places to cut: 0 1 2 3 4 5

So I define an array cut[s.length() + 1]. cut[i] is aimed to store the minimum cut needed for s.substring(0, i).
cut[0] = 0, obviously, empty string.
cut[1] = 0, because s.substring(0,1) is a palindrome (single character).
start from cut[2]:
1. if s.substring(0, i) is a palindrome, cut[i] = 0;
2. otherwise, if s.substring(start,i) (start = 1, ... i - 1) is a palindrome, cut[i] = cut[start] + 1;
return cut[s.length], which is cut[5] in this case.

The next part is, how to check if s.substring(start, end) is a palindrome. Of course we can do it iteratively, but is it more convenient if we draw a table of all palindrome substrings of s and just check the table every time?
Yep, that's pretty much the solution.

1. palindrome[i][i] is obviously a palindrome since it only contains one character.
2. palindrome[i][i + 1] is a palindrome if s.charAt(i) == s.charAt(j)
3. palindrome[i][j] is a palindrome if palindrome[i+1][j-1] && s.charAt(i) ==s.charAt(j)

start \ end 0 1 2 3 4
0 T F F T F
1 T T T F
2 T F T
3 T F
4 T

public class PalindromePartition {
    public int minCut(String s) {
        if (s == null || s.length() == 0)
            return 0;
        //**************************************
        // I write this single loop to check if the string itself is a plindrome, 
        //alternatively you can include it into the matrix or write a boolean checkPalindrome (s) method
        int startp = 0;
        int endp = s.length() - 1;
        while (startp < endp)
        {
         if(s.charAt(startp) != s.charAt(endp))
          break;
         startp++;
         endp--;
        }
        if (endp <= startp)
         return 0;
        //****************************************
        int[] cut = new int[s.length() + 1];
        boolean[][] palindromeMatrix = isPalindrome(s);
        cut[0] = 0;
        cut[1] = 0;
        for (int end = 2; end < cut.length; end++)
        {
            cut[end] = Integer.MAX_VALUE;
            if (palindromeMatrix[0][end - 1])
            {
                cut[end] = 0;
                continue;
            }
            for (int start = 1; start < end; start++)
            {
                if (!palindromeMatrix[start][end - 1])
                    continue;
                cut[end] = Math.min(cut[end], cut[start] + 1);
            }
        }
        return cut[s.length()];
    }
    private boolean[][] isPalindrome(String s)
    {
        boolean[][] palindromeMatrix = new boolean[s.length()][s.length()];
        for (int i = 0; i < s.length(); i++)
            palindromeMatrix[i][i] = true;
        for (int i = 0; i < s.length() - 1; i++)
            palindromeMatrix[i][i + 1] = (s.charAt(i) == s.charAt(i + 1));
        for (int length = 3; length < s.length(); length++)
        {
            for (int start = 0; start <= s.length() - length; start++)
                palindromeMatrix[start][start + length - 1] = (palindromeMatrix[start + 1][start + length - 2] && (s.charAt(start) == s.charAt(start + length - 1)));
        }
        return palindromeMatrix;
    }
}

Saturday, December 13, 2014

Word Break ... ||

When you have one, you must have two! Instead, this one uses a recursion. HashMap is always an amazing data structure. This one uses the map to store temporary strings and corresponding arrays, and thus reduces number of recursions.

Given a string s and a dictionary of words dict, add spaces in s to construct a sentence where each word is a valid dictionary word.

Return all such possible sentences.

For example, given
s = "catsanddog",
dict = ["cat", "cats", "and", "sand", "dog"].

A solution is ["cats and dog", "cat sand dog"].

Take the "catsanddog" as the example.
When the iteration detects "cat", it splits the string to prefix ="cat" and suffix = "sanddog". Take the suffix as the input, the method checks for the next prefix ("sand"). When it detects "dog", the input is in the dictionary, so we simply add the input into the result array (rst) and return.

"catsanddog" -> len == 3, "cat" + "sanddog" -> "sand" + "dog" -> "dog", add to the map and rst, return rst -> "sand" + " " + "dog" (from tmp), add to the map and rst, return rst -> "cat" + " " + "sand dog" = "cat sand dog", add to the map and rst;
"catsanddog" -> len == 4, "cats" + "anddog" -> "and" + "dog" -> "dog", get from map, return rst -> "and" + " " + "dog" -> "cats" + " " + "and dog" = "cats and dog", add to the map and rst, return rst;

public class Solution {
    public ArrayList wordBreak(String s, Set dict) {
        ArrayList rst = new ArrayList ();
        if (s == null || dict.isEmpty())
            return rst;
        HashMap> map = new HashMap>();
        rst = wordBreakHelper(s, dict, map);
        return rst;
        
    }
    private ArrayList wordBreakHelper(String s, Set dict, HashMap> map)
    {
        if (map.containsKey(s))
            return map.get(s);
        ArrayList rst = new ArrayList();
        int length = s.length();
        if (length <= 0)
            return rst;
        for (int len = 1; len <= length; len++)
        {
            String prefix = s.substring(0, len);
            if (dict.contains(prefix))
            {
                if (len == length)
                    rst.add(prefix);
                else
                {
                    String suffix = s.substring(len);
                    ArrayList tmp = wordBreakHelper(suffix, dict, map);
                    for (String items : tmp)
                    {
                        String words = prefix + " " + items;
                        rst.add(words);
                    }
                }
            }
        }
        map.put(s, rst);
        return rst;
    }
}

Scramble String

A very interesting problem. I like this recursion solution, very neat. Remember to check every boundary cases in order to reduce recursions

Given a string s1, we may represent it as a binary tree by partitioning it to two non-empty substrings recursively.

Below is one possible representation of s1 = "great":

    great
   /    \
  gr    eat
 / \    /  \
g   r  e   at
           / \
          a   t

To scramble the string, we may choose any non-leaf node and swap its two children.

For example, if we choose the node "gr" and swap its two children, it produces a scrambled string "rgeat".

    rgeat
   /    \
  rg    eat
 / \    /  \
r   g  e   at
           / \
          a   t

We say that "rgeat" is a scrambled string of "great".

Similarly, if we continue to swap the children of nodes "eat" and "at", it produces a scrambled string "rgtae".

    rgtae
   /    \
  rg    tae
 / \    /  \
r   g  ta  e
       / \
      t   a

We say that "rgtae" is a scrambled string of "great".

Given two strings s1 and s2 of the same length, determine if s2 is a scrambled string of s1.

Update a DP solution. Consider a simple case "gre". The string may be split to "g" and "re", or "gr" and "e". So a scrambled string of "gre" can be "ger", "rge", "reg", "egr", "erg" or itself ("gre"). The DP solution uses an 3D matrix, scramble[k][i][j], the first dimension indicates the length of the substring, and the second and third dimension indicate the start index of first and second string, respectively (s1.substring(i, i + k) and s2.substring(j, j + k)). So similar to the recursion solution, we check every substring of s1 and s2 and construct the matrix.

public boolean isScramble(String s1, String s2) {
        if (s1.length() != s2.length())
            return false;
        if (s1.length() == 0 || s1.equals(s2))
            return true;
        
        int length = s1.length();
        boolean[][][] scramble = new boolean[length][length][length];
        
        for (int i = 0; i < length; i++) {
            for (int j = 0; j < length; j++) {
                scramble[0][i][j] = s1.charAt(i) == s2.charAt(j) ? true : false;
            }
        }
        
        for (int len = 2; len <= length; len++) { //the length of substring
            for (int i = 0; i <= length - len; i++) {
                for (int j = 0; j <= length - len; j++) {
                    boolean r = false;
                    for (int k = 1; k < len && r == false; k++) {
                        r = (scramble[k - 1][i][j] && scramble[len - k - 1][i + k][j + k])
                        || (scramble[k - 1][i][j + len - k] && scramble[len - k - 1][i + k][j]);
                    }
                    scramble[len - 1][i][j] = r;
                }
            }
        }
        return scramble[length - 1][0][0];
    }

public class ScrambleString {
    public boolean isScramble(String s1, String s2) {
        if ((s1 == null && s2 != null) || (s1 != null && s2 == null))
            return false;
        if (s1.length() != s2.length())
            return false;
        if (s1.equals(s2))
            return true;
        int length = s1.length();
        int chars1 = 0;
        int chars2 = 0;
        
        // check if two strings are consisted by same characters
        for (int i = 0; i < length; i++)
        {
            chars1 += Character.getNumericValue(s1.charAt(i));
            chars2 += Character.getNumericValue(s2.charAt(i));
        }
        if (chars1 != chars2)
            return false;
            
        if (s1.length() == 1)
            return true;
        //the string must be swapped at certain position assume s2 is a scramble string
        //iterate through all positions and recursively check 
        for (int i = 1; i < length; i++)
        {
            if (isScramble(s1.substring(0,i), s2.substring(0,i)) && isScramble(s1.substring(i),s2.substring(i)))
                return true;
            // if the string is reversed, the reversed one is also a scramble string
            if (isScramble(s1.substring(0,i), s2.substring(length - i)) && isScramble(s1.substring(i), s2.substring(0, length - i)))
                return true;
        }
        return false;
    }
}

Update: 2015 - 01 -18
I did a performance test on "great" and "rgtae" for both methods:

Yeah, recursion is preferred. :)

AdSense