AdSense

Thursday, March 19, 2015

Suffix tree

A suffix tree, as its name, is a tree that contains all the suffixes of a given string. Implementation of a suffix allows linear time to search for the substring of a given string.

The implementation of suffix tree can be various, I use the version provided on Cracking the coding interview. Since everyone likes "banana", to illustrate, I am also using "banana" as an example.



There are two data structures used in TreeNode: a list that tracks the indices of the first character of all substrings of the given string, and a map of all children of a given tree node. The key is the the current character and the value is a child tree node. A child node can only have one unique value, that is a character, but the index list contains all indexes the character occurs in the tree.

Insertion: 
Insert the index of the current character, if there exists a child node with the same value (character), get the child and recursively add the index and the substring of the current string into the child node.
If not, initialize a new child node and recursively add the index and the substring to the child node.

Search: 
If the string contains the input substring, it will return a list that contains all starting indices of the substring, otherwise it will return null. To implement this, we recursively search if all characters in the input string are children of the root node, and return the list if we parse to the end of the string.

Construct the tree:
Simply insert all substrings of the string to the root node.

public class SuffixTree {
 TreeNode root = new TreeNode();
 public SuffixTree(String s){
  for(int i = 0; i < s.length(); i++)
   root.insert(s.substring(i), i);
 }
 public List getIndexes(String s){
  return root.getIndexes(s);
 }

}
public class TreeNode {
 Map children;
 char value;
 List indexes;
 public TreeNode(){
  children = new HashMap();
  //track the start index of the substring
  indexes = new ArrayList();
 }
 public void insert(String s, int index){
  indexes.add(index);
  if(s != null && s.length() > 0){
   value = s.charAt(0);
   TreeNode child = null;
   if(children.containsKey(value))
    child = children.get(value);
   else{
    child = new TreeNode();
    children.put(value, child);
   }
   child.insert(s.substring(1), index);
   System.out.println(information());
  }
 }
 public List getIndexes(String s){
  if(s == null || s.length() == 0){
   return indexes;
  }
  else{
   char first = s.charAt(0);
   if(children.containsKey(first)){
    return children.get(first).getIndexes(s.substring(1));
   }
  }
  return null;
 }
}

No comments:

Post a Comment