Sunday, February 22, 2015

Auto Complete method


Write a class that receives in the constructor a vector of strings representing a browser history (URLs), and a method for "auto-complete":
The method that receives a string representing a partial URL typed by the user and returns a vector of all possible completions of this partial URL (prefix).

Starts from some concepts:

Trie 
a trie/digital tree/radix tree/prefix tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings.  No node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.
A radix tree/radix trie/compact prefix tree is a space-optimized trie data structure where each node with only one child is merged with its parent. The result is that every internal node has up to the number of children of the radix r of the radix trie, where r is a positive integer and a power x of 2, having x >= 1. Unlike in regular tries, edges can be labeled with sequences of elements as well as single elements. This makes them much more efficient for small sets (especially if the strings are long) and for sets of strings that share long prefixes. 

Just to go through how this thing works. When we initialize a new trie object, we create an empty trie node. Every trie has an empty root, i.e., the root doesn't hold any string values. The trie uses a map to store all its children, i.e., the suffixes generated from this node.

When we insert a string, the trie first tries to find if any of its children contains the first character of the string (the prefix), if it can find one, it will recursively try to find the next character until it cannot find one, that's when it will add a new node and put it into the children map.

In order to find if a string is in the trie, similarly, it will recursively try to find if every character in the string is in the children map, if any of the character doesn't in a children map, return false, when it reaches to the end of the string, check if the current node's value equals the string.

The autocomplete method, this method is in fact ask us to return all suffixes given a prefix. Follow the same approach as the previous to methods, the trie first finds the path of the prefix, say, s-h-, then it will add all children of the node "sh" and all children's children into the list, and return the list. For example, if "sh"  has children "she", "shirt", "shirley", it will return all 3. Note that "shir" is a path that has two branches: "shirt" and "shirley". However, since "shir" itself is not a string that has been inserted to the trie, it will not be returned.


import java.util.*;
import java.util.Map.Entry;
public class AutoCompleteTrie {
 protected final Map children;
 protected String value;
 protected boolean isWord = false;
 /**
  * constructor, initialize a new ACT with empty string as the root
  * @param value
  */
 private AutoCompleteTrie(String value) {
  this.value = value;
  children = new HashMap ();
 }
 public AutoCompleteTrie() {
  this("");
 }
 /**
  * insert a word, this will construct a root to leave path
  * with the key of each node a character in the word, 
  * value the prefix of the word till the key character
  * @param word
  */
 public void insert (String word) {
  if (word == null)
   throw new IllegalArgumentException("Null string!");
  //this is a reference to the current object: 
  //the object whose method or constructor is being called
  AutoCompleteTrie node = this;
  for (char c : word.toCharArray()) {
   //System.out.println(c + ": " + node.value);
   if (!node.children.containsKey(c))
    node.add(c);
   node = node.children.get(c);
   
  }
  //System.out.println();
  node.isWord = true;
 }
 /**
  * add a child of the node
  * the child will have the key the input character
  * and the value the value of the node + key character
  * @param c
  */
 private void add(char c) {
  String val; 
  val = this.value + c;
  children.put(c, new AutoCompleteTrie(val));
 }
 
 /**
  * search if a word exists in the trie
  * @param word
  * @return
  */
 public boolean find(String word) {
  AutoCompleteTrie node = this;
  for (char c : word.toCharArray()) {
   if (!node.children.containsKey(c)) 
    return false;
   node = node.children.get(c);
  }
  return node.value.equals(word);
 }
 
 /**
  * input prefix return all possible strings 
  * with the same prefix in the trie
  * find the node of the prefix in the trie
  * call allChildren method and return the list
  * @param prefix
  * @return
  */
 public Collection autoComplete(String prefix) {
  AutoCompleteTrie node = this;
  for (char c : prefix.toCharArray()) {
   if (!node.children.containsKey(c))
    return Collections.emptyList();
   node = node.children.get(c);
  }
  return node.allChildren();
 }
 /**
  * generate the list of all children of the current node/object
  * @return
  */
 private Collection allChildren() {
  List rst = new ArrayList ();
  //if the prefix is also a previous input word, add it to the list
  if (this.isWord) {
   rst.add(this.value);
  }
  //add all children
  for (Entry entry : children.entrySet()) {
   AutoCompleteTrie child = entry.getValue();
   Collection childPrefixes = child.allChildren();
   rst.addAll(childPrefixes);
  }
  return rst;
 }
}

No comments:

Post a Comment