Monday, January 12, 2015

Implement strStr()

Implement strStr().
Returns the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

If you are interested in a more advanced algorithm, KMP algorithm, look this. This post will implement this problem using a hash function. It is based on the rolling hash method. Basically, we calculate a hash function for the pattern string, or the needle, and compare it with the text string, or the haystack. The hash function is calculated as the following way:


The constant is chosen as 29 in this case. When comparing with the haystack, we remove the first part of the hash function and add the part for the new character. It's like a sliding window. 


public int strStr(String haystack, String needle) {
        if (haystack == null || needle == null) 
            return -1;
        if (haystack.length() == 0)
         return needle.length() == 0 ? 0 : -1;
        if (needle.length() == 0)
            return 0;
        if (haystack.length() < needle.length())
         return -1;
        int base = 29;
        int n = needle.length();
        long tmpBase = 1;
        long needleHash = 0;
        
        for (int i = n - 1; i >= 0; i--) {
         needleHash += (long)needle.charAt(i) * tmpBase;
            tmpBase *= base;
        }
        tmpBase = 1;
        long haystackHash = 0;
        for (int i = n - 1; i >= 0; i--) {
            haystackHash += (long)haystack.charAt(i) * tmpBase;
            tmpBase *= base;
        }
        if (haystackHash == needleHash)
            return 0;
        tmpBase /= base;
        for (int i = n; i < haystack.length(); i++) {
            haystackHash = (haystackHash - (long)haystack.charAt(i - n) * tmpBase) * base + (long)haystack.charAt(i);
            if (haystackHash == needleHash)
                return i - n + 1;
        }
        return -1;
    }

Update: 2015 - 01 - 19
As usual, my curiosity drove me to do the following performance test. It looks like KMP algorithm does hit the lower bound of the complexity:

P.S.: The test strings, if you are interested, are:
String haystack = "mississippimississippimississipippimissisippimissisippimissispimississippimississippimississippimississippimississippiabcabdabc"
String needle = "abcabdabc"


P.P.S: The KMP code:

public int strStr(String haystack, String needle) {
        if (needle == null || needle.length() == 0)
            return 0;
        if (haystack == null || haystack.length() == 0)
            return -1;
        if (haystack.length() < needle.length())
            return -1;
        int n = needle.length();
        int h = haystack.length();
        int[] PMT = getPMT(needle);
        int index_h = 0;
        int index_n = 0;
        while (index_h < h) {
            while(index_n >= 0 && haystack.charAt(index_h) != needle.charAt(index_n))
                index_n = PMT[index_n];
            index_h++;
            index_n++;
            if (index_n == n)
                return index_h - n;
        }
        return -1;
    }
    
    private int[] getPMT(String needle) {
        int[] PMT = new int[needle.length() + 1];
        PMT[0] = -1;
        PMT[1] = 0;
        for (int i = 2; i <= needle.length(); i++) {
            PMT[i] = (needle.charAt(i - 1) == needle.charAt(PMT[i - 1])) ? PMT[i - 1] + 1 : 0;
        }
        return PMT;
    }

No comments:

Post a Comment