public class RepeatedDNA { private static final Mapdna = new HashMap (); static { dna.put('A', 0); dna.put('C', 1); dna.put('G', 2); dna.put('T', 3); } private final int base = 29; public List findRepeatedDnaSequences(String s) { if (s == null) throw new NullPointerException("Null String!"); //avoid duplicate sequence List rst = new ArrayList (); if (s.length() == 0) return rst; Set sequence = new HashSet (); long hashS = 0; for (int i = 0; i < s.length(); i++) { if (i > 9) { hashS -= (long)Math.pow(base, 9) * dna.get(s.charAt(i - 10)); } hashS = hashS * base + dna.get(s.charAt(i)); if (i > 8 && !sequence.add(hashS)) { if (!rst.contains(s.substring(i - 9, i + 1))) rst.add(s.substring(i - 9, i + 1)); } } return rst; } }
Monday, February 9, 2015
Repeated DNA Sequences
The tricky part of this problem is that we cannot simply use a set to store or substrings: it will exceed memory limit. A smarter way is to use integers instead of strings. I use the rolling hash method, which is also used in the implement strStr() problem (see that problem for detail).
No comments:
Post a Comment