AdSense

Saturday, October 10, 2015

Scala note 5: Sentence Anagrams

Finally I made it work! It takes much longer than I thought and I am too exhausted to explain everything. The original problem set can be found here.

My friend said overall the class is easy. Not to me. 


package forcomp

import common._

object Anagrams {

  /** A word is simply a `String`. */
  type Word = String

  /** A sentence is a `List` of words. */
  type Sentence = List[Word]

  /** `Occurrences` is a `List` of pairs of characters and positive integers saying
   *  how often the character appears.
   *  This list is sorted alphabetically w.r.t. to the character in each pair.
   *  All characters in the occurrence list are lowercase.
   *  
   *  Any list of pairs of lowercase characters and their frequency which is not sorted
   *  is **not** an occurrence list.
   *  
   *  Note: If the frequency of some character is zero, then that character should not be
   *  in the list.
   */
  type Occurrences = List[(Char, Int)]

  /** The dictionary is simply a sequence of words.
   *  It is predefined and obtained as a sequence using the utility method `loadDictionary`.
   */
  val dictionary: List[Word] = loadDictionary

  /** Converts the word into its character occurence list.
   *  
   *  Note: the uppercase and lowercase version of the character are treated as the
   *  same character, and are represented as a lowercase character in the occurrence list.
   */
  def wordOccurrences(w: Word): Occurrences = 
    w.toLowerCase.groupBy(c => c).mapValues(c => c.length).toList.sorted
    //w.toLowerCase.groupBy(c => c).toList.map(c => (c._1, c._2.length)).sorted

  /** Converts a sentence into its character occurrence list. */
  def sentenceOccurrences(s: Sentence): Occurrences = 
    wordOccurrences((s foldLeft "")(_+_))
    //s.flatten.mkString

  /** The `dictionaryByOccurrences` is a `Map` from different occurrences to a sequence of all
   *  the words that have that occurrence count.
   *  This map serves as an easy way to obtain all the anagrams of a word given its occurrence list.
   *  
   *  For example, the word "eat" has the following character occurrence list:
   *
   *     `List(('a', 1), ('e', 1), ('t', 1))`
   *
   *  Incidentally, so do the words "ate" and "tea".
   *
   *  This means that the `dictionaryByOccurrences` map will contain an entry:
   *
   *    List(('a', 1), ('e', 1), ('t', 1)) -> Seq("ate", "eat", "tea")
   *
   */
  lazy val dictionaryByOccurrences: Map[Occurrences, List[Word]] = 
    dictionary.groupBy(word => wordOccurrences(word))

  /** Returns all the anagrams of a given word. */
  def wordAnagrams(word: Word): List[Word] = 
    dictionaryByOccurrences(wordOccurrences(word))

  /** Returns the list of all subsets of the occurrence list.
   *  This includes the occurrence itself, i.e. `List(('k', 1), ('o', 1))`
   *  is a subset of `List(('k', 1), ('o', 1))`.
   *  It also include the empty subset `List()`.
   * 
   *  Example: the subsets of the occurrence list `List(('a', 2), ('b', 2))` are:
   *
   *    List(
   *      List(),
   *      List(('a', 1)),
   *      List(('a', 2)),
   *      List(('b', 1)),
   *      List(('a', 1), ('b', 1)),
   *      List(('a', 2), ('b', 1)),
   *      List(('b', 2)),
   *      List(('a', 1), ('b', 2)),
   *      List(('a', 2), ('b', 2))
   *    )
   *
   *  Note that the order of the occurrence list subsets does not matter -- the subsets
   *  in the example above could have been displayed in some other order.
   */

  def combinations(occurrences: Occurrences): List[Occurrences] = {
    val ocs: List[Occurrences] = (occurrences.map( occ => (1 to occ._2).map( o => (occ._1, o) ).toList ))
    ocs.foldLeft(List[Occurrences](Nil))( (l1, l2) =>
      l1 ::: (for (elem1 <- data-blogger-escaped---="" data-blogger-escaped--="" data-blogger-escaped-1="" data-blogger-escaped-3="" data-blogger-escaped-:::="" data-blogger-escaped-a="" data-blogger-escaped-an="" data-blogger-escaped-and="" data-blogger-escaped-any="" data-blogger-escaped-appear="" data-blogger-escaped-appearing="" data-blogger-escaped-be="" data-blogger-escaped-cannot="" data-blogger-escaped-character="" data-blogger-escaped-diff="" data-blogger-escaped-elem1="" data-blogger-escaped-elem2="" data-blogger-escaped-equal="" data-blogger-escaped-frequency="" data-blogger-escaped-from="" data-blogger-escaped-has="" data-blogger-escaped-in="" data-blogger-escaped-is="" data-blogger-escaped-it="" data-blogger-escaped-its="" data-blogger-escaped-l1="" data-blogger-escaped-l2="" data-blogger-escaped-list="" data-blogger-escaped-meaning="" data-blogger-escaped-must="" data-blogger-escaped-no="" data-blogger-escaped-note:="" data-blogger-escaped-occurrence="" data-blogger-escaped-of="" data-blogger-escaped-or="" data-blogger-escaped-precondition="" data-blogger-escaped-resulting="" data-blogger-escaped-smaller="" data-blogger-escaped-sorted="" data-blogger-escaped-subset="" data-blogger-escaped-subtract="" data-blogger-escaped-subtracts="" data-blogger-escaped-than="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-use="" data-blogger-escaped-value="" data-blogger-escaped-x="" data-blogger-escaped-y="" data-blogger-escaped-yield="" data-blogger-escaped-zero-entries.=""> List(('a', 2))
  //x diff y => List(('a', 3))
 def subtract(x: Occurrences, y: Occurrences): Occurrences = 
    (x /: y)((newx, elemy) => 
      (for (elemx <- data-blogger-escaped--="" data-blogger-escaped-elemx._1="" data-blogger-escaped-elemx._2="" data-blogger-escaped-elemx="" data-blogger-escaped-elemy._1="" data-blogger-escaped-elemy._2="" data-blogger-escaped-else="" data-blogger-escaped-filter="" data-blogger-escaped-if="" data-blogger-escaped-newx="" data-blogger-escaped-yield=""> 0).sorted
 
  /** Returns a list of all anagram sentences of the given sentence.
   *  
   *  An anagram of a sentence is formed by taking the occurrences of all the characters of
   *  all the words in the sentence, and producing all possible combinations of words with those characters,
   *  such that the words have to be from the dictionary.
   *
   *  The number of words in the sentence and its anagrams does not have to correspond.
   *  For example, the sentence `List("I", "love", "you")` is an anagram of the sentence `List("You", "olive")`.
   *
   *  Also, two sentences with the same words but in a different order are considered two different anagrams.
   *  For example, sentences `List("You", "olive")` and `List("olive", "you")` are different anagrams of
   *  `List("I", "love", "you")`.
   *  
   *  Here is a full example of a sentence `List("Yes", "man")` and its anagrams for our dictionary:
   *
   *    List(
   *      List(en, as, my),
   *      List(en, my, as),
   *      List(man, yes),
   *      List(men, say),
   *      List(as, en, my),
   *      List(as, my, en),
   *      List(sane, my),
   *      List(Sean, my),
   *      List(my, en, as),
   *      List(my, as, en),
   *      List(my, sane),
   *      List(my, Sean),
   *      List(say, men),
   *      List(yes, man)
   *    )
   *
   *  The different sentences do not have to be output in the order shown above - any order is fine as long as
   *  all the anagrams are there. Every returned word has to exist in the dictionary.
   *  
   *  Note: in case that the words of the sentence are in the dictionary, then the sentence is the anagram of itself,
   *  so it has to be returned in this list.
   *
   *  Note: There is only one anagram of an empty sentence.
   */
  
  def sentenceAnagrams(sentence: Sentence): List[Sentence] = 
    sentenceAnagramsHelper(sentenceOccurrences(sentence))

   def sentenceAnagramsHelper(occurrences: Occurrences): List[Sentence] =  occurrences match {
    case Nil => List(Nil)
    case occurrences => {
      val combs = combinations(occurrences)
      //Set(elem) return a boolean value true if elem exists and false if doesn't
      for {i <- combs if dictionarybyoccurrences.keyset (i)
           j <- dictionarybyoccurrences (i) 
           s <- sentenceanagramshelper (subtract (occurrences, i)))
           } yield (j :: s)

Code is also on git.

References (and special thanks):
[1]. Codatlas;
[2]. chancila (gitHub).

1 comment:


  1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete