The Fake Geek's blog: Scala

Showing posts with label Scala. Show all posts

Sunday, October 18, 2015

Scala note 6: Bloxorz

Notes: val, lazy val and def:

def expr = {
    val x = { print("x"); 1}
    lazy val y = { print("y"); 2}
    def z = { print ("z"); 3}
    z + y + x + z + y + x
}
expr

The above program will print "xzyz" while evaluating expr. That is because def evaluates the variable every time it is called, lazy val evaluates the variable when it is called and preserve the value, val evaluates the variable when it is defined and the value is also preserved. Thus, when expr is evaluated, x is first defined and evaluated, y and z are defined but not evaluated. then z and y are called and evaluated, but x is only called. Then z is evaluated again, follwed by calling y and x. Every time a variable is evaluated, it is printed, thus we get the result "xzyz".

Lift: defined in PartialFunction, which returns a function that takes some argument x to Some(this(x)) if this is defined for x and None otherwise. Applications in sequences:

scala> val list = List("Shirley", "Dora", "Jeimee")
list: List[String] = List(Shirley, Dora, Jeimee)

scala> list.lift(0)
res1: Option[String] = Some(Shirley)

scala> list.lift(3)
res2: Option[String] = None

scala> list.lift(0).map ("I love " + _) getOrElse "you"
res3: String = I love Shirley

scala> list.lift(3).map ("I love " + _) getOrElse "you"
res4: String = you

This week's problem focuses on the game Bloxorz.

Attention: You are allowed to submit a maximum of 5 times! for grade purposes. Once you have submitted your solution, you should see your grade and a feedback about your code on the Coursera website within 10 minutes. If you want to improve your grade, just submit an improved solution. The best of all your first 5 submissions will count as the final grade. You can still submit after the 5th time to get feedbacks on your improved solutions, however, these are for research purposes only, and will not be counted towards your final grade.

Download the streams.zip handout archive file and extract it somewhere on your machine.

In this assignment you will implement a solver for a simplified version of a Flash game named “Bloxorz” using streams and lazy evaluation.

As in the previous assignments, you are encouraged to look at the Scala API documentation while solving this exercise, which can be found here:

http://www.scala-lang.org/api/current/index.html

Bloxorz

Bloxorz is a game in Flash, which you can access here. As a first step for this assignment, play it for a few levels.

The objective of Bloxorz is simple; you must navigate your rectangular block to the hole at the end of the board, by rolling it, in the fewest number of moves possible. A block can be moved in 4 possible directions, left, right, up, down, using the appropriate keys on the keyboard.

You will quickly notice that for many levels, you are, in your head, trying to walk through different configurations/positions of where the block can be in order to reach it to the goal position. Equipped with some new programming skills, you can now let your computer do the work!

The idea of this assignment is to code a solver for a simplified version of this game, with no orange tiles, circles or crosses on the terrain. The goal of your program, given a terrain configuration with a start position and a goal position, is to return the exact sequence of keys to type in order to reach the goal position. Naturally, we will be interested in getting the shortest path as well.

State-space Exploration

The theory behind coding a solver for this game is in fact be applicable to many different problems. The general problem we are trying to solve is the following:

We start at some initial state S, and we are trying to reach an end state T.
From every state, there are possible transitions to other states, some of which are out of bounds.
We explore the states, starting from S. by exploring its neighbors and following the chain, until we reach T. There are different ways of exploring the state space. On the two ends of the spectrum are the following techniques:
- depth-first search: when we see a new state, we immediately explore its direct neighbors, and we do this all the way down, until we reach a roadblock. Then we backtrack until the first non-explored neighbor, and continue in the same vein.
- breadth-first search: here, we proceed more cautiously. When we find the neighbors of our current state, we explore each of them for each step. The respective neighbors of these states are then stored to be explored at a later time.

Game Setup

Let us start by setting up our platform. The trait GameDef will contain all the logic regarding how the terrain is setup, the blocks are represented and how they move.

Positions

A position on the game board is represented using the case class Pos(x:Int, y:Int), where x and y represent its coordinates. The scaladoc comment on class Pos explains how to interpret the coordinates:

The x coordinate denotes the position on the vertical axis
The y coordinate is used for the horizontal axis
The coordinates increase when moving down and right

Illustration:

  0 1 2 3   <- y axis
0 o o o o
1 o o o o
2 o # o o    # is at position Pos(2, 1)
3 o o o o

^
|

x axis

The Terrain

We represent our terrain as a function from positions to booleans:

type Terrain = Pos => Boolean

The function returns true for every position that is inside the terrain. Terrains can be created easily from a string representation using the methods in the file StringParserTerrain.scala.

Your first task is to implement two methods in trait StringParserTerrain that are used to parse the terrain and the start / end positions. The Scaladoc comments give precies instructions how they should be implemented.

def terrainFunction(levelVector: Vector[Vector[Char]]): Pos => Boolean = ???
def findChar(c: Char, levelVector: Vector[Vector[Char]]): Pos = ???

def terrainFunction(levelVector: Vector[Vector[Char]]): Pos => Boolean = 
    (pos: Pos) => (!levelVector.lift(pos.x).isEmpty) && (!levelVector(0).lift(pos.y).isEmpty)&& (levelVector(pos.x)(pos.y) != '-')
def findChar(c: Char, levelVector: Vector[Vector[Char]]): Pos = {
      val row = levelVector.indexWhere(_.contains(c))
      val col = levelVector(row).indexOf(c)
      Pos(row, col)
    }

Blocks

Back in the file GameDef.scala, a block is a 2 x 1 x 1 cuboid. We represent it as a case class which contains two fields, the 2d position of both the cubes which make up the block.

A Block is therefore a case class Block(b1: Pos, b2: Pos), and can move in four different directions, each time yielding a new block. To this effect, the methods left, right, up and down are provided.

Given this, you can now define a method isStanding which tells us whether the Block is standing or not:

def isStanding: Boolean = ???

Next, implement a method isLegal on Block which tells us whether a block is on the terrain or off it:

def isLegal: Boolean = ???

  def isStanding: Boolean = b1 == b2

    /**
     * Returns `true` if the block is entirely inside the terrain.
     */
    def isLegal: Boolean = terrain(b1) && terrain(b2)

Finally, we need to implement a method that constructs the initial block for our simulation, the block located at the start position:

def startBlock: Block = ???

def startBlock: Block = Block(startPos, startPos)

Moves and Neighbors

To record which moves we make when navigating the block, we represent the four possible moves as case objects:

sealed abstract class Move
case object Left  extends Move
case object Right extends Move
case object Up    extends Move
case object Down  extends Move

You can now implement the functions neighbors and legalNeighbors on Block, which return a list of tuples: the neighboring blocks, as well as the move to get there.

def neighbors: List[(Block,Move)] = ???
def legalNeighbors: List[(Block,Move)] = ???

def neighbors: List[(Block, Move)] = 
      List((left, Left), (right, Right), (up, Up), (down, Down))

    /**
     * Returns the list of positions reachable from the current block
     * which are inside the terrain.
     */
    def legalNeighbors: List[(Block, Move)] =
      neighbors filter(_._1.isLegal)

Solving the Game

Now that everything is set up, we can concentrate on actually coding our solver which is defined in the file Solver.scala.

We could represent a path to a solution as a Stream[Block]. We however also need to make sure we keep the history on our way to the solution. Therefore, a path is represented as a Stream[(Block, List[Move])], where the second part of the pair records the history of moves so far. Unless otherwise noted, the last move is the head element of the List[Move].

First, implement a function done which determines when we have reached the goal:

def done(b: Block): Boolean = ???

def done(b: Block): Boolean =  b.b1 == goal && b.b2 == goal

Finding Neighbors

Then, implement a function neighborsWithHistory, which, given a block, and its history, returns a stream of neighboring blocks with the corresponding moves.

def neighborsWithHistory(b: Block, history: List[Move]): Stream[(Block, List[Move])] = ???

As mentioned above, the history is ordered so that the most recent move is the head of the list. If you consider Level 1 as defined in Bloxorz.scala, then

neighborsWithHistory(Block(Pos(1,1),Pos(1,1)), List(Left,Up))

results in a stream with the following elements (given as a set):

Set(
  (Block(Pos(1,2),Pos(1,3)), List(Right,Left,Up)),
  (Block(Pos(2,1),Pos(3,1)), List(Down,Left,Up))
)

You should implement the above example as a test case in the test suite BloxorzSuite.

def neighborsWithHistory(b: Block, history: List[Move]): Stream[(Block, List[Move])] = 
    b.legalNeighbors.map {pair=> (pair._1, pair._2 +: history)}.toStream

Avoiding Circles

While exploring a path, we will also track all the blocks we have seen so far, so as to not get lost in circles of movements (such as sequences of left-right-left-right). Implement a function newNeighborsOnly to this effect:

def newNeighborsOnly(neighbors: Stream[(Block, List[Move])],
                     explored: Set[Block]): Stream[(Block, List[Move])] = ???

Example usage:

newNeighborsOnly(
  Set(
    (Block(Pos(1,2),Pos(1,3)), List(Right,Left,Up)),
    (Block(Pos(2,1),Pos(3,1)), List(Down,Left,Up))
  ).toStream,

  Set(Block(Pos(1,2),Pos(1,3)), Block(Pos(1,1),Pos(1,1)))
)

returns

  Set(
    (Block(Pos(2,1),Pos(3,1)), List(Down,Left,Up))
  ).toStream

Again, you should convert this example into a test case.

def newNeighborsOnly(neighbors: Stream[(Block, List[Move])],
                       explored: Set[Block]): Stream[(Block, List[Move])] = 
                         neighbors.filter(n => !explored.contains(n._1))

Finding Solutions

Now to the crux of the solver. Implement a function from, which, given an initial stream and a set of explored blocks, creates a stream containing the possible paths starting from the head of the initial stream:

def from(initial: Stream[(Block, List[Move])],
         explored: Set[Block]): Stream[(Block, List[Move])] = ???

Note: pay attention to how the path is constructed: as discussed in the introduction, the key to getting the shortest path for the problem is to explore the space in a breadth-first manner.

Hint: The case study lecture about the water pouring problem (7.5) might help you.

  def from(initial: Stream[(Block, List[Move])],
           explored: Set[Block]): Stream[(Block, List[Move])] = {
      if (initial.isEmpty) Stream.empty
      else {
        val more = for {
          pair <- initial
          next <- newNeighborsOnly (neighborsWithHistory(pair._1, pair._2), explored)
     } yield next
     initial #::: from(more, explored ++ (more.map(_._1)))
    }
   }

Putting Things together

Finally we can define a lazy val pathsFromStart which is a stream of all the paths that begin at the starting block:

lazy val pathsFromStart: Stream[(Block, List[Move])] = ???

lazy val pathsFromStart: Stream[(Block, List[Move])] = 
    from(Stream((startBlock, List[Move]())), Set[Block]())

We can also define pathToGoal which is a stream of all possible pairs of goal blocks along with their history. Indeed, there can be more than one road to Rome!

lazy val pathsToGoal: Stream[(Block, List[Move])] = ???

lazy val pathsToGoal: Stream[(Block, List[Move])] = pathsFromStart.filter(pair => done(pair._1))

To finish it off, we define solution to contain the (or one of the) shortest list(s) of moves that lead(s) to the goal.

Note: the head element of the returned List[Move] should represent the first move that the player should perform from the starting position.

lazy val solution: List[Move] = ???

lazy val solution: List[Move] = pathsToGoal.headOption match {
    case None => List.empty
    case Some(pair) => (pair._2).reverse
  }

**The problem is solved using BFS approach.

Code check on git.

References (and special thanks):
[1]. codatlas
[2]. Scala partial functions
[3]. My friend's homework

Saturday, October 10, 2015

Scala note 5: Sentence Anagrams

Finally I made it work! It takes much longer than I thought and I am too exhausted to explain everything. The original problem set can be found here.

My friend said overall the class is easy. Not to me.

package forcomp

import common._

object Anagrams {

  /** A word is simply a `String`. */
  type Word = String

  /** A sentence is a `List` of words. */
  type Sentence = List[Word]

  /** `Occurrences` is a `List` of pairs of characters and positive integers saying
   *  how often the character appears.
   *  This list is sorted alphabetically w.r.t. to the character in each pair.
   *  All characters in the occurrence list are lowercase.
   *  
   *  Any list of pairs of lowercase characters and their frequency which is not sorted
   *  is **not** an occurrence list.
   *  
   *  Note: If the frequency of some character is zero, then that character should not be
   *  in the list.
   */
  type Occurrences = List[(Char, Int)]

  /** The dictionary is simply a sequence of words.
   *  It is predefined and obtained as a sequence using the utility method `loadDictionary`.
   */
  val dictionary: List[Word] = loadDictionary

  /** Converts the word into its character occurence list.
   *  
   *  Note: the uppercase and lowercase version of the character are treated as the
   *  same character, and are represented as a lowercase character in the occurrence list.
   */
  def wordOccurrences(w: Word): Occurrences = 
    w.toLowerCase.groupBy(c => c).mapValues(c => c.length).toList.sorted
    //w.toLowerCase.groupBy(c => c).toList.map(c => (c._1, c._2.length)).sorted

  /** Converts a sentence into its character occurrence list. */
  def sentenceOccurrences(s: Sentence): Occurrences = 
    wordOccurrences((s foldLeft "")(_+_))
    //s.flatten.mkString

  /** The `dictionaryByOccurrences` is a `Map` from different occurrences to a sequence of all
   *  the words that have that occurrence count.
   *  This map serves as an easy way to obtain all the anagrams of a word given its occurrence list.
   *  
   *  For example, the word "eat" has the following character occurrence list:
   *
   *     `List(('a', 1), ('e', 1), ('t', 1))`
   *
   *  Incidentally, so do the words "ate" and "tea".
   *
   *  This means that the `dictionaryByOccurrences` map will contain an entry:
   *
   *    List(('a', 1), ('e', 1), ('t', 1)) -> Seq("ate", "eat", "tea")
   *
   */
  lazy val dictionaryByOccurrences: Map[Occurrences, List[Word]] = 
    dictionary.groupBy(word => wordOccurrences(word))

  /** Returns all the anagrams of a given word. */
  def wordAnagrams(word: Word): List[Word] = 
    dictionaryByOccurrences(wordOccurrences(word))

  /** Returns the list of all subsets of the occurrence list.
   *  This includes the occurrence itself, i.e. `List(('k', 1), ('o', 1))`
   *  is a subset of `List(('k', 1), ('o', 1))`.
   *  It also include the empty subset `List()`.
   * 
   *  Example: the subsets of the occurrence list `List(('a', 2), ('b', 2))` are:
   *
   *    List(
   *      List(),
   *      List(('a', 1)),
   *      List(('a', 2)),
   *      List(('b', 1)),
   *      List(('a', 1), ('b', 1)),
   *      List(('a', 2), ('b', 1)),
   *      List(('b', 2)),
   *      List(('a', 1), ('b', 2)),
   *      List(('a', 2), ('b', 2))
   *    )
   *
   *  Note that the order of the occurrence list subsets does not matter -- the subsets
   *  in the example above could have been displayed in some other order.
   */

  def combinations(occurrences: Occurrences): List[Occurrences] = {
    val ocs: List[Occurrences] = (occurrences.map( occ => (1 to occ._2).map( o => (occ._1, o) ).toList ))
    ocs.foldLeft(List[Occurrences](Nil))( (l1, l2) =>
      l1 ::: (for (elem1 <- data-blogger-escaped---="" data-blogger-escaped--="" data-blogger-escaped-1="" data-blogger-escaped-3="" data-blogger-escaped-:::="" data-blogger-escaped-a="" data-blogger-escaped-an="" data-blogger-escaped-and="" data-blogger-escaped-any="" data-blogger-escaped-appear="" data-blogger-escaped-appearing="" data-blogger-escaped-be="" data-blogger-escaped-cannot="" data-blogger-escaped-character="" data-blogger-escaped-diff="" data-blogger-escaped-elem1="" data-blogger-escaped-elem2="" data-blogger-escaped-equal="" data-blogger-escaped-frequency="" data-blogger-escaped-from="" data-blogger-escaped-has="" data-blogger-escaped-in="" data-blogger-escaped-is="" data-blogger-escaped-it="" data-blogger-escaped-its="" data-blogger-escaped-l1="" data-blogger-escaped-l2="" data-blogger-escaped-list="" data-blogger-escaped-meaning="" data-blogger-escaped-must="" data-blogger-escaped-no="" data-blogger-escaped-note:="" data-blogger-escaped-occurrence="" data-blogger-escaped-of="" data-blogger-escaped-or="" data-blogger-escaped-precondition="" data-blogger-escaped-resulting="" data-blogger-escaped-smaller="" data-blogger-escaped-sorted="" data-blogger-escaped-subset="" data-blogger-escaped-subtract="" data-blogger-escaped-subtracts="" data-blogger-escaped-than="" data-blogger-escaped-that="" data-blogger-escaped-the="" data-blogger-escaped-use="" data-blogger-escaped-value="" data-blogger-escaped-x="" data-blogger-escaped-y="" data-blogger-escaped-yield="" data-blogger-escaped-zero-entries.=""> List(('a', 2))
  //x diff y => List(('a', 3))
 def subtract(x: Occurrences, y: Occurrences): Occurrences = 
    (x /: y)((newx, elemy) => 
      (for (elemx <- data-blogger-escaped--="" data-blogger-escaped-elemx._1="" data-blogger-escaped-elemx._2="" data-blogger-escaped-elemx="" data-blogger-escaped-elemy._1="" data-blogger-escaped-elemy._2="" data-blogger-escaped-else="" data-blogger-escaped-filter="" data-blogger-escaped-if="" data-blogger-escaped-newx="" data-blogger-escaped-yield=""> 0).sorted
 
  /** Returns a list of all anagram sentences of the given sentence.
   *  
   *  An anagram of a sentence is formed by taking the occurrences of all the characters of
   *  all the words in the sentence, and producing all possible combinations of words with those characters,
   *  such that the words have to be from the dictionary.
   *
   *  The number of words in the sentence and its anagrams does not have to correspond.
   *  For example, the sentence `List("I", "love", "you")` is an anagram of the sentence `List("You", "olive")`.
   *
   *  Also, two sentences with the same words but in a different order are considered two different anagrams.
   *  For example, sentences `List("You", "olive")` and `List("olive", "you")` are different anagrams of
   *  `List("I", "love", "you")`.
   *  
   *  Here is a full example of a sentence `List("Yes", "man")` and its anagrams for our dictionary:
   *
   *    List(
   *      List(en, as, my),
   *      List(en, my, as),
   *      List(man, yes),
   *      List(men, say),
   *      List(as, en, my),
   *      List(as, my, en),
   *      List(sane, my),
   *      List(Sean, my),
   *      List(my, en, as),
   *      List(my, as, en),
   *      List(my, sane),
   *      List(my, Sean),
   *      List(say, men),
   *      List(yes, man)
   *    )
   *
   *  The different sentences do not have to be output in the order shown above - any order is fine as long as
   *  all the anagrams are there. Every returned word has to exist in the dictionary.
   *  
   *  Note: in case that the words of the sentence are in the dictionary, then the sentence is the anagram of itself,
   *  so it has to be returned in this list.
   *
   *  Note: There is only one anagram of an empty sentence.
   */
  
  def sentenceAnagrams(sentence: Sentence): List[Sentence] = 
    sentenceAnagramsHelper(sentenceOccurrences(sentence))

   def sentenceAnagramsHelper(occurrences: Occurrences): List[Sentence] =  occurrences match {
    case Nil => List(Nil)
    case occurrences => {
      val combs = combinations(occurrences)
      //Set(elem) return a boolean value true if elem exists and false if doesn't
      for {i <- combs if dictionarybyoccurrences.keyset (i)
           j <- dictionarybyoccurrences (i) 
           s <- sentenceanagramshelper (subtract (occurrences, i)))
           } yield (j :: s)

Code is also on git.

References (and special thanks):
[1]. Codatlas;
[2]. chancila (gitHub).

Sunday, October 4, 2015

Scala note 4: Huffman Coding

This post is based on Coursera's Scala course homework for week 4 and week 5. The whole problem can be found here.

Some notes:

Case classes: they are regular classes which export their constructor parameters and which provide a recursive decomposition mechanism via pattern matching.
Pattern matching: allows to match on any sort of data with a first-match policy.

List.groupBy: Partitions this traversable collection into a map of traversable collections according to some function.

/: or folderLeft: applying a binary operator to a start value and all elements of this sequence, going left to right. z /: xs is the same as xs foldLeft z

Map.getOrElse(default value): Returns the value associated with a key, or a default value if the key is not contained in the map.

List.sortBy: sorts the sequence according to the Ordering which results from transforming an implicitly given Ordering with a transform function.

List.flatMap: Builds a new collection by applying a function to all elements of this list and flatten the result elements to a list.

List.Map: Builds a new collection by applying a function to all elements of this list.

See here for the difference between flatMap and Map.

Documentation on List can be found here.

object Huffman {

  /**
   * A huffman code is represented by a binary tree.
   *
   * Every `Leaf` node of the tree represents one character of the alphabet that the tree can encode.
   * The weight of a `Leaf` is the frequency of appearance of the character.
   *
   * The branches of the huffman tree, the `Fork` nodes, represent a set containing all the characters
   * present in the leaves below it. The weight of a `Fork` node is the sum of the weights of these
   * leaves.
   */
  abstract class CodeTree
  case class Fork(left: CodeTree, right: CodeTree, chars: List[Char], weight: Int) extends CodeTree
  case class Leaf(char: Char, weight: Int) extends CodeTree



  // Part 1: Basics

  def weight(tree: CodeTree): Int = tree match {
    case Fork(left, right, chars, w) =>  w
    case Leaf(char, weight) => weight
  }
  def chars(tree: CodeTree): List[Char] = tree match {
    case Fork(left, right, chars, weight) => chars
    case Leaf(char, weight) => List(char)
  }

  def makeCodeTree(left: CodeTree, right: CodeTree) =
    Fork(left, right, chars(left) ::: chars(right), weight(left) + weight(right))



  // Part 2: Generating Huffman trees

  /**
   * In this assignment, we are working with lists of characters. This function allows
   * you to easily create a character list from a given string.
   */
  def string2Chars(str: String): List[Char] = str.toList

  /**
   * This function computes for each unique character in the list `chars` the number of
   * times it occurs. For example, the invocation
   *
   *   times(List('a', 'b', 'a'))
   *
   * should return the following (the order of the resulting list is not important):
   *
   *   List(('a', 2), ('b', 1))
   *
   * The type `List[(Char, Int)]` denotes a list of pairs, where each pair consists of a
   * character and an integer. Pairs can be constructed easily using parentheses:
   *
   *   val pair: (Char, Int) = ('c', 1)
   *
   * In order to access the two elements of a pair, you can use the accessors `_1` and `_2`:
   *
   *   val theChar = pair._1
   *   val theInt  = pair._2
   *
   * Another way to deconstruct a pair is using pattern matching:
   *
   *   pair match {
   *     case (theChar, theInt) =>
   *       println("character is: "+ theChar)
   *       println("integer is  : "+ theInt)
   *   }
   */
    def times(chars: List[Char]): List[(Char, Int)] = {
    /*groupBy: Partitions the traversable collection into a map of traversable collections
     * according to some function
     * chars.groupBy(x => x) : return a map of (char, List(char, char....))
     */ 
    chars.groupBy(x => x).map(t => (t._1, t._2.length)).iterator.toList
    }
  /**
   * using map
   */
  def times2(chars: List[Char]): List[(Char, Int)] = {
    def iterate(map: Map[Char, Int], c: Char) = {
      val count = (map get c).getOrElse(0) + 1
      map + ((c, count))
    }
    // /: alternative of chars foldLeft Map[Char, Int]()
    (Map[Char, Int]() /: chars)(iterate).iterator.toList
  }
  


  /**
   * Returns a list of `Leaf` nodes for a given frequency table `freqs`.
   *
   * The returned list should be ordered by ascending weights (i.e. the
   * head of the list should have the smallest weight), where the weight
   * of a leaf is the frequency of the character.
   */
  def makeOrderedLeafList(freqs: List[(Char, Int)]): List[Leaf] = 
    freqs.sortBy(t => (t._2, t._1)).map(leaf => Leaf(leaf._1,leaf._2 ))
    

  /**
   * Checks whether the list `trees` contains only one single code tree.
   */
  def singleton(trees: List[CodeTree]): Boolean = 
    if(trees.length == 1) true else false

  /**
   * The parameter `trees` of this function is a list of code trees ordered
   * by ascending weights.
   *
   * This function takes the first two elements of the list `trees` and combines
   * them into a single `Fork` node. This node is then added back into the
   * remaining elements of `trees` at a position such that the ordering by weights
   * is preserved.
   *
   * If `trees` is a list of less than two elements, that list should be returned
   * unchanged.
   */
  def combine(trees: List[CodeTree]): List[CodeTree] = trees match {
    case left :: right :: rest => (makeCodeTree(left, right) :: rest)
      .sortBy(t => weight(t))
    case _ => trees
    }

  /**
   * This function will be called in the following way:
   *
   *   until(singleton, combine)(trees)
   *
   * where `trees` is of type `List[CodeTree]`, `singleton` and `combine` refer to
   * the two functions defined above.
   *
   * In such an invocation, `until` should call the two functions until the list of
   * code trees contains only one single tree, and then return that singleton list.
   *
   * Hint: before writing the implementation,
   *  - start by defining the parameter types such that the above example invocation
   *    is valid. The parameter types of `until` should match the argument types of
   *    the example invocation. Also define the return type of the `until` function.
   *  - try to find sensible parameter names for `xxx`, `yyy` and `zzz`.
   */
  def until(singleton: List[CodeTree] => Boolean, combine: List[CodeTree] => List[CodeTree])
  (trees: List[CodeTree]): List[CodeTree] = {
    if(singleton(trees)) trees
    else until(singleton, combine)(combine(trees))
  }

  /**
   * This function creates a code tree which is optimal to encode the text `chars`.
   *
   * The parameter `chars` is an arbitrary text. This function extracts the character
   * frequencies from that text and creates a code tree based on them.
   */
  def createCodeTree(chars: List[Char]): CodeTree = 
    until(singleton, combine)(makeOrderedLeafList(times(chars))).head
 



  // Part 3: Decoding

  type Bit = Int

  /**
   * This function decodes the bit sequence `bits` using the code tree `tree` and returns
   * the resulting list of characters.
   */
  def decode(tree: CodeTree, bits: List[Bit]): List[Char] = tree match {
    case Leaf (c, _) => if (bits.isEmpty) List(c) else  c :: decode(tree, bits)
    case Fork(left, right, _, _) => if (bits.head == 0) decode(left, bits.tail)
                                                     else decode(right, bits.tail)
  }

  /**
   * A Huffman coding tree for the French language.
   * Generated from the data given at
   *   http://fr.wikipedia.org/wiki/Fr%C3%A9quence_d%27apparition_des_lettres_en_fran%C3%A7ais
   */
  val frenchCode: CodeTree = Fork(Fork(Fork(Leaf('s',121895),Fork(Leaf('d',56269),Fork(Fork(Fork(Leaf('x',5928),Leaf('j',8351),List('x','j'),14279),Leaf('f',16351),List('x','j','f'),30630),Fork(Fork(Fork(Fork(Leaf('z',2093),Fork(Leaf('k',745),Leaf('w',1747),List('k','w'),2492),List('z','k','w'),4585),Leaf('y',4725),List('z','k','w','y'),9310),Leaf('h',11298),List('z','k','w','y','h'),20608),Leaf('q',20889),List('z','k','w','y','h','q'),41497),List('x','j','f','z','k','w','y','h','q'),72127),List('d','x','j','f','z','k','w','y','h','q'),128396),List('s','d','x','j','f','z','k','w','y','h','q'),250291),Fork(Fork(Leaf('o',82762),Leaf('l',83668),List('o','l'),166430),Fork(Fork(Leaf('m',45521),Leaf('p',46335),List('m','p'),91856),Leaf('u',96785),List('m','p','u'),188641),List('o','l','m','p','u'),355071),List('s','d','x','j','f','z','k','w','y','h','q','o','l','m','p','u'),605362),Fork(Fork(Fork(Leaf('r',100500),Fork(Leaf('c',50003),Fork(Leaf('v',24975),Fork(Leaf('g',13288),Leaf('b',13822),List('g','b'),27110),List('v','g','b'),52085),List('c','v','g','b'),102088),List('r','c','v','g','b'),202588),Fork(Leaf('n',108812),Leaf('t',111103),List('n','t'),219915),List('r','c','v','g','b','n','t'),422503),Fork(Leaf('e',225947),Fork(Leaf('i',115465),Leaf('a',117110),List('i','a'),232575),List('e','i','a'),458522),List('r','c','v','g','b','n','t','e','i','a'),881025),List('s','d','x','j','f','z','k','w','y','h','q','o','l','m','p','u','r','c','v','g','b','n','t','e','i','a'),1486387)

  /**
   * What does the secret message say? Can you decode it?
   * For the decoding use the `frenchCode' Huffman tree defined above.
   */
  val secret: List[Bit] = List(0,0,1,1,1,0,1,0,1,1,1,0,0,1,1,0,1,0,0,1,1,0,1,0,1,1,0,0,1,1,1,1,1,0,1,0,1,1,0,0,0,0,1,0,1,1,1,0,0,1,0,0,1,0,0,0,1,0,0,0,1,0,1)

  /**
   * Write a function that returns the decoded secret
   */
  def decodedSecret: List[Char] = 
    decode(frenchCode, secret)



  // Part 4a: Encoding using Huffman tree

  /**
   * This function encodes `text` using the code tree `tree`
   * into a sequence of bits.
   */
  def encode(tree: CodeTree)(text: List[Char]): List[Bit] = {
    def encodeChar(tree: CodeTree)(char: Char): List[Bit] = tree match {
      case Leaf(_, _) => List()
      case Fork(left, right, _, _) => if (chars(left).contains(text.head)) 0 :: encodeChar(left)(char)
                                                       else 1 :: encodeChar(right)(char)
    }
    text flatMap(encodeChar(tree))
  }


  // Part 4b: Encoding using code table

  type CodeTable = List[(Char, List[Bit])]

  /**
   * This function returns the bit sequence that represents the character `char` in
   * the code table `table`.
   */
  def codeBits(table: CodeTable)(char: Char): List[Bit] = 
    table(table.indexWhere(x => x._1 == char))._2

  /**
   * Given a code tree, create a code table which contains, for every character in the
   * code tree, the sequence of bits representing that character.
   *
   * Hint: think of a recursive solution: every sub-tree of the code tree `tree` is itself
   * a valid code tree that can be represented as a code table. Using the code tables of the
   * sub-trees, think of how to build the code table for the entire tree.
   */
  def convert(tree: CodeTree): CodeTable = tree match {
    case Leaf(char, _) => List((char, List()))
    case Fork(left, right, chars, _) => mergeCodeTables(convert(left), convert(right))
  }

  /**
   * This function takes two code tables and merges them into one. Depending on how you
   * use it in the `convert` method above, this merge method might also do some transformations
   * on the two parameter code tables.
   */
  def mergeCodeTables(a: CodeTable, b: CodeTable): CodeTable = 
     a.map(code => (code._1, 0:: code._2)) ::: b.map(code => (code._1, 1 :: code._2))

  /**
   * This function encodes `text` according to the code tree `tree`.
   *
   * To speed up the encoding process, it first converts the code tree to a code table
   * and then uses it to perform the actual encoding.
   */
  def quickEncode(tree: CodeTree)(text: List[Char]): List[Bit] = 
    text flatMap (codeBits(convert(tree)))
}

Check code on git: Hoffman.scala.

People help the people

The video has nothing to do with this post. It's my all time love. I am listening to it when I write this blog. "If I had a brain, I'd be cold as a stone and rich as a fool, that turned all those good hearts away."

Saturday, September 19, 2015

Scala note 3: OOP (TweetSet)

Week 3 class focuses on classes and objects in Scala. The OOP concepts are similar to that in Java. However, Scala uses traits instead of interface in Java to serve similar functions.

One feature in Scala that I have never met in Java is lazy val. The difference between val and lazy val is that the later will only be executed at the first time it is accessed. Lazy val is executed once and only once. This means next time the lazy val is called, it will return the same value. Details can be found here and here.

Moreover, def defines a function and every time the function is called, it creates new function. val defines a value, it returns a value that is evaluated from a function. See this post for more details.

The homework asks us to implement an abstract class called TweetSet. It should be implemented as a binary tree structure and contains two concrete subclass Empty and NonEmpty. I am not going to put the whole assignment here due to the length, but you can always go here for the problem.

package objsets

import common._
import TweetReader._

/**
 * A class to represent tweets.
 */
class Tweet(val user: String, val text: String, val retweets: Int) {
  def moreRetweets(that: Tweet) : Tweet =
    if(this.retweets >= that.retweets) this
    else that
  override def toString: String =
    "User: " + user + "\n" +
    "Text: " + text + " [" + retweets + "]"
}

/**
 * This represents a set of objects of type `Tweet` in the form of a binary search
 * tree. Every branch in the tree has two children (two `TweetSet`s). There is an
 * invariant which always holds: for every branch `b`, all elements in the left
 * subtree are smaller than the tweet at `b`. The elements in the right subtree are
 * larger.
 *
 * Note that the above structure requires us to be able to compare two tweets (we
 * need to be able to say which of two tweets is larger, or if they are equal). In
 * this implementation, the equality / order of tweets is based on the tweet's text
 * (see `def incl`). Hence, a `TweetSet` could not contain two tweets with the same
 * text from different users.
 *
 *
 * The advantage of representing sets as binary search trees is that the elements
 * of the set can be found quickly. If you want to learn more you can take a look
 * at the Wikipedia page [1], but this is not necessary in order to solve this
 * assignment.
 *
 * [1] http://en.wikipedia.org/wiki/Binary_search_tree
 */
abstract class TweetSet {

  /**
   * This method takes a predicate and returns a subset of all the elements
   * in the original set for which the predicate is true.
   *
   * Question: Can we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
  def filter(p: Tweet => Boolean): TweetSet = filterAcc(p, new Empty)

  /**
   * This is a helper method for `filter` that propagetes the accumulated tweets.
   */
  def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet

  /**
   * Returns a new `TweetSet` that is the union of `TweetSet`s `this` and `that`.
   *
   * Question: Should we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
   def union(that: TweetSet): TweetSet 

  /**
   * Returns the tweet from this set which has the greatest retweet count.
   *
   * Calling `mostRetweeted` on an empty set should throw an exception of
   * type `java.util.NoSuchElementException`.
   *
   * Question: Should we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
  def mostRetweeted: Tweet

  /**
   * Returns a list containing all tweets of this set, sorted by retweet count
   * in descending order. In other words, the head of the resulting list should
   * have the highest retweet count.
   *
   * Hint: the method `remove` on TweetSet will be very useful.
   * Question: Should we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
  def descendingByRetweet: TweetList


  /**
   * The following methods are already implemented
   */

  /**
   * Returns a new `TweetSet` which contains all elements of this set, and the
   * the new element `tweet` in case it does not already exist in this set.
   *
   * If `this.contains(tweet)`, the current set is returned.
   */
  def incl(tweet: Tweet): TweetSet

  /**
   * Returns a new `TweetSet` which excludes `tweet`.
   */
  def remove(tweet: Tweet): TweetSet

  /**
   * Tests if `tweet` exists in this `TweetSet`.
   */
  def contains(tweet: Tweet): Boolean

  /**
   * This method takes a function and applies it to every element in the set.
   */
  def foreach(f: Tweet => Unit): Unit
  
  /**
   * if the set is empty
   */
  def isEmpty: Boolean
}

class Empty extends TweetSet {

  def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = acc
  
  def union(that: TweetSet): TweetSet = that
  
  def mostRetweeted: Tweet = throw new NoSuchElementException("Empty TweetSet")
 
  def isEmpty: Boolean = true
  
  def descendingByRetweet: TweetList = Nil

  /**
   * The following methods are already implemented
   */

  def contains(tweet: Tweet): Boolean = false

  def incl(tweet: Tweet): TweetSet = new NonEmpty(tweet, new Empty, new Empty)

  def remove(tweet: Tweet): TweetSet = this

  def foreach(f: Tweet => Unit): Unit = ()
}

class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet {

  def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = {
    if (p(this.elem)) 
      left.filterAcc(p, right.filterAcc(p, acc.incl(this.elem)))
    else
     left.filterAcc(p, right.filterAcc(p, acc))  
  }
  def union(that: TweetSet): TweetSet =
    (left.union(right)).union(that).incl(elem)
  
  def isEmpty: Boolean = false
   
  def mostRetweeted: Tweet = 
    if (left.isEmpty && right.isEmpty) elem
    else if (left.isEmpty) elem.moreRetweets(right.mostRetweeted)
    else if (right.isEmpty) elem.moreRetweets(left.mostRetweeted)
    else elem.moreRetweets(left.mostRetweeted).moreRetweets(right.mostRetweeted)
    
 def descendingByRetweet: TweetList = 
   new Cons(mostRetweeted, remove(mostRetweeted).descendingByRetweet)

  /**
   * The following methods are already implemented
   */

  def contains(x: Tweet): Boolean =
    if (x.text < elem.text) left.contains(x)
    else if (elem.text < x.text) right.contains(x)
    else true

  def incl(x: Tweet): TweetSet = {
    if (x.text < elem.text) new NonEmpty(elem, left.incl(x), right)
    else if (elem.text < x.text) new NonEmpty(elem, left, right.incl(x))
    else this
  }

  def remove(tw: Tweet): TweetSet =
    if (tw.text < elem.text) new NonEmpty(elem, left.remove(tw), right)
    else if (elem.text < tw.text) new NonEmpty(elem, left, right.remove(tw))
    else left.union(right)

  def foreach(f: Tweet => Unit): Unit = {
    f(elem)
    left.foreach(f)
    right.foreach(f)
  }
}

trait TweetList {
  def head: Tweet
  def tail: TweetList
  def isEmpty: Boolean
  def foreach(f: Tweet => Unit): Unit =
    if (!isEmpty) {
      f(head)
      tail.foreach(f)
    }
}

object Nil extends TweetList {
  def head = throw new java.util.NoSuchElementException("head of EmptyList")
  def tail = throw new java.util.NoSuchElementException("tail of EmptyList")
  def isEmpty = true
}

class Cons(val head: Tweet, val tail: TweetList) extends TweetList {
  def isEmpty = false
}


object GoogleVsApple {
  val google = List("android", "Android", "galaxy", "Galaxy", "nexus", "Nexus")
  val apple = List("ios", "iOS", "iphone", "iPhone", "ipad", "iPad")

  //TweetReader.allTweets foreach println
  lazy val googleTweets: TweetSet = TweetReader.allTweets.filter(tweet => google.exists(keyWord => tweet.text.contains(keyWord)))
  lazy val appleTweets: TweetSet = TweetReader.allTweets.filter(tweet => apple.exists(keyWord => tweet.text.contains(keyWord)))


  /**
   * A list of all tweets mentioning a keyword from either apple or google,
   * sorted by the number of retweets.
   */
  lazy val trending: TweetList = googleTweets.union(appleTweets).descendingByRetweet
}

object Main extends App {
  // Print the trending tweets
 GoogleVsApple.trending foreach println
}

I only include the part that we need to implement here. Due to the functional nature, it is fairly easy to understand. However, I do realize that due to intensive recursion, the program is slow, like really slow if we test for all tweets included in the test program. Or probably just my bad implementation?

The whole assignment can be found on git.

Monday, September 7, 2015

Scala note 2: Functional Sets

Mathematically, we call the function which takes an integer as argument and which returns a boolean indicating whether the given integer belongs to a set, the characteristic function of the set. For example, we can characterize the set of negative integers by the characteristic function (x: Int) => x < 0.

Therefore, we choose to represent a set by its characterisitc function and define a type alias for this representation:

type Set = Int => Boolean

Using this representation, we define a function that tests for the presence of a value in a set:

def contains(s: Set, elem: Int): Boolean = s(elem)

2.1 Basic Functions on Sets

Let’s start by implementing basic functions on sets.

Define a function which creates a singleton set from one integer value: the set represents the set of the one given element. Its signature is as follows:
```
def singletonSet(elem: Int): Set
```
Now that we have a way to create singleton sets, we want to define a function that allow us to build bigger sets from smaller ones.
Define the functions union, intersect, and diff, which takes two sets, and return, respectively, their union, intersection and differences.diff(s, t) returns a set which contains all the elements of the set s that are not in the set t. These functions have the following signatures:
```
def union(s: Set, t: Set): Set
def intersect(s: Set, t: Set): Set
def diff(s: Set, t: Set): Set
```
Define the function filter which selects only the elements of a set that are accepted by a given predicate p. The filtered elements are returned as a new set. The signature of filter is as follows:
```
def filter(s: Set, p: Int => Boolean): Set
```

2.2 Queries and Transformations on Sets

In this part, we are interested in functions used to make requests on elements of a set. The first function tests whether a given predicate is true for all elements of the set. This forall function has the following signature:

def forall(s: Set, p: Int => Boolean): Boolean

Note that there is no direct way to find which elements are in a set. contains only allows to know whether a given element is included. Thus, if we wish to do something to all elements of a set, then we have to iterate over all integers, testing each time whether it is included in the set, and if so, to do something with it. Here, we consider that an integer x has the property -1000 <= x <= 1000 in order to limit the search space.

Implement forall using linear recursion. For this, use a helper function nested in forall. Its structure is as follows (replace the ???):

def forall(s: Set, p: Int => Boolean): Boolean = {
 def iter(a: Int): Boolean = {
   if (???) ???
   else if (???) ???
   else iter(???)
 }
 iter(???)

}

Using forall, implement a function exists which tests whether a set contains at least one element for which the given predicate is true. Note that the functions forall and exists behave like the universal and existential quantifiers of first-order logic.
```
def exists(s: Set, p: Int => Boolean): Boolean
```
Finally, write a function map which transforms a given set into another one by applying to each of its elements the given function. map has the following signature:
```
def map(s: Set, f: Int => Int): Set
```

Coursera Scala course week 2 homework. The original question can be found here.

The question asks as two define an object Set, which has a characteristic function that projects an integer to a boolean.

1. define Singleton
This question requires us to create a set with one integer, i.e., given an integer, projects it to a boolean to claim the set contains the integer:

def singletonSet(elem: Int): Set = (x: Int) => x == elem

This means projects an integer x to if x equals given parameter elem. It can also be considered as given any integer, if x equals elem, then set contains x, which defines the singleton set that it only contains elem.

2. define Union, Intersect and Difference

  def union(s: Set, t: Set): Set = (x: Int) => s(x) || t(x)
  def intersect(s: Set, t: Set): Set = (x: Int) => s(x) && t(x)
  def diff(s: Set, t: Set): Set = (x: Int) => s(x) && !t(x)

s(x) indicates contains. Thus, for union, it means either for any x, if it satisfies the condition that either s contains x or t contains x, then x is in union. For intersect, the condition becomes both s and t should contains x. For difference, it means s should contain x but t should not.

3. define filter

def filter(s: Set, p: Int => Boolean): Set = (x: Int) => s(x) && p(x)

filter is a set that for any x in filter, it should satisfies that s contains x and x satisfies predicate p.

4. forall (∀ )

val bound = 1000

  def forall(s: Set, p: Int => Boolean): Boolean = {
    def iter(a: Int): Boolean = {
      if (a > bound) true
      else if (s(a) && !p(a)) false
      else iter(a+1)
    }
    iter(-bound)
  }

This is nothing special. Starts from -bound, if any a in s doesn't satisfy p, then return false. Iterate until bound, then return true.

5. exists(∃ )

def exists(s: Set, p: Int => Boolean): Boolean = !forall(s, x => !p(x))

This one is tricky. The solution means not all elements in s satisfies !p, which in turn indicates there is at least one element in s satisfies p.

6. map

def map(s: Set, f: Int => Int): Set = (y: Int) => exists(s, x => f(x) == y)

This is similar as how we define a singleton set: For any y, if there exists an element x in s that satisfies the condition f(x) equals y, then y is in new Set map.

Source on git.

The Fake Geek's blog

AdSense