AdSense

Saturday, September 19, 2015

Scala note 3: OOP (TweetSet)

Week 3 class focuses on classes and objects in Scala. The OOP concepts are similar to that in Java. However, Scala uses traits instead of interface in Java to serve similar functions.

One feature in Scala that I have never met in Java is lazy val. The difference between val and lazy val is that the later will only be executed at the first time it is accessed. Lazy val is executed once and only once. This means next time the lazy val is called, it will return the same value. Details can be found here and here.

Moreover, def defines a function and every time the function is called, it creates new function. val defines a value, it returns a value that is evaluated from a function. See this post for more details.

The homework asks us to implement an abstract class called TweetSet. It should be implemented as a binary tree structure and contains two concrete subclass Empty and NonEmpty. I am not going to put the whole assignment here due to the length, but you can always go here for the problem.

package objsets

import common._
import TweetReader._

/**
 * A class to represent tweets.
 */
class Tweet(val user: String, val text: String, val retweets: Int) {
  def moreRetweets(that: Tweet) : Tweet =
    if(this.retweets >= that.retweets) this
    else that
  override def toString: String =
    "User: " + user + "\n" +
    "Text: " + text + " [" + retweets + "]"
}

/**
 * This represents a set of objects of type `Tweet` in the form of a binary search
 * tree. Every branch in the tree has two children (two `TweetSet`s). There is an
 * invariant which always holds: for every branch `b`, all elements in the left
 * subtree are smaller than the tweet at `b`. The elements in the right subtree are
 * larger.
 *
 * Note that the above structure requires us to be able to compare two tweets (we
 * need to be able to say which of two tweets is larger, or if they are equal). In
 * this implementation, the equality / order of tweets is based on the tweet's text
 * (see `def incl`). Hence, a `TweetSet` could not contain two tweets with the same
 * text from different users.
 *
 *
 * The advantage of representing sets as binary search trees is that the elements
 * of the set can be found quickly. If you want to learn more you can take a look
 * at the Wikipedia page [1], but this is not necessary in order to solve this
 * assignment.
 *
 * [1] http://en.wikipedia.org/wiki/Binary_search_tree
 */
abstract class TweetSet {

  /**
   * This method takes a predicate and returns a subset of all the elements
   * in the original set for which the predicate is true.
   *
   * Question: Can we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
  def filter(p: Tweet => Boolean): TweetSet = filterAcc(p, new Empty)

  /**
   * This is a helper method for `filter` that propagetes the accumulated tweets.
   */
  def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet

  /**
   * Returns a new `TweetSet` that is the union of `TweetSet`s `this` and `that`.
   *
   * Question: Should we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
   def union(that: TweetSet): TweetSet 

  /**
   * Returns the tweet from this set which has the greatest retweet count.
   *
   * Calling `mostRetweeted` on an empty set should throw an exception of
   * type `java.util.NoSuchElementException`.
   *
   * Question: Should we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
  def mostRetweeted: Tweet

  /**
   * Returns a list containing all tweets of this set, sorted by retweet count
   * in descending order. In other words, the head of the resulting list should
   * have the highest retweet count.
   *
   * Hint: the method `remove` on TweetSet will be very useful.
   * Question: Should we implment this method here, or should it remain abstract
   * and be implemented in the subclasses?
   */
  def descendingByRetweet: TweetList


  /**
   * The following methods are already implemented
   */

  /**
   * Returns a new `TweetSet` which contains all elements of this set, and the
   * the new element `tweet` in case it does not already exist in this set.
   *
   * If `this.contains(tweet)`, the current set is returned.
   */
  def incl(tweet: Tweet): TweetSet

  /**
   * Returns a new `TweetSet` which excludes `tweet`.
   */
  def remove(tweet: Tweet): TweetSet

  /**
   * Tests if `tweet` exists in this `TweetSet`.
   */
  def contains(tweet: Tweet): Boolean

  /**
   * This method takes a function and applies it to every element in the set.
   */
  def foreach(f: Tweet => Unit): Unit
  
  /**
   * if the set is empty
   */
  def isEmpty: Boolean
}

class Empty extends TweetSet {

  def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = acc
  
  def union(that: TweetSet): TweetSet = that
  
  def mostRetweeted: Tweet = throw new NoSuchElementException("Empty TweetSet")
 
  def isEmpty: Boolean = true
  
  def descendingByRetweet: TweetList = Nil

  /**
   * The following methods are already implemented
   */

  def contains(tweet: Tweet): Boolean = false

  def incl(tweet: Tweet): TweetSet = new NonEmpty(tweet, new Empty, new Empty)

  def remove(tweet: Tweet): TweetSet = this

  def foreach(f: Tweet => Unit): Unit = ()
}

class NonEmpty(elem: Tweet, left: TweetSet, right: TweetSet) extends TweetSet {

  def filterAcc(p: Tweet => Boolean, acc: TweetSet): TweetSet = {
    if (p(this.elem)) 
      left.filterAcc(p, right.filterAcc(p, acc.incl(this.elem)))
    else
     left.filterAcc(p, right.filterAcc(p, acc))  
  }
  def union(that: TweetSet): TweetSet =
    (left.union(right)).union(that).incl(elem)
  
  def isEmpty: Boolean = false
   
  def mostRetweeted: Tweet = 
    if (left.isEmpty && right.isEmpty) elem
    else if (left.isEmpty) elem.moreRetweets(right.mostRetweeted)
    else if (right.isEmpty) elem.moreRetweets(left.mostRetweeted)
    else elem.moreRetweets(left.mostRetweeted).moreRetweets(right.mostRetweeted)
    
 def descendingByRetweet: TweetList = 
   new Cons(mostRetweeted, remove(mostRetweeted).descendingByRetweet)

  /**
   * The following methods are already implemented
   */

  def contains(x: Tweet): Boolean =
    if (x.text < elem.text) left.contains(x)
    else if (elem.text < x.text) right.contains(x)
    else true

  def incl(x: Tweet): TweetSet = {
    if (x.text < elem.text) new NonEmpty(elem, left.incl(x), right)
    else if (elem.text < x.text) new NonEmpty(elem, left, right.incl(x))
    else this
  }

  def remove(tw: Tweet): TweetSet =
    if (tw.text < elem.text) new NonEmpty(elem, left.remove(tw), right)
    else if (elem.text < tw.text) new NonEmpty(elem, left, right.remove(tw))
    else left.union(right)

  def foreach(f: Tweet => Unit): Unit = {
    f(elem)
    left.foreach(f)
    right.foreach(f)
  }
}

trait TweetList {
  def head: Tweet
  def tail: TweetList
  def isEmpty: Boolean
  def foreach(f: Tweet => Unit): Unit =
    if (!isEmpty) {
      f(head)
      tail.foreach(f)
    }
}

object Nil extends TweetList {
  def head = throw new java.util.NoSuchElementException("head of EmptyList")
  def tail = throw new java.util.NoSuchElementException("tail of EmptyList")
  def isEmpty = true
}

class Cons(val head: Tweet, val tail: TweetList) extends TweetList {
  def isEmpty = false
}


object GoogleVsApple {
  val google = List("android", "Android", "galaxy", "Galaxy", "nexus", "Nexus")
  val apple = List("ios", "iOS", "iphone", "iPhone", "ipad", "iPad")

  //TweetReader.allTweets foreach println
  lazy val googleTweets: TweetSet = TweetReader.allTweets.filter(tweet => google.exists(keyWord => tweet.text.contains(keyWord)))
  lazy val appleTweets: TweetSet = TweetReader.allTweets.filter(tweet => apple.exists(keyWord => tweet.text.contains(keyWord)))


  /**
   * A list of all tweets mentioning a keyword from either apple or google,
   * sorted by the number of retweets.
   */
  lazy val trending: TweetList = googleTweets.union(appleTweets).descendingByRetweet
}

object Main extends App {
  // Print the trending tweets
 GoogleVsApple.trending foreach println
}

I only include the part that we need to implement here. Due to the functional nature, it is fairly easy to understand. However, I do realize that due to intensive recursion, the program is slow, like really slow if we test for all tweets included in the test program. Or probably just my bad implementation?

The whole assignment can be found on git.

4 comments:

  1. Changing the order of the recursion in the union function definition has a deep impact on performance (complexity goes from O(n^2 * log(n)^2) to O(n * log(n))). Try:

    def union(other: IntSet): IntSet = (left union (right union other)) incl elem

    The same computation finishes much faster.

    Filipe Posteral

    ReplyDelete
    Replies
    1. I've tested your solution and this is much faster but could you give a more detailed explanation on the complexity ?
      Thx

      Delete
  2. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete
  3. Hey, thanks for the blog article.Really looking forward to read more. Cool.
    hadoop training
    hadoop online training

    ReplyDelete