The Fake Geek's blog: April 2015

Wednesday, April 29, 2015

One headlight

I have this evil idea to reorganize the lyrics of this song I am recently addicted. Honestly, I don't even know why I love this song, probably because of the current mess (thesis, defense, moving...) I am dealing with. I used my old Markov Chain code and it's indeed a beautiful song: no matter how I try to destroy it, it still means something.

As in the original lyric:
"This place is always such a mess
Sometimes I think I'd like to watch it burn"

Here is my favorite one:
"same

But me and Cinderella

We can drive it all together

We put it all together

We put it home With one headlight

Well it home With one

headlight Well it home With one headlight

She ran until she's out of cheap wine cigarettes

This place is forever

There's got to be something better than In the middle But me

Hey come on try a little

Nothing is always seemed such a little

Nothing is old

It feels just like a little Nothing is dead

We'll run until

there's got to be an opening

Somewhere here in front of cheap wine cigarettes This place

Hey , Hey come on try a little

Nothing is always such a little

Nothing is forever

There's got to be an opening

Somewhere here in between the end

it's cold

It feels just her window ledge

Hey , I'd like to watch it

must be something better than In the county line ."

It's hard to be in spotlight

It's also hard to be a wallflower.

Friday, April 17, 2015

Silicon Valley S2E1: What does Erlich Bachman's binary T-shirt mean?

I saw this Quora question today. It's interesting. Of course as shown in the answer, we can use a binaryToText converter, but what's the fun part if I don't write my own. I am learning Python, so here is the Python and Java version:

Python:

import math
def binaryToText(binary):
 l = len(binary)
 c = 0
 i = 0
 for ch in binary:
  c += math.pow(2, l - i - 1) * (ord(ch) - ord('0'))
  i += 1
 return chr(int(c))

def main():
 Tshirt = ['01000010','01101001','01110100','01100011','01101111','01101001','01101110']
 for t in Tshirt:
  print(binaryToText(t), end = '')
 print('\n')

main()

Java:

public class BinaryToText {
 public static char binaryToText(String binary){
  int c = 0;
  int l = binary.length();
  for(int i = 0; i < l; i++)
   c += (int)Math.pow(2, l - i - 1) * (int)(binary.charAt(i) - '0');
  System.out.println(c);
  return (char)c;
 }

 public static void main(String[] args) {
  String[] Tshirt = {"01000010",
       "01101001",
       "01110100",
       "01100011",
       "01101111",
       "01101001",
       "01101110"
  };
  for(String t : Tshirt)
   System.out.print(binaryToText(t));
  System.out.println();
 }
}

Probably I still haven't got rid of the whole Java class concept, I don't find many advantages of using Python. But, just for fun.

Wednesday, April 15, 2015

Data Structures in Python compared with Java: Sets

A set is an unordered collection with no duplicate elements. Set models the mathematical set abstraction. Both Python and Java use hashtable as the underlying data structure of set.

Creating a set
Also allows different types of element

#empty set
basket = set()
basket = {'shirley',2014}

Python also allows for creating a character set using the following method:

basket = set('abaabab')

Duplicate elements will be removed.

In Java:

Set basket = new HashSet ()

Of course only one type is allowed.
Java also provide constructor to create a hash set from another collection, if created from a collection that contains duplicate elements, duplicate ones will be removed.

List listA = new ArrayList ();
  listA.add(1);
  listA.add(2);
  listA.add(1);
  Set t = new HashSet (listA);

Union
In Python, it is the same way as you do binary operation:

>>> a = {'shirley',2015}
>>> b = {'dora', 2014}
>>> a | b
{'shirley', 'dora', 2014, 2015}

In Java, we use addAll(Collection<? extends E> c) method:

a.addAll(b);

Complements
Python:

>>> a = set('aabbccabc')
>>> b = set('abdd')
>>> a - b
{'c'}

It will return the elements in a that is not in b.

In Java, we use removeAll(Collection<? extends E> c) method:

a.remveAll(b);

Intersection
Python:

>>> a
{'a', 'b', 'c'}
>>> b
{'a', 'b', 'd'}
>>> a&b
{'a', 'b'}
>>>

Java: retainAll(Collection<? extends E> c) method:

a.retainAll(b);

XOR (?)
Actually, it is elements in a or b but not both

>>> a
{'a', 'b', 'c'}
>>> b
{'a', 'b', 'd'}
>>> a^b
{'d', 'c'}

I couldn't find any built in methods to do this in Java, however, we can always do a little bit coding to acquire the desired result:

Set a = new HashSet ();
  a.add(1);
  a.add(2);
  Set b = new HashSet ();
  b.add(2);
  b.add(6);
  Set c = new HashSet(a);
  c.removeAll(b);
  Set d = new HashSet(b);
  d.removeAll(a);
  c.addAll(d);

Sorted sets:
Java provides TreeSet data structure. It is based on red - black tree implementation (See here for implementations). In Python, I think this needs to be acquired by using lambda function(?). I am not quite familiar with the lambda function, but I will take a look later.

References:
[1]. Not-very-familiar Python doc
[2]. www.grepcode.com
[3]. Beloved Java doc
[4]. www.codatlas.com

Sunday, April 12, 2015

Data Structures in Python compared with Java: Array, List, and ArrayList

At the hardware level, most computer architectures provide a mechanism for creating and using 1-D arrays. A one-dimensional array, is composed of multiple sequential elements stored in contiguous bytes of memory. The entire contents of an array are identified by a single name. It holds a fixed number of values of a single type. The length of an array is established and fixed and the array is created.

Python doesn't have a native array data structure. but it has a more general list structure. The list can grow and shrink during execution as elements are added or removed while the size of an array cannot be changed. However, the tradeoff here is that it needs more space, which allows for quick and easy expansion as new items are added to the list.

Another difference is that while the array only allows single type, the list in Python allows multiple types:

list1 = ['shirley','2014']
list2 = ['dora','shine']
list3 = [2011, 2015]

Note that the list is also implemented using an array structure to store the items. When the list() constructor is called, an array structure is created. The array is initially created bigger than needed, leaving the capacity for future expansion. If the number of elements added to the list exceeds the capacity (which is larger than the initial size) of the list, a new array with a larger capacity is created and an array copy is followed.

In Java, the best analogy should be ArrayList. A slight difference is that the ArrayList in Java only allows single type and initialize an array with very small capacity (10) if not predefined.

Creating an empty list:

list = []
#create a list of None with size 256
list = [None]*256
list = list()

In Java, everything is instantiated with the new operator:

int[] array = new int[10];
List arraylist = new ArrayList();

Extending a list:
Python provides built in method to append one list to another:

list2.extend(list3)

In ArrayList in Java, that is:

arraylist.addAll(arraylistB);

Inserting items:
In Java, neither Array or ArrayList allows this operation:

list1.insert(2, "Jesse")

However, since the list is still array based, this operation requires shift the elements from index 2 to the right then insert "Jesse" to index 2.

Slicing:
Slicing is an operation that creates a new list consisting a contiguous subset of elements from the original list. References to the corresponding elements are copied and stored in the new list:

list4 = list1[startIndex(inclusive) : endIndex(exclusive)]

JDK 8 provides a similar method subList(int fromIndex(inclusive), int toIndex(exclusive)) (here for src):

List sublist = arrayList(1, 3);

Creating 2-D array/list:
In Python, this involves creating a list containing m lists initialized to 0:

matrix = [[] for x in range(rows of the matrix)]

Similarly, you have to at least initialize 1 dimension when creating a 2-D array.

Alternatively, you can use library numpy:

import numpy
numpy.zeros((m, n))

Well, in Java, this is much easier:

int[][] matrix = new int[m][n];

References:
[1]. Data structures and algorithms using Python. Wiley Publishing, 2010.
[2]. www.grepcode.com
[3]. Beloved Java doc
[4]. www.codatlas.com

Tuesday, April 7, 2015

Note on Python

Interpreted
A program written in a compiled language like C or C++ is converted from the source language i.e. C or C++ into a language that is spoken by your computer (binary code) using a compiler with various flags and options. When you run the program, the linker/loader software copies the program from hard disk to memory and starts running it.
Python, on the other hand, does NOT need compilation to binary. You just run the program directly from the source code. Internally, Python converts the source code into an intermediate form called bytecodes and then translates this into the native language of your computer and then runs it. All this, actually, makes using Python much easier since you don't have to worry about compiling the program, making sure that the proper libraries are linked and loaded, etc. This also makes your Python programs much more portable, since you can just copy your Python program onto another computer and it just works.

String

'What\'s your name?'

"What's your name"

Always use raw strings when dealing with regular expressions. Otherwise, a lot of backwhacking may be required.

self:
The reason you need to use self is because Python do methods in a way that makes the instance to which the method belongs to passed automatically, but not received automatically: the first parameter of methods is the instance the method is called on. That makes methods entirely the same as functions, and leaves the actual name to use up to you. self is not special to the code, it's just another object.

pass:
This statement is used when a statement is required syntactically but you do not want any command or code to execute.
The pass statement is a null operation. nothing happens when it executes. The pass is also useful in places where your code will eventually go, but has not been written yet.

JSON (JavaScript object Notation)
A lightweight data-interchange format. It is easy for machines to parse and generate. It is based on a subset of Javascript. JSON is a text format that is completely language independent but uses conventions that are familiar to programers of the C-family languages.

JSON is built on two structures:

A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list or associative array.
An ordered list of values. In most languages, the is realized as an array, vector, list, or sequence.

inspect module:
get the doc source file of a module

import inspect
inspect.getsourcefile(array)

chr() and ord()
chr() returns a string of one character whose ASCII code is the integer i:

>>> print(chr(97))
a

ord(): Given a string of length 1, return an integer representing the Unicode code point of the character when the argument is a unicode object.

>>> ord('a')
97

leading and trailing underscores:

single leading underscore (_function): weak "internal use" indicator. e.g., "from M import X" does not import objects whose name starts with an underscore
single trailing underscore (function_): used by convention to avoid conflicts with Python keyword. e.g., class_='ClassName'
double leading underscore(__function): when naming a class attribute, invokes name mangling.

name mangling:

If your class is intended to be subclassed, and you have attributes that you do not want subclasses to use, consider naming them with double leading underscores and no trailing underscores. This invokes Python's name mangling algorithm, where the name of the class is mangled into the attribute name. This helps avoid attribute name collisions should subclasses inadvertently contain attributes with the same name.Python mangles these names with the class name: if class Foo has an attribute named __a , it cannot be accessed by Foo.__a . (An insistent user could still gain access by calling Foo._Foo__a .) Generally, double leading underscores should be used only to avoid name conflicts with attributes in classes designed to be subclassed.

double leading and trailing underscore: "magic" objects or attributes that live in user - controlled namespaces. e.g., __init__, __file__. Never invent such names, only use them as documented.

__all__

It's a list of public objects of that module, it overrides the default of hiding everything that begins with an underscore.

[An overflow post] Longest valid subsequence

Given a string s, and a function isValid(String str), write a function to check the longest subsequence in s that is valid. For example, a subsequence in "whreat" can be "rat", "eat", "what" or "wheat". Please don't speculate the implementation of isValid(String str) function.

I was asked this question yesterday. It bothered some much that the person who asked me this question and I could not agree on the solution. Plus I had a rough day&night last night, so I decide to break my rule to write this overflow post (256).

At first I thought it should be a DP problem, however, since we cannot assume anything about the isValid function, we can not break down the problem to smaller problem, i.e., "wh" is true will not indicate "what" is also true. So the only bloody brutal solution I can think of is backtracking.

public class LongestValidSubSequence {
 private static boolean isValid(String str){
  if(str.equals("what") || str.equals("wheat") || str.equals("eat") || str.equals("rat"))
   return true;
  return false;
 }
 static String max = "";
 public static String longestValidSubsequence(String str){
  getSubsequence(str, new StringBuilder(), 0);
  return max;
 }
 
 private static void getSubsequence(String str, StringBuilder sb, int start){
  if(isValid(sb.toString())){
   String tmp = sb.toString();
   if(max.length() < tmp.length()){
    max = tmp;
   }
  }
  for(int i = start; i < str.length(); i++){
   sb.append(str.charAt(i));
   getSubsequence(str, sb, i + 1);
   sb.deleteCharAt(sb.length() - 1);
  }
 }
 public static void main(String[] args) {
  String s = "whreat";
  System.out.println(longestValidSubsequence(s));

 }

}

Saturday, April 4, 2015

Determine if an array is almost sorted, if not, how to sort it efficiently?

Given an array A of distinct elements with length n, determine if the array is 90% sorted.

1. Considering the following algorithm:

Choose an element with index i independently and uniformly at random from 0 < i < n - 1;
Compare the element with A[i - 1], output false if they are not sorted correctly;
Compare A[i] with A[i + 1], output false if they are not sorted correctly.

Prove that the algorithm will return false with probability at least 2 / 3 if A is not 90% sorted only the algorithm is repeated k = Ω(n).

Given the following counter example:
A[n/2 + 1, ..., n, 1, 2, 3, ... n / 2].
It is obvious that A is not 90 % sorted. So if we want to use the above algorithm to prove that A is not 90% sorted, each iteration we need to choose the first element to be 1, and then compare it with n, or the first element must be n, which will return false when compare with the next element.
Either case, the probability is 2 / n. Now this leads the probability of not in either of the above case 1 - 2/n.
Moreover, we know that the algorithm will be terminated once either the above case is determined. So:

(1 - 2/n)^k <= 1/3

We know that ( 1 - 1 / x)^x < 1 / e

so performing a little bit math leads to k >= (ln3)/2 *n = Ω(n).

2. Consider performing binary search on an unsorted array. Given key1 and key2 in A, it will return index i and j. We know that if key1 < key2, then i < j (why?). Now consider the following randomized algorithm:
Randomly pick up index i from A, performing binary search with key A[i], which will return index j. If i != j, return false.
Show that the algorithm is correct and will return false with probability at least 2/3 if the list is not 90% sorted if k is sufficiently large constant.

First, it is easy to understand that if the array is sorted, then BST will always return true. Now we want to know that if the array is not 90 % sorted, then it will return false with probability 2 / 3 with a proper k. Alternatively we can translate the problem into a typical probability problem:

Given a bag of balls, we know that at least 10% balls are blue, and the others are red, how many balls do we have to draw from the bag to get a blue ball with the probability at least 2/3?

Similar to the first question we can get:

(9/10)^k <= 1/3

k >= (ln3)/(ln(10/9))

So if we iterate the algorithm greater than (ln3)/(ln(10/9)), we will find an unsorted element with probability 2/3.

3. So what if we want to check if the array is not (1 - ε) sorted?

The same:

(1- ε)^k <= 1/3

k >= (ln3)/ε

4. Prove that if the given array is partially sorted in k slots, e.g.,

A={7, 8, 9, 4, 5, 6, 1, 2, 3}, then k = 3

Using insertion sort will lead to O(nk) complexity.

We know that since that each element is in the right order within its slot, so elements in the first slot need not move, the second slot will move number of elements in the first slot, and so on. Given n elements and k slots, each slot will contain n / k elements. So we will have the following equation:

(n/k) * 0 + (n/k) * 1 + ... + (n/k) * (k - 1) = n(k-1)/2 = O(nk)

5. Find an algorithm that sort this partially sorted array in O(nlog k) times, where k is number of slots.

This is just merge k sorted array algorithm. We use a priority queue, insert the first element in each slot into the queue, since the size of the queue is always k, inserting an element takes O(logk) time, we have n elements, thus merge the whole array takes O(nlogk) time.

Friday, April 3, 2015

Generate all sequences of n bits with all k unique bits

The problem came from Cracking the code interview:

Consider sequences of 36 bits. Each such sequence has 32 5 - bit sequences consisting of adjacent bits. For example, the sequence 1101011… contains the 5 - bit sequences 11010,10101,01011,…. Write a program that prints all 36 - bit sequences with the two properties:

1. The first 5 bits of the sequence are 00000.

2. No two 5 - bit subsequences are the same.

I generalized it to print all sequence of n (n <= 64) bits with: 1. the first k bits of the sequence are 0s and 2. no two k - bit subsequences are the same.

There probably are smarter ways. I used the brutal force search. Start from kth bit, get all permutations of the sequence with the first k bits zero, then for every sequence, check if all bits are unique, and print out those that contain unique k - bit sequence and haven't been printed.

I am looking for a better approach, here is my open question of the problem on Stackoverflow.

import java.util.*;
public class UniquekBitSequence {

 public static void uniqueSequence(int n, int k){
  long rst = 0;
  uniqueSequence(n, k, rst, k - 1, new HashSet ());
 }
 private static void uniqueSequence(int n, int k, long rst, int start, Set printed){
  if(isDistinct(rst, n, k) && printed.add(rst)){
   System.out.println(Long.toBinaryString(rst));
  }
  for(int i = start + 1; i < n; i++){
   rst |= (1 << i);
   uniqueSequence(n, k, rst, i, printed);
   rst = setZero(rst, i);
   uniqueSequence(n, k, rst,  i, printed);
  }
 }
 public static int getInt(long rst, int right, int left){
  int mask = (1 << (left + 1)) - 1 - ((1 << (right)) - 1);
  return ((int)rst & mask) >> right;
 }
 public static long setZero(long rst, int start){
  int mask = ~(1 << start);
  return rst & mask;
 }
 private static boolean isDistinct(long rst, int n, int k){
  Set bits = new HashSet ();
  for(int i = k - 1; i < n; i++){
   if(!bits.add(getInt(rst, i - k + 1, i)))
    return false;
  }
  return true;
 }
 public static void main(String[] args){
  uniqueSequence(10, 5);
 }
}

Thursday, April 2, 2015

Problems using distributed system

Find the sum of two n-bit binary numbers

Assuming each number is a power of 2, so we can partition each number to two parts, the first one the most significant n / 2 bits and the second one the least significant n / 2 bits. For example, 111,111 can be partitioned to 111 << 3 and 111, now if 010 adds 111, we are good, there is no carry. If 110 adds 111, we will have carries. When the carries is in the least significant parts, we need to modified the most significant parts by the carry. If we keep dividing, in the end, we can solve the problem by just adding 2 bits. Familiar with the approach? Yep, that's divide - and - conquer.

In reality, we cannot divide two numbers all the way to two bits, so here comes the question my friend likes to ask me a lot about it: Given m machines, how many iterations do we need to solve the problem given each machine has x memory? Now if mx > 2 * n, we are good, we can solve the problem by one iteration. However, what if we are given two super large numbers that, say, around 1GB each, and each machine has, 1 kb memory? So since each time we need to put the bits of two numbers into the machine, so each number can only acquire x / 2 memory in the machine. In total we can handle mx / 2 memory (in bits) of each number in each iteration. If the number has n bits, then we will need n / (mx / 2) iterations to solve the problem.

The prefix problem

The prefix product of a given sequence x0, x1, x2, ..., xn is y0, y1, y2, ..., yn where:
y0 = x0
y1 = x0 · x1
y2 = x0 · x1 · x2
...
yn = x0 · x1... · xn
where · can be any binary operations that satisfies:

x · (y · z) = (x · y) · z

The problem requires us to compute the products x0 · x1 · ... · xk where 1 <= k <= n, assuming n is a power of 2.

This problem also follows a divide - and - conquer approach. For k = 2, it is easy, just binary operation. For k = n, if we divide the problem into two parts, and we calculate the product for each part, which is represented as:

P(1, n / 2) = x0 · x1 · ... · xn/2
P(n/2+1, n) = xn/2+1 · xn2/+2 · ... · xn

then P(1, k) for k > n/2 would be P(1, n/ 2) · P(n/2+1, k).
So now we only need to keep dividing the the two subsets until we can calculate the binary product.

If we have enough processors, we can divide the array to n/2 processors, then for the combining part, since for each k > n / 2, we calculate P(1, n / 2) · P(n/2 + 1, k), so we can also divide each operation to multiple processors, so overall we will have O(logn) complexity.

Alternative approach

Initialize P(1, k) = xk for (k = 1, n), let's call x[k]
Let k = 2i, calculate x[2i] <- x[2i - 1] · x[2i]

So if we have n = 8,
x[2] <- x[1] · x[2]
x[4] <- x[3] · x[4]
x[6] <- x[5] · x[6]
x[8] <- x[7] · x[8]

Then we use induction and calculate 2k
x[4] <- x[2] · x[4]
x[8] <- x[6] · x[8]

But we miss the 2i + 1 part:
x[3] <- x[2] · x[3]
x[5] <- x[4] · x[5]
x[7] <- x[6] · x[7]

Do it iteratively, we can solve the problem with O(logn) complexity.

Finding kth smallest element
Given a interconnection network with is a complete binary tree with n = 2^(h - 1) nodes (Each node represents a processor). Find the kth smallest element. This tree-machine structure has been used in imaging processing applications, where the leaves correspond to inputs (e.g., pixels) and the algorithms for manipulating them are hierarchical.

We know that if given an array, we can use Quick select to find the solve the problem. Here, we can still apply a similar approach:

1. choosing a random element as pivot,
2. computing its rank
3. eliminating.

Choosing a random element can be achieved by a tournament. Each leaf competes with its sibling, if it wins, it sends its number to the parent. The root will get an overall winner, and that is our pivot.
The pivot is then broadcast down the tree. All leaves compare with it, and send:

1: if their number is smaller than or equal to the pivot or
0: otherwise

to their parent. The rank of the pivot is the number of 1s the root gets.

Then based on the rank of the pivot, each leave node can determine if its number is eliminated.

In each iteration, we will have four waves of communication:
1. up the tree to choose a pivot;
2. down the tree to broadcast it;
3. up the tree to compute its rank;
4. down the tree to broadcast the rank and eliminate the element

Now after the iteration, there is highly chance that we will need another one. But the problem is, the tournament is no longer fair. The node with a sibling node's number being eliminated will be automatically promoted to the next level. If, in an extreme case, all nodes in half of the tree are eliminated except for one, then this node will be promoted to the root to compete with another half of the tree with total probability 1 / 2. The other half of the tree, on the other hand, will have to killing each other to get promoted, which leads to a much lower total probability to win the game. So how to keep the fairness?
Alternatively, we can set up a counter, which is initialized as 1. This counter indicates the real opponents (the numbers that haven't been eliminated) that participated in the part of the tournament involving this element. When an element wins a game at some node, it's promoted upward, and the losing element's counter is added to the wining one's. Every game now is played with a biased coin according to the counter of the competing elements.
For example, if x wins its first game against y, x will have a counter of 2, while z advances by competing nobody (counter = 1). If x now plays with z, x will have probability 2/3 to win the game while z will only have 1 / 3. Is it fair? Indeed it is. Since when x plays against y, both nodes have a chance of 2 / 3 * 1/ 2 of winning, so overall x (or y) has 1/ 3 chance of wining, while z has overall 1 / 3 chance of wining, so we still preserve uniform randomness.

Since in each iteration, we will need to communicate from root to leaves 4 times, so we need O(logn) for communications. On average we will need to do O(logn) iterations (hopefully we are not too unlucky), so average running time is O((logn)^2).

Note that most of the time, the root and leaves are waiting: the root waits for the children to send up the pivot; the leaves waits for their parents to send down the pivot. So to improve efficiency, we can do this in a pipeline fashion. After the first layer of winners are selected and proceeds to their parents, the leaves start the second tournament again. And we can keep eliminating elements based on the pivots that was sent down in previous tournaments. This at anytime, we will have h pivots, so we can cut the elements to 1 / (h + 1) of its original size. The regular algorithm requires O(log n ) steps. Since now we can generate h = log n pivots at about the same time, we can save a factor of O(log(logn)). So the running time is reduced to O((logn)^2 / (log(logn))).

Wednesday, April 1, 2015

Reentrant lock & field in interface

I looked back to my last week's unsolved problem in this blog:

Given the interface below, implement a task scheduler.

interface Task {
    void Run();
    Set<Task> GetDependencies();
}

Additionally, the task scheduler should follow two rules.
1. Each task may only be executed once.
2. The dependencies of a task should be executed before the task itself.

Reentrant lock
So the question is, what if we run it in parallel. Naturally I thought about concurrency. However, when I tried to write a synchronized method for the problem, I realized that there is a recursion in my method. So I wondered: will it cause a deadlock?

The answer is no.

Recall that a thread cannot acquire a lock owned by another thread. But a thread can acquire a lock that it already owns. Allowing a thread to acquire the same lock more than once enables reentrant synchronization. This describes a situation where synchronized code, directly or indirectly, invokes a method that also contains synchronized code, and both sets of code use the same lock. Without reentrant synchronization, synchronized code would have to take many additional precautions to avoid having a thread cause itself to block.

In short, if I have a synchronized recursion method, when the method invokes itself again, it will transfer the lock to the recursive method, executing that first, then the lock is returned to the current method. Java does provide an explicit ReentrantLock class, if you are interested, take a look at the source code.

The fields in an interface
Since I need to implement the Task interface provided by the problem statement for test, I wrote a Trsk class. When I tried to print the label I defined in the Trsk class of a instance of the Task interface, I got an error. So I thought, ok, maybe I should add the field label into the Task. However, for an interface, all fields are automatically public, static and final, and all methods are public.
There are two solutions to solve the problem:
1. Declare a Trsk object in my TaskSechduler class;
2. Cast Task to Trsk object when I need to access the Trsk fields and methods.

I choose the second one.

Back to the problem
Here is the simple dependency graph of my test code. a -> b indicates a is dependent on b.

So this problem, I wrote a synchronized method, and another class ExecuteTask and created two Threads to run. And here is the result:

Yep, Thread 1 never gets a chance to executing any Tasks. Since in my random test graph, everyone is sort of dependent on everyone, Task 6 (which doesn't depend on anyone) is executed first. And since it is in Thread 2, which leads Thread 2 always acquires the lock and executes tasks, until all tasks are executed. And when it comes to Thread 1's turn, it realizes that there is nothing left (Thanks Thread 2 for being very hard working!).

In order to let Thread 1 to get some job, I removed "synchronized" keyword:

First run

Second run

Third run

Now it's all about who is more aggressive. However, do notice that this will cause some order problems since we have no idea who is accessing the recourse.

The intuition here is that when you need to run parallel, be careful if you have some rules on the order.

The code

public class TaskSchedulerParallel {
 Set executed;
 Set inProcess;
 public TaskSchedulerParallel(){
  executed = new HashSet();
  //executed = Collections.synchronizedSet(new HashSet());
  inProcess = new HashSet();
  //inProcess = Collections.synchronizedSet(new HashSet());
 }
 public void schedule(Task t){
 //public synchronized void schedule(Task t) {
  inProcess.add(t);
  /*for(Task ta : executed)
   System.out.print(((Trsk)ta).label + " ");
  System.out.println();*/
  for(Task dep : t.GetDependencies()){
   if(!executed.contains(dep) && !inProcess.contains(dep))
    schedule(dep);
  }
  if(executed.add(t)){
   t.Run();
   inProcess.remove(t);
  }
 }
}
public class ExecuteTask implements Runnable{
 private TaskSchedulerParallel tsp;
 private Set toExecute;
 private int label;
 public ExecuteTask(int label, TaskSchedulerParallel tsp, Set tasks){
  this.tsp = tsp;
  toExecute = tasks;
  this.label = label;
 }
 @Override
 public void run() {
  System.out.println("Thread: " + label);
  for(Task t : toExecute){
   tsp.schedule(t);
  }
  System.out.println("Thread " + label + " finishes!");
 }
}
public class Trsk implements Task{
 int label;
 Set dependencies = new HashSet();
 public Trsk(int label){
  this.label = label;
 }
 @Override
 public void Run() {
  System.out.println(label + " is running!");
 }
 public boolean addDependency(Task t){
  return dependencies.add(t);
 }
 @Override
 public Set GetDependencies() {
  return dependencies;
 }
 

}
public class TaskTester {

 public static void main(String[] args) {
  TaskSchedulerParallel tsp = new TaskSchedulerParallel();
  Task t1 = new Trsk(1);
  Task t2 = new Trsk(2);
  Task t3 = new Trsk(3);
  Task t4 = new Trsk(4);
  Task t5 = new Trsk(5);
  Task t6 = new Trsk(6);
  t1.addDependency(t2);
  t1.addDependency(t4);
  t2.addDependency(t3);
  t3.addDependency(t4);
  t3.addDependency(t5);
  t4.addDependency(t6);
  t5.addDependency(t1);
  Set s1 = new HashSet();
  s1.add(t1);
  s1.add(t2);
  s1.add(t3);
  Set s2 = new HashSet();
  s2.add(t4);
  s2.add(t5);
  s2.add(t6);
  (new Thread(new ExecuteTask(1, tsp, s1))).start();
  (new Thread(new ExecuteTask(2, tsp, s2))).start();

 }

}

AdSense