Friday, November 18, 2016

Design chat server

Explain how you would design a chat server. In particular, provide details about the various backend components, classes, and methods. What would be the hardest problems to solve?

1. Understand requirement

When given such a problem, you probably want to discuss with the interviewer what may be the requirements for the chat server. Some ideas can be as follows:


  • Personal messaging
  • Group messaging
  • Sign on/sign off
  • Add friend requests


2. Figure out what data we need to store

User relation: User and all his/her friends that he/she can send message to
    user_id, list of friends
Message:
    sender, receiver, time, message_id, content, group_id
    * if message is sent to a group, receiver should be empty or a universal id for all groups.
Device:
    Consider the user has multiple devices, we need to ensure all devices receive messages simultaneously.
    device_id, type, status (enum, online, offline, etc)
***Notification queue:
    To ensure every message is notified to the receivers, we need to store those messages that are waiting to be notified to receivers, in case anything happened with the servers.

3. Figure out your DB

Always start from relational DB. That's easier to understand the relations between different objects and to maintain. For this problem, the relational DB part can be easily figured out from above data types.

4. Think about your API

Now comes the fun part. How to design the whole thing so that our chat server will work.
Let's start from a new user registering to our chat server. So we need a User interface, in which it should have a method called createUser(some parameter). For an existing user, if he or she logs in, we need to grab his/her chat history/profile/etc, so we need another interface, called getUser().
Now if the user wants to send a message, there should be two methods called send(Sender sender, Receiver receiver, Content content) and send(Sender sender, Group group, Content content). These two methods can be put in another interface, called MessageSendingService (or else). The action of sending message will create a new Message object.
Now let's think about the process of sending a message. First the user send a POST request with a new message content. This request will be transformed to an API call to our MessageSendingService, which will call send method and create the message. A new message will then be created and stored in our DB. Now at this point, the message is successfully sent from the sender, so we can send response to sender that the message is successfully sent. The next thing is to notify the receiver. Now we can have another service called MessageNotificationService, which will have a method called createNotificationRequest(Receiver receiver), which will create list of notification requests for each "active" device the user has. "Active" status can be acquired by checking status of the device in device id. The interface can have another method called pushNotification(Message message) which will push push each message to the receiver.
Now there is one more interface we need, which can be called UserRelationManager, which can have two methods, createRelation(User requestUser, User responseUser), which one user will send a "friend" request to another user, and another one, createGroup(User requestUser, List<User> group), which will add all users to a Group object and save to DB.

These are possible interfaces and methods we may need for our API, but there should be more and better solutions.

5. Now think about scalability and reliability

* Think about using NoSQL, how would you denormalize?
* Separate front end, back end and DB. Front end servers only care about transforming client requests and call back end server. Backend servers make API calls, read and write to DB and send response to front end server. This makes things easier later we want to expand our app to mobile or other platforms.
* Check heartbeat
* Using load balancer, replication, caching, batch processing...

6. Think about the hardest problem

* How to guarantee exactly once?
   Retry if message delivery not successful.
   In client side using hash to identify already delivered message.


7. The reality

The actual Facebook chat architecture can be found in this presentation. I wrote this post before I saw this presentation, and I'm happy that the actual implementation doesn't deviate from what I propose.  :)

The challenge in reality is that the "status" field, which in reality is called "Presence", is hard to scale. In FB's implementation, they use an actual set of servers for presence. Presence aggregates online info in memory, and do periodic AJAX polling for list of online friends.

Conversations are stored in log format.

The notification queue is maintained in user bases, i.e., each user has a channel for all his/her devices, , and long-polling is used for delivering messages. Briefly, long-polling is that when client sends a request for a message, the connection between client and server keeps open until new message comes in, or until time out. When the client receives the response, the connection closes and the client files a new request.


No comments:

Post a Comment