Wednesday, March 25, 2015

Common sense (?) about YouTube

The Web Servers

Source: https://www.youtube.com/watch?v=w5WVu624fY8
NetScaler
load balances: mark down/up machines,
cache static contents: Apache is not very good at handling very large static contents.

mod_fastcgi: request look dynamic, follow certain rules

Run linux
can usually scale by adding machines
Python is fast enough
Web code speed is usually not the bottleneck
More often spending on time waiting on databases, caches, RPC (remote procedure call) calls.
Development code is important

<<100 ms page service times
dynamic Python -> C compiler, use psyco to speed up Python
selectively write encryptions, such as C extensions
pre-generated cached HTML

Serving Video
Issues: bandwidth, hardware, limited power in a data center, hardware doesn't consume too much power
Each video is hosted by "mini-cluster"
mini-cluster: small number machines that serve exact the same set of videos, few redundancy;
couple machines in one cluster:
1. scalability: more disks serving each content
2. head room: one machine goes down, other machines can take off the slack
3. online backups of everything

from Apache -> lighttpd
single process -> multi-process

Serving Video
Put the most popular content on to the CDNs
The moderately played and least played videos go to YouTube servers.
The traffic profile that goes to YouTube servers: some videos have 10 - 20 views / day, but if aggregate all those videos, it's still a lot. Rate control becomes more important, cache becomes more important. And want to tune the amount memory of each system so that it won't be too small. But not too large, because otherwise when you expand the number of machines, it will cost too much money without much benefit.

Keep it simple
simple network path: not too many network devices are in the path between the client and video servers
commodity hardware
simple common tools
Bundles of Linux, build layers on top of them, can be modified if needed
OS page cache size is critical
Handle random seeks(SATA tweaks, etc. )

Serving Thumbnails
Thumbnails images that correspond to each video are around 100 * 90 pixels, so ~ 5kb each, A lot of them, each video has multiple thumbnails. There are 4 times number of thumbnails than there are videos. Videos are across thousands of machines, but thumbnails are concentrated on small number of machines. Need to deal with lots of small objects: In & Out (INO) cache, direct entry caches, page caches, all over OS level
High request / second: any given web page, when search a key word, there are lots thumbnails, but there is only one video.
Start with Apache: unified pool (same machine that is serving the web traffic is also serving static images(thumbnails)) -> separate pool, high load, low performance
-> squid(reverse proxy): far more efficient with CPU, load increases, performance decreases over time
-> lighttpd (modified)
Source:https://www.youtube.com/watch?v=w5WVu624fY8
Main thread: epoll reading content that is already in memory
Assign disk reads to different Worker Thread
As long as keep all the fast stuff on the main thread, serving lots of request per second, all the slow stuff can come a little bit later

-> BTFE
Google's BTFE
Based on Bigtable(Google distributed data store)
avoids small file problem
set up new machines becomes faster and easier
fast, fault-tolerant
various forms of caching: cache multiple levels so that the latency between server and user is not so much

Databases
MySQL
only store metadata (users, video description titles tags, etc)
In the beginning: one main DB and one backup
main database is swapping: it is not using more memory than the system has: The Linux 2.4 kernels, it thinks the page caches is more important than applications, if the page caches is 1.6 gb, the application is 3.5 gb, if there is only 4 gb of physical memory, rather than reduce the size of page cache, it instead swapped out mySQL, but mySQL needs to be on memory, so it goes back and forth, so swapping in and swapping out is very frequently. 

DB Optimizations
Query optimizations
Batch jobs whenever they are too expensive to do
memcache: instead of going to database all the time, use memcache D, hashtable on TCP socket, allows you to get and set data based on some string based key; using memory is better than using the disk
App server caching: database caches, database machine OS caches, memcache, memory space of the app server process: access the data very often (hundreds of times / second), sometimes precalculate the data, and store the data in memory

Replication
Spread reload across other databases when the traffic grows, which are replaying the transactions happening on the master databases. All the writes go to the master database, they spread to single replica that is attached to that database, asynchronized process. 

Source: https://www.youtube.com/watch?v=w5WVu624fY8

All the writes are in multiple places, in some situations that writes are exhausting the databases;
If there are too many replicas, management issues;

Replica Lag (downside)
Replication is asynchronous
If there are multiple machines which have different CPU speed, slightly different lags on each machine;
On the Master database, updates comes from multiple threads. Each thread is executing a single update. 
On the replica data base, MySQL serialize everything onto one thread, it can be bottleneck.Update 3 cannot execute until update 2 is finished assuming update 2 is a long update. 

Source: https://www.youtube.com/watch?v=w5WVu624fY8

Unhealthy Replication Thread
Consider there are 5 updates, 3 are results of pages that are not in memory, so result in cache misses. Cache misses stalls the replication thread. At certain point the replica cannot keep up with the master database. The master database is using all the CPUs and all the disks, but the replica can only use 1 CPU and 1 disk, causing a reduction in replication speed. 

DB updates involve:
1. reading the affected DB pages
2. Applying the changes

MySQL use 1 thread to buffer or the sql from the master database, and another one executing the sql. 
The solution: since replication is behind, the cache primer thread can see the not-yet-replayed SQL queries and pre-fetch the affected rows. 

Replica Pools
Since it cannot deal with all database load, prioritized video watching above everything else. Split replica databases into two pools:
Source: https://www.youtube.com/watch?v=w5WVu624fY8

Database Partitions
A gigantic database: split it to different pieces, each is independent
partition by user
spread writes AND reads
much better cache locality -> less I/O
30% hardware reduction with same amount of traffic

Scalable systems
simple
solve the problem at hand:
product of evolution:the complexity will come over time

Need the flexibility

Scalable techniques
divide and conquer: partition things. distribute things
approximate correctness: the system of the state is what what is reported to be so if you can't tell that your system is fundamentally skewing and inconsistent, then it's not
e.g., if you write a comment, and another person on the other side of the world is loading the watch page, he or she may not care what you write, but you may want to see the comment, so inconsistency comes to make sure the writer sees the comment first, and thus can save some cost.
expert knob twiddling: consistency, durability
jitter: introduces randomness, things have tendency to stack up
e.g., cache expirations: if there is a popular video, we want to cache it. If we set usual cache timeout for an hour or 10 minutes, but for a very popular video (the most popular one yesterday), the cache timeout maybe 24 hours. If everything expires at one given time, then every machine will observe that expiration exactly the same time and go back and try to compute it, and this is awful. By jittering, the desired expiration time is about 24 hours and up, but we will randomly jittering (adding variances) around 18 to 30 hours , and that prevent things from stacking up. Because system has the tendency to synchronize lines up.
cheating: when you have monotonically increasing counters like a view count, you could do a full transaction every time you update that, or you can do a transaction once in a while and update by a random amount, as long as the change people will believe it is real.

Scalable components
Well defined inputs:
Well defined dependencies
frequently end up behind an RPC (remote procedure call)
leverage local optimizations

Efficiency
uncorrelated with scalability
The most efficient thing would be to cram everything in process and see
focus on algorithms: focus on macro level, they way individual the components break out doesn't matter,
libraries:
wiseguy
pycurl
spitfire
serialization formats

tools
Apache
linux
mySQL
vitess
zookeeper








1 comment:

  1. The development of artificial intelligence (AI) has propelled more programming architects, information scientists, and different experts to investigate the plausibility of a vocation in machine learning. Notwithstanding, a few newcomers will in general spotlight a lot on hypothesis and insufficient on commonsense application. IEEE final year projects on machine learning In case you will succeed, you have to begin building machine learning projects in the near future.

    Projects assist you with improving your applied ML skills rapidly while allowing you to investigate an intriguing point. Furthermore, you can include projects into your portfolio, making it simpler to get a vocation, discover cool profession openings, and Final Year Project Centers in Chennai even arrange a more significant compensation.


    Data analytics is the study of dissecting crude data so as to make decisions about that data. Data analytics advances and procedures are generally utilized in business ventures to empower associations to settle on progressively Python Training in Chennai educated business choices. In the present worldwide commercial center, it isn't sufficient to assemble data and do the math; you should realize how to apply that data to genuine situations such that will affect conduct. In the program you will initially gain proficiency with the specialized skills, including R and Python dialects most usually utilized in data analytics programming and usage; Python Training in Chennai at that point center around the commonsense application, in view of genuine business issues in a scope of industry segments, for example, wellbeing, promoting and account.

    ReplyDelete