HDFS(Hadoop Distributed File System)
Master-slave architecture
Master:
- Name node:
- control all the data nodes, it's the controller that handles all file system operations (e.g., the create file, the delete file will go through name node), it manages the namespace of the file system, a snapshot of what the file system looks like,
- also handles block mapping(whenever you put a file into the file system, it will break the file into blocks and spread it over all data nodes)
- single point of failure: contains filesystem metadata, once the Name Node fails, the whole cluster fails
- Holds in memory a map of the entire cluster
- use secondary name node for a copy
- replication value/factor: how many blocks of each file are out there
- store everything in memory
- mapping of block_ids to Data Nodes
- journal/edit log: track of edits and changes, any client request information, the client will write the changes into other logs, it is not in the memory
- Get the information from the edit log to the memory: reboot of the name node, do a check point process;
- do reboot will force a check point, the name node will take what's in the memory, process it to the disk, and it will merge the edit logs with that file that was proceeded to the disk (fsImage);
- after reboot, it will reload what's in fsImage back to the memory
- If we don't reboot Name Node very often, the edit logs will keep growing, if we loose the Name Node, we will loose all changes on the Name Node
- monitors health
- client communicate/controller
- Never communicate with anyone (other nodes talk to it)
- Secondary Name Node (check point node): snapshot of the Name Node every now and then, system restore for the name node
- takes all responsibilities of merging the log with the filesystem image
- housekeeping node for the name node
- Metadata backup: periodically access the Name Node and merge the edit log and fsImage, and re-upload the fsImage back to the Name Node
- not high availability server
- Job tracker: (in map reduce): controller for all task trackers
Slave:
- Data Nodes:
- Workhorse of the system, doing all block operations, receiving instructions from the name node. e.g., client communicates with the name node and get a list of where the desired file is stored, and the client will communicate directly with the data nodes that store those files and get the files.
- Writing and retrieving blocks to disks
- Communicate directly with clients or Name Node:
- Data Node sends signal every 3s to the Name Node so that the Name Node knows that it is online
- If a Data Node fails to send signal by 10min, the Name Node will consider that Node dead and re-replicate all blocks on that node all over other nodes
- Also do replications
- Task trackers
File Blocks:
- Default saves the data in 64 MB chunks on disk;
- will affect the metadata: small number of large files, then the footprints of the metadata will be small; on the other hand, if the size of each file is small (8kb), it will have too much metadata
- Block placement: Inter rack communication, it will calculate the distance (network band width) between data nodes and it will place the blocks as close to each other as possible per file within a rack
- Rebalancing
- balancer tool, rebalancing or redistribute files among all nodes
- Replication management
- every 10 heart beat from a data node is actually a block report, the name node can figure out if the block is under or over replicated. If it is over replicated, it will remove the block. If it is under replicated, it will create a priority queue, and things with the least blocks on the cluster will be highest on the list.
- Every Data Node has a block scanner so that it can check out the block integrity of all the blocks. And it directly impacts the block report: if the node happens to have a corrupt block, in the report the Name Node will recognize it and remove it and re-replicate the good block to that node.
Source: https://www.youtube.com/watch?v=Ll7tVuQhRWI |
Multiple rack cluster
broke roles into separate racks
- Challenges:
- Data loss prevention: use replication, multiple copy of data
- Rack awareness: understanding network topology, manual thing, describe network topology to the name node. Once the name node understands the network topology, and it is rack aware, it will ensure multiple data copies sit on multiple racks, so that whenever data loss happens, the name node is able to replicate the data on that rack.
- Network performance:
- assumption: Intra rack communication is much higher band width and lower latency than cross rack communication.
HDFS Features:
No comments:
Post a Comment