Sunday, December 20, 2015

Functionalities of Hadoop daemons

Name Node, Secondary Name Node and Job Tracker are known as Master daemons. Task Tracker and Data Nodes are known as slave daemons. Whenever a client submits a Job, the actual processing is going to take place in Task Tracker.
NameNode consists of all Meta data information such as racks, blocks (r1, r2, b1, b2). NN sends its image file to secondary NN every 3sec (Heartbeat Mechanism). Name node is having special feature “Rack Awareness”, it will give information about nodes present in different racks of Hadoop cluster environment.
  • Secondary Name Node (Snn) - If NN goes down, Snn will take the image instantaneously and acts like NN and performs the NN functionalities.
  • Job Tracker (JT) - it is used for job scheduling and maintenance. It assigns different tasks to Task Tracker and monitors it.
  • Task Tracker (TT) - Actual computational processing is going to happen here. It gets in contact with JT and has some heart beat mechanism between them.
For Single Node Hadoop Setup it includes single master node and multiple working nodes. In this case, master consists of Job Tracker, Task Tracker, Name Node (NN) and Data Node (DN). A slave or working nodes act as both the Data Node and Task Tracker. For high level application developments Single Node Hadoop will give limited options (like memory, capacity) only, so it’s not that suitable.

For Multiple Node Hadoop Setup (Cluster Set up) HDFS is having Name Node server and secondary Name Node to capture the snapshots of Name Node’s Metadata information in Cumulative Intervals. If Name Node shuts down, the secondary name node acts like Name node and executes the instructions to Job Tracker.

Writing data into HDFS

From figure 2, if the client wants to write files into HDFS, the client would send requests to Name Node for block locations. Block locations are the location of blocks stored in Data Nodes. Name Node will provide the Block locations and data node information which are currently free. Then. The client directly contacts the Data Nodes to store the data with the block locations received from Name Node. Consequently, NN will have the metadata of the original data, i.e. stored in Data nodes.

Reading data from HDFS

The client sends request to NN, asking for file information where it has been stored in HDFS. NN will provide Data Node information where the files were stored. Then client contact Data nodes and retrieve the file. NN always contains the Meta data information like blocks, racks and nodes.

No comments: