Tuesday, December 15, 2015

High Level Architecture of Hadoop

HDFS

HDFS is a Hadoop distributed file system. As it's name implies, it provides distributed environment for the storage and its file system is designed in a way to run on commodity hardware. There are some distributed file systems in the market like Cloudera, Horton works, etc. which provides distributed storage in HDFS and it has its own special advantages.
HDFS provides a high degree of fault tolerance and runs on cheap commodity hardware. By adding nodes to cluster using cheap commodity hardware, Apache Hadoop can process large data sets which will give the client better results as compared to existing distributed systems. In this, data stored in each block replicate into 3 nodes in such a way that even if one node goes down, there will be no loss of data since we have a proper back up recovery mechanism.
So HDFS provides concepts like Replication Factor, High memory block size and it can scale out up to several 1000 nodes.

No comments: