Monday, December 21, 2015

Job Execution and its Work flow

Let’s say for performing some computational mechanism using Java, we submitted some Job and data of size 1000 MB. Once a client submits the job, it will contact NN for the resources which are readily available for the job to execute. NN will provide data node information to the JT for further proceeding. Depends on the availability of resources, the JT splits and assigns tasks to Task Tracker.

Suppose, the Job is of 1000MB(1GB), assume JT splits the work into 10 tasks and allocates 100 MB to each process. Here the capacity to handle task tracker depends on input split like block size (64 MB or 128 MB).
Coming to open source apache Hadoop Limitations, the Hadoop 1.0 has scalability limited to 5000 nodes in cluster and maximum 40000 concurrent tasks can be available for this and it means each node is going to provide 8 concurrent tasks. Hadoop 2.2 overcomes the limitations of Hadoop 1. Hadoop enterprise editions will provide features like to distribute storage space in the cloud, Extensibility to upgrade nodes.

An Enterprise edition provides options according to client requirements. The client will choose these Hadoop editions by taking factors like data usage and data storage of the company. Enterprise editions like Cloudera, Horton works and Big Insights are all developed on top of Apache Hadoop only.
Conclusion
As a whole, Hadoop Architecture provides both storage and processing of job as a distributed framework. Compared to existing methods of storing and processing of large sets, Hadoop gives additional advantages in terms of market strategies. The complete enterprise editions like Cloudera and Horton works provides a complete environment for the Hadoop and its cluster maintenance and Support.

No comments: