Thursday, December 24, 2015

Job Execution Inside Hive

Hive query processing life cycle

HIVESERVER is an API that allows the clients (JDBC) to execute the queries on hive data warehouse and get the desired results. Under hive services driver, compiler and execution engine interact with each other and process the query.
The client submits the query via a GUI. The driver receives the queries in the first instance from GUI and it will define session handlers which will fetch required APIs that is designed with different interfaces like JDBC or ODBC. The compiler creates the plan for the job to be executed. Compiler in turn is in contact with matter and its gets metadata from Meta Store.

Execution Engine (EE) is the key component here to execute a query by directly communicating with Job Tracker, Name Node and Data nodes. As discussed earlier, by running hive query at the backend, it will generate a series of MR (Map Reduce) Jobs. In this scenario, the execution engine plays like a bridge between hive and Hadoop to process the query. For DFS operations, EE contacts Name Node.
At the end, EE is going to fetch desired results from Data Nodes. EE will be having bi-directional communication with Metastore. In hive, side is a framework to serialize and de-serialize input and output data from HDFS to local or vice versa.
Metastore is used for collection of all the Hive metadata and it’s having back up services to backup meta store info. The service runs on the same JVM as the services of hive running on. The structural information of tables, their columns, column types and similarly the partition structure information will also be stored in this.

No comments: