Tuesday, December 29, 2015

Hadoop Pig

Pig is developed on top of Hadoop. It provides the data flowing environment for processing large sets of data. The pig provides a high-level language. It is an alternative abstraction on top of Map Reduce (MR). Pig program supports parallelization mechanism. For scripting of the Pig, it provides Pig Latin language.
Pig takes Pig Latin scripts and turns into a series of MR jobs. Pig scripting having its own advantages of running the applications on Hadoop from the client side. The nature of programming is easy, compared to low level languages such as Java. It provides simple parallel execution, the user can write and use its own custom defined functions to perform unique processing.
Pig Latin provides several operators like LOAD, FILTER, SORT, JOIN, GROUP, FOREACH and STORE for performing relational data operations. The operators implement data transformation tasks with simple lines of codes. Compared to MR code, the Pig Latin codes are very less in lines and gives better flexibility in some IT industrial use cases.

No comments: