Sunday, January 3, 2016

Pig Use Case in Telecom Industry

Telecom Industry generates huge amount of data (Call details). To process this data, we use pig to De-identify the user call data information.

First step is to store the data into HDFS, applying pig scripts on the loaded data and refining user call data and fetching important call information like Time rate, repetition rate, and some important log info. Once the de-identified Information comes out, the result will get stored into HDFS.
Like this huge amount of data comes into the system servers and it will be stored in HDFS and processed using scripts. During this process it will filter the data, iterates the data and produces results.
IT companies which use Pig to process their data are Yahoo, Twitter, LinkedIn and eBay. They use Pig to run most of their MR jobs. The pig is mainly used for web log processing, typical data mining situations and for image processing.

Conclusion
By providing data flowing and parallel mechanism which is going to run jobs across clusters, pig is very popular in terms of usage. When it comes to flexibility, high level scripting language gives programmers an easy interface to process and get results in an efficient way. Pig provides optimization techniques to flow data smoothly across the cluster.
Specific filtering, grouping and iterations in scripting reduces the complexity of code and runs in an effective manner. Last but not the least, as a whole pig fulfills key functionalities of Big data like volume, velocity and variety by its unique high level data flowing language.

No comments: