Telecom Industry generates huge amount of data (Call details). To process this data, we use pig to De-identify the user call data information.
First step is to store the data into HDFS, applying pig scripts on the loaded data and refining user call data and fetching important call information like Time rate, repetition rate, and some important log info. Once the de-identified Information comes out, the result will get stored into HDFS.
Like this huge amount of data comes into the system servers and it will be stored in HDFS and processed using scripts. During this process it will filter the data, iterates the data and produces results.
IT companies which use Pig to process their data are Yahoo, Twitter, LinkedIn and eBay. They use Pig to run most of their MR jobs. The pig is mainly used for web log processing, typical data mining situations and for image processing.