W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À...

10
High level data flow language for exploring very large datasets. PIG Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy[2] and then call directly from the language. Apache Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad-hoc way of creating and executing MapReduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation.

Transcript of W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À...

Page 1: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u

High level data flow language for exploring very large datasets.

PIG

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoopjobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for RDBMSs. Pig Latin can be extended using User Defined Functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy[2] and then call directly from the language.

Apache Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad-hoc way of creating and executing MapReduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation.

Page 2: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u

Why use Pig

Page 3: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u

Pig is a powerful tool for querying data in a Hadoop cluster. It's so powerful that Yahoo! estimates that between 40% and 60% of its Hadoop workloads are generated from Pig Latin scripts. With 100,000 CPUs at Yahoo! and roughly 50% running Hadoop, that's a lot of pigs running around.

Pig Users

running Hadoop, that's a lot of pigs running around.

But Yahoo! isn't the only organization taking advantage of Pig. You'll find Pig at Twitter (processing logs, mining tweet data); t AOL and MapQuest (for analytics and batch data processing); and at LinkedIn, where Pig is used to discover people you might know.

Ebay is reportedly using Pig for search optimization, and adyard uses Pig in about half of its recommender system.

Page 4: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u

Where not to Use Pig

Page 5: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u

Apache Pig - Architecture

Page 6: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u
Page 7: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u
Page 8: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u
Page 9: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u
Page 10: W/' · ,mkl pizip hexe jps[ perkyeki jsv i\tpsvmrk ziv] pevki hexewixw w/' z w ] p ] z ] p z r o À o o ( } u ( } ] v p } p u