And introduction to Pig with exercises

29
An introduction to Pig Zoltan C. Toth https://www.linkedin.com/in/zoltanctoth

Transcript of And introduction to Pig with exercises

Page 1: And introduction to Pig with exercises

An introduction to PigZoltan C. Toth

https://www.linkedin.com/in/zoltanctoth

Page 2: And introduction to Pig with exercises

What is Pig?

Page 3: And introduction to Pig with exercises

What is Pig?

Pig is a platform that makes big data analytics on unstructured data easier.

Page 4: And introduction to Pig with exercises

Java MapReduce, the native language of Big Data is hard.

59 lines

How many page views did we have on our website yesterday?

Page 5: And introduction to Pig with exercises

Pig is much easier

for doing big data computations

In Pig, this is 4 lines of code

Page 6: And introduction to Pig with exercises

What type of data Pig operates on?

in an ideal world you have structured data

but in reality you often have unstructured data

Page 7: And introduction to Pig with exercises

The two basic concepts

Page 8: And introduction to Pig with exercises

aliases and transformations

Page 9: And introduction to Pig with exercises

aliases and transformations

define structure

Page 10: And introduction to Pig with exercises

aliases and transformations

define structure

Page 11: And introduction to Pig with exercises

aliases and transformations

filter for payments

define structure

Page 12: And introduction to Pig with exercises

aliases and transformations

filter for payments

define structure

Page 13: And introduction to Pig with exercises

aliases and transformations

select countries

filter for payments

define structure

Page 14: And introduction to Pig with exercises

aliases and transformations

select countries

filter for payments

define structure

Page 15: And introduction to Pig with exercises

aliases and transformations

select countries

filter for payments

define structure

save or show

Page 16: And introduction to Pig with exercises

aliases and transformations

select countries

filter for payments

define structure

save or show

Page 17: And introduction to Pig with exercises

aliases and transformations

select countries

filter for payments

define structure

save or show

aliases

Page 18: And introduction to Pig with exercises

aliases and transformations

select countries

filter for payments

define structure

save or show

transformationsaliases

Page 19: And introduction to Pig with exercises

DEMO

Which are the two commands that help you debug a Pig script?

Page 20: And introduction to Pig with exercises

Demo

Which countries did we receive payments from?

Page 21: And introduction to Pig with exercises

Demo

Which countries did we receive payments from?

Page 22: And introduction to Pig with exercises

Demo

Which countries did we receive payments from?

Page 23: And introduction to Pig with exercises

Demo

Which countries did we receive payments from?

Page 24: And introduction to Pig with exercises

DEMO

Page 25: And introduction to Pig with exercises

Wrapping it up • Pig works best when you have a unstructured big

data.

• It uses the concept of aliases and transformations that help you define your data pipeline.

• You can do interactive debugging in the grunt shell with using the describe and illustrate commands.

Page 26: And introduction to Pig with exercises

Exercise

List the different countries we received payments from

Page 27: And introduction to Pig with exercises

ExcerciseHow many payments / sum payments are there by country?

Page 28: And introduction to Pig with exercises

AssignmentWhich are the happy countries?

Page 29: And introduction to Pig with exercises

Followup questions? [email protected]

Thanks!