And introduction to Pig with exercises

Post on 11-Apr-2017

219 views 0 download

Transcript of And introduction to Pig with exercises

An introduction to PigZoltan C. Toth

https://www.linkedin.com/in/zoltanctoth

What is Pig?

What is Pig?

Pig is a platform that makes big data analytics on unstructured data easier.

Java MapReduce, the native language of Big Data is hard.

59 lines

How many page views did we have on our website yesterday?

Pig is much easier

for doing big data computations

In Pig, this is 4 lines of code

What type of data Pig operates on?

in an ideal world you have structured data

but in reality you often have unstructured data

The two basic concepts

aliases and transformations

aliases and transformations

define structure

aliases and transformations

define structure

aliases and transformations

filter for payments

define structure

aliases and transformations

filter for payments

define structure

aliases and transformations

select countries

filter for payments

define structure

aliases and transformations

select countries

filter for payments

define structure

aliases and transformations

select countries

filter for payments

define structure

save or show

aliases and transformations

select countries

filter for payments

define structure

save or show

aliases and transformations

select countries

filter for payments

define structure

save or show

aliases

aliases and transformations

select countries

filter for payments

define structure

save or show

transformationsaliases

DEMO

Which are the two commands that help you debug a Pig script?

Demo

Which countries did we receive payments from?

Demo

Which countries did we receive payments from?

Demo

Which countries did we receive payments from?

Demo

Which countries did we receive payments from?

DEMO

Wrapping it up • Pig works best when you have a unstructured big

data.

• It uses the concept of aliases and transformations that help you define your data pipeline.

• You can do interactive debugging in the grunt shell with using the describe and illustrate commands.

Exercise

List the different countries we received payments from

ExcerciseHow many payments / sum payments are there by country?

AssignmentWhich are the happy countries?

Followup questions? zoltanctoth@gmail.com

Thanks!