Best Weather for Goals?files.meetup.com/19103884/2_lissabon_meetup.pdf · Thinking in Spark Don't...

Best Weather for Best Weather for Goals?Goals?

Willem HendriksData Scientist IBM

willem.hendriks@nl.ibm.comhttps://github.com/willemhendriks

“More data usually beats better algorithms”Anand Rajaraman (when teaching at Stanford)

http://anand.typepad.com/datawocky/2008/03/more-data-usual.html

Old Skool StatisticsOld Skool Statistics Big Data!Big Data!

Nice read!

In The News: No more Silent Movies!

Creative Jobs are at risk, not engineers!

Scary email this week...

Apache Spark™ is a fast and general engine for large-scale data processing.

Google Trends of “Apache Spark”

What is Spark?

Spark 1.6.1 Released (9 March 2016)

Anybody here did any Big-Data projects with Hadoop?

Any computation on clusters?

Why People love Spark

1 – Nice short snappy code (Python / Scala / R)

2 – Universal Connectors (Databases / Hadoop / Files / Messaging Systems)

3 – Interactive Shell

4 – Beautiful Library(Linear Algebra & ML)

Why Lissabon loves Spark?

1 – Nice short snappy code (Python / Scala / R)

2 – Universal Connectors (Databases / Hadoop / Files / Messaging Systems)

3 – Interactive Shell

4 – Beautiful Library(Linear Algebra & ML)

Madrid

Dresden

Amsterdam

Lisbon

What can we do now,We could not do some years ago?

Why would a person/company invest in Spark Technology?

Spark & IBM

Key Components Spark

Spark Context

worker worker workerworker worker worker workerworker worker worker worker

It doesn't matter how you access Spark,Command line,Spark-submitOr notebook

With Apache Spark, Hadoop will be useless (soon)

True -or- False ?

Different Data-Sources (MySQL / Hadoop / Files / Cassandra)Requires Different Spark Programming Approaches

True -or- False ?

Scala is the Best Way to Spark,Python is for hipsters....R is for pirates....Java for old folks....

True -or- False ?

More than 1 way to Spark

RDDs DataFrames

little slower little faster (language independent)

MLlib sparkML

chain of RDDs pipelines!Nice chain of transformations & fitters

Freedom (map/reduce) structured

Programmers SQL'er / R lovers

Thinking in SparkDon't think as Variables & Functions, but in

Transformations & Actions

A RDD is a pointer to your data, not the data itself

(Hadoop / File / Database)

A transformation on a RDD will create a new RDD, by creating a

new pointer (again no data)

It adds tubes & flasks, but still no data/fluid is flowing.

Not until an action is called upon an RDD, spark will not flow data through your chain of RDD's... (lazy evaluation)

After an action,

Spark lets the

fluid/data flow.

How can we 'hold' some fluid/data inside a flask, to later

quickly use it?

rdd 2rdd 3

rdd 4rdd 5

How can we 'hold' some fluid/data inside a flask, to later

quickly use it?

rdd 2rdd 3

rdd 4rdd 5

.persist() or

.cache()

Actions & Transformations

Transformations: Actions:

Contains Elements

Transformations & Actions

.persist()

Mllib to create models (Machine Learning)

Key/Value, a special case, needed for joins

Freedom (as a opposite to Data-Frames)

Double, Float 1.3

List [12, “Blue”, 3.03]

String “23;”AEFED”

Numpy Array Array(1, 4, 3, 3, 5, ...

Spark's Labelled Points

LabelledPoint( “GOOD”, “Chablis”, 12.3 )

Any (basic) python …...

(Key, Value) Special CaseKey = everythingValue = everything

[2,4,6]

[5,4,3]

[7,6,5]

[8,7,8]

[3,4,5]

[2,4,6]

[5,4,3]

[7,1,3]

[1,1,3]

[3,4,3]

[1,4,6]

[5,7,3]

[7,6,5]

[8,8,8]

[3,8,5]

Partition APartition A

Partition B

Partition C

Let's do some basic Spark'in together

UEFA Demo

Let's take some Data!

What is are the best weather conditions for Portuguese teams?

Does the weather have influence on the number of goals in UEFA Champions League Matches?

Best Weather for Goals?files.meetup.com/19103884/2_lissabon_meetup.pdf · Thinking in Spark Don't...

Documents

Transcript of Best Weather for Goals?files.meetup.com/19103884/2_lissabon_meetup.pdf · Thinking in Spark Don't...

Rupture Disc Device (RDD)

Bway Rdd Zcm26 Report v1.0

Anatomy of RDD : Deep dive into Spark RDD abstraction

Spark Programming with RDD

Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.

Basics of pointer, pointer expressions, pointer to pointer and pointer in functions

Best Weather for Goals? - files.meetup.com Learning Intro.pdfThinking in Spark Don't think as Variables & Functions, but in Transformations & Actions A RDD is a pointer to your data,

Spark RDD : Transformations & Actions

RDD Classic Club Home

HEE “VV” W WOOR RDD

POINTER. Outline Pointer dan Struktur Pointer dan Array Pointer dan Function.

RDD Method Paper

Hrs Rdd Slides f

Pointer II POINTER AND ARRAY POINTER AND STRUCTURE.

Rupture Disc Device (RDD) - opwglobal.com

R41890 RDD

US Army: acwa rdd permit2

Spark rdd part 2

Stack Pointer Frame Pointer Difference

RDD Learning Academy...RDD Learning Academy. 2 RDD ASSOCIATES Perishables Expertly Merchandised SPEED TO SHELF. 3 ... SILK - Plnt Bsd - COFFEE CREAMERS 95,192 29.6% 29.19 1.58 COFFEE