Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016...

24

Big Data Processing with Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig

Upload
others
Category

Documents
view
6
download
0

Embed Size (px):

Transcript of Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016...

Page 1: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Big Data Processing withSpark and AWS EMR @glomex17.10.2016MichaelLudwig

Page 2: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Our Architecture

2

Page 3: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

3

Page 4: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Our Use Cases

4

Billing Pre-Aggregations

Interactive Big Data

Page 5: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Spark components

5

Spark 1.6, PySpark, spark-submit, DataFrames, SparkSQL, UDFs, Accumulators

Page 6: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Example: SparkSQL

6

Page 7: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

EMR Cluster Startup

7

AWS Web Console AWS CLI

AWS SDKs(Python, Java, JS

etc.)

Page 8: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Startup parameters

8

Page 9: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Spot prices

9

Page 10: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Cluster Interaction

10

Page 11: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

YARN Manager

11

Page 12: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Monitoring: Spark UI

12

Page 13: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Monitoring: Ganglia on EMR

13

Page 14: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Error Troubleshooting

14

Page 15: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Summary§ EMR§ Easyclusterstartupandconfiguration§ Throw-Away,isolatedclusters§ Nobigupfrontinvestmentsneeded

§ Spark§ BestframeworktogetstartedwithBigdata§ Bigcommunity&fastdevelopment§ Localdevelopmenteasy

15

Page 16: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Backup§ TODO

16

Page 17: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

EMR Access Urls

17

Page 18: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

RDD, DataFrame and DataSet

18

Page 19: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Spark Cluster

19

Page 20: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

In-Memory Computation

20

Page 21: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Operations§ placeholder

21

Page 22: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

Sample Transformations

22

Page 23: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

RDD Lineage

23

Page 24: Big Data Processing with Spark and AWS EMR @ glomex€¦ · Spark and AWS EMR @ glomex 17.10.2016 Michael Ludwig. Our Architecture 2. 3. Our Use Cases 4 Billing Pre-Aggregations Interactive

RDD DAG

24

Building 1000 Node Spark Cluster on EMR

Building 1000 Node Spark Cluster on EMR

DISCOGRAPHY - edrmartin.com file1’52 2’39 2’31 2’08 Ode To Joy N° EMR Brass Band EMR 3340 EMR 3344 EMR 3406 EMR 3414 EMR 3389 EMR 3285 EMR 3326 EMR 3330 EMR 3348 EMR 3363

DISCOGRAPHY - edrmartin.com file1’52 2’39 2’31 2’08 Ode To Joy N° EMR Brass Band EMR 3340 EMR 3344 EMR 3406 EMR 3414 EMR 3389 EMR 3285 EMR 3326 EMR 3330 EMR 3348 EMR 3363

DISCOGRAPHY · Radetzky March (Strauss J.) N° EMR Blasorchester Concert Band EMR 11270 EMR 10300 EMR 10107 EMR 10351 EMR 10911 EMR 11061 EMR 10530 EMR 1660 EMR 1360 EMR 11628 EMR

DISCOGRAPHY · Radetzky March (Strauss J.) N° EMR Blasorchester Concert Band EMR 11270 EMR 10300 EMR 10107 EMR 10351 EMR 10911 EMR 11061 EMR 10530 EMR 1660 EMR 1360 EMR 11628 EMR

Hadoop & Spark – Using Amazon EMR

Hadoop & Spark – Using Amazon EMR

EMR 12393 Madagascar...76 Trombones (Willson) N° EMR Blasorchester Concert Band EMR 12408 EMR 12379 EMR 12380 EMR 12383 EMR 12390 EMR 12393 EMR 12394 EMR 12396 EMR 12403 Time 3’00

EMR 12393 Madagascar...76 Trombones (Willson) N° EMR Blasorchester Concert Band EMR 12408 EMR 12379 EMR 12380 EMR 12383 EMR 12390 EMR 12393 EMR 12394 EMR 12396 EMR 12403 Time 3’00

DISCOGRAPHY · Take Five N° EMR Brass Band EMR 3619 EMR 3620 EMR 3621-EMR 3622 EMR 3623 EMR 3624 EMR 3625 EMR 3626 EMR 3627 EMR 3628 EMR ... HARMONIE – BLASORCHESTER TRUMPET &

DISCOGRAPHY · Take Five N° EMR Brass Band EMR 3619 EMR 3620 EMR 3621-EMR 3622 EMR 3623 EMR 3624 EMR 3625 EMR 3626 EMR 3627 EMR 3628 EMR ... HARMONIE – BLASORCHESTER TRUMPET &

FOOT-PRINT (satellite coverage area) - Glomex - Marine · PDF file · 2015-05-25These maps show the satellite coverage area for each Glomex satellite TV ... Each satellite covers

FOOT-PRINT (satellite coverage area) - Glomex - Marine · PDF file · 2015-05-25These maps show the satellite coverage area for each Glomex satellite TV ... Each satellite covers

EMR 11503 Rock Star ancien titre Rock Fever · EMR 11503 EMR 10119 EMR 11808 EMR 11623 EMR 11515 EMR 11411 EMR 11802 EMR 11739 EMR 11625 EMR 11426 EMR 11439 EMR 11831 Time 3’26

EMR 11503 Rock Star ancien titre Rock Fever · EMR 11503 EMR 10119 EMR 11808 EMR 11623 EMR 11515 EMR 11411 EMR 11802 EMR 11739 EMR 11625 EMR 11426 EMR 11439 EMR 11831 Time 3’26

16069 Romance Strs - alle-noten.deHejre Kati (Hubay) N° EMR Clarinet & Orchestra EMR 16044 EMR 16058 EMR 16060 EMR 16062 EMR 16064 EMR 16066 EMR 16068 EMR 16069 EMR 16071 EMR 16073

16069 Romance Strs - alle-noten.deHejre Kati (Hubay) N° EMR Clarinet & Orchestra EMR 16044 EMR 16058 EMR 16060 EMR 16062 EMR 16064 EMR 16066 EMR 16068 EMR 16069 EMR 16071 EMR 16073

Data Science & Best Practices for Apache Spark on Amazon EMR

Data Science & Best Practices for Apache Spark on Amazon EMR

SPARKLYR EN EMR - Cloud Object Storage | Store & …-+madrid... · • Apache Spark: Historia, que es y cuando utilizarlo?

SPARKLYR EN EMR - Cloud Object Storage | Store & …-+madrid... · • Apache Spark: Historia, que es y cuando utilizarlo?

Big Data Analytics and Visualization to Monitor Sea Level Rise...Custom Spark vs. AWS EMR Ref. Speed - Giovanni: 1140.22 sec 16-WAY 64-WAY Custom Spark 3.3 2.9 AWS EMR 3.8 3.1 3.3

Big Data Analytics and Visualization to Monitor Sea Level Rise...Custom Spark vs. AWS EMR Ref. Speed - Giovanni: 1140.22 sec 16-WAY 64-WAY Custom Spark 3.3 2.9 AWS EMR 3.8 3.1 3.3

· Concerto N° 1 Trumpet, Piano (continued) EMR 666 EMR 676 EMR 665 EMR 663 EMR 641 EMR 679 EMR 682 EMR 6098 EMR 644 EMR 6075 EMR 6061 EMR 6012 EMR 6065 EMR 683 EMR 6021 EMR 6026

· Concerto N° 1 Trumpet, Piano (continued) EMR 666 EMR 676 EMR 665 EMR 663 EMR 641 EMR 679 EMR 682 EMR 6098 EMR 644 EMR 6075 EMR 6061 EMR 6012 EMR 6065 EMR 683 EMR 6021 EMR 6026

DISCOGRAPHY - edrmartin.com · Venus N° EMR Brass Band EMR 3526 EMR 3527 EMR 3528 EMR 3529 EMR 3530 EMR 3531 EMR 3532 EMR 3533 EMR 3534 EMR 3535 ... Medium Rock q = 124 As sung by

DISCOGRAPHY - edrmartin.com · Venus N° EMR Brass Band EMR 3526 EMR 3527 EMR 3528 EMR 3529 EMR 3530 EMR 3531 EMR 3532 EMR 3533 EMR 3534 EMR 3535 ... Medium Rock q = 124 As sung by

(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR

Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv

Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv

Data science with spark on amazon EMR - Pop-up Loft Tel Aviv

Data science with spark on amazon EMR - Pop-up Loft Tel Aviv

Using AWS EMR, Redshift, and Spark to Power Your Analytics

Using AWS EMR, Redshift, and Spark to Power Your Analytics

DISCOVERY 2 S500SS2 EXPLORER 2 S500MS2 - Glomex · GLOMEX guarantees the Discovery 2 S500SS2 and Explorer 2 S500MS2 satellite antenna series against conformity defects for a period

DISCOVERY 2 S500SS2 EXPLORER 2 S500MS2 - Glomex · GLOMEX guarantees the Discovery 2 S500SS2 and Explorer 2 S500MS2 satellite antenna series against conformity defects for a period

for Azure Marketplace - Trifacta Documentation · Support for Spark 2.3.0 on the Hadoop cluster. See System Requirements. Support for integration with EMR 5.13, EMR 5.14, and EMR

for Azure Marketplace - Trifacta Documentation · Support for Spark 2.3.0 on the Hadoop cluster. See System Requirements. Support for integration with EMR 5.13, EMR 5.14, and EMR

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS