Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns...

17
Design Patterns Leveraging Spark in PDI Chris Skirde Pentaho Director of Sales Engineering, Hitachi Vantara Rakesh Saha Pentaho Senior Product Manager, Hitachi Vantara

Transcript of Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns...

Page 1: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

DesignPatternsLeveragingSparkinPDIChrisSkirdePentaho DirectorofSalesEngineering,HitachiVantaraRakeshSahaPentahoSeniorProductManager,HitachiVantara

Page 2: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

QuizTime!

• WhatisSpark?A. Agoodwaytostartafire.B. Necessaryforawellrunninginternalcombustionengine.C. Fastandgeneralpurposeengineforlarge-scaledataprocessing.D. Alloftheabove.

• TrueorFalse,PentahosupportsSpark?• WhoisusingSparktoday(withorwithoutPentaho)?

Page 3: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Agenda

• IntroductiontoSpark• Commondesignpatterns

• HowtoleverageSparkwithPentaho

Page 4: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

IntroductiontoSpark

• Whyareweinterested?

• Whatisitreally?

• What’sbeendone?

Page 5: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

SparkApplicationArchitecture

Daemon

PDI/Server

Page 6: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

WhatDoThoseApplicationsHaveinCommon?

Page 7: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

CommonDesignPatterns

• Filter/Organize• Join• Sum

• Transform/Enrich

• Query• MachineLearning/DataScience

Page 8: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Filter/Organize

Page 9: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Join

Page 10: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Sum(andOtherAggregations)

Page 11: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Transform/Enrich

• Anystepyoulike!

Page 12: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Query– Easy!

• ClouderauseHive-on-SparkwithHive2• HortonworksuseSparkSQL viaSimba

Page 13: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

MachineLearning/DataScience

Page 14: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Recap

Whatwecoveredtoday:

• ReviewedwhatSparkisandwhyorganizationsareadoptingit• Discussedseveralcommondataintegrationdesignpatterns

• LinkedthosedesignpatternstoPentahofeaturesforyoutotry

Page 15: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

Questions?

Page 16: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large

NextSteps

Wanttolearnmore?

• “MeettheExperts”MattCastersandMarkHall!

• AdaptiveExecutionLayerhttp://www.pentaho.com/blog/introducing-adaptive-execution-layer-spark-architecture

• SQLonSparkhttp://www.pentaho.com/blog/operationalize-spark-big-data-newest-enhancements

Page 17: Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns Leveraging Spark in PDI Chris Skirde ... C. Fast and general purpose engine for large