Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns...
-
Upload
duongduong -
Category
Documents
-
view
214 -
download
0
Transcript of Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns...
DesignPatternsLeveragingSparkinPDIChrisSkirdePentaho DirectorofSalesEngineering,HitachiVantaraRakeshSahaPentahoSeniorProductManager,HitachiVantara
QuizTime!
• WhatisSpark?A. Agoodwaytostartafire.B. Necessaryforawellrunninginternalcombustionengine.C. Fastandgeneralpurposeengineforlarge-scaledataprocessing.D. Alloftheabove.
• TrueorFalse,PentahosupportsSpark?• WhoisusingSparktoday(withorwithoutPentaho)?
Agenda
• IntroductiontoSpark• Commondesignpatterns
• HowtoleverageSparkwithPentaho
IntroductiontoSpark
• Whyareweinterested?
• Whatisitreally?
• What’sbeendone?
SparkApplicationArchitecture
Daemon
PDI/Server
WhatDoThoseApplicationsHaveinCommon?
CommonDesignPatterns
• Filter/Organize• Join• Sum
• Transform/Enrich
• Query• MachineLearning/DataScience
Filter/Organize
Join
Sum(andOtherAggregations)
Transform/Enrich
• Anystepyoulike!
Query– Easy!
• ClouderauseHive-on-SparkwithHive2• HortonworksuseSparkSQL viaSimba
MachineLearning/DataScience
Recap
Whatwecoveredtoday:
• ReviewedwhatSparkisandwhyorganizationsareadoptingit• Discussedseveralcommondataintegrationdesignpatterns
• LinkedthosedesignpatternstoPentahofeaturesforyoutotry
Questions?
NextSteps
Wanttolearnmore?
• “MeettheExperts”MattCastersandMarkHall!
• AdaptiveExecutionLayerhttp://www.pentaho.com/blog/introducing-adaptive-execution-layer-spark-architecture
• SQLonSparkhttp://www.pentaho.com/blog/operationalize-spark-big-data-newest-enhancements