Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns...

Post on 23-Jun-2018

214 views 0 download

Transcript of Design Patterns Leveraging Spark in PDI - PentahoWorld … · 2017-11-06 · Design Patterns...

DesignPatternsLeveragingSparkinPDIChrisSkirdePentaho DirectorofSalesEngineering,HitachiVantaraRakeshSahaPentahoSeniorProductManager,HitachiVantara

QuizTime!

• WhatisSpark?A. Agoodwaytostartafire.B. Necessaryforawellrunninginternalcombustionengine.C. Fastandgeneralpurposeengineforlarge-scaledataprocessing.D. Alloftheabove.

• TrueorFalse,PentahosupportsSpark?• WhoisusingSparktoday(withorwithoutPentaho)?

Agenda

• IntroductiontoSpark• Commondesignpatterns

• HowtoleverageSparkwithPentaho

IntroductiontoSpark

• Whyareweinterested?

• Whatisitreally?

• What’sbeendone?

SparkApplicationArchitecture

Daemon

PDI/Server

WhatDoThoseApplicationsHaveinCommon?

CommonDesignPatterns

• Filter/Organize• Join• Sum

• Transform/Enrich

• Query• MachineLearning/DataScience

Filter/Organize

Join

Sum(andOtherAggregations)

Transform/Enrich

• Anystepyoulike!

Query– Easy!

• ClouderauseHive-on-SparkwithHive2• HortonworksuseSparkSQL viaSimba

MachineLearning/DataScience

Recap

Whatwecoveredtoday:

• ReviewedwhatSparkisandwhyorganizationsareadoptingit• Discussedseveralcommondataintegrationdesignpatterns

• LinkedthosedesignpatternstoPentahofeaturesforyoutotry

Questions?

NextSteps

Wanttolearnmore?

• “MeettheExperts”MattCastersandMarkHall!

• AdaptiveExecutionLayerhttp://www.pentaho.com/blog/introducing-adaptive-execution-layer-spark-architecture

• SQLonSparkhttp://www.pentaho.com/blog/operationalize-spark-big-data-newest-enhancements