Introduction to HDF 3.0

33
1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Timothy Spann 2017 Future of Data – Princeton Meetup June 20, 2017 Hosted by TRAC Intermodal Introduction to HDF 3.0

Transcript of Introduction to HDF 3.0

Page 1: Introduction to HDF 3.0

1 ©HortonworksInc.2011– 2017AllRightsReserved

TimothySpann2017FutureofData– PrincetonMeetupJune20,2017HostedbyTRACIntermodal

Introduction to HDF 3.0

Page 2: Introduction to HDF 3.0

2 ©HortonworksInc.2011– 2017AllRightsReserved

• Schema Registry – Milind Pandit• HDFStreamingUpdates– TimSpann

• EDWOptimizationwithHadoopandHDF- GregoryCKeys,PhD.

Page 3: Introduction to HDF 3.0

3 ©HortonworksInc.2011– 2017AllRightsReserved

AmbariIntegration

Page 4: Introduction to HDF 3.0

4 ©HortonworksInc.2011– 2017AllRightsReserved

FormatandSchemaAwareEfficientFlowManagement

à Provideprocessorsforschemaawarerecordstructureforcommonprocessingpatterns– Split,Enrich,Partition,Convert,Query (SQLqueriespoweredbyApacheCalcite)– Put/GetrecordsbetweenNiFi andKafka,ElasticSearch,RDMBS(moresoon)– Easybridgingto/fromColumnardataformatslikeORCorParquet

à Separateformat/schemaspecificlogicintoextensiblerecordreadersandwriters– Developerscanwritenewreaders/writers– Userscancreatenewreaders/writerswithscriptingliveinproduction!

à Sowhat?– Formatandschemaawareprocessing*with*genericreusablecomponents– Maintainsfullprovenance/lineagetrail– Dramaticspeed/efficiencyincreasepernode– IntegrationwithHortonworksSchemaRegistryandextensibleforothers

Page 5: Introduction to HDF 3.0

5 ©HortonworksInc.2011– 2017AllRightsReserved

RecordReaderCS

Page 6: Introduction to HDF 3.0

6 ©HortonworksInc.2011– 2017AllRightsReserved

RecordWriterCS

Page 7: Introduction to HDF 3.0

7 ©HortonworksInc.2011– 2017AllRightsReserved

‘QueryRecord’Processor– Treatstreamingrecordsastables

Page 8: Introduction to HDF 3.0

8 ©HortonworksInc.2011– 2017AllRightsReserved

ComponentVersioning

Page 9: Introduction to HDF 3.0

9 ©HortonworksInc.2011– 2017AllRightsReserved

StreamProcessing– IntroducingStreamingAnalyticsManager(SAM)

StreamingAnalyticsManager

AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithadrag-and-dropuserexperience

Page 10: Introduction to HDF 3.0

10 ©HortonworksInc.2011– 2017AllRightsReserved

SAM- WriteComplexStreamingApplicationsWithNoCode

StreamingAnalyticsManager

à AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithdrag-and-dropparadigm– Buildstreaminganalyticsapplicationsthatdoeventcorrelation,contextenrichment,complex

patternmatching,analyticalaggregationsandcreationofalerts/notificationswheninsightsarediscovered.

– Givethecodersthepowertoaddkeyfunctionsandextendtheplatform (addcustomsinks,processors,spouts,etc..)

Page 11: Introduction to HDF 3.0

11 ©HortonworksInc.2011– 2017AllRightsReserved

SAM’sValueProposition

à Buildanddeploycomplexstreamapplicationswithoutwritinganycode

à Onlyopensourcetoolinthemarket withgraphicalprogrammingparadigm

à Speedtime-to-markettobuildcomplexstreaminganalyticsapplications

à Buildstreaminganalyticsapplicationswithoutspecializedskillsets.

à Decoupledataformatfromthestreamingapplicationitselfwhilebeingschemaaware

à Supportmultipleunderliningstreamingengines

Page 12: Introduction to HDF 3.0

12 ©HortonworksInc.2011– 2017AllRightsReserved

StreamBuilderModuleforAppDevelopers

à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapplications

à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode

à 4TypesofComponents:Sources,Processors,SinksandCustom

Page 13: Introduction to HDF 3.0

13 ©HortonworksInc.2011– 2017AllRightsReserved

StreamInsightModuleforBusinessAnalysts

à Atooltocreatereal-timeanalyticsdashboards,chartsandgraphs

à 30+visualizationchartsoutoftheboxwithcustomizationcapability

à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.

Page 14: Introduction to HDF 3.0

14 ©HortonworksInc.2011– 2017AllRightsReserved

StreamOpsModuleforITOperations

à Createandmanagedifferentenvironmentsinwhichindividualstreamingapplicationswillbebuilt

à EnvironmentsconsistsofservicessuchasHDFS,Kafka,Stormfromdifferentservicepools

à Savetimeandreduceoperationaloverheadwithsamedraganddropparadigmasthestreambuildmodule

Page 15: Introduction to HDF 3.0

15 ©HortonworksInc.2011– 2017AllRightsReserved

StreamBuilderModuleforAppDevelopers

à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapps.

à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode.

à 4TypesofComponents:Sources,Processors,SinksandCustom

Page 16: Introduction to HDF 3.0

16 ©HortonworksInc.2011– 2017AllRightsReserved

SAMisAllaboutDoingReal-TimeAnalyticsontheStream

Real-TimePrescriptiveAnalytics

Real-TimeAnalytics

Real-TimePredictiveAnalytics

Real-TimeDescriptiveAnalytics

Whatshouldwedorightnow?

Whatcouldhappennow/soon?

Whatishappeningrightnow?

Page 17: Introduction to HDF 3.0

17 ©HortonworksInc.2011– 2017AllRightsReserved

Real-TimePrescriptiveAnalytics

à Question:Whatshouldwedorightnow?

à Context:Itisrainy,thedriverisbeenontheroadfor12hoursandhehas30highspeedingalertsovera3minutewindowinthelast2hours.

à Answer:DispatcharadiocalltotheDrivertoslowdown

Page 18: Introduction to HDF 3.0

18 ©HortonworksInc.2011– 2017AllRightsReserved

Real-TimePredictiveAnalytics

à Question:NoviolationeventsbutwhatmighthappenthatIneedtobeworriedabout?

à Mydatascienceteamhasamodelthatcanpredictthatbasedon– Weather– Roads– DriverHRinfolikedrivercertificationstatus,wagePlan– Drivertimesheetinfolikehours,andmilesloggedoverthelastweek

Page 19: Introduction to HDF 3.0

19 ©HortonworksInc.2011– 2017AllRightsReserved

BuildingthePredictiveModelonHDP

Exploresmallsubsetofeventstoidentifypredictivefeaturesandmakeahypothesis.E.g.hypothesis:“foggyweathercausesdriverviolations”

1

IdentifysuitableMLalgorithmstotrainamodel– wewilluseclassificationalgorithmsaswehavelabeledeventsdata

2

TransformenrichedeventsdatatoaformatthatisfriendlytoSparkMLlib – manyMLlibsexpecttrainingdatainacertainformat

3

TrainalogisticclassificationSparkmodelonYARN,withaboveeventsastraininginput,anditeratetofinetunegeneratedmodel

4

Page 20: Introduction to HDF 3.0

20 ©HortonworksInc.2011– 2017AllRightsReserved

LogisticalRegressionModel

Page 21: Introduction to HDF 3.0

21 ©HortonworksInc.2011– 2017AllRightsReserved

ScoringthePredictiveModelonHDF

UseSAM’senrich/customprocessorstoenrichtheeventwiththefeaturesrequiredforthemodel6

EnrichwithFeatures

UseSAM’sprojection/customprocessorstotransform/normalizethestreamingeventandthefeaturesrequiredforthemodel

7Transform/Normalize

UseSAM’sPMMLprocessortoscorethemodelforeachstreameventwithitsrequiredfeatures8

ScoreModel

UseSAM’sruleandnotificationprocessorstoalert,notifyandtakeactionusingtheresultsofthemodel9

Alert/Notify/Action

ExporttheSparkMllib modelandimportintotheHDF’sModelRegistry5 Model

Registry

Page 22: Introduction to HDF 3.0

22 ©HortonworksInc.2011– 2017AllRightsReserved

SAM’sModelRegistryandPMMLProcessor

à ModelRegistry– Samhasrepositorytostore

andmanagePMMLbasedpredictivemodels

– Firstclassfeatureslikeversion,evolutionpolicies,etc,willbeaddedinfuturerelease

à PMMLProcessor– Processorthatcanusemodel

fromtheregistryandscorethemodelsbasedontheinputstreamofeventscomingin

Page 23: Introduction to HDF 3.0

23 ©HortonworksInc.2011– 2017AllRightsReserved

SAMExtensibility:CustomProcessors,UDF,UDAFs

à CustomComponents– Mostuserswillwanttobuildcustomcomponentstomeet

certainrequirements.– SAMprovidestheabilitytoaddbuildcustomcomponent

usingtheSAMSDK– ThejarsthencanthenbeuploadedinSAMviatheUser

Interface

à 3TypesofCustomComponents– CustomProcessors– CustomUDF

• UserdefinedfunctionsthatareusedbytheProjectionprocessor

– CustomUDAFs• Userdefinedaggregatefunctionsthatareusedbythe

Aggregateprocessor.– SDKcanbeusedtocreatecustomUDFfunctionsfor

windowedaggregations

Page 24: Introduction to HDF 3.0

24 ©HortonworksInc.2011– 2017AllRightsReserved

StreamingSplitJoinPattern

à 3Enrichmentshavetoperformedontheeventstreamtofeedintomodel:– FromLat,Longandtime,queryweather

conditions– FromdriverId,lookupinformationabout

driver’scertificationandwageplan– FromdriverId,lookupinformationabouthow

manymilesandhourswasonthedriverontheroadlastweek

à StreamingSplitJoinPattern– ComplexPatternthatallowsparallelprocessing

todecreaselatency(UsedbyApacheMetronextensively)

1. CreateasplitJoin Key2. Splitthestreamintonwherenisthenumber

ofdifferentenrichmentsyouwanttodo3. JointhenstreamsbasedonthesplitJoinKey

ComplexpatterntoimplementthatSAMallowstheusertodo

simplywithnocode!

Page 25: Introduction to HDF 3.0

25 ©HortonworksInc.2011– 2017AllRightsReserved

StreamInsightModuleforBusinessAnalysts

à Atooltocreatetime-seriesandreal-timeanalyticsdashboards,chartsandgraphs

à 30+visualizationchartsoutoftheboxwithcustomizationcapability

à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.

Page 26: Introduction to HDF 3.0

26 ©HortonworksInc.2011– 2017AllRightsReserved

StreamingAnalyticsManager

Page 27: Introduction to HDF 3.0

27 ©HortonworksInc.2011– 2017AllRightsReserved

SetUpAnEnvironmentforSAM

Page 28: Introduction to HDF 3.0

28 ©HortonworksInc.2011– 2017AllRightsReserved

HortonworksSAMCanvastobuildtheStreamingAnalyticsAppwithoutwritingalineofcode

Page 29: Introduction to HDF 3.0

29 ©HortonworksInc.2011– 2017AllRightsReserved

HortonworksSAMAppDashboard

Page 30: Introduction to HDF 3.0

30 ©HortonworksInc.2011– 2017AllRightsReserved

SchemaRegistryDashboardandDetailsofOneSchema

Page 31: Introduction to HDF 3.0

31 ©HortonworksInc.2011– 2017AllRightsReserved

Contact:

[email protected]/futureofdata-princeton

community.hortonworks.com/users/9304/tspann.html

Page 32: Introduction to HDF 3.0

32 ©HortonworksInc.2011– 2017AllRightsReserved

HortonworksCommunityConnection

Read access for everyone, join to participate and be recognized

• FullQ&APlatform(likeStackOverflow)

• KnowledgeBaseArticles

• CodeSamplesandRepositories

Page 33: Introduction to HDF 3.0

33 ©HortonworksInc.2011– 2017AllRightsReserved

CommunityEngagement

Participate now at: community.hortonworks.com©HortonworksInc.2011– 2015.AllRightsReserved

4,000+RegisteredUsers

10,000+Answers

15,000+TechnicalAssets

One Website!