Dataflow with Apache NiFi - Crash Course - HS16SJ
-
Upload
dataworks-summithadoop-summit -
Category
Technology
-
view
3.861 -
download
4
Transcript of Dataflow with Apache NiFi - Crash Course - HS16SJ
DataflowwithApacheNiFiAldrinPiri- @aldrinpiriApacheNiFi CrashCourseHadoop Summit2016– SanJose
29June2016
2 ©HortonworksInc.2011–2016.AllRightsReserved
Key:'ApacheNiFi’Value:'PMCMember'
Key:'Work’Value:’Sr.MemberofTechnicalStaff@Hortonworks'
Key:'WorkingwithNiFi Since’Value:'2010’
3 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
4 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
5 ©HortonworksInc.2011–2016.AllRightsReserved
Let’sConnectAtoBProducersA.K.AThings
AnythingAND
Everything
Internet!
Consumers• User• Storage• System• …MoreThings
6 ©HortonworksInc.2011–2016.AllRightsReserved
Movingdataeffectivelyishard
Standards: http://xkcd.com/927/
7 ©HortonworksInc.2011–2016.AllRightsReserved
Whyismovingdataeffectivelyhard?
à Standards
à Formats
à “ExactlyOnce”Delivery
à Protocols
à VeracityofInformation
à ValidityofInformation
à EnsuringSecurity
à OvercomingSecurity
à Compliance
à Schemas
à ConsumersChange
à CredentialManagement
à “That [person|team|group]”
à Network
à “ExactlyOnce”Delivery
8 ©HortonworksInc.2011–2016.AllRightsReserved
Let’sConnectLotsofAstoBs toAstoCstoBs toΔs toCstoϕsLet’sconsidertheneedsofacourierservice
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter CoreDataCenteratHQ
ServerCluster
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck: CreativeStall,https://thenounproject.com/creativestall/
Deliverer:RigoPeter,https://thenounproject.com/rigo/
CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/
HandScanner:EricPearson,https://thenounproject.com/epearson001/
9 ©HortonworksInc.2011–2016.AllRightsReserved
Great!Iamcollectingallthisdata!Let’suseit!Findingourneedlesinthehaystack
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink /Apex
Kafka
Storm/Spark/Flink /Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck: CreativeStall,https://thenounproject.com/creativestall/
Deliverer:RigoPeter,https://thenounproject.com/rigo/
CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/
HandScanner:EricPearson,https://thenounproject.com/epearson001/
10 ©HortonworksInc.2011–2016.AllRightsReserved
Whyismovingdataeffectivelyhardwhenscopedinternally?
à Standards
à Formats
à “ExactlyOnce”Delivery
à Protocols
à VeracityofInformation
à ValidityofInformation
à EnsuringSecurity
à OvercomingSecurity
à Compliance
à Schemas
à ConsumersChange
à CredentialManagement
à “That [person|team|group]”
à Network
à “ExactlyOnce”Delivery
11 ©HortonworksInc.2011–2016.AllRightsReserved
Let’sConnectLotsofAstoBs toAstoCstoBs toΔs toCstoϕsOh,thatcourierserviceisglobal
12 ©HortonworksInc.2011–2016.AllRightsReserved
Whyismovingdataeffectivelyhardwhenscopedglobally?
à Standards
à Formats
à “ExactlyOnce”Delivery
à Protocols
à VeracityofInformation
à ValidityofInformation
à EnsuringSecurity
à OvercomingSecurity
à Compliance
à Schemas
à ConsumersChange
à CredentialManagement
à “That [person|team|group]”
à Network
à “ExactlyOnce”Delivery
13 ©HortonworksInc.2011–2016.AllRightsReserved
TheUnassumingLine:ACaseStudyWe’veseenafewlinesshowupinthewildthusfar
Internet! Inter- &Intra- connectionsinourglobalcourierenterprise
Spotlight:ArthurLacôte,https://thenounproject.com/turo/
14 ©HortonworksInc.2011–2016.AllRightsReserved
DataflowLineAnatomy101Let’sdissectwhatthislinetypicallyrepresents
Fig1.Lineus Worldwidewebus.CommonName:Internet!
ScriptorApplication
ScriptorApplication
Data Data
DisparateTransportMechanisms
15 ©HortonworksInc.2011–2016.AllRightsReserved
DataflowLineAnatomy201Sometimesthattransportisjustmorelines
Fig1.Lineus Worldwidewebus.CommonName:Internet!
ScriptorApplication
ScriptorApplication
LineInception
Data Data
16 ©HortonworksInc.2011–2016.AllRightsReserved
DataflowLineAnatomy301Butthoselinescouldalsohavecomponents…
Fig1.Lineus Worldwidewebus.CommonName:Internet! Fig2.Good RecursionJoke
NoSuchJokeException
footagenotfound
17 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
18 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheNiFiKeyFeatures
• Guaranteeddelivery
• Databuffering- Backpressure
- Pressurerelease
• Prioritizedqueuing
• FlowspecificQoS- Latencyvs.throughput
- Losstolerance
• Dataprovenance
• Supportspushandpullmodels
• Recovery/recordingarollinglogoffine-grainedhistory
• Visualcommandandcontrol
• Flowtemplates
• Pluggable/multi-rolesecurity
• Designedforextension
• Clustering
19 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheNiFi Subproject:MiNiFi
à LetmegetthekeypartsofNiFi closetowheredatabeginsandprovidebidrectionalcommunication
à NiFi livesinthedatacenter.Giveitanenterpriseserveroraclusterofthem.
à MiNiFi livesasclosetowheredataisbornandisaguestonthatdeviceorsystem
20 ©HortonworksInc.2011–2016.AllRightsReserved
Let’srevisitourcourierservicefromtheperspectiveofNiFi
PhysicalStore
GatewayServer
MobileDevices
Registers
ServerCluster
DistributionCenter
Kafka
CoreDataCenteratHQ
ServerCluster
Others
Storm/Spark/Flink /Apex
Kafka
Storm/Spark/Flink /Apex
OnDeliveryRoutes
Trucks Deliverers
DeliveryTruck: CreativeStall,https://thenounproject.com/creativestall/
Deliverer:RigoPeter,https://thenounproject.com/rigo/
CashRegister:SergeyPatutin,https://thenounproject.com/bdesign.by/
HandScanner:EricPearson,https://thenounproject.com/epearson001/
ClientLibraries
ClientLibraries
MiNiFi
MiNiFiNiFi NiFi NiFi NiFi NiFi NiFi
ClientLibraries
21 ©HortonworksInc.2011–2016.AllRightsReserved
ApacheNiFi ManagedDataflowSOURCES REGIONAL
INFRASTRUCTURECORE
INFRASTRUCTURE
22 ©HortonworksInc.2011–2016.AllRightsReserved
NiFi isbasedonFlowBasedProgramming(FBP)
FBPTerm NiFi Term Description
InformationPacket
FlowFile Each objectmovingthroughthesystem.
Black Box FlowFileProcessor
Performsthework, doingsomecombinationofdatarouting,transformation,ormediationbetweensystems.
BoundedBuffer
Connection Thelinkage betweenprocessors,actingasqueuesandallowingvariousprocessestointeractatdifferingrates.
Scheduler FlowController
Maintainstheknowledgeofhowprocessesareconnected, andmanagesthethreadsandallocationsthereofwhichallprocessesuse.
Subnet ProcessGroup
Asetofprocessesandtheirconnections,whichcanreceiveandsenddataviaports.Aprocess groupallowscreationofentirelynewcomponentsimplybycompositionofits components.
23 ©HortonworksInc.2011–2016.AllRightsReserved
FlowFiles &DataAgnosticism
à NiFi isdataagnostic!
à But,NiFi wasdesignedunderstandingthatusers
cancareaboutspecificsandprovidestooling
tointeractwithspecificformats,protocols,etc.
ISO8601 - http://xkcd.com/1179/
Robustnessprinciple
Beconservativeinwhatyoudo,beliberalinwhatyouacceptfromothers“
24 ©HortonworksInc.2011–2016.AllRightsReserved
FlowFiles arelikeHTTPdataHTTPData FlowFile
HTTP/1.1200OK
Date:Sun,10Oct201023:26:07GMT
Server:Apache/2.2.8(CentOS)OpenSSL/0.9.8g
Last-Modified:Sun,26Sep201022:04:35GMT
ETag:"45b6-834-49130cc1182c0"
Accept-Ranges:bytes
Content-Length:13
Connection: close
Content-Type: text/html
Helloworld!
StandardFlowFile AttributesKey:'entryDate’ Value:'FriJun1717:15:04EDT2016'Key:'lineageStartDate’Value:'FriJun1717:15:04EDT2016'Key:'fileSize’ Value:'23609'FlowFile AttributeMapContentKey:'filename’ Value:'15650246997242'Key:'path’ Value:'./’
BinaryContent*
Header
Content
25 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
LiveDemo
Community
26 ©HortonworksInc.2011–2016.AllRightsReserved
Extension/IntegrationPoints
NiFi Term Description
Flow FileProcessor
Push/Pull behavior.CustomUI
ReportingTask
Used topushdatafromNiFi tosomeexternalservice(metrics,provenance,etc..)
ControllerService
Usedtoenablereusablecomponents/ sharedservicesthroughouttheflow
RESTAPI Allowsclientstoconnecttopullinformation,changebehavior,etc..
27 ©HortonworksInc.2011–2016.AllRightsReserved
OS/Host
JVM
FlowController
WebServer
Processor1 ExtensionN
FlowFileRepository
ContentRepository
ProvenanceRepository
LocalStorage
OS/Host
JVM
FlowController
WebServer
Processor1 ExtensionN
FlowFileRepository
ContentRepository
ProvenanceRepository
LocalStorage
Architecture* OS/Host
JVM
NiFiClusterManger– RequestReplicator
WebServer
MasterNiFiClusterManager(NCM)
OS/Host
JVM
FlowController
WebServer
Processor1 ExtensionN
FlowFileRepository
ContentRepository
ProvenanceRepository
LocalStorage
SlavesNiFiNodes
28 ©HortonworksInc.2011–2016.AllRightsReserved
NiFiArchitecture– Repositories- Passbyreference
FlowFile Content Provenance
F1à C1 C1 P1à F1
Excerptofdemoflow… What’shappeninginsidetherepositories…
BEFORE
AFTER
F2à C1 C1 P3à F2 – Clone(F1)
F1à C1 P2à F1 – Route
P1à F1 – Create
29 ©HortonworksInc.2011–2016.AllRightsReserved
NiFiArchitecture– Repositories– CopyonWrite
FlowFile Content Provenance
F1à C1 C1 P1à F1- CREATE
Excerptofdemoflow… What’shappeninginsidetherepositories…
BEFORE
AFTER
F1à C1F1.1à C2 C2(encrypted)
C1(plaintext)
P2à F1.1 - MODIFY
P1à F1- CREATE
30 ©HortonworksInc.2011–2016.AllRightsReserved
AgendaWhatisdataflowandwhatarethechallenges?
ApacheNiFi
Architecture
Demo
Community
31 ©HortonworksInc.2011–2016.AllRightsReserved
Learn,ShareatBirdsofaFeatherStreaming,DataFlow&Cybersecurity
ThursdayJune306:30pm,BallroomC
32 ©HortonworksInc.2011–2016.AllRightsReserved
WhyNiFi?
à Movingdataismultifacetedinitschallengesandthesearepresentindifferentcontextsatvaryingscopes– Thinkofourcourierexampleandorganizationslikeit:intervs intra,domestically,internationally
à Providecommontoolingandextensionsthatarecommonlyneededbutbeflexibleforextension– LeverageexistinglibrariesandexpansiveJavaecosystemforfunctionality– Alloworganizationstointegratewiththeirexistinginfrastructure
à Empowerfolksmanagingyourinfrastructuretomakechangesandreasonaboutissuesthatareoccurring– DataProvenancetoshowcontextanddata’sjourney– UserInterface/Experienceakeycomponent
33 ©HortonworksInc.2011–2016.AllRightsReserved
Learnmoreandjoinus!
Apache NiFi sitehttp://nifi.apache.org
Subproject MiNiFi sitehttp://nifi.apache.org/minifi/
Subscribe to and collaborate [email protected]@nifi.apache.org
Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI
Follow us on Twitter@apachenifi
34 ©HortonworksInc.2011–2016.AllRightsReserved
OurLabforToday
à WewillbeexploringsomeexamplestoworkthroughcreatingadataflowwithApacheNiFi
à UseCase:Anurbanplanningboardisevaluatingtheneedforanewhighway,dependentoncurrenttrafficpatterns,particularlyasotherroadworkinitiativesareunderway.Integratinglivedataposesaproblembecausetrafficanalysishastraditionallybeendoneusinghistorical,aggregatedtrafficcounts.Toimprovetrafficanalysis,thecityplannerwantstoleveragereal-timedatatogetadeeperunderstandingoftrafficpatterns.NiFi wasselectedforforthisreal-timedataintegration.
à Labsareavailableathttp://tinyurl.com/nificrashcourse
35 ©HortonworksInc.2011–2016.AllRightsReserved
ThankYou