Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Introduction to Apache NiFi - Seattle Scalability Meetup
-
Upload
saptak-sen -
Category
Software
-
view
2.202 -
download
5
Transcript of Introduction to Apache NiFi - Seattle Scalability Meetup
Introducing #ApacheNiFi
Saptak Sen [@saptak]Technical Product Manager, Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
#seascale
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Agenda
• New Data Sources and the Rise of the Internet of Anything• Introducing: Hortonworks DataFlow powered by Apache NiFi• Key concepts, architecture, and use cases• Demo• Q&A
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
IoAT Data Grows Faster Than We Consume It
Much of the new data exists in-flight, between systems and devices as part of the Internet of AnythingNEW
TRADITIONAL
Ability to consume data
The OpportunityUnlock transformational business valuefrom a full fidelity of data and analyticsfor all data.
Geolocation
Server logs
Files & emails
ERP, CRM, SCM
Traditional Data Sources
Internet of Anything
Sensorsand machines
Clickstream
Social media
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Interconnectedness Demands User CentricityChanges Organizations into Data Companies
Hortonworks Data Platformfor rich historical insights
from data-at-restNEW Hortonworks DataFlow
for securely collecting, conducting, and curating
data-in-motion while ALSO driving value for data-at-rest
analytics and use cases
Source: Gartner - Architecture Options for Big Data Analytics on Hadoop, July 2015
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of IoAT & Data Flow
The Data Flow Thing
Process and Analyze DataAcquire Data
Store Data
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Global interactions with customers, business partners, and thingsspanning different volume, velocity, bandwidth, and latency needs
Realistic View of IoAT and Data Flow
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Meeting IoAT Edge Requirements
GATHER
DELIVER
PRIORITIZE
Track from the edge Through to the datacenter
Small Footprintsoperate with very little power
Limited Bandwidthcan create high latency
Data Availabilityexceeds transmission bandwidth
Data Must Be Securedthroughout its journey
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Acquires Onyara
Turn Internet of Anything Data Into Actionable Insights• Onyara is the creator of and key contributor to Apache NiFi,
an open source solution for processing and distributing data.
• Over the past 8 years, Onyara engineers developed the U.S. government software project called “Niagara Files”, the precursor to Apache NiFi.
• Apache NiFi was made available as an Apache Incubator project through the NSA Technology Transfer Program in the Fall of 2014.
NEW Hortonworks DataFlow offering will securely and easily collect, conduct and curate any data, from anything, anywhere.
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
The IoAT Data Flow
Hortonworks Data Platformpowered by Apache Hadoop
Hortonworks Data Platformpowered by Apache Hadoop
EnrichContext
Store Data and Metadata
Internetof Anything
Hortonworks DataFlow powered by Apache NiFi
Perishable Insights
HistoricalInsights
Introducing Hortonworks DataFlow powered by Apache NiFi
Hortonworks DataFlow and the Hortonworks Data Platform deliver the industry’s most complete solution for management of Big Data.
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi: Three key concepts
• Manage the flow of information
• Data Provenance
• Secure the control plane and data plane
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache NiFi – Key Features
• Guaranteed delivery• Data buffering
- Backpressure- Pressure release
• Prioritized queuing• Flow specific QoS
- Latency vs. throughput- Loss tolerance
• Data provenance
• Recovery/recording a rolling log of fine-grained history
• Visual command and control
• Flow templates• Pluggable/multi-role
security• Designed for extension• Clustering
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Common Apache NiFi Use Cases
Predictive AnalyticsEnsure the highest value data is captured and available for analysisComplianceGain full transparency into provenance and flow of data
IoT OptimizationSecure, Prioritize, Enrich and Trace data at the edge
Fraud DetectionMove sales transaction data in real time to analyze on demand
Big Data IngestEasily and efficiently ingest data into Hadoop
Value ResourcesGain visibility into how data sources are used to determine value
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
MasterNiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
SlavesNiFi Nodes
High Availability: Control plane vs Data plane…
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDF – Powered by Apache NiFi
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Add processor for data intake1 Drag and drop processor icon from the top menu
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Choose the specific processor2 Choose one of the processors – currently 90 available – designed for extension
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Example: Pick Twitter Processor
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Configure the processor
3 Select processor and choose option to Configure
4
Adjust parameters as required
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Another processor for data output5 Drag and drop processor icon from the top menu
6 Example: choose PutHDFS processor
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Configure second processor7 Configure 2nd processor
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Connect processors, configure connection
8
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Click Start to begin processing
9
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
See processors update with real time changes
10As data flows, GUI interface updates in real time.
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Dynamically adjust and tune data flow as needed
11 Dynamically adjust and tune dataflow as needed, in real time. Can also replicate data for testing and comparison.
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Understand the data path with Data Provenance
14 Select Data Provenance
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Trace lineage of a particular piece of data
15
Icon for Data Lineage
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Every change to data is tracked: processing, views
16
Provenance event is tracked
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Updates as changes happen
17 Updates as data flows
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Easily access and trace changes to dataflow
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Audit trail of Hortonworks DataFlow User Actions
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Operations: Planned
Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Q & A
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved