Design a Dataflow in 7 minutes with Apache NiFi/HDF

Post on 16-Apr-2017

8.698 views 6 download

Transcript of Design a Dataflow in 7 minutes with Apache NiFi/HDF

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Create a live dataflow in minutesHow would that change your business?

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Add processor for data intake. Time: 1 minute1 Drag and drop processor from top menu

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Choose the specific processor2 Choose one of the processors – currently 170+ available

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Pick Twitter Processor

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Configure the processor. Time: 2 minutes3

4

Select processor and choose option to Configure

Adjust parameters as required

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Another processor for data output. Time: 1 minute5

6 Filter for and select a “Put” processor

Drag and drop processor from top menu

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Configure second processor. Time: 1 minute7 Configure 2nd processor

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Connect processors, configure connection. 2 minutes

Configure Connection8

Note: Sample Flow is different from previous example of PutHDFS. This dataflow is PutFile. Same concepts apply.

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Click Start to Begin Processing. Time total: 7 minutes

9 Click start “play” to begin processing (will run continuously until you select stop)

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

See Processors Update with Real Time Changes10 As data flows, GUI interface updates in real time.

11 If destination is stopped or unable to receive, queue builds

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamically adjust and tune data flow as needed

12 Dynamically configure/ start/ stop/ tune/ reroute change/ pause dataflows as needed.

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Powerful Tools to Quickly Replicate, Group, Repurpose, Tune and Test in Real-Time

13

14 Create a new template

Group multiple processes together to create a process group

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Provenance MeansReal-Time Traceability of:

Data FlowData ContentData Context

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Watch Real Time Flow of Data: Data Provenance

Select Data Provenance15

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Trace Lineage of a Particular Piece of Data

Icon for Data Lineage16

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Every Change to Data is Tracked in Real-Time: processing, views

Every event is traceable

17

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Real-Time Updates of Dataflow: Traceable Context & Content

Know immediately both context and content18

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Easily access and trace changes to dataflow

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Audit trail of Hortonworks DataFlow User Actions

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions?

Hortonworks Community Connection:Data Ingestion and Streaminghttps://community.hortonworks.com/