Hadoop summit 2016

18
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning using Spark and DL4J for fun and profit Adam Gibson and Dhruv Kumar 2015 Version 1.0

Transcript of Hadoop summit 2016

Page 1: Hadoop summit 2016

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Deep Learning using Spark and DL4J for fun and profit

Adam Gibson and Dhruv Kumar

2015Version 1.0

Page 2: Hadoop summit 2016

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Who are we?

Adam Gibson- Co founder of Skymind - Wrote DeepLearning4J, ND4J

Dhruv Kumar- Sr Solutions Architect, HWX- MS Umass, Mahout, ASF

Page 3: Hadoop summit 2016

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

In this talk

- What’s Deep Learning?- Architectures - Implementation and Libraries in Real Life- Demo!

Page 4: Hadoop summit 2016

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Deep Learning

• One of the many pattern recognition techniques in Data Science

• Excels at rich media applications:• Image recognition• Speech translation• Voice recognition

• Loosely inspired by human brain models• Synonymous with Artificial Neural Networks, Multi Layer

Networks

Page 5: Hadoop summit 2016

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Enterprise use cases

Page 6: Hadoop summit 2016

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Doing this in real life for enterprise

Page 7: Hadoop summit 2016

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights ReservedPage 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

HDP FOR DATA AT REST

HDF FOR DATA IN MOTION

ACTIONABLEINTELLIGENCE

MODERN DATA APPSModern Data Applications in Enterprise: Connected, Fast, Intelligent

PERISHABLE INSIGHTS

HISTORICAL INSIGHTS

INTERNETOF

ANYTHING

Page 8: Hadoop summit 2016

How do we realize MDA in a Hadoop Centric World?

HDF

Hadoop

HDFS

HBase Hive SOLR

YARN

Storm

Service Management /

Workflow

SIEM

Spark

Raw Network Stream

Network Metadata Stream

Data Stores

Syslog

Raw Application Logs

Other Streaming Telemetry

Page 9: Hadoop summit 2016

www.hortonworks.com

NiFi 1

NiFi 2

Storm 1 Kafka 1

Storm 2 Kafka 2

Storm 3 Kafka 3

DataNode 1 HBase 1

Source 1

Source 2

Source 3

Source N

NiFi Nodes

Edge Nodes

Master NodesClients 1

Clients 2

DataNode 2 Hbase 2

DataNode 3 Hbase 3

DataNode 4 Hbase 4

DataNode 5 Hbase 5

DataNode 6 Hbase 6

DataNode 7 Hbase 7

DataNode 8 Hbase 8

DataNode 9 DataNode 10

DataNode 31 DataNode 32

Master 1

Master 2

Master 3

Master 4

Master 5

Worker Nodes

HDF

HDP

World Azure

Page 10: Hadoop summit 2016

Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Storm/Spark Streaming

Storm

Detailed Reference Architecture

HDF

Flume

Sink toHDFS

Transform

Interactive

UI Framework

Hive

Hive

HDFS

HDFS

SOURCE DATA

Server logs

Application Logs

Firewall Logs

CRM/ERP

Sensor

Kafka

Kafka

Stream toHDF

Forward to Storm

Real Time Storage

Spark-ML

Pig

Alerts

Bolt toHDFS

Dashboard

Silk

JMSAlerts

Hive Server

HiveServer

Reporting

BI Tools

High Speed Ingest

Real-Time

Batch Interactive

Machine LearningModels

Spark

Pig

Alerts SQOOP

Flume

Iterative ML

Hbase/Pheonix

HBaseEvent Enrichment

Spark-Thrift

Pig

Page 11: Hadoop summit 2016

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 11

For Model Building: Typical Workflow

1. Ingest training data and store it2.Split data set into: training, testing and validation sets3.Vectorize and extract features to go into next step4.Architect multi layer network, initialize5.Feed data and train6.Test and Validate7.Repeat steps 4 and 5 until desired8.Store model9.Put model in app, start generalizing on real data.

Page 12: Hadoop summit 2016

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 12

So what do you get?

1. Ingest training data and store it using Nifi or other ingest tools2.Split data set into: training, testing and validation sets3.Vectorize and extract features to go into next step4.Architect multi layer network, initialize5.Feed data and train6.Test and Validate7.Repeat steps 4 and 5 until desired8.Store model9.Put model in app, start generalizing on real data.

Steps 2, 3, 4 and 5: Use libraries such as Deeplearning4j

Page 13: Hadoop summit 2016

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 13

Deeplearning4j Architecture

Page 14: Hadoop summit 2016

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 14

DL4J: Canova for Vectorization and Ingest

• Canova uses an input/output format system (similar to how Hadoop uses MapReduce)

• Supports all major types of input data (text, CSV, audio, image and video)

• Can be extended for specialized input formats• Connects to Kafka

Page 15: Hadoop summit 2016

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 15

ND4J:

• N-dimensional vector library• Scientific computing for JVM• DL4J uses it to do linear algebra for backpropagation• Supports GPUs via CUDA and Native via Jblas • Deploys on Android• DL4J code remains unchanged whether using GPU or

CPU

Page 16: Hadoop summit 2016

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 16

How to chose a Neural Net in DL4J core?

Page 17: Hadoop summit 2016

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Demo!

Page 18: Hadoop summit 2016

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank Youhortonworks.com