Big Data Science at the Digital Catapult

26
BIG DATA SCIENCE “The price of light is far less than the cost of darkness” Chandan Rajah [ @ChandanRajah ]

description

Talk on Big Data and the need for it in the digital economy. This talk is centred around the Digital Catapult's challenge areas.

Transcript of Big Data Science at the Digital Catapult

Page 1: Big Data Science at the Digital Catapult

BIG DATA SCIENCE

“The price of light is far less than the cost of darkness”

Chandan Rajah [ @ChandanRajah ]

Page 2: Big Data Science at the Digital Catapult

BENEFITS OF BIG DATA

COST SPEED

AGILITY CAPABILITY

Page 3: Big Data Science at the Digital Catapult

Steps to the EPIPHANY

WHERE

WHAT WHY

DEMO

Page 4: Big Data Science at the Digital Catapult

What is Big Data ?

Big Data ≠ Data Volume

Big Data = Crude Oil

Think of data like ‘Crude Oil’

Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it

in ‘mega tanks’

Page 5: Big Data Science at the Digital Catapult

What is Data Science ?

Data Science ≠ Statistical Analysis

Data Science = Oil Refinery

Data science is about ‘treating’ data; applying ‘science’ to the data;

Refine the data ‘results’; and combine to form ‘insight’

Page 6: Big Data Science at the Digital Catapult

Knowns, Unknowns & DIKUW FTW!

known knownswe know we know

known unknownswe know we don’t know

unknown unknownswe don’t know we don’t

know

DDATA

IINFORMATION

KKNOWLEDGE

WWISDOM

UUNDERSTANDI

NG

raw what how to why when

numbers description experience cause & effect prediction

letters context tested proven what’s best

symbols relationship instruction

signals reports programs models

PAST FUTURE

Data Engineer Data Analyst Data Miner Data Scientist

known knownsknown unknowns unknown

unknowns

Page 7: Big Data Science at the Digital Catapult

Data Analytics to Data Discovery ?

data you know

data you don’t know

qu

esti

on

s y

ou

’re a

skin

gq

uestio

ns y

ou

’re n

ot a

skin

g

Data Analyst

Data Scientist

DataAnalytics

Data Discovery

DATA MODELLINGY F( X, random noise, parameters)

ALGORITHMIC MODELLINGY [ BLACK BOX ] X

Page 8: Big Data Science at the Digital Catapult

DIVIDE

SCATTER

Split Data in BlockReplicate and Store

Petabytes of Resilience

CONQUER

EXPLORE

1000s of Parallel ThreadsExplore Every PathMachine Learning

INSIGHT

GATHER

Real Time ActionPeriodic DashboardsIterative Evolution

What is the Big Idea ?

Page 9: Big Data Science at the Digital Catapult

Divide = HDFS

Name Node

1 32

Client 1. Create Metadata

2. Put Blocks

Data Nodes

Control / Monitoring

1 1

2 2

3 3

WR

ITE

Name Node

1 1 1 2

2

2

3 3 34

4 4

Client 1. Get Metadata

2. Fetch Blocks

Data Nodes

Control / Monitoring

REA

D

Page 10: Big Data Science at the Digital Catapult

Conquer = MapReduce

Page 11: Big Data Science at the Digital Catapult

Insight = Functional Paradigm

Page 12: Big Data Science at the Digital Catapult

Steps to the EPIPHANY

WHERE

WHAT WHY

DEMO

Page 13: Big Data Science at the Digital Catapult

Why is Big Data needed ?

VOLUME VELOCITY VARIETY

Exponential growth; 2x in

2 yrs

PB (1000 TB) is now

common

Event streams; never at

rest

640k GB per internet

minute

100s of data sources

85% not in a table

Page 14: Big Data Science at the Digital Catapult

Where in the Value Chain ?

Generation Transport Knowledge Output Value

BIG DATA SCIENCE

Straddles all four Challenge Areas

Page 15: Big Data Science at the Digital Catapult

Steps to the EPIPHANY

WHERE

WHAT WHY

DEMO

Page 16: Big Data Science at the Digital Catapult

Big Data Heat Map – Gartner 2012

Page 17: Big Data Science at the Digital Catapult

Big Data Potential by Sector – McKinsey for USBLS, 2011

Page 18: Big Data Science at the Digital Catapult

Big Data Investment by Industry – Gartner, 2012

Page 19: Big Data Science at the Digital Catapult

Top Big Data Challenges – Gartner, 2012

Page 20: Big Data Science at the Digital Catapult

Survey on Big Data Investments – IDG Survey, 2013

Page 21: Big Data Science at the Digital Catapult

Survey on Main Drivers to Invest – IDG Survey,

2014

Page 22: Big Data Science at the Digital Catapult

Steps to the EPIPHANY

WHERE

WHAT WHY

DEMO

Page 23: Big Data Science at the Digital Catapult

DEMO

Page 24: Big Data Science at the Digital Catapult

RECAP OF BENEFITS

COST SPEED

AGILITY CAPABILITY

Page 25: Big Data Science at the Digital Catapult

LAST WORDS OF WISDOM

NOT ALL ROADS LEAD TO ROME

TIME VALUE OF DATA KNOWLEDGE IS POWER

I AM AN INDIVIDUAL

Page 26: Big Data Science at the Digital Catapult

“The price of light is far less than the cost of darkness”