Demystify Big Data, Data Science & Signal Extraction Deep Dive

80
Demystify

Transcript of Demystify Big Data, Data Science & Signal Extraction Deep Dive

Page 1: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Demystify

Page 2: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Technology Basics

Big Data Overview & Snapshot

Big Data Architecture : Deep Dive

Hadoop Overview

Clear Understanding of Data Science

Big Data Career Opportunities

Q & A

1

What we will cover in the 60 mins

2

3

4

5

6

7

Page 3: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Apart from that we will also cover …

• An overview of the shift to Data Science Platforms

• The 3 critical components of a Data Science platform

• Industries that are most likely to get disrupted and shift to Data Science

• Characteristics of firms that get left behind the Data Science wave

• Factors that push an industry towards Data Science

• A brief overview of aspects of platform architecture beyond technology

Page 4: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Who am I ?

• Mahesh Kumar CV is A Big Data Entrepreneur

• Mahesh got about 14 years of experience in architecting and developing distributed and real-time data-driven systems.

• Specialties: Translating big data into action, Big Data Trainings, Product Engineering Services, and Building Big Data CoE & Big Data Incubators

• Written more than 60 Blogs in Big Data & SAP Analytics

• Worked in the past with IBM, Mindtree, CSC & Rolta companies

• Conducted couple of Boot camps & Workshops in Different companies

Page 5: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Data Vs Information

• Data refers to a collection of numbers, characters and is a relative term;

• Data is Raw, Facts , Figures etc

• Information is Process Data

Page 6: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Structure Data Vs Unstructured Data

Page 7: Demystify Big Data, Data Science & Signal Extraction Deep Dive
Page 8: Demystify Big Data, Data Science & Signal Extraction Deep Dive

So where is this data getting generated ?

Social Networking and Media:

700 million Facebook users, 250 million Twitter users

175+ million public blogs

Each Facebook update, Tweet, blog post and comment creates multiple new data points, both structured, semi-structured and unstructured

Mobile Devices:

5 billion mobile phones in use worldwide

Each call, text and instant message is logged as data

particularly smart phones and tablets, also make it easier to use social media

Internet Transactions:

Billions of online purchases, stock trades and other transactions happen every day, including countless automated transactions

Each creates a number of data points collected by retailers, banks, credit cards, credit agencies and others

Networked Devices and Sensors:

Electronic devices of all sorts – including servers and other IT hardware, smart energy meters and temperature sensors -- all create semi-structured log data that record every action

Page 9: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Build Vs Buy

HUMAN DRIVEN

EMAIL

WEB LOGS

DOCUMENTS

SOCIAL

MACHINE DRIVEN

SATELLITE IMAGES

BIO- INFORMATICS

M2M LOG FILES

SENSORS

VIDEO

AUDIO

BUSINESS DRIVEN

OLTP

ALL DATA TYPES

1X 10X 100X

BIG DATA TODAY

BIG DATA TOMORROW

Page 10: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Defining Big Data

Any amount

of data

that's too

BIG

to be handled by one computer

John Rauser

Page 11: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Why Big Data

12 TB of Tweets in a Day

80% Of world’s data is unstructured

30 billion pieces of content shared on Facebook every

month

Expected Data in 2020 would be 35 ZB

5 Million Trade events per second

2267 Billion Internet Users

4.7 billion searches on Google per day

5 Billion people tweet,text,call and browse

on mobile phones daily

Walmart handles 1 Million transaction per hour

255 Million Websites

Page 12: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Enterprise Data Landscapes

Operational

Warehouses

Marts

Dimensional

Semantic

Information

Oracle DB2 SQL Other

BW TeraData Netezza

Mart Mart Mart

OLAP OLAP

IQ

Universe

? Queries Ad-Hoc Dashboard

E

T

L

Applications

Reports

OLAP

Mart Mart Mart

OLAP

Mart

Unstructured

Data

Page 13: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Big Data Reference Architecture

Structured Data Sources Data Integration (Batch /

Near real-time) Data Repositories

MDM

End User Analytics

Reports / Dashboards

Unstructured/Semi-

structured Data Sources

Web logs, Application /

Network log, Social, Chat

transcripts, Emails

Legacy applications, ERP

and CRM applications

Data Extraction

External feeds

Instrumentation data /

Sensors, RFID, Telematics,

Time and Location data

Real-time Streaming/Integration

Data Cleaning and

Transformation

Change Data Capture for

Structured Data

Change Data Capture

ODS

Analytics

Data Warehouse

DW Appliances

Data Marts

MOLAP Cube In-memory Databases

Unstructured / Semi-

structured data

Scorecards and Metrics

Events and Alerts

Data Mining and Exploration

Predictive Analytics

Text Analytics

Visual Exploration

Mobile BI

Columnar Databases

Page 14: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Columnar

Databases

Structured Data Sources Data

Integration Data Repositories

MDM

End User Analytics

Reports

Unstructured/Semi-

structured Data Sources

Web logs, Application / Network

log, Social, Chat transcripts,

Emails

Legacy and ERP

Data

Extraction,

Transformation

External feeds

Instrumentation data / Sensors,

RFID, Telematics, Time and

Location data

Real-time Streaming /

Integration

Data

Quality

CDC for

Structured

data

Change

Data

Capture

ODS

DW

DW

Appliance

Data

Marts

MOLAP

Cube

In-memory

Databases

Unstructured /

Semi-structured

Scorecards /

Metrics

Events /

Alerts

Data

Mining

Predictive

Analytics

Text

Analytics

HANA / BW

/ Sybase

SAP HANA Dash

boards

BO WebI /

Crystal

Reports

BO dashboard

Data

Exploration

Mobile

BI

SAP HANA

Sybase IQ /

HANA BO Mobile

SAP HANA/

Sybase

RDS /

Rapid

Marts

SAP BW SAP Lumira

SAP Predictive

Analysis

Analytics

Hadoop

Platform

BO CMS

SAP HANA

/ SAP BW SAP MDM

SA

P B

O

Da

ta S

erv

ice

s

3rd Party

3rd Party

SAP HANA

Big Data Reference Architecture

SAP

Page 15: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Columnar

Databases

Structured Data Sources Data

Integration Data Repositories

MDM

End User Analytics

Reports

Unstructured/Semi-

structured Data Sources

Web logs, Application /

Network log, Social, Chat

transcripts, Emails

Legacy Applications

and ERP

Data

Extraction

External feeds

Instrumentation data /

Sensors, RFID, Telematics,

Time and Location data

Real-time Streaming

Data

Quality

CDC for

Structured

Data

CDC for

Unstructured

Data

Hadoop

Platform

ODS

Data

Warehouse

DW

Appliance

Data

Marts

MOLAP

Cube

In-memory

Databases

Semi /

Unstructured

Scorecards /

Metrics

Events /

Alerts

Predictive

Analytics

Text

Analytics

Content

Analytics

Info

Sphere

Info

rmation S

erv

er

Dash

boards

Cognos B

uis

ness Inte

lligence

Ente

rprise

Visual

Exploration

Mobile

BI

Cognos

TM1

Cognos

Mobile

Pure

Data

(Nete

zza,

Info

Sphere

Ware

house)

Cognos TM1

InfoSphere

Data Explorer

SPSS

Premium

SPSS

Content

Analytics InfoSphere Streams

InfoSphere

CDC

Analytics

Sandbox

Big Insights /

Streams

Big Insights

InfoSphere

MDM

Big Insights /

NoSQL

Big Insights /

HBase

Pu

reD

ata

(N

ete

zza

,

Info

Sp

he

re W

are

ho

use

,

ISA

S)

Big Data Reference Architecture

IBM

Page 16: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Columnar

Databases

Structured Data Sources Data

Integration Data Repositories

MDM

End User Analytics

Reports

Unstructured/Semi-

structured Data Sources

Web logs, Application /

Network log, Social, Chat

transcripts, Emails

Legacy Applications

and ERP

Data

Extraction

External feeds

Instrumentation data /

Sensors, RFID, Telematics,

Time and Location data

Real-time Streaming

Data

Quality

CDC for

Structured

Data

CDC for

Unstructured

Data

Hadoop

Platform

ODS

Data

Warehouse

DW

Appliance

Data

Marts

MOLAP

Cube

In-memory

Databases

Semi /

Unstructured

Scorecards /

Metrics

Real Time

Decision Mgt.

Data

Mining

Predictive

Analytics

Text

Analytics

Data

Integrator

Exadata Dash

boards

BI Publisher

OBI Foundation

Suite

Visual

Exploration

Mobile

BI

Exalytics

OBI Mobile

Ora

cle

/ E

xa

da

ta

Oracle /

Exadata

Essbase /

Hyperion

Exalytics

OBI Scorecard

Exa

lytics +

Ora

cle

R E

nt.

Endeca Oracle Golden Gate

Analytics

Sandbox Exalytics

Hadoop /

Golden Gate

Big Data

Appliance

Oracle MDM

Big Data

Appliance

Exadata EHCC

/ HBase

Silver Creek

Data Integrator

/ Golden Gate

Real-time

Decisions

Big Data Reference Architecture

ORACLE

Page 17: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Big Data Reference Architecture

Informatica+EMC+SAS

Columnar

Databases

Structured Data Sources Data

Integration Data Repositories

MDM

End User Analytics

Reports

Unstructured/Semi-

structured Data Sources

Legacy Applications

and ERP

Data

Extraction

External feeds

Instrumentation data /

Sensors RFID, Telematics,

Time and Location data

Real-time Streaming

Data

Quality

CDC for

Structured

Data

CDC for

Unstructured

Data

Hadoop

Platform

ODS

Data

Warehouse

DW

Appliance

Data

Marts

MOLAP

Cube

In-memory

Databases

Semi /

Unstructured

Scorecards /

Metrics

Data

Exploration

Predictive

Analytics

Text

Analytics

Info

rma

tica

Po

we

rCe

nte

r &

Da

ta Q

ua

lity

EMC GreenPlum Dash

boards

SAS BI

Visual

Exploration

Mobile

BI

SAS Visual

Analytics

SAS BI

EM

C G

ree

nP

lum

Da

tab

ase

EMC GreenPlum

SAS OLAP

Server

SAS Visual

BI

SAS Ent.

Miner

SAS Strategy

Mgt

JMP Pro

SAS Text

Miner

Informatica PowerCenter – Real-time edition

Analytics

Sandbox EMC GreenPlum

UAP

Informatica

hParser /

Hadoop Pwx

EMC

Greenplum HD

EMC

GreenPlum

HD

HBase

Informatica

MDM

Web logs, Application /

Network log, Social, Chat

transcripts, Emails

Page 18: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Big Data Reference Architecture Open Source Technologies

Columnar

Databases

Structured Data Sources Data

Integration Data Repositories

MDM

End User Analytics

Reports

Unstructured/Semi-

structured Data Sources

Legacy Applications

and ERP

Data

Extraction

External feeds

Instrumentation data /

Sensors RFID, Telematics,

Time and Location data

Real-time Streaming

Data

Quality

CDC for

Structured

Data

CDC for

Unstructured

Data

Hadoop

Platform

ODS

Data

Warehouse

DW

Appliance

Data

Marts

MOLAP

Cube

In-memory

Databases

Semi /

Unstructured

Scorecards /

Metrics

Predictive

Analytics

Text

Analytics

Ap

ac

he

Ma

pR

ed

uc

e, P

ig,

Ta

len

d D

ata

In

teg

rati

on

& D

ata

Qu

ality

Commercial

Product

Dash

boards

Visual

Exploration

Mobile

BI

Apache Derby

PentahoMob

ile BI

MyS

QL

, A

pa

ch

e

Hiv

e

MySQL, Hive

SAS OLAP

Server

R, Apache

Mahout

SAS Text

Miner

Apache Flume

Analytics

Sandbox Apache HDFS +

R

Apache

Hadoop

HBase,

NoSQL HBase

Talend MDM

Web logs, Application /

Network log, Social, Chat

transcripts, Emails

Pe

nta

ho

Bu

sin

es

s A

na

lyti

cs

, B

I

Page 19: Demystify Big Data, Data Science & Signal Extraction Deep Dive

What is Hadoop

• It’s a framework for large-scale data processing:

• Inspired by Google’s architecture:

• A top-level Apache project – Hadoop is open source

• Written in Java, plus a few shell scripts

• An open-source software framework that supports data-intensive distributed applications

• Abstract and facilitate the storage and processing of large and rapidly growing data sets

• Structured and non-structured data

• Simple programming models

Page 20: Demystify Big Data, Data Science & Signal Extraction Deep Dive

2 key components of Core Hadoop

Page 21: Demystify Big Data, Data Science & Signal Extraction Deep Dive

• Yahoo! : More than 100,000 CPUs in ~20,000 computers running Hadoop; biggest cluster: 2000 nodes

(2*4cpu boxes with 4TB disk each); used to support research for Ad Systems and Web Search

• AOL : Used for a variety of things ranging from statistics generation to running advanced algorithms for

doing behavioral analysis and targeting; cluster size is 50 machines, Intel Xeon, dual processors, dual

core, each with 16GB Ram and 800 GB hard-disk giving us a total of 37 TB HDFS capacity.

• Facebook: To store copies of internal log and dimension data sources and use it as a source for

reporting/analytics and machine learning; 320 machine cluster with 2,560 cores and about 1.3 PB raw

storage;

• FOX Interactive Media : 3 X 20 machine cluster (8 cores/machine, 2TB/machine storage) ; 10 machine

cluster (8 cores/machine, 1TB/machine storage); Used for log analysis, data mining and machine

learning

• NetSeer - Up to 1000 instances on Amazon EC2 ; Data storage in Amazon S3; Used for crawling,

processing, serving and log analysis

• Powerset / Microsoft - Natural Language Search; up to 400 instances on Amazon EC2 ; data storage

in Amazon S3

Hadoop uses every where

Page 22: Demystify Big Data, Data Science & Signal Extraction Deep Dive

HDFS : High level architecture

• HDFS Follows a master-slave architecture

• 2 Major Daemons in HDFS – • Name Node • Data Node

• Master : Name Node • Responsible for namespace and metadata • Namespace : file hierarchy • Metadata : ownership, permissions, block locations etc

• Slave : DataNode • Responsible for storing actual data blocks

Page 23: Demystify Big Data, Data Science & Signal Extraction Deep Dive

MapReduce : High Level Architecture

• Map reduce has a master slave architecture too

• 2 Daemon processes

• Master : Job Tracker • Responsible for dividing, scheduling and monitoring work

• Slave : Task Tracker • Responsible for actual processing

Page 24: Demystify Big Data, Data Science & Signal Extraction Deep Dive

High Level View

Page 25: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Apache Hadoop Ecosystem

Page 26: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Disruptions

Page 27: Demystify Big Data, Data Science & Signal Extraction Deep Dive

1 Japanese dating app

Page 28: Demystify Big Data, Data Science & Signal Extraction Deep Dive

2.Heart implants

Page 29: Demystify Big Data, Data Science & Signal Extraction Deep Dive

MOOC 3

Page 30: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Sensored cows in Netherland

Page 31: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Googles autonomous car

Page 32: Demystify Big Data, Data Science & Signal Extraction Deep Dive

What's common to the following game changing solutions ?

1

2

3

4 5

Japanese dating app

Sensored cows in Netherland Googles autonomous car

MOOC

Heart implants

Page 33: Demystify Big Data, Data Science & Signal Extraction Deep Dive

At the core there is a deep embedded DATA PRODUCT !

Page 34: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Created by DATA SCIENCE !

Conquer the world ! Become Data Scientist

Page 35: Demystify Big Data, Data Science & Signal Extraction Deep Dive

• How our health gets cared for ?

• How we learn ?

• How we fall in love ?

• How we do farming ?

• How we drive ?

The world around is changing… Our lives are intimately Surrounded by Data products (an intimate fabric of our lives)

Page 36: Demystify Big Data, Data Science & Signal Extraction Deep Dive

• Amazon Defeated Borders ( Books )

• Netflix Defeated Blockbuster ( Video )

• iTunes Defeated Tower records ( Music )

• Google defeated Yahoo ( Search ) – Page rank algorithm

How did the following players disrupt the Marketplace ?

Page 37: Demystify Big Data, Data Science & Signal Extraction Deep Dive

If Data Science is not integral you are no longer in the game

Page 38: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Demystifying

Data Science ( in simple plain everyday English )

Page 39: Demystify Big Data, Data Science & Signal Extraction Deep Dive

In a Nutshell

• Data Science is the extraction of knowledge from data

• Data Science is the art of turning data into actions

• The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it

• Data Science seeks to • Extract meaning from data

• Create " Data Products"

• Use all available data to tell a valuable story to non- practioners

The future belongs to the companies and people that turn data into products

Page 40: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Data Science is every where

Page 41: Demystify Big Data, Data Science & Signal Extraction Deep Dive

41 Known Unknowns

(BI)

Unknown Unknowns

( Data Science )

Lots of $ impacting patterns

Unnoticed

Waiting to be discovered!

Data Science vs. BI

Page 42: Demystify Big Data, Data Science & Signal Extraction Deep Dive

“As is” state in most organizations

Data

( Sales , Finance )

Reports

( BO, Cognos, MSAS )

Page 43: Demystify Big Data, Data Science & Signal Extraction Deep Dive

“As is” stage with leading game changers

Data repository

Insights

Analytics cell + Modeling processes

( Segment, Score, Text mine )

Move from Reports Insightful Actions that Impact

Page 44: Demystify Big Data, Data Science & Signal Extraction Deep Dive

What's are 4 core differences between Data Science & Dashboards ?

Data repository

Dashboards

Data repository (Purchase habits)

Signal (Similiar people discovery)

ML process (Collaborative filtering)

Actions (Recommend a product )

Outcomes (Improve cross sell)

2

3

4

Dashboards

1

ML + Signals + Actions = Game Changing Outcomes

Page 45: Demystify Big Data, Data Science & Signal Extraction Deep Dive

What exactly is an model ?

• Mathematically defining a real world phenomena

• Representative of real world

• For example cross sell model

Page 46: Demystify Big Data, Data Science & Signal Extraction Deep Dive

What are 3 common things between predictive models and caricatures ?

• Its an approximation, not a perfection

• Its better than not having anything

• It get the job done

REAL WORLD

ANALYTICAL MODEL

Page 47: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Use data to discover Signals (patterns) that cause changes that impacts $ .

What's the Goal of Data Science ?

Page 48: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Data Science Reference Architecture – Key components

Hadoop

Hive

Hana

Info bright

Clustering

Text mining

Mobile

Digital

Data Ingestion Pipeline

Page 49: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Machine Learning Reference Architecture

STORE ( Hadoop, Hive, HANA, Cloudera, Splunk, Hortonworks)

SENSE ( signal extraction- text mining, scoring models ),

RESPOND ( Front line actions thru website, call centre )

1

2

3

Page 50: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Snapshot of Machine Learning Techniques

1. Segmentation

3.Forecasting

5. Scoring models

2.Text mining

4. Visual Analytics

6.Optimisation

1. Customer behavior segmentation

2. Defect segmentation

3. Employee segmentation model

4. Supplier segmentation mode

5. “Chunking” groups

6. Discovered by algorithm

1. Convert messy unstructured text into actionable signals

2. Keyword frequencies

3. Sentiment ratios

4. Blogs

5. Call center transcripts

6. Emails

7. Multi channel sentiment analysis

1. Predict CLTV

2. Predict Sales at a neighborhood outlet

3. Predict Salary based on experience, qualification,

rating, market demand

4. Identify drivers of behavior

5. Weights processing

1. Beyond line, bar , pie charts

2. Geospatial modeling to see geo correlation

3. Spread analysis

4. Outlier detection

1. Churn propensity

2. Cross sell

3. Attrition modeling in HR

4. Risk scoring models in Banking

5. Logistic

6. Neural networks

7. Decision trees

8. Support Vector machines

1. Constraint modeling

2. Maximize an outcome

3. Maximize sales without cannibalizing sister brands

Page 51: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Its all about DETECTING PATTERNS !

Page 52: Demystify Big Data, Data Science & Signal Extraction Deep Dive

1. Segmentation

Page 53: Demystify Big Data, Data Science & Signal Extraction Deep Dive

2. Unstructured Text Mining

Page 54: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Real world Unstructured text mining in health care

Doctors transcripts

Split sentences

onto

words/tokens

Step-1 : SPLIT

Filter “noise”

words eg : I ,

the, is, was,

Step-2 : FILTER ‘Pulmonary’=

‘pulmonar’

‘Insomnia’ = ‘Sleep’ =

‘Sleeplessnes;

Step-3 : STEMMING

Keyword extraction &

Theme generation

Step-4 : THEME EXTRACTION

Step-5 : THEME /

KEYWORD ANALYSIS

Lab diagnostics Nurses Observations

Cardiac

watch list

Oncology

watch list

Pulmonary

watch list

Diabetic

watch list

Schizophreni

a watch list

Page 55: Demystify Big Data, Data Science & Signal Extraction Deep Dive

3. Scoring Models

Page 56: Demystify Big Data, Data Science & Signal Extraction Deep Dive

4. Forecasting !

Page 57: Demystify Big Data, Data Science & Signal Extraction Deep Dive

5. Recommenders

Page 58: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Industries disrupted by Data Science

• Infrastructure optimisation, Network security Telecom

• Customer sentiment, Multi channel analysis Banking

• Consumer engagement, Recommendation engines Digital channel

• Autonomous cards, Fords OnStar Automotive

• Wearables Health care

• Operations optimisation Oil n Gas

• Digitisation Retail

Page 59: Demystify Big Data, Data Science & Signal Extraction Deep Dive

What factors are driving companies towards data science ?

• Competitive advantage in the market place ( get ahead fast using unique insights )

• Existential threat ( others are moving ahead fast and I need to catch up )

• Revenue enhancement ( Cross sell models, recommenders )

• Cost optimisation ( Operational efficiency )

Page 60: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Technology behind Data Science

Algorithams

Machine learning

Predictive

analytics

R

Page 61: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Why is Big Data HOT ?

Page 62: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Big Data jobs are Exploding!

Page 63: Demystify Big Data, Data Science & Signal Extraction Deep Dive
Page 64: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Data Science jobs are Exploding!

Page 65: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Data Science Jobs exploding in India too !

Page 66: Demystify Big Data, Data Science & Signal Extraction Deep Dive

1

2

3

Page 67: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Transform yourself to 21st Century Skills

Page 68: Demystify Big Data, Data Science & Signal Extraction Deep Dive

The 6 Most Desired Skills in 2015

Page 69: Demystify Big Data, Data Science & Signal Extraction Deep Dive

1

2

3

To summarize 3 key takeaways …

Page 70: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ

Page 71: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ-1: “I am confused between Hadoop and Data Science … What's difference between Hadoop and Data Science?”

• Hadoop = Data Infrastructure layer

• Data Science = Sensing patterns from data to impact business outcome

Page 72: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ-2 : “I have worked on SAP, Oracle, etc How do I transition to becoming a Data Scientist ?”

• Execute your first Data Science pilot • Step-1 : Learn R

• Step-2 : Zero in on a business problem to solve

• Step-3 : Setup R Your technology connector …Get access to data from your Technology

• Step-4 : Apply an Analytical construct ( VEDA ML )

• Step-5 : Discover the pattern which impacts the outcome

• Step-6 : Present final results to executive business team

• Explore setting up a Data science project within existing organisation

• Meetups to explore the outside world

Page 73: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ-3: “Should I know probability and advanced statistics ?”

• Not really

• We are focussed on APPLICATION and not THEORY underpinning it

• We will teach you • Business problem to solve

• How to execute the command on a platform

• What to look for in the output

• What happens within the black box can be seen later

Page 74: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ-4: “This is a big shift for me … In your experience how long does it take to make the transition from IT to Data Science ?”

• We have seen people make the transition from 4 weeks to about 6 months

• It depends upon the time + passion + drive you have

Page 75: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ-5: “How are we going to prepare you for the data science job market ?”

1. Mock preparatory sessions

2. Worksheets + Modelling Checklists + Data Science Playbooks

3. Live projects on clustering , scoring which can be put in resume

4. Our strategic tie-ups with Organisations looking for data science skills

5. Top 30 Practitioner generated Data Science questions

Page 76: Demystify Big Data, Data Science & Signal Extraction Deep Dive

FAQ-6: “I am not an IT professional but a domain person. How can I get started ?”

1. Option-1 : Focus on Industry use cases

2. Option-2 : Take basic introduction to data sciences

Page 77: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Your Turn : Happy to Answer your Questions

Page 78: Demystify Big Data, Data Science & Signal Extraction Deep Dive

Big Data Resources • datasciencecentral.com

• bigdatauniversity.com

• Courseera.com

• Big Data Architecture

• Spotting Signals in Big Data

• Signal Extraction Methodology

• Advanced Visualization in Big Data

• Exploratory Data Analysis (EDA) : Quick Deep Dive

• Best practices in designing dashboards and scorecards

• Exploring Big Data Using Bivariate Analysis

• Where to start looking in Big Data using Univariate Analysis

• Big Data Platform & Applications

• Statistics Role in Data Science

• Applied Mathematics Role in Data Science

• Data-Scientist-playbook

• 5-disruption-data-products By Data Science

Page 79: Demystify Big Data, Data Science & Signal Extraction Deep Dive

All The Best Happy Hadooping & Dating with Data Science

Conquer the world ! Become Data Scientist

Page 80: Demystify Big Data, Data Science & Signal Extraction Deep Dive