Demystifying big data

Post on 12-Apr-2017

99 views 0 download

Transcript of Demystifying big data

Demystifying Big Data

Brown Bag

Everything start small

Traditional Approach

Simple Process

Result

What’s next?Unanswered question of lifetime.

Unquenchable thirst of improvement

❏ How to Sell more?

❏ How to optimize inventory?

❏ How to engage customer more?

❏ What do my customer Like?

❏ How to reduce Operation Cost?

Torture the data, and it will confess to anythingRonald Coase

How to get Data?Humans…..

Ever Growing Data ❏ Historical data plays important role.

❏ Data explodes while processing.

❏ More data beats better algorithms.

So What is Big Data?When data has tendency to grow more than what one machine can process.

Getting Right Tool

Data Parallel Processing❏ Distribute the data [ With replication]

❏ Move Computation close to Data

❏ Process each section of Data separately

❏ Aggregate the results.

Advantages of Data Parallel Model

❏ No Hardware restriction. e.g Memory, CPU.

❏ No Scalability Issue

❏ Cost effectiveness.

❏ No Single point of failure.

That’s nice, So problem solved. But Presentation says Hadoop,Spark?

Challenges of Data-||-sim ❏ Data partitioning, distribution and accumulation

❏ Fault Tolerance.

❏ Distributed Coordination and management.

❏ Abstraction with the distributed complexity.

Big Data Ecosystem ❏ Distributed Data Storage System:

❏ Data distribution.❏ Data Replication.❏ High throughput with no single point of failure.

❏ Distributed Data Processing System:❏ Distributing Code close to data.❏ Abstracting distributed complexity from programmer.❏ Fault tolerance and handling computation failure.❏ Aggregating results.

❏ Distributed Coordination and Resource management.❏ Resource allocation.❏ Distributed configuration management.

Distributed Data Storage System

Distributed Data Processing System

Distributed Coordination and Resource management.

Lambda Architecture

How to Sell more?Recommendation.

Speed Layer

2. Product Views

1. Web Log

3. Similar Product

4. Update user product recommendation

How to optimize inventory?Predication

Batch Layer

1. User Data

2. Location Cluster per item

3. Location Cluster per item Data

3. Current Warehouse inventory

4. Inventory transfer.