Big Data Architecture - پرشین اسکریپت

49
Big Data Architecture & Business Value TAHEREH SAHEB PHD IN STS FROM RPI. NY ASSISTANT PROFESSOR AT TARBIAT MODARES UNIVERSITY

Transcript of Big Data Architecture - پرشین اسکریپت

Page 1: Big Data Architecture - پرشین اسکریپت

Big Data Architecture & Business ValueTAHEREH SAHEB

PHD IN STS FROM RPI. NY

ASSISTANT PROFESSOR AT TARBIAT MODARES UNIVERSITY

Page 2: Big Data Architecture - پرشین اسکریپت

Big data is an enterprise asset and

needs to be managed as an

integrated element of your current

Enterprise Architecture

Page 3: Big Data Architecture - پرشین اسکریپت

کشف الگوها، پیش بینی آ ینده و ارایه تحلیل های تجویزی: بیگ دیتا

Page 4: Big Data Architecture - پرشین اسکریپت
Page 5: Big Data Architecture - پرشین اسکریپت
Page 6: Big Data Architecture - پرشین اسکریپت
Page 7: Big Data Architecture - پرشین اسکریپت

Big data adds depth to our analysis of

events

through providing broader and deeper data

Page 8: Big Data Architecture - پرشین اسکریپت
Page 9: Big Data Architecture - پرشین اسکریپت

Big data adds depth to our analysis of

events

through providing broader and deeper data

Page 10: Big Data Architecture - پرشین اسکریپت
Page 11: Big Data Architecture - پرشین اسکریپت
Page 12: Big Data Architecture - پرشین اسکریپت
Page 13: Big Data Architecture - پرشین اسکریپت

Big data adds depth to our analysis of

events

through providing broader and deeper data

Page 14: Big Data Architecture - پرشین اسکریپت

•Current State

•Envisioning the future

•Advanced Analytical Inquiries

•Data Type

•Data Characteristics

•Where to get data?

•Business Process

4-Design Big Data

Architecture

3- Draw Business

Information Map or Data Flow DIagram

2-Determine Business Drivers & Problems

1-Envision

Page 15: Big Data Architecture - پرشین اسکریپت

Before designing a big data architecture

Page 16: Big Data Architecture - پرشین اسکریپت

To answer these questions, use a structured approach for evaluating

the viability of a big data solution according to the dimensions shown in the following figure

Business value from the insight that might be gained

from analyzing the data

Governance considerations for the new sources of

data and how the data will be used

People with relevant skills available and commitment

of sponsors

Volume of the data being captured

Variety of data sources, data types, and data formats

Velocity at which the data is generated, the speed

with which it needs to be acted upon, or the rate at

which it is changing

Veracity of the data, or rather, the uncertainty or

trustworthiness of the data

Page 17: Big Data Architecture - پرشین اسکریپت

Phase one: Vision

When we paint a vision of a future architecture, we start with a

basic understanding of our current state and we begin to

speculate on how it might evolve.

A challenge in many organizations is that at inception, the

technical vision may not be aligned to the business vision.

In fact, business visionaries and potential sponsors must drive the

technical vision.

So we must understand current business utilization of our current

information architecture and how that could change in the future.

The vision phase is mostly about gathering requirements and

exploring the art of the possible.

Page 18: Big Data Architecture - پرشین اسکریپت

Phase 2: Determine Business Drivers & Problems

The next phase of the methodology takes a much deeper

look at

1- look at the business drivers & Problems

2- to map the business problem to its big data type.

Page 19: Big Data Architecture - پرشین اسکریپت

Business Problem

Big Data Type

Big Data Characteristics

Page 20: Big Data Architecture - پرشین اسکریپت

Big Data Business Drivers

Some of the key business drivers:

Operational effectiveness

Delivering a better

customer service

Improved Innovation

Improved company growth

Page 21: Big Data Architecture - پرشین اسکریپت

Business Problem: Predicting power consumption

Page 22: Big Data Architecture - پرشین اسکریپت

Mapping the business problem to its big data type

DescriptionBig Data TypeBusiness Problem

Utility companies have rolled out smart meters to measure

the consumption of water, gas, and electricity at regular

intervals of one hour or less. These smart meters generate huge volumes of interval data that needs to be analyzed.

--------------------------------------

Utilities also run big, expensive, and complicated systems

to generate power. Each grid includes sophisticated

sensors that monitor voltage, frequency, and other important operating characteristics.

---------------------------------------

A big data solution can analyze power generation

(supply) and power consumption (demand) data using

smart meters.

Machine-generated dataUtilities: Predict

power consumption

کنتورهای هوشمند و اندازه گیری مصرف

سنسورها به منظور اندازه گیری ولتاژ و سایر خصوصیات عملیاتی

Business Drive : Operating efficiency

Business Problem: Predicting power

consumptionBig Data Type: data delivered by the sensors

Page 23: Big Data Architecture - پرشین اسکریپت

Customer Churn!!!!

Page 24: Big Data Architecture - پرشین اسکریپت

Telecommunication

DescriptionBig Data TypeBusiness Problem

Telecommunications operators need to

build detailed customer churn models that include social media and transaction data.

Web and social

data

Customer Churn

The value of the churn models depends on

the quality of customer attributes

(customer master data such as date of

birth, gender, location, and income) and the social behavior of customers.

Telecommunications providers who

implement a predictive analytics strategy

can manage and predict churn by

analyzing the calling patterns of

subscribers.

Transaction dataتهیه مدلهای ریزش

مشتریان بر اساس داده

های شبکه های اجتماعی و

تراکنش

ارزش مدلهای ریزش بستگی به

مستر ) کیفیت خواص مشتریان

دیتا مانند تاریخ تولد، جنسیتو

و رفتار اجتماعی مشتریان ( درامد

.دارد

تحلیل پیش بینی کننده به منظور مدیریت و پیش بینی ریزش با تحلیل الگوهای زنگ زدن مشتریان

Page 25: Big Data Architecture - پرشین اسکریپت

Fraud Detection

Page 26: Big Data Architecture - پرشین اسکریپت

Financial Services& Health Care

DescriptionBig Data TypeBusiness Problem

.

Machine-generated data

Fraud Detection

Solutions are typically designed to detect and prevent

myriad fraud and risk types across multiple industries, including

Transaction data

Credit and debit payment card fraud

Deposit account fraud

Technical fraud

Bad debt

Healthcare fraud

Medicaid and Medicare fraud

Human-generated

Page 27: Big Data Architecture - پرشین اسکریپت

Business Problem

Big Data TypeBig Data

Characteristics

Page 28: Big Data Architecture - پرشین اسکریپت
Page 29: Big Data Architecture - پرشین اسکریپت

•Analysis type — Whether the data is analyzed in

real time or batched for later analysis.

A mix of both types may be required by the use

case:

Fraud detection; analysis must be done in real

time or near real time.

Trend analysis for strategic business decisions;

analysis can be in batch mode.

• Processing methodology — The type of technique

to be applied for processing data (e.g.,

predictive, analytical, ad-hoc query, and

reporting).

Analysis Type

Processing methodology

Page 30: Big Data Architecture - پرشین اسکریپت

Data Frequency and Size

How much data is expected

and at what frequency does it arrive.

Knowing frequency and size helps

determine the storage mechanism,

storage format, and the necessary

preprocessing tools.

Data frequency and size depend on

data sources:

•On demand, as with social media

data

•Continuous feed, real-time (weather

data, transactional data)

•Time series (time-based data)

Page 31: Big Data Architecture - پرشین اسکریپت

Data TypeContent Format

•Data type — Type of data to be processed —

transactional, historical, master data, and others.

Knowing the data type helps segregate the data in

storage.

• Content format — Format of incoming data —

structured (CRM, for example), unstructured

(audio, video, and images, for example), or semi-

structured.

Format determines how the incoming data needs to

be processed and is key to choosing tools and

techniques and defining a solution from a business

perspective

Page 32: Big Data Architecture - پرشین اسکریپت

Data Sources

•Data source — Sources of data (where the data is

generated) — web and social media, machine-

generated, human-generated, etc.

Identifying all the data sources helps determine the

scope from a business perspective. The figure shows the most widely used data sources.

Page 33: Big Data Architecture - پرشین اسکریپت

Data ConsumersHardware

•Data consumers — A list of all of the possible

consumers of the processed data:

• Business processes

• Business users

• Enterprise applications

• Individual people in various business roles

• Part of the process flows

• Other data repositories or enterprise

applications

•Hardware — The type of hardware on which the big

data solution will be implemented — commodity

hardware or state of the art. Understanding the

limitations of hardware helps inform the choice of big data solution

Page 34: Big Data Architecture - پرشین اسکریپت

Phase 3: Business Information Maps: data flow?

Once we understand what data our business

analysts need, we must figure out where we

should get the data & Which business

processes needs what kind of data?

Page 35: Big Data Architecture - پرشین اسکریپت

Example: fictitious manufacturer of luxury cars, Lux Motor Cars (LMC)

Lease vehicles in the United States, the following information is

uncovered:

• The variance between when a car is due for scheduled

service and when it is actually serviced is quite high. In other

words, drivers don’t seem to adhere to the service schedule as

closely as other

LMC owners do.

• The number of miles driven after a “Check Engine” light

came on was dramatically higher for LMC Lease vehicles. It

appears that lessees weren’t taking the “Check Engine” light very seriously.

Page 36: Big Data Architecture - پرشین اسکریپت

LMC is currently not able to put such a program into place. Much of the

data they need to run a program of this type is not currently available,

though this is not fully understood in the business.

So, we need to build a business information map describing the current

state so that all can better understand how this part of the business

operates with the data that it has today.

The good news is that LMC is ready to make a telematics investment.

Big Data Initiative: deploying a telematics system

Page 37: Big Data Architecture - پرشین اسکریپت

First Step: Data Sources

Page 38: Big Data Architecture - پرشین اسکریپت

Second Step: Data Flow

Page 39: Big Data Architecture - پرشین اسکریپت

deploying a telematics system

• Stakeholders will receive from the M&W system the

key

performance indicators (KPIs) that have been

heretofore

unavailable.

• Sensors will exchange data with the M&W

system including the interactive diagnostics that

include both driver alerts and on-demand logs

(interactive diagnostics).

• Lessee and / or drivers will receive near real-time

(RT) driver alerts and exchange messages to deal

with service scheduling

(near RT driver alerts and scheduling).

Page 40: Big Data Architecture - پرشین اسکریپت

Phase 4: Drawing future state Big Data architecture

We are ready to design our future state technical architecture and

the IT architecture team will engage extensively in this phase.

However, before a more detailed design is started, we might want to

first understand the skills we have in our organization and the

impact those skills (or lack of) might have in the architecture.

We’ll also want to clearly understand the good and bad things about

our current state architecture and how we might extend it through

the introduction of new software components and systems

Page 41: Big Data Architecture - پرشین اسکریپت

Current state of Information Architecture

In this example, the Enterprise Data

Warehouse (EDW) provides the

historic database of record.

Data is extracted from multiple OLTP systems (the ERP and CRM systems are pictured).

Only Structured Data!

Data marts surround the EDW.

Business analysts access

the marts and / or EDW using reporting and

ad hoc query and analysis tools.

Page 42: Big Data Architecture - پرشین اسکریپت

At the current state, the sources of data are limited to structured data!

But the business needs a deeper analysis of consumers, therefore needs a new set of external streaming, unstructured and semi-structured data!

A Business Problem: to better understand the success of promotions and sales

efforts.

Page 43: Big Data Architecture - پرشین اسکریپت

Data is gathered in Hadoop after being

captured when 1) shoppers buy items

and

2) browse on the web site

and 3) when they enter the brick-and

mortar stores.

4) Sentiment data that expresses the

shoppers’ opinion of doing business

with the company is gathered in

Hadoop from social media.

The streaming data landed in clusters

of NoSQL databases that can easily be

scaled for high-ingestion demands andthen loaded into Hadoop for analysis.

Page 44: Big Data Architecture - پرشین اسکریپت

We will next determine if there is a need to query and

analyze data residing in our traditional data warehouse

information architecture and the Hadoop cluster at the same time.

Page 45: Big Data Architecture - پرشین اسکریپت
Page 46: Big Data Architecture - پرشین اسکریپت

You want to be an on-demand Business?

Add a real-time recommendation

engine and why an event processing

engine to guide the shopper by presenting specific products in the web store while they are engaged

Page 47: Big Data Architecture - پرشین اسکریپت

Example of an event processing

For example, if sensors in the brick-and-mortar store

begin to detect delays in reaching cashiers and

dissatisfied customers abandoning the items they

wanted to buy, predefined rules might trigger devices

to signal cashiers who are engaged in other activities

to open up additional cash registers and alleviate the backup

Page 48: Big Data Architecture - پرشین اسکریپت
Page 49: Big Data Architecture - پرشین اسکریپت

Big Data Architecture