Big Data Architecture - پرشین اسکریپت
Transcript of Big Data Architecture - پرشین اسکریپت
Big Data Architecture & Business ValueTAHEREH SAHEB
PHD IN STS FROM RPI. NY
ASSISTANT PROFESSOR AT TARBIAT MODARES UNIVERSITY
Big data is an enterprise asset and
needs to be managed as an
integrated element of your current
Enterprise Architecture
کشف الگوها، پیش بینی آ ینده و ارایه تحلیل های تجویزی: بیگ دیتا
Big data adds depth to our analysis of
events
through providing broader and deeper data
Big data adds depth to our analysis of
events
through providing broader and deeper data
Big data adds depth to our analysis of
events
through providing broader and deeper data
•Current State
•Envisioning the future
•Advanced Analytical Inquiries
•Data Type
•Data Characteristics
•Where to get data?
•Business Process
4-Design Big Data
Architecture
3- Draw Business
Information Map or Data Flow DIagram
2-Determine Business Drivers & Problems
1-Envision
Before designing a big data architecture
To answer these questions, use a structured approach for evaluating
the viability of a big data solution according to the dimensions shown in the following figure
Business value from the insight that might be gained
from analyzing the data
Governance considerations for the new sources of
data and how the data will be used
People with relevant skills available and commitment
of sponsors
Volume of the data being captured
Variety of data sources, data types, and data formats
Velocity at which the data is generated, the speed
with which it needs to be acted upon, or the rate at
which it is changing
Veracity of the data, or rather, the uncertainty or
trustworthiness of the data
Phase one: Vision
When we paint a vision of a future architecture, we start with a
basic understanding of our current state and we begin to
speculate on how it might evolve.
A challenge in many organizations is that at inception, the
technical vision may not be aligned to the business vision.
In fact, business visionaries and potential sponsors must drive the
technical vision.
So we must understand current business utilization of our current
information architecture and how that could change in the future.
The vision phase is mostly about gathering requirements and
exploring the art of the possible.
Phase 2: Determine Business Drivers & Problems
The next phase of the methodology takes a much deeper
look at
1- look at the business drivers & Problems
2- to map the business problem to its big data type.
Business Problem
Big Data Type
Big Data Characteristics
Big Data Business Drivers
Some of the key business drivers:
Operational effectiveness
Delivering a better
customer service
Improved Innovation
Improved company growth
Business Problem: Predicting power consumption
Mapping the business problem to its big data type
DescriptionBig Data TypeBusiness Problem
Utility companies have rolled out smart meters to measure
the consumption of water, gas, and electricity at regular
intervals of one hour or less. These smart meters generate huge volumes of interval data that needs to be analyzed.
--------------------------------------
Utilities also run big, expensive, and complicated systems
to generate power. Each grid includes sophisticated
sensors that monitor voltage, frequency, and other important operating characteristics.
---------------------------------------
A big data solution can analyze power generation
(supply) and power consumption (demand) data using
smart meters.
Machine-generated dataUtilities: Predict
power consumption
کنتورهای هوشمند و اندازه گیری مصرف
سنسورها به منظور اندازه گیری ولتاژ و سایر خصوصیات عملیاتی
Business Drive : Operating efficiency
Business Problem: Predicting power
consumptionBig Data Type: data delivered by the sensors
Customer Churn!!!!
Telecommunication
DescriptionBig Data TypeBusiness Problem
Telecommunications operators need to
build detailed customer churn models that include social media and transaction data.
Web and social
data
Customer Churn
The value of the churn models depends on
the quality of customer attributes
(customer master data such as date of
birth, gender, location, and income) and the social behavior of customers.
Telecommunications providers who
implement a predictive analytics strategy
can manage and predict churn by
analyzing the calling patterns of
subscribers.
Transaction dataتهیه مدلهای ریزش
مشتریان بر اساس داده
های شبکه های اجتماعی و
تراکنش
ارزش مدلهای ریزش بستگی به
مستر ) کیفیت خواص مشتریان
دیتا مانند تاریخ تولد، جنسیتو
و رفتار اجتماعی مشتریان ( درامد
.دارد
تحلیل پیش بینی کننده به منظور مدیریت و پیش بینی ریزش با تحلیل الگوهای زنگ زدن مشتریان
Fraud Detection
Financial Services& Health Care
DescriptionBig Data TypeBusiness Problem
.
Machine-generated data
Fraud Detection
Solutions are typically designed to detect and prevent
myriad fraud and risk types across multiple industries, including
Transaction data
Credit and debit payment card fraud
Deposit account fraud
Technical fraud
Bad debt
Healthcare fraud
Medicaid and Medicare fraud
Human-generated
Business Problem
Big Data TypeBig Data
Characteristics
•Analysis type — Whether the data is analyzed in
real time or batched for later analysis.
A mix of both types may be required by the use
case:
Fraud detection; analysis must be done in real
time or near real time.
Trend analysis for strategic business decisions;
analysis can be in batch mode.
• Processing methodology — The type of technique
to be applied for processing data (e.g.,
predictive, analytical, ad-hoc query, and
reporting).
Analysis Type
Processing methodology
Data Frequency and Size
How much data is expected
and at what frequency does it arrive.
Knowing frequency and size helps
determine the storage mechanism,
storage format, and the necessary
preprocessing tools.
Data frequency and size depend on
data sources:
•On demand, as with social media
data
•Continuous feed, real-time (weather
data, transactional data)
•Time series (time-based data)
Data TypeContent Format
•Data type — Type of data to be processed —
transactional, historical, master data, and others.
Knowing the data type helps segregate the data in
storage.
• Content format — Format of incoming data —
structured (CRM, for example), unstructured
(audio, video, and images, for example), or semi-
structured.
Format determines how the incoming data needs to
be processed and is key to choosing tools and
techniques and defining a solution from a business
perspective
Data Sources
•Data source — Sources of data (where the data is
generated) — web and social media, machine-
generated, human-generated, etc.
Identifying all the data sources helps determine the
scope from a business perspective. The figure shows the most widely used data sources.
Data ConsumersHardware
•Data consumers — A list of all of the possible
consumers of the processed data:
• Business processes
• Business users
• Enterprise applications
• Individual people in various business roles
• Part of the process flows
• Other data repositories or enterprise
applications
•Hardware — The type of hardware on which the big
data solution will be implemented — commodity
hardware or state of the art. Understanding the
limitations of hardware helps inform the choice of big data solution
Phase 3: Business Information Maps: data flow?
Once we understand what data our business
analysts need, we must figure out where we
should get the data & Which business
processes needs what kind of data?
Example: fictitious manufacturer of luxury cars, Lux Motor Cars (LMC)
Lease vehicles in the United States, the following information is
uncovered:
• The variance between when a car is due for scheduled
service and when it is actually serviced is quite high. In other
words, drivers don’t seem to adhere to the service schedule as
closely as other
LMC owners do.
• The number of miles driven after a “Check Engine” light
came on was dramatically higher for LMC Lease vehicles. It
appears that lessees weren’t taking the “Check Engine” light very seriously.
LMC is currently not able to put such a program into place. Much of the
data they need to run a program of this type is not currently available,
though this is not fully understood in the business.
So, we need to build a business information map describing the current
state so that all can better understand how this part of the business
operates with the data that it has today.
The good news is that LMC is ready to make a telematics investment.
Big Data Initiative: deploying a telematics system
First Step: Data Sources
Second Step: Data Flow
deploying a telematics system
• Stakeholders will receive from the M&W system the
key
performance indicators (KPIs) that have been
heretofore
unavailable.
• Sensors will exchange data with the M&W
system including the interactive diagnostics that
include both driver alerts and on-demand logs
(interactive diagnostics).
• Lessee and / or drivers will receive near real-time
(RT) driver alerts and exchange messages to deal
with service scheduling
(near RT driver alerts and scheduling).
Phase 4: Drawing future state Big Data architecture
We are ready to design our future state technical architecture and
the IT architecture team will engage extensively in this phase.
However, before a more detailed design is started, we might want to
first understand the skills we have in our organization and the
impact those skills (or lack of) might have in the architecture.
We’ll also want to clearly understand the good and bad things about
our current state architecture and how we might extend it through
the introduction of new software components and systems
Current state of Information Architecture
In this example, the Enterprise Data
Warehouse (EDW) provides the
historic database of record.
Data is extracted from multiple OLTP systems (the ERP and CRM systems are pictured).
Only Structured Data!
Data marts surround the EDW.
Business analysts access
the marts and / or EDW using reporting and
ad hoc query and analysis tools.
At the current state, the sources of data are limited to structured data!
But the business needs a deeper analysis of consumers, therefore needs a new set of external streaming, unstructured and semi-structured data!
A Business Problem: to better understand the success of promotions and sales
efforts.
Data is gathered in Hadoop after being
captured when 1) shoppers buy items
and
2) browse on the web site
and 3) when they enter the brick-and
mortar stores.
4) Sentiment data that expresses the
shoppers’ opinion of doing business
with the company is gathered in
Hadoop from social media.
The streaming data landed in clusters
of NoSQL databases that can easily be
scaled for high-ingestion demands andthen loaded into Hadoop for analysis.
We will next determine if there is a need to query and
analyze data residing in our traditional data warehouse
information architecture and the Hadoop cluster at the same time.
You want to be an on-demand Business?
Add a real-time recommendation
engine and why an event processing
engine to guide the shopper by presenting specific products in the web store while they are engaged
Example of an event processing
For example, if sensors in the brick-and-mortar store
begin to detect delays in reaching cashiers and
dissatisfied customers abandoning the items they
wanted to buy, predefined rules might trigger devices
to signal cashiers who are engaged in other activities
to open up additional cash registers and alleviate the backup
Big Data Architecture