Big Data & the Cloud

43
Big Data Big Data and and The Cloud The Cloud Robert J. Abate, CBIP, CDMP Independent Consultant Webinar: March 20 th , 2012 2PM EST / 11AM PST

Transcript of Big Data & the Cloud

Page 1: Big Data & the Cloud

““ Big DataBig Data ”” and and ““ The CloudThe Cloud ””

Robert J. Abate, CBIP, CDMP

Independent Consultant

Webinar: March 20th, 2012

2PM EST / 11AM PST

Page 2: Big Data & the Cloud

2 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

““ Big DataBig Data ”” And And ““ The CloudThe Cloud ”” -- AgendaAgenda

The Industry Is A Buzz…

The Challenges Of Big

Data

Architectural Solutions &

The Cloud

It’s A Brave New World

Case Studies

Questions & Answers

Page 3: Big Data & the Cloud

The Industry Is A BuzzThe Industry Is A Buzz ……

“Despite the hype, most firms find the technology useful to operate on data they already have”

Source: Forrester, June 2011

Page 4: Big Data & the Cloud

4 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Everyone Is Talking About Big DataEveryone Is Talking About Big Data ……

“Big data will represent a hugely disruptive force during the next five years – enabling levels of insight – that are currently unachievable through any other means”

“Big Data: Huge Management Implications with Enormous Returns”

“Big data is still in mostly unchartered territory, but a surprise number is actually doing something with it”

“61% of respondents feel big data will fundamentally change the way their business works

“Most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies, potential application areas, and why ‘big data BI’ contrasts with traditional BI tools. It differs dramatically from traditional BI in terms of both capabilities and in the technologies used to achieve those capability breakthroughs”

Gartner: May 2011

Gartner: January 2012

IDC: March 2011

Forrester: June 2011

CIO/Insight: November 2010

Page 5: Big Data & the Cloud

5 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

What Are The Drivers For Big Data/CloudWhat Are The Drivers For Big Data/Cloud

We Are In The Information Age

�Every corporation today is in the “Data Business”

We Are Inundated In Data

�Types�Sources�Varieties

Data Is Growing Exponentially

�So are the challenges

Data Complexity Is Increasing

�Causing insight to be lost

Page 6: Big Data & the Cloud

6 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Pictorial Representation Of InformationPictorial Representation Of Information

Page 7: Big Data & the Cloud

7 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Big Data Is More Than Just VolumeBig Data Is More Than Just Volume

Consider:Consider: Master Data, Fidelity, Complexity, Validity, Perishability, Linking Data

Structured Data:Structured Data: POS transactions, call detail records, credit card transactions, shipping updates, purchase orders, payments, shipments, account transactions

Unstructured Data:Unstructured Data: Web logs, newsfeeds, social media, geo-location, mobile, consumer comments, claims, doctor’s notes, clinical studies, images, video, audio

DeviceDevice--generated Data:generated Data:RFID sensors, smart meters, smart grids, GPS spatial, micro-payments

Variety Complexity

Velocity Volume

Smart GridSmart GridImagesImages

AudioAudio

VideoVideo

Transactional Data

Transactional Data

TextText

DocumentsDocuments

Industry-specificIndustry-specific

Web trafficWeb traffic

Sensor/location-based

Sensor/location-based

SocialSocial

Page 8: Big Data & the Cloud

8 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Big DataBig Data ’’s Potential Is Limitlesss Potential Is Limitless

TODAY

Less than 10% of enterprises information

“Rear-view” mirror reporting, dashboards and analysis

� Days, weeks, months, or even quarters old

Incomplete, inaccurate, and disjointed data

Architectures and methods that take 6 to 18 months to exploit

TOMORROW

Vast majority of available sources and external data

Forward looking or “Windshield-view” predictions with recommendations� Real-time near real-time

Correlated, high confidence, governed data

Vastly accelerated time to market

Page 9: Big Data & the Cloud

9 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Time Really Is Money!Time Really Is Money!

DataLifecycle

“THE TIME VALUE CURVE”© 2007 - Dr. Richard Hackathorn, Bolder Technology, I nc., All Rights Reserved. Used with Permission.

Action Time

Valu

e L

ost

Action Time

Valu

e L

ost

Time

Value

Action

Business Event

Taken

Business Event

Taken

Capture Latency

AnalysisLatency

Decision Latency

Data Ready For Analysis

Information Delivered

Capture Latency

AnalysisLatency

Decision Latency

Data Ready For Analysis

Information Delivered

Page 10: Big Data & the Cloud

10 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Data Is Coming At Us FasterData Is Coming At Us Faster

In A Recent TDWI Survey Of 450 CIO’s

�17% have a real time data warehouse�90% plan on having a real time warehouse�75% will replace to get to a real-time solution

Big Data Projects Are Enterprise-Scale

�When asked:

““What Is The Scope OfWhat Is The Scope Of

Your Big Data Initiative?Your Big Data Initiative?””

5%

5%

8%

8%

8%

65%

Other

Regional

Project-based

Departmental

Line of business

Enterprise

Source: Forrester® June 2011 Global Big Data Online Survey

Page 11: Big Data & the Cloud

11 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Data Is Coming From All DirectionsData Is Coming From All Directions ……

Data is now commonly entering into

the enterprise from external sources

�Government (Census, Revenues, …)

�Neilson, NPD Group (Sales)

�Bloomberg, NYSE (Financial Position)

�Experian, TransUnion, Equifax (Credit Reporting)

�Google Maps, MapInfo (Geospatial, …)

�Radian 6, Biz360, … (Client Trend Data)

�Etc.

Page 12: Big Data & the Cloud

12 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Need For Need For ““ Trust In DataTrust In Data ””

Compliance with laws

� Sarbanes Oxley [SOX], BASIL II, HIPAA, etc.

Lack of confidence in the data

� Reports utilizing same data do not report same totals or computations

Data not defined and readily available

� Multiple sources of data have to be rationalized at each project start-up thereby wasting valuable time & $ on every project

Data timeliness

� Manual process to collect, analyze and provide results

Data integrity

� Unknown filters, varying calculation/computations, fields used for data not indicative of field names, data passed along from one person to another to another to another…..

Page 13: Big Data & the Cloud

13 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Summation Of Industry Summation Of Industry ““ BuzzBuzz ””

Business mandate to obtain more value out

of the data (get answers)

Variety of sources, amounts, types and

granularity of data that customers want to

integrate is growing exponentially

Need to shrink the latency between the

business event and the data availability for

analysis and decision-making

Advancing agility of information is key

Need for Data trust and Compliance with

regulations

Page 14: Big Data & the Cloud

The Challenges Of Big DataThe Challenges Of Big Data

“If It Was That

Easy, Everyone

Would Be Doing It”Source: Unknown

Page 15: Big Data & the Cloud

15 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

The Information Issue Is?The Information Issue Is?

Too many organizations are not using

information to its full advantage!

�1 in 3 business leaders frequently make critical decisions without the information they need

�1 in 2 business leaders do not have access to the information across their organization needed to do their jobs.

�3 in 4 business leaders say more predictive information would drive better decisions

Source: IBM Institute for Business Value, March 2009

Page 16: Big Data & the Cloud

16 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Business Alignment & TrustBusiness Alignment & Trust

A Recent CIO:INSIGHT Poll of CIO’s Found

� 56% of respondents say they feel overwhelmed by the amount of data their enterprise manages

� 33% of respondents want even more sources of data, despite their feelings of being overwhelmed by it

� 62% of respondents say they’re frequently interrupted by irrelevant incoming data

� 43% of respondents say they’re dissatisfied with the current tools they use to filter out irrelevant data

� 46% of respondents say they’ve made inaccurate business decisions as a result of bad or outdated data

� One in Three report that they “can’t find the right people with the right data”

Source: “The Big Data Conundrum”, http://www.cioinsight.com/c/a/Storage/The-Big-Data-Conundrum-568229/

Page 17: Big Data & the Cloud

17 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Viewed Another WayViewed Another Way ……

If a football team had these players on the field:

� Only 4 of the 11 players on the field would know which goal is theirs

� Only 6 of the 11 would care � Only 3 of the 11 would know

what position they play and what they are supposed to do

� 9 players out of 11 would, in some way, be competing against their own team rather than the opponent

Page 18: Big Data & the Cloud

18 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

BI Perception Is Complicated & SlowBI Perception Is Complicated & Slow

BI/DW is perceived as not “enabling” the business

�� Inhibitor to corporate progressInhibitor to corporate progress IT systems cannot be changed fast enough to meet market demands, seize opportunity or comply with a new requirement.

�� Weak alignment between IT and business strategyWeak alignment between IT and business strategy Marked by an intractable language barrier.

�� Business not always sure what information or dimens ions they Business not always sure what information or dimens ions they want or needwant or need To answer questions about what to do next

�� BI/DW has not been known as a source of innovationsBI/DW has not been known as a source of innovations

The complexity of systems has caused BI/DW to be

reactive rather than proactive

� Silo’d solutions, db’s and applications with trapped business rules

� Multiple sources of information and no single “truth”� No “Architectural Blueprints” to the enterprise…

Page 19: Big Data & the Cloud

19 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

BI & D/W BI & D/W –– The The ““ Old WayOld Way ””

Data Discovery DQ / Data Governance Data Integration BI & Data Mining

Profiling Metadata / MDM Data Modeling & ETL BI / DW / OLAP

PROCESSES

TOOLS

DataChaos

DefinedData

MasterData

IntegratedInformation

BusinessIntelligence

D/W KPI’sDashboards

Data Chaos• Same type data is different

in diverse systems• EG: AT&T is the same as

AT&T Inc

Defined Data• Defined common

meanings• EG: Determine the

sources, types, and properties of grouped (i.e.: customer) records

Master Data• Publish and subscribe to

master data• EG: Single view of

customer across all information systems

Integrated Information• Bring metadata together

with modeled information for reporting (BI) and warehousing (drilling and hierarchies).

Business Intelligence• Analyzing the data by

looking into history• Viewing graphs of

historical information

D/W KPI’s & Dashboards• Drilling into information to find

and analyze trends• KPI’s and metrics that offer a

glimpse into historical performance

• Exception reporting and alerts

Page 20: Big Data & the Cloud

20 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

The The ““ IntelligenceIntelligence ”” Maturity ModelMaturity Model

Page 21: Big Data & the Cloud

21 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Advancing The Maturity Of BIAdvancing The Maturity Of BI

Page 22: Big Data & the Cloud

22 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

The Big Data MethodThe Big Data Method

Data Discovery DQ / Data Governance Analytics Utilizing Data Scientists

Profiling & Matching / DQ Query Federation “R”,

PROCESSES

TOOLS

DataChaos

DataAnalysis

DataMatching

IntegratedInformation

DataAnalytics

BusinessPerformanceOptimization

Data Chaos• Same type data is different

in diverse systems• EG: AT&T is the same as

AT&T Inc

Defined Data• Defined common

meanings• EG: Determine the

sources, types, and properties of grouped (i.e.: customer) records

Data Matching• Profiling of information to

determine quality• Automated analysis to

match information

Integrated Information• Bring metadata together

from matching into data stores and sharing with analysis toolsets

• Organizing information for rapid retrieval

Data Analytics• Using Data Scientists,

evaluate data utilizing mathematical algorithms and visualization toolsets

Performance Optimization• Using analytics, changes to

business models are made• Analysis of models improve

business and optimize business performance

Page 23: Big Data & the Cloud

Architectural Solutions &Architectural Solutions &The CloudThe Cloud

“You never change things by

fighting the existing reality.

To change something, build a

new model that makes the

existing model obsolete.”

Richard Buckminster Fuller

Page 24: Big Data & the Cloud

24 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Big Data Required A Big ChangeBig Data Required A Big Change

Consider 100 GB would store the entire US Census DB “basic” information set for every living human being on the planet:� Age, Sex, Income, Ethnicity, Language, Religion, Housing Status, Location

into a 128 bit set� That equates to about 6.75 millions rows of about 10 columns

Consider the Large Hadron Collinder within the CERN Laboratories

� Expected to produce 150,000 times as much raw data each yearWhat makes large data sets are repeated observations over time / space (spatial or temporal dimensions)� Web log has Millions [M] of visits over a handful pages� Retailer has 100K products, M customers, but Billions of transactions� Hi-Res Scientific like fMRI 1K-GB per view

Cardinalities (distinct observations) was usually small with regard to total # of observations� This was starting to change with the advent of device supplied information,

sensors and other semi and unstructured data sources

Page 25: Big Data & the Cloud

25 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

A Change In Technology Was NeededA Change In Technology Was Needed

Consider that

Relational technologies

were invented to get

data in and organized,

not designed nor

organized to get it out

� RDBMS’s were designed for efficient transactions processing on large data sets� Adding, Updating

� Searching for & retrieving small amounts of data

Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09

Page 26: Big Data & the Cloud

26 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Data Warehousing Was A Data Warehousing Was A ““ FixFix ””

DW was classically designed as “copy of transaction data specifically structured for query and analysis”� General approach was bulk ETL into a DB designed for

queries

Big data caused this “Fix” to break� “Traditional RDBMS-based dimensional modeling and cube-

based OLAP turns out to be to slow or to limited to support asking the really interesting questions of warehoused data”

“To achieve acceptable performance for highly order-dependent queries on truly large data, one must be willing to consider

abandoning the purely relational database model”

Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09

Page 27: Big Data & the Cloud

27 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Then Change Came In TechnologiesThen Change Came In Technologies ……

The advent of cloud and storage costs

� Infrastructure utilization increased dramatically�Low TCO and cost of storage and memory dropped

significantly spawning powerful computing paradigms and appliances

The advent of commodity-based

processing in a grid or MPP config

�Usage of existing hardware in a grid paradigm supporting queries across entire datasets

� “Hadoop” & MPP Shared Nothing Architectures

Page 28: Big Data & the Cloud

28 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Technology Solutions AppearedTechnology Solutions Appeared

Massively Parallel Processing� Teradata, Greenplum, etc.

Grid� Hadoop, MapReduce,

Cassandra, etc.

Columnar� ParAccel, Vertica, Sybase,

Sand Technologies, etc.

Hardware Appliances� DATAllegro, Netezza,

Oracle Exadata, etc.A visualization of a network of Facebook connections, from previous related research by Mucha and others.Credit: Amanda L. Traud, Christina Frost, UNC-Chapel Hill.Source: http://www.physorg.com/news192985912.html

Page 29: Big Data & the Cloud

29 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Virtualization & The CloudVirtualization & The Cloud

Page 30: Big Data & the Cloud

30 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Data Virtualization In The CloudData Virtualization In The Cloud

Page 31: Big Data & the Cloud

31 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Advances Provided Answers To SilosAdvances Provided Answers To Silos

“What Areas Do Your Big Data

Initiatives Address?”

Source: Forrester® June 2011 Global Big Data Online Survey

Page 32: Big Data & the Cloud

ItIt ’’s A Brave New Worlds A Brave New World ……

“Who Owns Or Drives

Your Big Data

Initiatives?”Source: Forrester, June 2011

2%

2%

12%

15%

70%

Other

Don’t know

Mostly IT-driven, with minimal business involvement

Mostly business-driven, with minimal IT involvement

Business/IT collaboration

Page 33: Big Data & the Cloud

33 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

From The Old Stack To A New EcosystemFrom The Old Stack To A New Ecosystem

Data integration without pre-processing

� Ability to locate and to query federated sources of data and content without costly data modeling and ETL transformation

Variety of sources (Mergers & Acquisitions, Growth, Services)

� Inability to rapidly add new data sources because of tightly coupled business rules

Need for flexible data structures

� Current structures are rigid and are views of the sources or the business requirements

Incorporation of unstructured data including social media

� Need tools to integrate and analyze unstructured sources that are not currently used

Need to incorporate and utilize metadata

� Metadata is disjointed, confined and incompatible – need uniformed, agile approach

Dynamic information with views for a reason

� Need creation and structuring of views that support dynamic information for purpose

Information management and governance in a regulated world

� Security and entitlement checking integrated with query processing� Information grants handled thru XACML obligations

Page 34: Big Data & the Cloud

34 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

The New The New ““ Data FabricData Fabric ”” TransformationTransformation

Coordinates ingestion of information

no matter what the source

Micro-batch takes the place of batch

Tagging replaces transformation

Federated query replaces ETL

Query direction removes the need for

optimization of data stores

Purposeful view is the new master

data repository

Page 35: Big Data & the Cloud

35 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Newest Trends In Big Data & The CloudNewest Trends In Big Data & The Cloud

Compelling Analytics Provide Extreme ROI� Data Visualization Technologies

�Heat, Clouds, Clusters, Flows� Mixing Structured, Semi and Unstructured Sources� Self-service analytics - Build your own sandbox!

Big Data Cloud Encircled Warehouses� Data Virtualization

�Abstracting the data from the systems�Complements existing data warehouses

� Many times the size of structured warehouse� Provides for rapid analytic iterations

Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information"

Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization

Page 36: Big Data & the Cloud

36 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Data Visualization In PracticeData Visualization In Practice

Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization

WorldWideWebAround Wikipedia- Wikipedia as part of the world wide webCreated by Chris 73 | Talk 09:56, 18 Jul 2004 (UTC) using TouchGraphGoogleBrowserV1.01

Page 37: Big Data & the Cloud

37 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

A Picture Is Worth A Thousand WordsA Picture Is Worth A Thousand Words

Source: Greenplum, An EMC Corporation

Page 38: Big Data & the Cloud

38 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Mixing Structured, Semi & Unstructured SourcesMixing Structured, Semi & Unstructured Sources ……

Source: Information Builders

Page 39: Big Data & the Cloud

39 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Big Data Cloud Encircled WarehousesBig Data Cloud Encircled Warehouses

Source: EMC Corporation

Page 40: Big Data & the Cloud

Case StudiesCase Studies

In the real world, we

find out the reasons

why Murphy’s Law is

so prevalent…

Page 41: Big Data & the Cloud

41 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Telecomm Provider Finds AnswersTelecomm Provider Finds Answers ……

Before investing tens of millions in infrastructure, a

telecomm firm learned where to invest their monies…

Challenge�100TB Traditional EDW, Single Source Of Truth

�Operational Reporting & Financial Consolidation

�Heavy Governance And Control

�Unable To Support Critical Business Initiatives

�Customer Loyalty And Churn The #1 Business Initiative From The CEO

Enterprise Big Data Cloud Surrounded Warehouse

� Extracted Data From EDW & Other Sources

� Generated Social Graph From Call DetailAnd Subscriber Data

� Within 2 Weeks Found “Connected” Subscribers 7X More Likely To Churn Than Average Users

� Now Deploying 1PB ProductionSource: Greenplum, an EMC Corporation

Page 42: Big Data & the Cloud

42 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Questions & AnswersQuestions & Answers

Open Exchange Of

Ideas

Speaker Contact

Information:

Robert J. Abate

[email protected]

(201) 745-7680

Page 43: Big Data & the Cloud

43 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012

Curriculum Vitae Of PresenterCurriculum Vitae Of Presenter

Robert J. Abate, CBIP, CDMP

As a hands-on, accomplished Information Technology professional, Mr. Abate offers 30 years of experience in Architectures, Applications, Business Intelligence & Analytics, Infrastructure, and IT strategy. He is credited as one of the first to publish on Services Oriented Architectures (1996), and a respected IT thought leader within the field. He holds a Bachelors of Science in Electrical Engineering, and is a Certified Business Intelligence Professional and a Certified Data Management Professional in four disciplines. Mr. Abate both chairs and presents at global conferences and a member of the board of DAMA and is a respected author and industry thought-leader. Mr. Abate frequently can be heard giving talks on topicssuch as “The Convergence Of SOA & BI,” “Best Practices In Enterprise Information Management,” “Making Big Data Analytics Actionable”, and “Data Services & Virtualization”.