Post on 12-May-2015
““ Big DataBig Data ”” and and ““ The CloudThe Cloud ””
Robert J. Abate, CBIP, CDMP
Independent Consultant
Webinar: March 20th, 2012
2PM EST / 11AM PST
2 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
““ Big DataBig Data ”” And And ““ The CloudThe Cloud ”” -- AgendaAgenda
The Industry Is A Buzz…
The Challenges Of Big
Data
Architectural Solutions &
The Cloud
It’s A Brave New World
Case Studies
Questions & Answers
The Industry Is A BuzzThe Industry Is A Buzz ……
“Despite the hype, most firms find the technology useful to operate on data they already have”
Source: Forrester, June 2011
4 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Everyone Is Talking About Big DataEveryone Is Talking About Big Data ……
“Big data will represent a hugely disruptive force during the next five years – enabling levels of insight – that are currently unachievable through any other means”
“Big Data: Huge Management Implications with Enormous Returns”
“Big data is still in mostly unchartered territory, but a surprise number is actually doing something with it”
“61% of respondents feel big data will fundamentally change the way their business works
“Most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies, potential application areas, and why ‘big data BI’ contrasts with traditional BI tools. It differs dramatically from traditional BI in terms of both capabilities and in the technologies used to achieve those capability breakthroughs”
Gartner: May 2011
Gartner: January 2012
IDC: March 2011
Forrester: June 2011
CIO/Insight: November 2010
5 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
What Are The Drivers For Big Data/CloudWhat Are The Drivers For Big Data/Cloud
We Are In The Information Age
�Every corporation today is in the “Data Business”
We Are Inundated In Data
�Types�Sources�Varieties
Data Is Growing Exponentially
�So are the challenges
Data Complexity Is Increasing
�Causing insight to be lost
6 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Pictorial Representation Of InformationPictorial Representation Of Information
7 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Big Data Is More Than Just VolumeBig Data Is More Than Just Volume
Consider:Consider: Master Data, Fidelity, Complexity, Validity, Perishability, Linking Data
Structured Data:Structured Data: POS transactions, call detail records, credit card transactions, shipping updates, purchase orders, payments, shipments, account transactions
Unstructured Data:Unstructured Data: Web logs, newsfeeds, social media, geo-location, mobile, consumer comments, claims, doctor’s notes, clinical studies, images, video, audio
DeviceDevice--generated Data:generated Data:RFID sensors, smart meters, smart grids, GPS spatial, micro-payments
Variety Complexity
Velocity Volume
Smart GridSmart GridImagesImages
AudioAudio
VideoVideo
Transactional Data
Transactional Data
TextText
DocumentsDocuments
Industry-specificIndustry-specific
Web trafficWeb traffic
Sensor/location-based
Sensor/location-based
SocialSocial
8 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Big DataBig Data ’’s Potential Is Limitlesss Potential Is Limitless
TODAY
Less than 10% of enterprises information
“Rear-view” mirror reporting, dashboards and analysis
� Days, weeks, months, or even quarters old
Incomplete, inaccurate, and disjointed data
Architectures and methods that take 6 to 18 months to exploit
TOMORROW
Vast majority of available sources and external data
Forward looking or “Windshield-view” predictions with recommendations� Real-time near real-time
Correlated, high confidence, governed data
Vastly accelerated time to market
9 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Time Really Is Money!Time Really Is Money!
DataLifecycle
“THE TIME VALUE CURVE”© 2007 - Dr. Richard Hackathorn, Bolder Technology, I nc., All Rights Reserved. Used with Permission.
Action Time
Valu
e L
ost
Action Time
Valu
e L
ost
Time
Value
Action
Business Event
Taken
Business Event
Taken
Capture Latency
AnalysisLatency
Decision Latency
Data Ready For Analysis
Information Delivered
Capture Latency
AnalysisLatency
Decision Latency
Data Ready For Analysis
Information Delivered
10 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Data Is Coming At Us FasterData Is Coming At Us Faster
In A Recent TDWI Survey Of 450 CIO’s
�17% have a real time data warehouse�90% plan on having a real time warehouse�75% will replace to get to a real-time solution
Big Data Projects Are Enterprise-Scale
�When asked:
““What Is The Scope OfWhat Is The Scope Of
Your Big Data Initiative?Your Big Data Initiative?””
5%
5%
8%
8%
8%
65%
Other
Regional
Project-based
Departmental
Line of business
Enterprise
Source: Forrester® June 2011 Global Big Data Online Survey
11 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Data Is Coming From All DirectionsData Is Coming From All Directions ……
Data is now commonly entering into
the enterprise from external sources
�Government (Census, Revenues, …)
�Neilson, NPD Group (Sales)
�Bloomberg, NYSE (Financial Position)
�Experian, TransUnion, Equifax (Credit Reporting)
�Google Maps, MapInfo (Geospatial, …)
�Radian 6, Biz360, … (Client Trend Data)
�Etc.
12 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Need For Need For ““ Trust In DataTrust In Data ””
Compliance with laws
� Sarbanes Oxley [SOX], BASIL II, HIPAA, etc.
Lack of confidence in the data
� Reports utilizing same data do not report same totals or computations
Data not defined and readily available
� Multiple sources of data have to be rationalized at each project start-up thereby wasting valuable time & $ on every project
Data timeliness
� Manual process to collect, analyze and provide results
Data integrity
� Unknown filters, varying calculation/computations, fields used for data not indicative of field names, data passed along from one person to another to another to another…..
13 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Summation Of Industry Summation Of Industry ““ BuzzBuzz ””
Business mandate to obtain more value out
of the data (get answers)
Variety of sources, amounts, types and
granularity of data that customers want to
integrate is growing exponentially
Need to shrink the latency between the
business event and the data availability for
analysis and decision-making
Advancing agility of information is key
Need for Data trust and Compliance with
regulations
The Challenges Of Big DataThe Challenges Of Big Data
“If It Was That
Easy, Everyone
Would Be Doing It”Source: Unknown
15 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
The Information Issue Is?The Information Issue Is?
Too many organizations are not using
information to its full advantage!
�1 in 3 business leaders frequently make critical decisions without the information they need
�1 in 2 business leaders do not have access to the information across their organization needed to do their jobs.
�3 in 4 business leaders say more predictive information would drive better decisions
Source: IBM Institute for Business Value, March 2009
16 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Business Alignment & TrustBusiness Alignment & Trust
A Recent CIO:INSIGHT Poll of CIO’s Found
� 56% of respondents say they feel overwhelmed by the amount of data their enterprise manages
� 33% of respondents want even more sources of data, despite their feelings of being overwhelmed by it
� 62% of respondents say they’re frequently interrupted by irrelevant incoming data
� 43% of respondents say they’re dissatisfied with the current tools they use to filter out irrelevant data
� 46% of respondents say they’ve made inaccurate business decisions as a result of bad or outdated data
� One in Three report that they “can’t find the right people with the right data”
Source: “The Big Data Conundrum”, http://www.cioinsight.com/c/a/Storage/The-Big-Data-Conundrum-568229/
17 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Viewed Another WayViewed Another Way ……
If a football team had these players on the field:
� Only 4 of the 11 players on the field would know which goal is theirs
� Only 6 of the 11 would care � Only 3 of the 11 would know
what position they play and what they are supposed to do
� 9 players out of 11 would, in some way, be competing against their own team rather than the opponent
18 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
BI Perception Is Complicated & SlowBI Perception Is Complicated & Slow
BI/DW is perceived as not “enabling” the business
�� Inhibitor to corporate progressInhibitor to corporate progress IT systems cannot be changed fast enough to meet market demands, seize opportunity or comply with a new requirement.
�� Weak alignment between IT and business strategyWeak alignment between IT and business strategy Marked by an intractable language barrier.
�� Business not always sure what information or dimens ions they Business not always sure what information or dimens ions they want or needwant or need To answer questions about what to do next
�� BI/DW has not been known as a source of innovationsBI/DW has not been known as a source of innovations
The complexity of systems has caused BI/DW to be
reactive rather than proactive
� Silo’d solutions, db’s and applications with trapped business rules
� Multiple sources of information and no single “truth”� No “Architectural Blueprints” to the enterprise…
19 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
BI & D/W BI & D/W –– The The ““ Old WayOld Way ””
Data Discovery DQ / Data Governance Data Integration BI & Data Mining
Profiling Metadata / MDM Data Modeling & ETL BI / DW / OLAP
PROCESSES
TOOLS
DataChaos
DefinedData
MasterData
IntegratedInformation
BusinessIntelligence
D/W KPI’sDashboards
Data Chaos• Same type data is different
in diverse systems• EG: AT&T is the same as
AT&T Inc
Defined Data• Defined common
meanings• EG: Determine the
sources, types, and properties of grouped (i.e.: customer) records
Master Data• Publish and subscribe to
master data• EG: Single view of
customer across all information systems
Integrated Information• Bring metadata together
with modeled information for reporting (BI) and warehousing (drilling and hierarchies).
Business Intelligence• Analyzing the data by
looking into history• Viewing graphs of
historical information
D/W KPI’s & Dashboards• Drilling into information to find
and analyze trends• KPI’s and metrics that offer a
glimpse into historical performance
• Exception reporting and alerts
20 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
The The ““ IntelligenceIntelligence ”” Maturity ModelMaturity Model
21 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Advancing The Maturity Of BIAdvancing The Maturity Of BI
22 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
The Big Data MethodThe Big Data Method
Data Discovery DQ / Data Governance Analytics Utilizing Data Scientists
Profiling & Matching / DQ Query Federation “R”,
PROCESSES
TOOLS
DataChaos
DataAnalysis
DataMatching
IntegratedInformation
DataAnalytics
BusinessPerformanceOptimization
Data Chaos• Same type data is different
in diverse systems• EG: AT&T is the same as
AT&T Inc
Defined Data• Defined common
meanings• EG: Determine the
sources, types, and properties of grouped (i.e.: customer) records
Data Matching• Profiling of information to
determine quality• Automated analysis to
match information
Integrated Information• Bring metadata together
from matching into data stores and sharing with analysis toolsets
• Organizing information for rapid retrieval
Data Analytics• Using Data Scientists,
evaluate data utilizing mathematical algorithms and visualization toolsets
Performance Optimization• Using analytics, changes to
business models are made• Analysis of models improve
business and optimize business performance
Architectural Solutions &Architectural Solutions &The CloudThe Cloud
“You never change things by
fighting the existing reality.
To change something, build a
new model that makes the
existing model obsolete.”
Richard Buckminster Fuller
24 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Big Data Required A Big ChangeBig Data Required A Big Change
Consider 100 GB would store the entire US Census DB “basic” information set for every living human being on the planet:� Age, Sex, Income, Ethnicity, Language, Religion, Housing Status, Location
into a 128 bit set� That equates to about 6.75 millions rows of about 10 columns
Consider the Large Hadron Collinder within the CERN Laboratories
� Expected to produce 150,000 times as much raw data each yearWhat makes large data sets are repeated observations over time / space (spatial or temporal dimensions)� Web log has Millions [M] of visits over a handful pages� Retailer has 100K products, M customers, but Billions of transactions� Hi-Res Scientific like fMRI 1K-GB per view
Cardinalities (distinct observations) was usually small with regard to total # of observations� This was starting to change with the advent of device supplied information,
sensors and other semi and unstructured data sources
25 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
A Change In Technology Was NeededA Change In Technology Was Needed
Consider that
Relational technologies
were invented to get
data in and organized,
not designed nor
organized to get it out
� RDBMS’s were designed for efficient transactions processing on large data sets� Adding, Updating
� Searching for & retrieving small amounts of data
Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
26 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Data Warehousing Was A Data Warehousing Was A ““ FixFix ””
DW was classically designed as “copy of transaction data specifically structured for query and analysis”� General approach was bulk ETL into a DB designed for
queries
Big data caused this “Fix” to break� “Traditional RDBMS-based dimensional modeling and cube-
based OLAP turns out to be to slow or to limited to support asking the really interesting questions of warehoused data”
“To achieve acceptable performance for highly order-dependent queries on truly large data, one must be willing to consider
abandoning the purely relational database model”
Source: ACM Website “The Pathologies of Big Data”, Adam Jacobs, 7/6/09
27 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Then Change Came In TechnologiesThen Change Came In Technologies ……
The advent of cloud and storage costs
� Infrastructure utilization increased dramatically�Low TCO and cost of storage and memory dropped
significantly spawning powerful computing paradigms and appliances
The advent of commodity-based
processing in a grid or MPP config
�Usage of existing hardware in a grid paradigm supporting queries across entire datasets
� “Hadoop” & MPP Shared Nothing Architectures
28 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Technology Solutions AppearedTechnology Solutions Appeared
Massively Parallel Processing� Teradata, Greenplum, etc.
Grid� Hadoop, MapReduce,
Cassandra, etc.
Columnar� ParAccel, Vertica, Sybase,
Sand Technologies, etc.
Hardware Appliances� DATAllegro, Netezza,
Oracle Exadata, etc.A visualization of a network of Facebook connections, from previous related research by Mucha and others.Credit: Amanda L. Traud, Christina Frost, UNC-Chapel Hill.Source: http://www.physorg.com/news192985912.html
29 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Virtualization & The CloudVirtualization & The Cloud
30 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Data Virtualization In The CloudData Virtualization In The Cloud
31 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Advances Provided Answers To SilosAdvances Provided Answers To Silos
“What Areas Do Your Big Data
Initiatives Address?”
Source: Forrester® June 2011 Global Big Data Online Survey
ItIt ’’s A Brave New Worlds A Brave New World ……
“Who Owns Or Drives
Your Big Data
Initiatives?”Source: Forrester, June 2011
2%
2%
12%
15%
70%
Other
Don’t know
Mostly IT-driven, with minimal business involvement
Mostly business-driven, with minimal IT involvement
Business/IT collaboration
33 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
From The Old Stack To A New EcosystemFrom The Old Stack To A New Ecosystem
Data integration without pre-processing
� Ability to locate and to query federated sources of data and content without costly data modeling and ETL transformation
Variety of sources (Mergers & Acquisitions, Growth, Services)
� Inability to rapidly add new data sources because of tightly coupled business rules
Need for flexible data structures
� Current structures are rigid and are views of the sources or the business requirements
Incorporation of unstructured data including social media
� Need tools to integrate and analyze unstructured sources that are not currently used
Need to incorporate and utilize metadata
� Metadata is disjointed, confined and incompatible – need uniformed, agile approach
Dynamic information with views for a reason
� Need creation and structuring of views that support dynamic information for purpose
Information management and governance in a regulated world
� Security and entitlement checking integrated with query processing� Information grants handled thru XACML obligations
34 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
The New The New ““ Data FabricData Fabric ”” TransformationTransformation
Coordinates ingestion of information
no matter what the source
Micro-batch takes the place of batch
Tagging replaces transformation
Federated query replaces ETL
Query direction removes the need for
optimization of data stores
Purposeful view is the new master
data repository
35 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Newest Trends In Big Data & The CloudNewest Trends In Big Data & The Cloud
Compelling Analytics Provide Extreme ROI� Data Visualization Technologies
�Heat, Clouds, Clusters, Flows� Mixing Structured, Semi and Unstructured Sources� Self-service analytics - Build your own sandbox!
Big Data Cloud Encircled Warehouses� Data Virtualization
�Abstracting the data from the systems�Complements existing data warehouses
� Many times the size of structured warehouse� Provides for rapid analytic iterations
Data visualization is the study of the visual representation of data, meaning "information that has been abstracted in some schematic form, including attributes or variables for the units of information"
Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization
36 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Data Visualization In PracticeData Visualization In Practice
Source: Wikipedia - http://en.wikipedia.org/wiki/Data_visualization
WorldWideWebAround Wikipedia- Wikipedia as part of the world wide webCreated by Chris 73 | Talk 09:56, 18 Jul 2004 (UTC) using TouchGraphGoogleBrowserV1.01
37 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
A Picture Is Worth A Thousand WordsA Picture Is Worth A Thousand Words
Source: Greenplum, An EMC Corporation
38 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Mixing Structured, Semi & Unstructured SourcesMixing Structured, Semi & Unstructured Sources ……
Source: Information Builders
39 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Big Data Cloud Encircled WarehousesBig Data Cloud Encircled Warehouses
Source: EMC Corporation
Case StudiesCase Studies
In the real world, we
find out the reasons
why Murphy’s Law is
so prevalent…
41 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Telecomm Provider Finds AnswersTelecomm Provider Finds Answers ……
Before investing tens of millions in infrastructure, a
telecomm firm learned where to invest their monies…
Challenge�100TB Traditional EDW, Single Source Of Truth
�Operational Reporting & Financial Consolidation
�Heavy Governance And Control
�Unable To Support Critical Business Initiatives
�Customer Loyalty And Churn The #1 Business Initiative From The CEO
Enterprise Big Data Cloud Surrounded Warehouse
� Extracted Data From EDW & Other Sources
� Generated Social Graph From Call DetailAnd Subscriber Data
� Within 2 Weeks Found “Connected” Subscribers 7X More Likely To Churn Than Average Users
� Now Deploying 1PB ProductionSource: Greenplum, an EMC Corporation
42 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Questions & AnswersQuestions & Answers
Open Exchange Of
Ideas
Speaker Contact
Information:
Robert J. Abate
r.j.abate@att.net
(201) 745-7680
43 © 2012 – Dataversity & Robert J. AbateBig Data & The Cloud – March 20th, 2012
Curriculum Vitae Of PresenterCurriculum Vitae Of Presenter
Robert J. Abate, CBIP, CDMP
As a hands-on, accomplished Information Technology professional, Mr. Abate offers 30 years of experience in Architectures, Applications, Business Intelligence & Analytics, Infrastructure, and IT strategy. He is credited as one of the first to publish on Services Oriented Architectures (1996), and a respected IT thought leader within the field. He holds a Bachelors of Science in Electrical Engineering, and is a Certified Business Intelligence Professional and a Certified Data Management Professional in four disciplines. Mr. Abate both chairs and presents at global conferences and a member of the board of DAMA and is a respected author and industry thought-leader. Mr. Abate frequently can be heard giving talks on topicssuch as “The Convergence Of SOA & BI,” “Best Practices In Enterprise Information Management,” “Making Big Data Analytics Actionable”, and “Data Services & Virtualization”.