© 2011 by The 451 Group. All rights reserved
Matthew Aslett Senior analyst, enterprise software [email protected]
Variety, Velocity and Volume Meeting the performance challenges of Big Data in the enterprise
© 2011 by The 451 Group. All rights reserved
Big Data, Total Data
2
What is it?
Current data management trends
What technologies are involved?
When to use them
The drivers behind emerging technology choices
© 2011 by The 451 Group. All rights reserved © 2011 by The 451 Group. All rights reserved
451 Research is focused on the business of enterprise IT innovation. The company’s analysts provide critical and timely insight into the competitive dynamics of innovation in emerging technology segments.
The 451 Group
Tier1 Research is a single-source research and advisory firm covering the multi-tenant datacenter, hosting, IT and cloud-computing sectors, blending the best of industry and financial research.
The Uptime Institute is ‘The Global Data Center Authority’ and a pioneer in the creation and facilitation of end-user knowledge communities to improve reliability and uninterruptible availability in datacenter facilities.
TheInfoPro is a leading IT advisory and research firm that provides real-world perspectives on the customer and market dynamics of the enterprise information technology landscape, harnessing the collective knowledge and insight of leading IT organizations worldwide.
ChangeWave Research is a research firm that identifies and quantifies ‘change’ in consumer spending behavior, corporate purchasing, and industry, company and technology trends.
© 2011 by The 451 Group. All rights reserved
Coverage areas
Matthew Aslett Senior analyst, enterprise software
With The 451 Group since 2007
www.twitter.com/maslett
Commercial Adoption of Open Source (CAOS) Adoption by enterprises
Adoption by vendors
Information Management Database
Data warehousing
Data caching
4
© 2011 by The 451 Group. All rights reserved
Data Management Trends
© 2011 by The 451 Group. All rights reserved
Current data management trends
6
The volume, variety and velocity of data is growing rapidly
Data processing capabilities have never been better
The value of data has never been better understood
The data deluge problem is also a big data opportunity
RISK OPPORTUNITY
© 2011 by The 451 Group. All rights reserved
What is Big Data?
7
More than just rising data volumes
Big Data ≠ Volume
© 2011 by The 451 Group. All rights reserved
What is Big Data?
8
Also variety of data types/sources and velocity of data updates
Big Data = Volume ± Variety ± Velocity
© 2011 by The 451 Group. All rights reserved
Current data management trends
9
The volume, variety and velocity of data is growing rapidly
Data processing capabilities have never been better
The value of data has never been better understood
‘Big Data’ covers a diverse set of products that can be applied to different problems
‘Big Data’ highlights the problem – volume/variety/velocity,
and promises a solution – value,
but doesn’t provide a path in between
RISK OPPORTUNITY
© 2011 by The 451 Group. All rights reserved
Total Data
© 2011 by The 451 Group. All rights reserved
What is Total Data?
11
Not just another name for Big Data
A concept defined by The 451 Group to describe new approaches to data management – beyond restrictive silos
Reflects the changing data management landscape as pragmatic choices are being made about data storage and analysis techniques
Inspired by ‘Total Football’
© 2011 by The 451 Group. All rights reserved
What is Total Data?
12
Also the desire of the user to store and process all their data
Value = (Volume ± Variety ± Velocity) x Totality
© 2011 by The 451 Group. All rights reserved
What is Total Data?
13
Within tolerable time frames
Value = (Volume ± Variety ± Velocity) x Totality
Time
Data
volume/
variety/
velocity
Rate of query
Value of data
© 2011 by The 451 Group. All rights reserved
What is Total Data?
14
Within tolerable time frames
Value = (Volume ± Variety ± Velocity) x Totality
Time
Total Data is making the most efficient use of existing and new data management resources to deliver value from data
The technologies deployed depend on which factor is most significant to the problem and the nature of the query
© 2011 by The 451 Group. All rights reserved
Technology choices
© 2011 by The 451 Group. All rights reserved
Application stack
Hardware
Database
Users
Application
© 2011 by The 451 Group. All rights reserved
Traditional scalability
Database
Users
Application
Users
Application
Users
Application
Hardware
© 2011 by The 451 Group. All rights reserved
Commodity hardware
Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware
Database
Application Application Application
Users Users Users
© 2011 by The 451 Group. All rights reserved
User explosion
Users Users Users Users Users Users Users Users
Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware
Database
Application Application Application
© 2011 by The 451 Group. All rights reserved
Application scalability
Users Users Users Users Users Users Users Users
Application Application Application Application Application Application
Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware
Database
© 2011 by The 451 Group. All rights reserved
Data management use cases
Database
Operational
Analytic
© 2011 by The 451 Group. All rights reserved
Data management use cases
Data management
real-time transaction
and data ingestion
large scale data storage and analysis
© 2011 by The 451 Group. All rights reserved
Data management requirements
real-time transaction
and data ingestion
large scale data storage and analysis
Data ingestion/analysis • random reads and writes • real-time • low, predictable latency • high performance Data storage/analysis • lower-cost storage • large-scale analytics • read heavy • batch processing
© 2011 by The 451 Group. All rights reserved
Data management requirements
real-time transaction
and data ingestion
large scale data storage and analysis
Data ingestion/analysis • MySQL • Data caching • NoSQL, NewSQL databases • Stream processing Data storage/analysis • Data warehouse/marts • In-database analytics • Online repository • Hadoop
© 2011 by The 451 Group. All rights reserved
Emerging scalability
Users Users Users Users Users Users Users Users
Application Application Application Application Application Application
Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware
Data ingestion/
analysis
Data
storage/analysis
Data ingestion/
analysis
Data storage/a
nalysis
Data ingestion/
analysis
Data storage/a
nalysis
Data ingestion/
analysis
Data
storage/analysis
Data ingestion/
analysis
Data storage/a
nalysis
Data ingestion/
analysis
Data storage/a
nalysis
Data storage/a
nalysis
Data storage/a
nalysis
© 2011 by The 451 Group. All rights reserved
Database SPRAIN
26
Scalability Hardware economics
NoSQL, NewSQL databases, Hadoop, cloud computing
Performance Database limitations
Data caching, in-memory, virtual machine, stream/event processing
Relaxed consistency
CAP Theorem NoSQL databases, cloud computing
Agility Polyglot persistence
Agile development, schema-free, non-relational, in-memory
Intricacy Big data, total data
Non-relational database, NoSQL, Hadoop, in-database analytics, memory storage
Necessity Open source The failure of incumbent vendors to address emerging requirements
© 2011 by The 451 Group. All rights reserved
Emerging scalability
Users Users Users Users Users Users Users Users
Application Application Application Application Application Application
Hardware Hardware Hardware Hardware Hardware Hardware Hardware Hardware
Data ingestion/
analysis
Data
storage/analysis
Data ingestion/
analysis
Data storage/a
nalysis
Data ingestion/
analysis
Data storage/a
nalysis
Data ingestion/
analysis
Data
storage/analysis
Data ingestion/
analysis
Data storage/a
nalysis
Data ingestion/
analysis
Data storage/a
nalysis
Data storage/a
nalysis
Data storage/a
nalysis
Virtual machine
Virtual machine
© 2011 by The 451 Group. All rights reserved
Relevant reports
Total Data
Explaining the total data management approach to dealing with the impact of big data on the data management landscape
Coming late 2011
Including the growing Hadoop ecosystem and real-time
COMING LATE 2011
Gil Tene
CTO, Azul Systems
Enterprise Big Data
Java, Building Blocks and Performance Impacts
©2011 Azul Systems, Inc. 2 2
What Azul does for the Big Data space
• We provide Java runtimes that scale consistently
• We eliminate the Garbage Collection problem
• Our runtimes are elastic in nature
• Our runtimes are an ideal building block for Big Data
©2011 Azul Systems, Inc. 3 3
Many technologies support Big Data
• Operational databases / data warehouses/marts
• Data integration / data virtualization
• Business intelligence tools
• Hadoop / equivalent / alternative technology
• Relational / non-relational databases
• In-memory databases
• Data caching
• Stream / event processing
• In-database analytics
• Disk / memory storage
©2011 Azul Systems, Inc. | Azul Company Confidential 4
Value and infrastructure
©2011 Azul Systems, Inc. 5 5
Value inflection points, Data
Data
Value
Volume, Velocity,
Variety
Value of data
Value = (Volume ± Variety ± Velocity) x Totality
Time
©2011 Azul Systems, Inc. 6 6
Volume / Velocity/ Variety
translation to underlying capacity metrics
• Higher Volume = larger data set sizes, amounts of state
• Higher Velocity = higher processing rate, throughput
• Higher Variety = more metadata, indexes, cross-linking
©2011 Azul Systems, Inc. | Azul Company Confidential 7
Timeliness
©2011 Azul Systems, Inc. 8 8
Inflection points – Timeliness
Data
Value
1/t
Value of data
Value = (Volume ± Variety ± Velocity) x Totality
Time
©2011 Azul Systems, Inc. 9 9
Real world example of timeliness:
Affecting shopping decisions
I went [online] shopping for a camping trip last week
• Spent multi-$100: tent, sleeping bags, other items.
• Researched and shopped around for hours, across
multiple sites
• Relevant ads that showed up within my decision
window affected my shopping
• Ads that showed up a day later did not…
©2011 Azul Systems, Inc. 10 10
Big Data’s value
real-time
transaction
and data
ingestion
large scale
data storage
and analysis
Interactive application
• random reads and writes
• real-time
• low, predictable latency
• high performance
Data storage/analysis
• lower-cost storage
• large-scale analytics
• read heavy
• batch processing
©2011 Azul Systems, Inc. 11 11
Big Data’s value: enhanced by timeliness
real-time
transaction
and data
ingestion
large scale
data storage
and analysis
Interactive application
• random reads and writes
• real-time
• low, predictable latency
• high performance
Data storage/analysis
• lower-cost storage
• large-scale analytics
• read heavy
• batch processing?
Increased
Value
5 seconds
5 minutes
5 hours
1 day
1 week
©2011 Azul Systems, Inc. | Azul Company Confidential 12
Building blocks
©2011 Azul Systems, Inc. 13 13
Choice of Building blocks size
Building block granularity
©2011 Azul Systems, Inc. 14 14
Inflection points – building block size
Data
Value
Building
Block Size
Value of data
Value = (Volume ± Variety ± Velocity) x Totality
Time
©2011 Azul Systems, Inc. 15 15
Building blocks
• There are value inflection points to building block size
─ At some size levels, orders-of-magnitude improvement occur
─ E.g. when entire index fits in each replica/memory space
─ E.g. when partition sizes are big enough
• Typical physical building block
─ 12-24 core, dual-socket servers
─ 24-96 GB DRAM
• Typical process level building block
─ A Java JVM
─ 1 - 4GB of memory, 1-2 cores
• Why the mismatch?
©2011 Azul Systems, Inc. 16 16
Productivity with a scale challenge
• Variety of languages and technologies in Big Data
• Java/JVM is the enterprise default, highly productive
• But…
• Building blocks are practically limited in size
• Main limitation reason: Garbage Collection
• Main Garbage Collection issue: GC pause times
• Imposes practical limits due to stability/responsiveness
• Physical building blocks broken into 10s of tiny pieces
©2011 Azul Systems, Inc. | Azul Company Confidential 17
Critical infrastructure units
©2011 Azul Systems, Inc. 18 18
Critical unit example - HDFS
Source: Apache Hadoop documentation
©2011 Azul Systems, Inc. 19 19
Critical Infrastructure units
Size of critical infrastructure units drive cluster scale
• Central metadata nodes
• Central in-memory index nodes
• Central authoritative data nodes
• Graph-DB and dense relationship nodes
©2011 Azul Systems, Inc. 20 20
Summary: Azul’s Impact on Big Data
Break the building block size barrier
• Each instance can elastically fill an entire physical unit
• Expose/leverage value inflection points
• Remove/Expand cluster scale limitations
• Reduce/Consolidate instance counts
• Address tuning challenges
• Ensure predictable performance
For More Information: Azul Systems Web site: www.azulsystems.com
Technical resources: ./resources
Zing trial: ./trial
451 Group: www.the451group.com
blogs.the451group.com/information_management/
Webinar replay:
www.azulsystems.com/resources/webinars
Top Related