Who owns privacyprotect.org? Answer - Directi Internet Solutions, Pvt. Ltd.
Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall...
Transcript of Trends and Directions for Database Technology Curt Cotner ... trends and directi… · IBM shall...
Trends and Directions for Database TechnologyCurt Cotner, IBM FellowSession Code: G018
10/05/2012
Please Note:
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
10/05/2012
Acknowledgements and Disclaimers:
© Copyright IBM Corporation 2011. All rights reserved.
– U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
IBM, the IBM logo, ibm.com, Infosphere Warehouse and SAS are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first
occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks
owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other
countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Other company, product, or service names may be trademarks or service marks of others.
Availability. References in this presentation to IBM products, programs, or services do not imply that they will be available in all
countries in which IBM operates.
The workshops, sessions and materials have been prepared by IBM or the session speakers and reflect their own views. They are provided for informational purposes only, and are neither intended to, nor shall have the effect of being, legal or other guidance or advice to any participant. While efforts were made to verify the completeness and accuracy of the information contained in this presentation, it is provided AS-IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this presentation or any other materials. Nothing contained in this presentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
10/05/2012
DBMS is the Bedrock of Modern Business
� Mature
� Performance
� Available
� Reliable
� Consistent
� Durable
4
10/05/2012
New Technology Emerges
5
XMLDatabases
In-memoryDatabases
Object Databases
NoSQLDatabases
1990s 2000s 2010s
10/05/2012
NoSQL Datastores
10/05/2012
Current NoSQL Landscape
Document Stores12 +
Graph Stores11+
Key Value Stores23 +
Tabular Stores6 +
XML Stores9 +
Currently there are more than 100+ noSQL systemsNot clear which of these will survive
Object Stores12 +
Others
10/05/2012
Why did NoSQL Datastores Arise?
� Some applications want extremely rapid development iterations
� so rapid that they cannot afford to negotiate schema changes with a DBA
� Some application groups aren’t comfortable with SQL, and really don’t want to get
involved in learning relational database technology
� Need for a simple low-latency, low-overhead API to access data that scales to 1000’s of
Web servers (e.g. cached Web data)
� Need to scale-out on cheap commodity nodes with locally attached SATA disks
� Increasing use of distributed analytics
10/05/2012
NoSQL Datastores
Transactional� Custom high-end OLTP for financial applications
� Scaleout datastores for Cloud/Web 2.0
� Examples
– MemcacheDB, Cassandra, Dynamo, Voldemort,
SimpleDB, Gigaspaces, Websphere eXtreme Scale
Analytics� Managing updates
� Support for random access and indexing
� Scaleout content store
� Examples
– Bigtable, HBase, Hypertable
Focus on Give up
� Commodity servers, networking, disks
� Easy elasticity and scalability to multiple racks (10s to 100s of servers)
� Fault-tolerance and high availability
� Relational data model
� SQL APIs
� Complex queries (joins, secondary indexes, ACID transactions)
Two Worlds
10/05/2012
DB2 is already providing XML support
Applications include creating business reports, SOA, webservices, forms etc
10/05/2012
DB2 is making investments to support Key Value
Data Store
Get (Key)
Put (Key, Value)
Remove(Key)
Value
Often used to cache data and objects for Web 2.0 applications
10/05/2012
NoSQL Graph Store
� Easy database design
� Schema not pre-defined
� Easy adaptation as needs evolve
Curt Cotner 1995 FordownsCar
Curt Cotner 123 Maple Ave, ChicagoownsHouse
Curt Cotner 2001 ThunderjetownsBoat
10/05/2012
NoSQL Graph Store based on DB2
� Open source Jena code ported to use DB2
� Uses DB2 logging, indexing, compression, etc.
� Makes use of DB2 high availability features
� Can scale out with DB2 pureScale or parallel sysplex
• Used in measurements with Rational Jazz
• DB2 graph store outperforms open source 4:1
• DB2 eliminates the scalability and availability limitations found with the open source
solution
• Will be provided to all DB2 and Informix customers at no added charge in 1H2012
10/05/2012
Cloud Computing: Hottest Topic in the Industry…
14
10/05/2012
� Secure, self-service cloud management hardware appliance for management of shared application deployments
� Pre-optimized for high performance and scalability with pre-configured workload patterns for ease of use
� Unmatched IBM Middleware management (apply maintenance, federate cells, etc. - not black box)– Can also manage black-box images to support other products
� Enables consistent & repeatable deployment of application environments based on patterns (Virtual Systems and Virtual Applications)
� Dispenses hardened middleware patterns into a pool/cloud of virtualized hardware running a supported hypervisor e.g. VMware ESX, z/VM, or PowerVM.
– “Bring your own cloud”
� Integrates with existing infrastructure management tools through programmable REST APIs
� License management provides ability to set license thresholds per product to maintain cloud-wide compliance
� Elasticity of application environments through support for addition of virtual images to dispensed patterns
� Fine grained control of deployments with IP address mapping and naming details with deployed patterns
IBM Workload Deployer – What is it?
X P Z
10/05/2012
DB Multi-tenancy: Sharing the Costs of Hardware Resources and Maintenance� Multitenancy can further reduce hardware costs and maintenance costs of a database in the cloud
� Multitenancy: multiple companies or users using the same software with a level of isolation
– Tenants are companies or users that would have historically installed and used a single instance of software solely for their own use
– Multitenancy allows companies/users to use the same software with a level of isolation
� Analogous to users running various applications on the same operating system
– The point is to share the management and hardware costs among a number of “tenants”
– Tenants, like the distinct users on an operating system require a level isolation
Number of Tenants
Siz
e o
f Ten
an
ts
Large tenants
Medium tenants
Long tail of small tenants
Medium Tenants Small TenantsLarge Tenants
Isolation: DatabasesShared: NA/Hardware
Isolation: TablesShared: Database
Isolation: RowsShared: Tables
MT ApplicationMT ApplicationMT Application or non-
MT application
1616
10/05/2012
Database Multi-Tenancy Models
Tenant A
Tenant B
App Server
Shared Tables� Tenants w/ same schema
� Smallest metadata
� Difficult tenant-specificoptimizations and tooling
� FGAC/Row permissions for security
Separate Instances/DBs
Separate Schemas/Tenants
� Tenants w/ different schemas (e.g. customization)
� Larger catalog footprint
� Table statistics
� Tenant-specific backup/restore
Tenant A
Tenant B
Multi-tenant App
App Server
Multi-tenant App
Hig
her
Query
Optim
ization/r
untim
e C
om
ple
xity,
Hig
her
Security
Worr
ies
Multi-tenant App
App Server
Higher Multi-tenancy, better resource utilization
10/05/2012
Shared Development Scenario
Shared objects and/or private objects
� Most objects could be shared across multiple tenants.
� Tenants w/ different schemas (e.g. customization) can have private objects.
� Greatly reduces cases where you need to create a new instance/subsystem to keep changes to shared objects from impacting other users.
Tenant A
Tenant B
Multi-tenant App
App Server
Shared Objects
10/05/2012
Shared Data Test Scenario
Tenant A updates
Tenant B updates
App Server
Shared Tables Multi-tenant App
Shared rows for all
tenants
•Tenants with same schema
•Most rows are shared across tenants
•Updates are held off to the side in tenant-keyed rows visible only to the tenant.
•Huge disk savings for SAP customers, since they often have 6-15 copies of production used for various types of development, testing, training.
•Should be easy/fast to destroy tenant updates for repetitive testing scenarios.
10/05/2012 20
Traditional Systems Landscape
OLTP Staging Area ODS EDW Data Marts
ETL ETL ETL ETL
Historical reasons:
• Different access patterns� impact on performance
• EDW as the data integration hub� again, impact on performance
• Different life-cycle characteristics� and again, impact on performance
• Different Service Level Agreements (SLA)� Lack of broadly available workload management capabilities
� Choice of lower cost-of-acquisition offerings
Negative ramifications:
• Complexity� both in systems management and in applications
• Difficulties in supporting real time analytics
• Inability to match ever more demanding SLA
requirements
• High total cost of ownership
Applications
10/05/2012 21
Visionary Systems Landscape
OLTP Staging Area ODS EDW Data Marts
ELT ELT ELT ELT
Applications
� Benefits� Consolidating all the components into a single
system
� Uniform access to any data
� Efficient data movement within the system
(ideally, no network)
� Opportunity to remove, i.e. consolidate some
of the layers
� Challenges� Mixed workload management capabilities
� Ensuring continuous availability, security and
reliability
� Providing universal processing capabilities to
deliver best performance for both
transactional and analytical workloads without
the need for excessive tuning
� Approaches� Columnar stores
� In-memory databases
� Hardware acceleration, special purpose processors
� Appliances
10/05/2012
Columnar Data Store Model
• Transactional database engines
typically use row-oriented data store
model
• Query engines which are optimized for
analytical queries sometimes use a
column-oriented approach.
• In a columnar store, the data of a
specific column is stored sequentially
• If attributes are not required for a
specific query execution, they simply
can be skipped, not causing any I/O or
decompression overhead.
Advantages:
• Scan only the columns required. Large
reduction in I/O
• High compression rates
• Good CPU cache locality while processing
column data
Challenges:
• Can be expensive to combine the qualifying columns into answer set (projection).
• Random I/O created during projections can eliminate benefits of I/O reduction
• Additional storage required so that values across vertical slices can be merged
• Multiple I/Os per record for all write operations (INSERT, UPDATE, DELETE).
Col1 row1
Col1 row2
Col1 row3
Col1 row4
Col1 row5
Col1 row6
Col1 rowN
.
.
.
Col2 row1
Col2 row2
Col2 row3
Col2 row4
Col2 row5
Col2 row6
Col2 rowN
.
.
.
Col3 row1
Col3 row2
Col3 row3
Col3 row4
Col3 row5
Col3 row6
Col3 rowN
.
.
.
ColN row1
ColN row2
ColN row3
ColN row4
ColN row5
ColN row6
ColN rowN
.
.
.
…
Multiple storage blocks store data
exclusively for this column
Approaches:
• In-memory
• SSD
• Multi-core friendly scan patterns
10/05/2012 23
Memory Hierarchy
CPU L1 L2/3 DRAM SSD HDD
1 - 26 - 20
100 - 4005000
1000000
Palace Park
Berget Stuttgart
Chicago
2 times to the Moon and back
c
y
c
l
e
s
m
i
l
e
s
This room
Challenges:
� Disk storage is realistically unavoidable for a fast and reliable recovery� Logging
� Backup
� Database growth and capacity planning challenges� Non-deterministic compression rates
� Cost
Approaches:
� Enhancing disk-based DBMS with in-memory capabilities� Optimizer and run-time awareness
� Storage Class Memory
10/05/2012 24
Storage Class Memory� Need to close the gap between DRAM and HDD
� HDD growth focus has always been areal density
Recommended reading: IBM Journal Research & Development, Vol 52 No 4/5
Capacity 1TB
Read or write access time 100 ns
Data rate > 1 GB/s
Sustained I/O rate 238K SIO/s
Sustained bandwidth 975MB/s
Write endurance 1012 writes
Projected 2020 characteristics of SCM devices
Thumb Drive
• Goal: create compact, robust storage (and memory) systems with greatly improved cost/performance ratios
• Defining characteristics
� nonvolatility� solid-state implementation (no moving parts)� very low latencies (tens to hundreds of ns)� low cost per bit� physical durability during practical use
• Access latency improved relatively modestly
� 10% vs. 45% CAGR for chip performance
10/05/2012 25
IBM DB2 Analytics Accelerator: Deep DB2 Integration
Data
Manager
Buffer
ManagerIRLM
Log
ManagerIDAA
Applications DBA Tools, z/OS Console, ...
. . .
Operation Interfaces
(e.g. DB2 Commands)
Application Interfaces
(standard SQL dialects)
z/OS on System z10‘s of processors
100‘s GB of memory
Netezza
DB2
Superior availability
reliability, security,
workload management,
OLTP performance ...
Industry leading
DW performance,
ease of use
10/05/2012 26
IDAA: Query Execution
DB2 for z/OS
Optimizer
IDA
A D
RD
A R
equesto
r
IDAA
Application
Application
Interface
Query execution run-time for
queries that cannot be or should
not be off-loaded to IDAA
SPU
CPU FPGA
Memory
SPU
CPU FPGA
Memory
SPU
CPU FPGA
Memory
SPU
CPU FPGA
Memory
SM
P H
ost
10/05/2012
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
Sec(s)
Query 1 Query 2 Query 3 Query 4 Query 5 Query 6 Query 7 Query 8 Query 9
Query
Acceleration
Times
Faster
Query
Total
Rows
Reviewed
Total
Rows
Returned Hours Sec(s) Hours Sec(s)
Query 1 2,813,571 853,320 2:39 9,540 0.0 5 1,908
Query 2 2,813,571 585,780 2:16 8,220 0.0 5 1,644
Query 3 8,260,214 274 1:16 4,560 0.0 6 760
Query 4 2,813,571 601,197 1:08 4,080 0.0 5 816
Query 5 3,422,765 508 0:57 4,080 0.0 70 58
Query 6 4,290,648 165 0:53 3,180 0.0 6 530
Query 7 361,521 58,236 0:51 3,120 0.0 4 780
Query 8 3,425.29 724 0:44 2,640 0.0 2 1,320Query 9 4,130,107 137 0:42 2,520 0.1 193 13
DB2 Only
DB2 with
IDAA
IDAA: Beta Program Results
10/05/2012
Curt [email protected], [email protected]
Session
Trends and Directions for Database Technology