Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle...
Transcript of Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle...
![Page 1: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/1.jpg)
Niloy Mukherjee, Shasank Chavan, Maria Colgan, Dinesh Das, Mike Gleeson, Sanket Hase, Allison Holloway, Hui Jin,
Jesse Kamp, Kartik Kulkarni, Tirthankar Lahiri, Juan Loaiza, Neil Macnaughton, Vineet Marwah, Atrayee Mullick,
Andy Witkowski, Jiaqi Yan, Mohamed Zait
Distributed Architecture of Oracle Database In-memory
![Page 2: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/2.jpg)
Overview
1. Motivation– Trends and current solutions
2. Solution– Real Application Clustera– Oracle Database In-Memory
3. Preliminary evalutation– Some test results
4. Conclusion
![Page 3: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/3.jpg)
MotivationWhy Do We Need This?
![Page 4: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/4.jpg)
Data Trends● Deluge of data● Ad-hoc real-time analysis
![Page 5: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/5.jpg)
Typical Solution
● ETL – Extract, transform, load● Analyze data in dedicated system
OLTPApplication
OLAPApplication
● Complexity and manageability overhead!
● No real-time analytics
![Page 6: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/6.jpg)
Data Format
● Columnar format– Great for OLAP– Fast scans of single column
● Row format– Great for OLTP– Handle entire rows
![Page 7: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/7.jpg)
Hardware Trends● More cores, processors● Cheaper memory● Requires distributed applications
![Page 8: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/8.jpg)
In-Memory Databases● Memory resident
– Oracle TimesTen (mid 1990s)– Both row and column based– Main memory now conceived
as primary storage!
● Disk resident– Persistent
![Page 9: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/9.jpg)
Scaling Out● Aggregate power and memory● DB may not fit in single machine● Less contention for resources● Elastic!
![Page 10: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/10.jpg)
Scaling Up● Majority of workloads are quite small● Median @ Microsoft & Yahoo: less than 14GB● 90% @ Facebook under 100GB● Commodity server: 100s of GB and 32 cores● Oracle Sun SuperCluster: 32TB and 1024 cores
![Page 11: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/11.jpg)
But...● Scaling out offers
– High availability– Fast recovery
![Page 12: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/12.jpg)
How can mixed OLTAP be provided seamlessly, transparently AND be distributed?
Then the Question Is...
![Page 13: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/13.jpg)
SolutionHow was it solved?
![Page 14: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/14.jpg)
Real Application Cluster● Real Application Clusters abstracts away cluster details
![Page 15: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/15.jpg)
DBIM● Oracle Database In-memory (2014)● Real Application Clusters● Dual format● Both disk and memory● Both OLAP and OLTP (Mixed OLTAP)
![Page 16: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/16.jpg)
Dual Format● Row format
– Buffer cache (in memory)– Traditional logging
● Column format– In-memory– Fast scans
![Page 17: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/17.jpg)
DBIM Instance Architecture
![Page 18: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/18.jpg)
Shared Buffer Cache
![Page 19: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/19.jpg)
Shared Buffer Cache● Shared collective cache of data blocks● Global Cache Service manages
– Location– Access– Handles all OLTP DML operations
● ACID
![Page 20: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/20.jpg)
In-Memory Column Store
![Page 21: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/21.jpg)
In-Memory Compression Unit (IMCU)● Construction:
– Convert row → column– Apply «intelligent data transformation»
and compression● Unit of distribution and scan● Contiguous● Each column becomes a Compression
Unit (CU)– User selectable compression, capacity vs
performance
![Page 22: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/22.jpg)
Scanning IMCUs● SIMD instructions● In-memory Storage Indexes
– Automatically created– Pruning based on filter
predicates– E.g. max and min for each CU
● Low scan cost enables– Bloom filter joins– Vector Group By
![Page 23: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/23.jpg)
In-Memory Column Store● Container for in-memory segments
– Each segment contiguous and contains several IMCUs
● NUMA enabled – distributes equally● Home location index
– Look up segment from data block
![Page 24: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/24.jpg)
Distribution Manager
![Page 25: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/25.jpg)
Distribution manager● Wanted qualities
– Scale out - Extremely scalable distribution– High availability
● in-memory fault tolerance● efficient recovery
– Scale up – distribution across NUMA nodes– Seamless interaction with Oracle's SQL execution engine
![Page 26: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/26.jpg)
Distribution Schemes● By partition● By sub-partiton● By rowid/block range● Automatic
![Page 27: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/27.jpg)
Distribution Mechanism● Why not centralized?
– Non-trivial consistency communication by the coordinating instance● Why not decentralized?
– Lack of consensus → inconsistency● Best of both worlds!?
– Two phase distribution
![Page 28: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/28.jpg)
Phase 1: Consensus● Multiple instances may trigger (re)distribution
– Need leader selection
![Page 29: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/29.jpg)
Phase 1: Consensus
Broadcast Acknowledge Leader downgrade
![Page 30: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/30.jpg)
Phase 2: Population● Calculate block ranges
– Use SCN broadcasted in phase 1● Determine home location
– Rendezvous hashing● NUMA is static
– Use modulo based distribution
![Page 31: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/31.jpg)
Side Note: Rendezvous Hashing● Given a hash function h and an object O, select the instance S
whereh(S, O)
takes on the highest value.● Alternatively: Lowest value● Desirable property: Minimal disruption
![Page 32: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/32.jpg)
Phase 2: Population● Generate IMCUs● Update home location index● Release locks
At the end: All home location indexes consistent
![Page 33: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/33.jpg)
Home Location Indexes
![Page 34: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/34.jpg)
Redistribution● On cluster topology change● Same as distribution● Reuse SCN!
![Page 35: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/35.jpg)
IM Transaction Manager
![Page 36: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/36.jpg)
IM Transaction Manager● Maintains transactional consistency● Uses a system change number (SCN)● Snapshot management unit (SMU)
– Fills in the gap between the IMCU's SCN and query SCN
IMCU+SCN SMU
Note: Requires regular repopulation
![Page 37: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/37.jpg)
Distributed SQL Execution
![Page 38: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/38.jpg)
Distributed SQL Execution● Index vs scan
– Extrapolate cost from home location index● Scan:
– Determine degree of parallelism– Allocate nodes– For 1-safe, select first or secondary
![Page 39: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/39.jpg)
Distributed SQL Execution● Hierarchy of
(sub)distributors● Distribute work based on
home location index● Align to IMCUs and
NUMA boundaries– All block ranges within
same memory
Instance
Query
NUMA node
![Page 40: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/40.jpg)
Home Location Aware Scanning
![Page 41: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/41.jpg)
Uniqueness of Architecture
● SAP HANA– More centralized– poor load
balancing– no redundancy
● No-SQL– Focus on
performance– Not ACID
● IBM DB2 + BLU– Per node in-
memory column db
– no in-memory redundancy
![Page 42: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/42.jpg)
Preliminary EvaluationDid it work?
With data from TPC-H
![Page 43: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/43.jpg)
Distribution● Non-partitioned table
– «atomics» table– Constant size
● Composite-partitioned table– «lineitem» table– Increasing size
● 84-way partitioned, each subpartitioned 256 ways (hash)
Speedup seems to be linear
![Page 44: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/44.jpg)
Query Execution● 4 query sets:
![Page 45: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/45.jpg)
Query Set 1● Selects counts● Where clauses with
increasing complexity
![Page 46: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/46.jpg)
Query Set 2● Select max● Increasing
complexity in select clause
![Page 47: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/47.jpg)
Query Set 3● Different like
predicates
![Page 48: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/48.jpg)
Query Set 4● Simple '<='
predicate● Increasing selectivity
![Page 49: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/49.jpg)
In-Memory Distribution Awareness● Auto distributed, no redundancy
![Page 50: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/50.jpg)
NUMA Aware Query Execution● Scale up
![Page 51: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/51.jpg)
In-Memory Fault Tolerance● 1-safe redundancy● First on 8 instances, then after killing one● (availability)
![Page 52: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/52.jpg)
ConclusionOr more like a summary?
![Page 53: Distributed Architecture of Oracle Database In-Memory · Distributed Architecture of Oracle Database In-memory. Overview ... SAP HANA – More centralized ... – Application transparent](https://reader030.fdocuments.in/reader030/viewer/2022021504/5afc19da7f8b9a32348fde62/html5/thumbnails/53.jpg)
Conclusion● Seamless real-time analytics on huge data volumes with redundancy
→ mixed OLTAP● Oracle DBIM should solve this
– Application transparent– In-memory– Distributed– Uses Oracle's SQL execution framework (consistent interface)