Parallel Database Systems: The Future of High Performance ...
Parallel Database System: The Future of High Performance Database Systems
-
Upload
ahmed-case -
Category
Documents
-
view
36 -
download
0
description
Transcript of Parallel Database System: The Future of High Performance Database Systems
Parallel Database System: The Parallel Database System: The Future of High Performance Future of High Performance
Database SystemsDatabase Systems
Present by: Suresh Babu LPresent by: Suresh Babu L
1
OutlineOutline
Why parallel Databases?Why parallel Databases? Scale up and SpeedupScale up and Speedup Parallel DB’s ArchitecturesParallel DB’s Architectures Parallel Data FlowParallel Data Flow Data PartitioningData Partitioning Parallelism with Relational Parallelism with Relational
OperatorsOperators The State of the Art The State of the Art
2
Why Parallel Databases?Why Parallel Databases?
Edgar F.CoddEdgar F.Codd
3
Parallel Access to DataParallel Access to Data
1 Terabyte1 Terabyte
10 MB/s
1 Terabyte1 Terabyte
1,000 x parallel100 second SCAN.
Parallelism: divide a big problem into many smaller ones
to be solved in parallel.
BANDWID
TH
10 GB/s
At 10 MB/s1.2 days to scan
4
Parallel DBMS: IntroParallel DBMS: Intro
Pipeline Any Sequential Program
Any Sequential Program
Partition outputs split N ways inputs merge M ways
SequentialSequential
SequentialSequential Any Sequential Program
Any Sequential Program
5
Pipeline parallelism:Pipeline parallelism:
Pipeline partition:Pipeline partition:
Pipelined and Pipelined and Partitioned ParallelismPartitioned Parallelism
Both are natural in DBMS!Both are natural in DBMS!
Pipeline parallelismPipeline parallelism Partitioned data allows partitioned Partitioned data allows partitioned parallelismparallelism
6
Source Data
Scan Scan Scan Scan
Merge
Sort Sort Sort Sort
Source Data
Source Data
Source Data
Source Data
Scan
Sort
Scale-Up And Speed-UpScale-Up And Speed-Up SpeedupSpeedup
Scale-up:Scale-up:
7
100GB 100GB
100GB 1TB
Barriers to Achieving Barriers to Achieving Linear Speedup and Linear Speedup and
ScaleupScaleup
8
A Bad Speedup Curve
3-Factors
Processers & Discs
Inte
rfer
ence
Ske
w
Sta
rtu
p
Architectures for Parallel Architectures for Parallel DBsDBs
Shared memory:Shared memory:
Shared –disks:Shared –disks:
CLIENTS
MemoryProcessors
CLIENTS
IBM/370 ,Sequent, SGI, Sun
VMScluster, Sysplex
9
Architectures for Parallel Architectures for Parallel DBs(contd.)DBs(contd.)
Shared Nothing: Shared Nothing: CLIENTS
Tandem, Teradata, SP2
10
Architectures (contd.)Architectures (contd.)Shared Nothing
Teradata: 400 nodes 80x12 nodes
Tandem: 110 nodesIBM / SP2 / DB2: 128 nodesInformix/SP2 100 nodesATT & Sybase 8x14 nodes
Shared DiskOracle 170 nodesRdb 24 nodes
Shared MemoryInformix 9 nodes RedBrick ? nodes
CLIENTS
MemoryProcessors
CLIENTS
CLIENTS
11
Parallel Data Flow and Parallel Data Flow and Relational SystemsRelational Systems
12
Source Data
Scan Scan Scan Scan
Merge
Sort Sort Sort Sort
Source Data
Source Data
Source Data
Data PartitioningData Partitioning
Three main techniques:Three main techniques: Round RobinRound Robin Hash PartitioningHash Partitioning Range partitioningRange partitioning
13
Round Robin Round Robin PartitioningPartitioning
……..
…..
P1 P2 Pn
14
Hash PartitioningHash Partitioning
……..
P1 P2 Pn
15
Range PartitioningRange Partitioning
……..
…… ……
a….c d…..g w…z
P1 P2 Pn
16
Parallelism with Parallelism with Relational OperatorsRelational Operators
Two basic operations:Two basic operations: Merge Merge SplitSplit
17
Merge OperationMerge Operation
18
Split OperationSplit Operation SplitSplit
Used to partition or replicate the stream Used to partition or replicate the stream produced by a relational operatorproduced by a relational operator
19
Example of Parallelizing Example of Parallelizing Relational OperatorsRelational Operators
CC
AA B B
20
SCAN SCAN
JOIN
INSERT
Example (contd.)Example (contd.)
21
The State of the Art The State of the Art
TeradataTeradata Tandem Nonstop sqlTandem Nonstop sql GammaGamma The super database computerThe super database computer BubbaBubba
22
Specialized Parallel Specialized Parallel Relational OperatorsRelational Operators
Algorithms for traditional relational Algorithms for traditional relational operators written to improve their operators written to improve their parallel execution, to better handle parallel execution, to better handle data and execution skew.data and execution skew.
Look at joinLook at join Sort mergeSort merge Hash joinHash join
23
CONCLUSIONCONCLUSION
24
THANK YOUTHANK YOU
QUESTIONS ?QUESTIONS ?25