Parallel Database System: The Future of High Performance Database Systems

25
Parallel Database System: Parallel Database System: The Future of High The Future of High Performance Database Systems Performance Database Systems Present by: Suresh Present by: Suresh Babu L Babu L 1

description

Parallel Database System: The Future of High Performance Database Systems. Present by: Suresh Babu L. Outline. Why parallel Databases? Scale up and Speedup Parallel DB’s Architectures Parallel Data Flow Data Partitioning Parallelism with Relational Operators The State of the Art. - PowerPoint PPT Presentation

Transcript of Parallel Database System: The Future of High Performance Database Systems

Page 1: Parallel Database System: The Future of High Performance Database Systems

Parallel Database System: The Parallel Database System: The Future of High Performance Future of High Performance

Database SystemsDatabase Systems

Present by: Suresh Babu LPresent by: Suresh Babu L

1

Page 2: Parallel Database System: The Future of High Performance Database Systems

OutlineOutline

Why parallel Databases?Why parallel Databases? Scale up and SpeedupScale up and Speedup Parallel DB’s ArchitecturesParallel DB’s Architectures Parallel Data FlowParallel Data Flow Data PartitioningData Partitioning Parallelism with Relational Parallelism with Relational

OperatorsOperators The State of the Art The State of the Art

2

Page 3: Parallel Database System: The Future of High Performance Database Systems

Why Parallel Databases?Why Parallel Databases?

Edgar F.CoddEdgar F.Codd

3

Page 4: Parallel Database System: The Future of High Performance Database Systems

Parallel Access to DataParallel Access to Data

1 Terabyte1 Terabyte

10 MB/s

1 Terabyte1 Terabyte

1,000 x parallel100 second SCAN.

Parallelism: divide a big problem into many smaller ones

to be solved in parallel.

BANDWID

TH

10 GB/s

At 10 MB/s1.2 days to scan

4

Page 5: Parallel Database System: The Future of High Performance Database Systems

Parallel DBMS: IntroParallel DBMS: Intro

Pipeline Any Sequential Program

Any Sequential Program

Partition outputs split N ways inputs merge M ways

SequentialSequential

SequentialSequential Any Sequential Program

Any Sequential Program

5

Pipeline parallelism:Pipeline parallelism:

Pipeline partition:Pipeline partition:

Page 6: Parallel Database System: The Future of High Performance Database Systems

Pipelined and Pipelined and Partitioned ParallelismPartitioned Parallelism

Both are natural in DBMS!Both are natural in DBMS!

Pipeline parallelismPipeline parallelism Partitioned data allows partitioned Partitioned data allows partitioned parallelismparallelism

6

Source Data

Scan Scan Scan Scan

Merge

Sort Sort Sort Sort

Source Data

Source Data

Source Data

Source Data

Scan

Sort

Page 7: Parallel Database System: The Future of High Performance Database Systems

Scale-Up And Speed-UpScale-Up And Speed-Up SpeedupSpeedup

Scale-up:Scale-up:

7

100GB 100GB

100GB 1TB

Page 8: Parallel Database System: The Future of High Performance Database Systems

Barriers to Achieving Barriers to Achieving Linear Speedup and Linear Speedup and

ScaleupScaleup

8

A Bad Speedup Curve

3-Factors

Processers & Discs

Inte

rfer

ence

Ske

w

Sta

rtu

p

Page 9: Parallel Database System: The Future of High Performance Database Systems

Architectures for Parallel Architectures for Parallel DBsDBs

Shared memory:Shared memory:

Shared –disks:Shared –disks:

CLIENTS

MemoryProcessors

CLIENTS

IBM/370 ,Sequent, SGI, Sun

VMScluster, Sysplex

9

Page 10: Parallel Database System: The Future of High Performance Database Systems

Architectures for Parallel Architectures for Parallel DBs(contd.)DBs(contd.)

Shared Nothing: Shared Nothing: CLIENTS

Tandem, Teradata, SP2

10

Page 11: Parallel Database System: The Future of High Performance Database Systems

Architectures (contd.)Architectures (contd.)Shared Nothing

Teradata: 400 nodes 80x12 nodes

Tandem: 110 nodesIBM / SP2 / DB2: 128 nodesInformix/SP2 100 nodesATT & Sybase 8x14 nodes

Shared DiskOracle 170 nodesRdb 24 nodes

Shared MemoryInformix 9 nodes RedBrick ? nodes

CLIENTS

MemoryProcessors

CLIENTS

CLIENTS

11

Page 12: Parallel Database System: The Future of High Performance Database Systems

Parallel Data Flow and Parallel Data Flow and Relational SystemsRelational Systems

12

Source Data

Scan Scan Scan Scan

Merge

Sort Sort Sort Sort

Source Data

Source Data

Source Data

Page 13: Parallel Database System: The Future of High Performance Database Systems

Data PartitioningData Partitioning

Three main techniques:Three main techniques: Round RobinRound Robin Hash PartitioningHash Partitioning Range partitioningRange partitioning

13

Page 14: Parallel Database System: The Future of High Performance Database Systems

Round Robin Round Robin PartitioningPartitioning

……..

…..

P1 P2 Pn

14

Page 15: Parallel Database System: The Future of High Performance Database Systems

Hash PartitioningHash Partitioning

……..

P1 P2 Pn

15

Page 16: Parallel Database System: The Future of High Performance Database Systems

Range PartitioningRange Partitioning

……..

…… ……

a….c d…..g w…z

P1 P2 Pn

16

Page 17: Parallel Database System: The Future of High Performance Database Systems

Parallelism with Parallelism with Relational OperatorsRelational Operators

Two basic operations:Two basic operations: Merge Merge SplitSplit

17

Page 18: Parallel Database System: The Future of High Performance Database Systems

Merge OperationMerge Operation

18

Page 19: Parallel Database System: The Future of High Performance Database Systems

Split OperationSplit Operation SplitSplit

Used to partition or replicate the stream Used to partition or replicate the stream produced by a relational operatorproduced by a relational operator

19

Page 20: Parallel Database System: The Future of High Performance Database Systems

Example of Parallelizing Example of Parallelizing Relational OperatorsRelational Operators

CC

AA B B

20

SCAN SCAN

JOIN

INSERT

Page 21: Parallel Database System: The Future of High Performance Database Systems

Example (contd.)Example (contd.)

21

Page 22: Parallel Database System: The Future of High Performance Database Systems

The State of the Art The State of the Art

TeradataTeradata Tandem Nonstop sqlTandem Nonstop sql GammaGamma The super database computerThe super database computer BubbaBubba

22

Page 23: Parallel Database System: The Future of High Performance Database Systems

Specialized Parallel Specialized Parallel Relational OperatorsRelational Operators

Algorithms for traditional relational Algorithms for traditional relational operators written to improve their operators written to improve their parallel execution, to better handle parallel execution, to better handle data and execution skew.data and execution skew.

Look at joinLook at join Sort mergeSort merge Hash joinHash join

23

Page 24: Parallel Database System: The Future of High Performance Database Systems

CONCLUSIONCONCLUSION

24

Page 25: Parallel Database System: The Future of High Performance Database Systems

THANK YOUTHANK YOU

QUESTIONS ?QUESTIONS ?25