Scalable Distributed Database System: SD-SQL Server
description
Transcript of Scalable Distributed Database System: SD-SQL Server
1
Scalable Distributed Database System: SD-SQL Server
Soror SAHRICeria, Paris-Dauphine University
Journées Académiques MicrosoftParis, April 2006
2
Overview
Objective SD-SQL Server Architecture Application Interface Implementation Performance Analysis Conclusion
3
Objective
Most of DBSs have distributed/parallel versions SQL Server, Oracle, DB2…
DBSs do not provide dynamic partitioning of relational tables All require manual (distributed) repartitioning when tables scale-up
4
Objective
http://ceria.dauphine.fr/CERIA-publications.html Research Report, December 2005
[Oracle Database 10g]
5
Objective
Scalable Distributed Partitioning of Relational Tables
Scalable Distributed Database System (SD-DBS)
Litwin, Rich & Schwartz “Architecture for a scalable Distributed DBSs”
2nd Intl. Workshop on Cooperative Internet Computing, 2002
6
SD-SQL Server
SD-SQL Server is the first prototype SD-DBSGeneralizes SQL Server
Shared Nothing Architecture Embeds SQL Server 2000 (at present) Up to 250 nodes (at present)
7
SD-SQL Server Architecture
LinkedSQL
Servers
D1 D2 Di Di+1
S S PC D1_T
_D1_T _D1_T
User/Application
T
sd_create_table
User/Application
sd_insert
_D1_TSplit
NDBs
SD-SQLserver
SD-SQLserver
SD-SQLclient
SD-SQL Server
Managers
SD-SQLpeer
8
Nodes, NDBs and SDBs
DB1 SDB
Node1 Node2 Node3 Nodei
DB1
……
DB1 DB1
DB2 SDB
DB2 DB2
MDB
9
NDB Types
An SD-SQL Server NDB can be: Client NDB
Carries only images Application interfaces
Server NDB Carries only the segments
Peer NDB Both functions
Primary NDB First created for an SDB
10
Scalable Tables
A scalable (distributed) table is a collection of segments Segments are SQL tables
A scalable table has, initially, only one primary segment At some server or peer NDB
The number of segments in a scalable table is variable Overflowing segments split
11
Splitting
A split occurs when an insert overflows the segment capacity
Splits produce other segments for a scalable table. Each is located at a different NDB
Within the SDB
If there is not enough NDBs, splits dynamically append new ones
12
SDB DB1SDB DB1SDB DB1
Splitting
Scalable Table T
sd_insert
N1 N2 N4N3
NDBDB1
NDBDB1
NDBDB1
sd_insert
NDBDB1
Ni
sd_create_node
sd_insert
N3
NDBDB1
sd_create_node_database
NDBDB1
…….
sd_create_node_database
13
Splitting
S
b+1
S S1
pb+1-p
p=INT(b/2)
C( S)= { c: c < h = c (b+1-p)}
C( S1)={c: c > = c (b+1-p)}
Check Constraint?
b
SELECT TOP Pi * INTO Ni.Si FROM S ORDER BY C ASCSELECT TOP Pi * WITH TIES INTO Ni.S1 FROM S ORDER BY C ASC
14
Images
Images hide the scalable table segments An image is a distributed updateable partitioned
view of a scalable table Union-all view with check constraints
An image presents the scalable table partitioning It do not address any new segments resulted from a
split
15
Images
Primary image Resides at the creation node Has the name of the scalable table
Secondary images Reside at other client or peer NDBs of the SDB Have a specific name, other than that of the
table To avoid name conflict
Scalable Views Carry images at several levels
16
Images
DB1 SDB
N1.DB1 N2.DB1 N3.DB1
T Scalable Table
PrimaryImage
CREATE VIEW T AS SELECT * FROM N1.DB1._Ni_TCREATE VIEW T AS SELECT * FROM N1.DB1._Ni_T UNION ALL SELECT * FROM N2.DB1._Ni_T UNION ALL SELECT * FROM N3.DB1._Ni_T
17
Application Interface
Applications manipulate SD-SQL Server through its command interface.
SD-SQL Server commands start with ‘sd_’ To distinguish from SQL Server commands for static
tables.
INSERT sd_insert
CREATE TABLE sd_create_table
18
Application Interface:The SkyServer Benchmark
Use of the SkyServer DB as benchmark http://research.microsoft.com/~gray/SDSS Use of the PhotoObj table as a scalable table. PhotoObj has 158,426 tuples (about 260 MB)
19
Application Interface: Node Management
Node Creation sd_create_node ‘Dell1’ /* Server by default */ sd_create_node ‘Ceria’, ‘client’
Node Alteration sd_alter_node ‘Ceria’, ‘ADD server’ /* Becomes peer*/
Node Removal sd_drop_node ‘Ceria’
20
Application Interface: NDBs &SDBs Management
SDB Creation sd_create_scalable_database
‘SkyServer’, ‘Dell1’, ‘Server’, 2/* Creates the primary SkyServer NDB as well at Dell1*/
NDB Creation sd_create_node_database
‘SkyServer’, ‘Ceria’, ‘Client’ SDB Removal
sd_drop_scalable_database ‘SkyServer’
21
Application Interface: Scalable Tables Management
Scalable Table Creation sd_create_table ‘PhotoObj (objid BIGINT PRIMARY
KEY…)’, 10000 Scalable Table Alteration
sd_alter_table ‘PhotoObj ADD t INT’, 1000 sd_create_index ‘run_index ON Photoobj (run)’ sd_drop_index ‘PhotoObj.run_index’
Scalable Table Removal sd_drop_table ‘PhotoObj’
22
Application Interface: Scalable Tables Management
Insert sd_insert ‘INTO PhotoObj SELECT * FROM
Ceria5.Skyserver-S.PhotoObj’ Search
sd_select ‘* FROM PhotoObj’ sd_select ‘TOP 5000 * INTO PhotoObj1 FROM
PhotoObj’, 500
23
Application Interface
See “Architecture and Interface of Scalable Distributed Database System SD-SQL Server”, IASTED-DBA 2006
24
Implementation
Commands are distributed stored procedures Processed as distributed transactions at Repeatable
Read isolation level See
“ Prototyping a Scalable Distributed Database System SD-SQL Server”, WDAS 2006
25
Performance Analysis (1)
0
500
1000
1500
2000
39500 79000 158000
Segment Capacity (tuples)
Exe
cuti
on
Tim
e (m
s)
SD-SQL Server Peer With IA SD-SQL Server Peer
SQL Server Peer SD-SQL Server Client With IA
SD-SQL Server Client SQL Server Client
Query (Q1) execution time
(Q1) sd_select ‘COUNT (*) FROM PhotoObj’
26
Performance Analysis (2)
9862.4
11578.6610898
13036.5 14071.66
11712
0
2000
4000
6000
8000
10000
12000
14000
16000
2 3 4
Number of Segments
Execu
tio
n T
ime (
ms)
IA IC
Query (Q2) with image checking only (IC) and with image adjustment (IA)
(Q2) sd_select ‘TOP 10000 x.objid FROM photoobj x, photoobj y WHERE x.obj=y.obj AND x.objid>y.objid’
27
Performance Analysis (3)
sd_select ‘COUNT (*) FROM PhotoObj’
326250
220156
93
343256
226164
106
436
356
93
203283
16 76123
203 220
0100200300400500
1 2 3 4 5
Number of Segments
Exe
cutio
n T
ime
(ms)
SQL Server-Distr SD-SQL Server
SQL Server-Centr. SD-SQL Server LSV
Comparison between SD-SQL Server and SQL Server
28
Conclusion
Scalable distributed partitioning of relational tables is now reality with SD-SQL Server No manual repartitioning
Unlike in any other DBS we know about
Performance analysis proves Efficiency of our design Immediate utility of SD-SQL Server
29
Future Works
Management of replication, data failures.. Non-availability of a server….
Management of the data merging
Application on other DBMSs Oracle, DB2….
30
Work partly supported by
CEE Project EGov MS Research CEE Project ICONS IBM Almaden Res. Cntr., CA HP Laboratories, CA
31
Thank You
Questions?
Soror SAHRICeria, Paris-Dauphine University
Journées Académiques MicrosoftParis, April 2006