DB Korth 6th Edition
-
Upload
surajit153 -
Category
Documents
-
view
239 -
download
1
Transcript of DB Korth 6th Edition
8/19/2019 DB Korth 6th Edition
http://slidepdf.com/reader/full/db-korth-6th-edition 1/6
CH PTER
8
arallel Databases
his chapter is
suitable
for an advaru:Ed course but can also be used for inde
pendent
study proje
cts
by students of a first course. The chapter covers se
veral
aspects
of the de
sign of parallel
database
systems
-
partitioning
of data,. par
allelization
of indhidual
relational operations, and parallelization of relational
expressions.
The
chapter also
br i
e
fly coven some systems
issues, such
as
cache
coherency
and
failure resiliency.
The most important
applications
of parallel
databases
today re for v..•are-
housing and analyzing large amountsof
data.
Therefore partitioning of d
ata a.. 'ld
parallelquery proc
essing
are co .e:red in significant detail. Query optimiz.ation is
also
of
importance, for
the
same reason.
Ho
•
ever
, parallel query optim.iz.ation
is
still
not a
fully solved problem
; exhaustive search,. as is
used for
sequential
query
optimization is too
expensive in a
parallel
system, forcing
the
use of heuristics..
The
description
of parallel query processing algorithms is based O.. \
the
shared-nothing
mode
l.
Students
may be
asked
to
study
h0\'1'
the
algorithms
can
be: improved if
shared-memory
machines are
used
instead.
13.9 For each
of
the three partitioning techniques, name
ly
roW'ld-robin,. hash
partitioning,
and range
partitioning, gh:e
anexample of
a
query
for
-'hich
that
partitioning
technique , . ould provide
the
fastest response.
Am;wu
R
ound
robin partitioning:
\<\1hen relations
a.re
large and queries
read
entire relations, round
robin
gives
good speed-up
and fast response time.
Hash partitioning
For point
queries
on
the partitioning
attributes
,
this
gives
the
fastest
response, as
each
disk can process a diffemt query simultaneously.
I f
he hash
partitioning
is uniform, entire relation scans can be per
formed
efficiently.
8/19/2019 DB Korth 6th Edition
http://slidepdf.com/reader/full/db-korth-6th-edition 2/6
Ran
ge
partitioning
For
rmge
queries
on the
p
artitioning
a
ttribut
es
,.
, ..hkh
acca1
1 fC \\• tuple1, nnge: partitioning gh·es
the
fastest
re-
•
ponM
.
U . U
\.Vhat faeton
could
raul
t
i n ' '
"
Y
lh n
a relation s parti
ti
oned on
o.. rte
of
it&
attrlbut.ab
y:
. . Huh
partit <mlng?
b. R.angc partit <mlng?
In each ease, ""hat
can be
done to red.Utt the sk.n-?
A
H C L
. . ~ a r t l d o n i n g
Too
man
y rtt0rdl ith
tht u . m t
,.alu.e for thehashing attribute
,.
ora
poo
d r choocn huh function without the properties of randomness
one unlfonnlty, can ...Wt in
• s bw
ed
partition.
To impron the
oitua
dcn,, w e
ohoWd c:xpaiment
with
better
has Ung functions
for
th.at
relation.
b. IW>s«-p&rt doning:
Non-unifonn diltribution of 1"&lun for
the
partitioning a.ttribute
(including dupliate
1"
&lua for the partitioning attribute) ,,+uch
ar
e
not taken
into
account by 1 Nd
partiti.on ng
\.-ector is the main
rcuon for
llccwed
partitions.
Sorting the
relatio.
l
an he
partitioning
attribute and
then
dhi.dirlg i t into n rang e:s i th
equal
number of
tupla per range " 1ll gi1•e 1 g ood putitioning 1·ec
to
r 1vith \ a )
lO'\\
·
&biv.
U.U. Give: an ex.ample of 1 join
that
i1
not
a simple equi-join
for
""hich pa.rti
tione
d
puAilclll.n'I. can
be used.
\t\'Nt
1.ttributes sho
ul
d be used for parti
tioning?
A.n.wa:
\o\'e gh·t: tv.·o eampl1:1 of
such
joins.
a...
r t . A ) - ( t t. ) l
Here: l \ I: N.vc &n cqui1oincond iti
on \\•
hi
ch can
be execu
ted
first, a.nd
th
e
extr
1 conditions
can
be checked
in
d
ependentl
y
on
each
tu
ple
in
the join raul t. Pa.rtitioned pa-rallelism is useful
to
execute
the
equi
Joln,
b. r °'(r..<l:(,"H/»,J-'O)*""«><·IJ'OJ)• l)•»l s
Thi.I
i i a
query
in 1..-hich an r tuple and an s
tuple
join v.i th each
otMr 1f
they
fall
into
the 1arn1 range of \•alues. Hence
partition
ed
puallclltm
applies natuB.lly
to
this
sc
enario, e\"en
though
th
e join
is
l10t an cqul·joln.
F
or
both the
qucrln
, r ahould
be
partitioned on
attrib
u te
A a.."ld
s
on
a ttribute
o
For
the
accond
query,
the putitioning ; should actually be
do
r>« an (l; .8/
20J) •
20
.
8/19/2019 DB Korth 6th Edition
http://slidepdf.com/reader/full/db-korth-6th-edition 3/6
- u
1.8.12 Describe a good ·ay to pa.ral.lehz.e each of the follo·wing:
a. The difference o
peration
b.
.4.ggrega tion
by
the countoperation
c. .4.gg:rega tion
by
the countdistiod:o
pera
tion
d . A
ggre
ga tion
by
the a.vgopera tion
e.
left
outer join.. i the join
conditi
on in
volv
es only equality
f. Left
outer join,. i he join
conditi
on involves
comparisons
other than
equali
ty
g. Full out
er jo
in,.
i h
e join condition
in
v
ol
v
es
comparisons
other
than
equality
Amiwu:
a. \.Ve
can
pa...allelize the difference operation
by parti
tioning the rela
tions
on
all the attributes . an
d
then co
mputing differences loc
ally
at
each
processor .
•
5 in aggregation,
th
e cost o f transferring
tu
pl
es
dur
ing
parti
ti
oning
can
be
red
u
ced
by
pa
rtially
computin
g differences
at
each
processor
,. befor
e p
arti
ti
onin
g.
b.
Let
us refer to the group-by a
ttrib
ute as attribute
A, an
d the attribu te
on ' ·
hich
the
ag
gregation functi
on opera
t
es
; as a ttri
bu
te G.
co
u:ntis
performed
just
like sum. mentioned in he book) except that, a
count
of
the
number of values of a ttribute B for each value o f attribute A
is transferred to the correct destina
ti
on
pro
cessor,
ins
t
ead
of
a
sum.
Aft
er
partiti
onin
g,
th
e
partia
l c
ounts
from
all
the
pr
oce
ssors are
added up
locally at
each
processor to
ge
t the final re s
ul
t.
c. For this . partial counts cannot e
comp
ut
ed
locally before p
arti
tion
ing. Each processor ins.tead transfers all
uni
que Ov
alues
for each A
value to the correct destination processor.
After
partitio
nin
g, each
processor locally counts the number of un ique tuples for each value
o f
A and
then ou tpu ts the final
tt:S-
ult.
d .
This
can
again
be
implemented
lik
e
i ;um
exce
pt
tha
t
for each
v
alu
e
o f A a
ISUDl
of the 9
,-
alues as ,.,·ell as a
coont
of he n
umber of tu
pl
es
in the gro
up
. is transferred
during
partitio.-tlng. Then each processor
o u tpu ts i ts local result .
by
dh..-iding the total sum
by
total numb
er
of
tuples for each A value assigned to its partition.
e . This c
an
be performed just like partition
ed
natural jo
in
.
par
titio
nin
g, each processor
compu
tes the left ou ter jo
in
locally using
any of the strategies of Chapter 1
2.
f. The left outer join can be c
om
puted using a. \ exte. \Sion of the
a g m e n t ~ d e a t e
scheme
to compute non equi-jo ins. Con
sider r s The relations. a.re partition
ed
, and r
bd
s
is
c
om
puted at
8/19/2019 DB Korth 6th Edition
http://slidepdf.com/reader/full/db-korth-6th-edition 4/6
each site.
\>\i
e also collect tuples from r that did not match any tuples
from
s;
c
all
the
set of these danglin
g tu
ples at
s
it
e
i
as
11• • dter
the
abo
ve
step
is
d
on
e
at each
si
te, for
each
fr
agment
of
r
\V
e take the
intersecti
on
of the 1//s from every
pr
ocessor in
\Vhich
the &
agment
of r \-\'as replicated. The intersections give the real set of
dang
ling
tuples
; these tupl
es
a.re
padded ,.,.-}th
nulls an
d added
to the resull
The intersections themseh·es, followed by additi
on
ofpadded tupl
es
to the result,
can
be done
in par
allel by parti
ti
oning.
g. The algorithm is basically the same as abo, e, except that '' hen com
bining results,
th
e processing of dangling tuples must
done
for both
relations.
18.ts escribe
the benefits
and dr
a
v.-b
acks of pipelined parallelism.
An 5wa:
•
Bendits:..'\i
o
need
to
\\'lite in
termediate
relations
to
disk onl
y to
read
them back immediately.
•
Dr awbada;:
a.
Canno
t
take
ad
v
antage
of high d egrees
of
p
ara
llelism,
as
queries do not have large number of operations.
b. Not possible to pipeline o
pera
to
rs
v.·hi
ch
ne
ed
to look
at
all the
inpu
t before producing any ou tput.
c. Since each op
eration
ex
ecu
t
es on
a single
r o c e ~ o r
the
most -
pens
iY
e
ones take
a lo
ng tim
e to
finish.Thus
speed-up v.ill
be lo\V
despite the use of parallelism.
13.14
Suppose
yo
u
\ \ 1
sh
to
han
dl
e a \V
orkloa
d consisting
of
a large
number
of
small r a n . . ~ c t i o n s by using shared-nothing parallelism.
a.. Is in
traqu ery parallelism
required in
such a
si
tu
ation
?
I f
not,
''hy,
and
\Vhat form of
parall
elism is
appr
opriate?
b.
What fo
rm of
skel. · \vould be
of si
gnificance
\\rith
su
ch
a \\'orl<load?
c. Suppose
most
transac
ti
ons accessed
on
e
ua vunl
record, ,
.hi.ch in
-
cl
udes u a v u ~ 1 t y p . . ~ a t t r i b u t e andanassociatednavunl typeJJJtr.ilt.r
record,
'·hich pr
°
-i
des
info
rmation about the acco
unt
ty
pe
. Ho ''
,
.
ould you parti
tion
and/ or replic
ate
data to speed up transac
tions
?
You may
assume
that the
11rro1111 iypeJJu1slt:r
relation is rarely u
p
da ted.
An 5wa:
a..
Intra.query
par
allelism
is probably
not
appr
opriate for
this si
tu
at i
on.
Since each indh-idual transaction is small, the overhead of paral-
lelizing each query may exceed the
po
tential benefits. Interquery
parallelism ·ould be a be
tter
choice, allo,.,.-ing many transactions to
run
in parallel
8/19/2019 DB Korth 6th Edition
http://slidepdf.com/reader/full/db-korth-6th-edition 5/6
b.
Partiti
on
s
ke\.\• can
be a
performance issue
in
th i
s type of syst
em
,.
especially \.\-ith
th
e use ofshared-nothing parallelism.. A load imbal
ance
amongst
the
processors
of h
e
dis
tr i
bu
t
ed
syst
em
can
si
gnificant
reduce
the
speedup gained by parallel execution. F
or
exam ple
,.
if
all
transactio
ns
happen to
inv
olve o
nly the
data in a
sin
gle partiti
on
,.
the
processors not associated
\\ith tha
t partiti
on
.\rill
no
t be used at
a
ll
.
c. Since n ro
l f
I yp .1111J /1.r is rare
ly
updated,.
i t can be
replica ted
in
entirety across all nod
es
. I f he nlrvlf11lrela
tion is
updated
frequently
an
d accesses are , .. ell
-disbibu
t
ed,.
t
sh
ould be partitioned across
n
odes
.
1.&.15 The attribu te on \.\·
hich
a relation is
partition
ed can
ha
ve a significant
impact on the cost o f a query.
a. Givena \\'orkload of
SQ
L
queries on
a single relatio. l. \\•hata
ttri
but
es
\.\•ould be candidates for partitio ning?
b. Ho
v
\vould you choose behv
een th
e altem.ati:ve partitioning tech
ni
qu
es
,
based
on th
e
l\ or
ldoad
?
c. Is t possible
to parti
tion a relation on
more than
one a ttribu te?
Expl
ain
your ans
\\·er
.
a. The candidate a
ttrib
utes ' o
ul
d be
i
Attri
butes on
vhich
o
ne
or more queries
has
a selection
con
d ition.
The
correspon
ding
selecti
on
con
diti
on
can
then
be
ev
al
ua ted at a single processor, instead of
be
ing e .aluated at all
processors..
ii Attri
bu
t
es
involved
in
join conditions.
If
such an a
ttri
bu
te is
used for partitioning, it is
possib
le to perform
th
e join l\-ithou t
repartitioning
th
e relation. This effect is particularly beneficial
for , ·e ry
large
relatio. '\S, for ' '•hich repa.rtitio. '\ing can be
\ ery
expensive.
i i i.
Attri
bu
t
es
inv
oh·
ed
n
r o u
cl
auses
;
similar
to
joins
, it
is
possible
to
perf
orm
aggregation l \-ithout repartitioning
th
e cor
responding relation.
b. A.cos
t-based approach \\
o
des best
n choosing
beh veen altemat
h·
es.
In this
ap
proach,
candida
te
pa
rtitioning choices are genera ted
,.
and
for eac..li.
candidate
the
cos
t of
executin
g
all
the
queries
/ updates in a
l\
orldoad is estimated. The cho ice leading
to
the least cost is picked.
One
is
sue is
tha
t the number of candidate choices is generally ery
large. Al
gori
thms and heuristics d esigned to limit
th
e number
of
ca.. \did
ates
for
' 'hi.ch
costs need
to
be es
timated
are \\•id
el...-
used
in
practice. ·
8/19/2019 DB Korth 6th Edition
http://slidepdf.com/reader/full/db-korth-6th-edition 6/6
Another Luuc
Lt
that
the
,.,,.orldoad
may
have a very large number
of
qucria/ updata. Techniqun to reduce
this number include
the
o ~ i n g
(a) combining repeated occurrences
of
a
query
that
o Uy
differ ln
conatantt
, rtpl.acing
themby
one parametrized
query
along
' -ith
a count ol nw:nber o occurrences and (b) dropping queries
'
·hkh arc
V'U). cheap
ln) ·ay, or
not lilc ely
to be affected
by the
p.utitionlng choke.
c. lt
i pou.U>lc
to
putition a reli ion on more th.an one attribute,. n
two •Y ·
One
Lt to im'Oh•c
multiple
ilttnDutes in a si. lgle composite
putidoningby
.
The
other
1\ ay
is to k.eepmore
tha- l
one copyof
the
. . . .
moder\, partitioned ;,,different
w•ys.
The
latter 4
pproach
s
a.a..
update cools, but
a n
speed up somequems signl.fiGL'llly.