8/12/2019 09 Parallel II 11 02 Ann
1/24
Chapter9
ParallelAlgorithms
AlgorithmTheory
WS2013/14
FabianKuhn
8/12/2019 09 Parallel II 11 02 Ann
2/24
AlgorithmTheory,WS2013/14 FabianKuhn 2
ParallelComputations
:timetoperformcomp.withprocs
:work(total#operations)
Timewhen
doing
the
computation sequentially
:criticalpath/span
Timewhen
parallelizing
as
muchaspossible
LowerBounds:
,
8/12/2019 09 Parallel II 11 02 Ann
3/24
AlgorithmTheory,WS2013/14 FabianKuhn 3
BrentsTheorem
BrentsTheorem:Onprocessors,aparallelcomputationcanbeperformedintime
.
Corollary:Greedyisa2approximationalgorithmforscheduling.
Corollary:Aslongasthenumberofprocessors O ,itis
possibletoachievealinearspeedup.
8/12/2019 09 Parallel II 11 02 Ann
4/24
AlgorithmTheory,WS2013/14 FabianKuhn 4
PRAM
Backto
the
PRAM:
Sharedrandomaccessmemory,synchronouscomputationsteps
ThePRAMmodelcomesinvariants
EREW(exclusiveread,exclusivewrite):
Concurrentmemoryaccessbymultipleprocessorsisnotallowed
If
two
or
more
processors
try
to
read
from
or
write
to
the
same
memorycellconcurrently,thebehaviorisnotspecified
CREW(concurrentread,exclusivewrite):
Readingthe
same
memory
cell
concurrently
is
OK
Twoconcurrentwritestothesamecellleadtounspecified
behavior
This
is
the
first
variant
that
was
considered
(already
in
the
70s)
8/12/2019 09 Parallel II 11 02 Ann
5/24
AlgorithmTheory,WS2013/14 FabianKuhn 5
PRAM
ThePRAM
model
comes
in
variants
CRCW(concurrentread,concurrentwrite):
Concurrent
reads
and
writes
are
both
OK
Behaviorofconcurrentwriteshastospecified
WeakCRCW:concurrentwriteonlyOKifallprocessorswrite0
CommonmodeCRCW:allprocessorsneedtowritethesamevalue
ArbitrarywinnerCRCW:adversarypicksoneofthevalues
PriorityCRCW:valueofprocessorwithhighestIDiswritten
StrongCRCW:largest(orsmallest)valueiswritten
Thegivenmodelsareorderedinstrength:
weak commonmode arbitrarywinner priority strong
8/12/2019 09 Parallel II 11 02 Ann
6/24
AlgorithmTheory,WS2013/14 FabianKuhn 6
SomeRelationsBetweenPRAMModels
Theorem:A
parallel
computation
that
can
be
performed
in
time
,
usingprocessorsonastrongCRCWmachine,canalsobe
performedintime log usingprocessorsonanEREW
machine. Each(parallel)stepontheCRCWmachinecanbesimulatedby
log stepsonanEREWmachine
Theorem:Aparallelcomputationthatcanbeperformedintime,
usingprobabilisticprocessorsonastrongCRCWmachine,can
alsobeperformedinexpectedtime log using log
processorson
an
arbitrary
winner
CRCW
machine.
Thesamesimulationturnsoutmoreefficientinthiscase
8/12/2019 09 Parallel II 11 02 Ann
7/24
AlgorithmTheory,WS2013/14 FabianKuhn 7
SomeRelationsBetweenPRAMModels
Theorem:A
computation
that
can
be
performed
in
time
,
using
processorsonastrongCRCWmachine,canalsobeperformedin
time using processorsonaweakCRCWmachine
Proof: Strong:largestvaluewins,weak:onlyconcurrentlywriting0 isOK
8/12/2019 09 Parallel II 11 02 Ann
8/24
AlgorithmTheory,WS2013/14 FabianKuhn 8
SomeRelationsBetweenPRAMModels
Theorem:A
computation
that
can
be
performed
in
time
,
using
processorsonastrongCRCWmachine,canalsobeperformedin
time using processorsonaweakCRCWmachine
Proof: Strong:largestvaluewins,weak:onlyconcurrentlywriting0 isOK
8/12/2019 09 Parallel II 11 02 Ann
9/24
8/12/2019 09 Parallel II 11 02 Ann
10/24
AlgorithmTheory,WS2013/14 FabianKuhn 10
ComputingtheMaximum
Theorem:If
each
value
can
be
represented
using
log bits,
the
maximumof(integer)valuescanbecomputedintime1 using
processorsonaweakCRCWmachine.
Proof:
Firstlookat
highestorderbits
The
maximum
value
also
has
the
maximum
among
those
bits Thereareonly possibilitiesforthesebits
max.of
highestorderbitscanbecomputedin 1 time
Forthosewithlargest
highestorderbits,continuewith
nextblockof
bits,
8/12/2019 09 Parallel II 11 02 Ann
11/24
AlgorithmTheory,WS2013/14 FabianKuhn 11
PrefixSums
Thefollowing
works
for
any
associative
binary
operator
:
associativity:
AllPrefix
Sums:Given
asequence
of
values
, , ,theall
prefixsumsoperationw.r.t. returnsthesequenceofprefixsums:
, , , , , , ,
Canbecomputedefficientlyinparallelandturnsouttobean
importantbuildingblockfordesigningparallelalgorithms
Example:Operator:,input:, , 3, 1, 7, 0, 4, 1, 6, 3
, ,
8/12/2019 09 Parallel II 11 02 Ann
12/24
AlgorithmTheory,WS2013/14 FabianKuhn 12
ComputingtheSum
Letsfirst
look
at
Parallelizeusingabinarytree:
8/12/2019 09 Parallel II 11 02 Ann
13/24
AlgorithmTheory,WS2013/14 FabianKuhn 13
ComputingtheSum
Lemma:The
sum
canbecomputedin
timelog onanEREWPRAM.Thetotalnumberof
operations(totalwork)is.
Proof:
Corollary:Thesumcanbecomputedintime log using
log processorsonanEREWPRAM.
Proof:
FollowsfromBrentstheorem( , log )
8/12/2019 09 Parallel II 11 02 Ann
14/24
8/12/2019 09 Parallel II 11 02 Ann
15/24
8/12/2019 09 Parallel II 11 02 Ann
16/24
AlgorithmTheory,WS2013/14 FabianKuhn 16
ComputingThePrefixSums
Foreach
node
of
the
binary
tree,
define
as
follows:
isthesumofthevaluesattheleavesinalltheleftsub
treesofancestorsofsuchthatisintherightsubtreeof.
Foraleafnode holdingvalue:
Fortherootnode:
Forallothernodes:
istheleftchildof:
istherightchildof:
(hasleftchild)
(:sumofvaluesin
subtreeof)
8/12/2019 09 Parallel II 11 02 Ann
17/24
AlgorithmTheory,WS2013/14 FabianKuhn 17
ComputingThePrefixSums
leafnode
holding
value
:
rootnode:
Nodeistheleftchildof:
Nodeis
the
right
child
of
:
Where: sumofvaluesinleftsubtreeof
Algorithmtocomputevalues:
1. Computesumofvaluesineachsubtree(bottomup)
Canbedoneinparalleltime log with totalwork
2. Computevalues topdownfromroottoleaves:
Tocompute
the
value
,
only
of
the
parent
and
the
sum
of
the
leftsibling(ifisarightchild)areneeded
Canbedoneinparalleltime log with totalwork
8/12/2019 09 Parallel II 11 02 Ann
18/24
8/12/2019 09 Parallel II 11 02 Ann
19/24
AlgorithmTheory,WS2013/14 FabianKuhn 19
ComputingPrefixSums
Theorem:Given
asequence
, , ofvalues,allprefixsums
(for1 )canbecomputedintimelog
using log processorsonanEREWPRAM.
Proof:
Computingthesumsofallsubtreescanbedoneinparallelin
time log using totaloperations.
Thesame
is
true
for
the
top
down
step
to
compute
the
ThetheoremthenfollowsfromBrentstheorem:
, log
Remark:Thiscanbeadaptedtootherparallelmodelsandto
differentwaysofstoringthevalue(e.g.,arrayorlist)
8/12/2019 09 Parallel II 11 02 Ann
20/24
8/12/2019 09 Parallel II 11 02 Ann
21/24
8/12/2019 09 Parallel II 11 02 Ann
22/24
8/12/2019 09 Parallel II 11 02 Ann
23/24
AlgorithmTheory,WS2013/14 FabianKuhn 23
ApplyingtoQuicksort
Theorem:On
an
EREW
PRAM,
using
processors,
randomized
quicksortcanbeexecutedintime(inexpectationandwith
highprobability),where
log
log .
Proof:
Remark:
Wegetoptimal(linear)speedupw.r.t.tothesequential
algorithm
for
all
log .
8/12/2019 09 Parallel II 11 02 Ann
24/24
AlgorithmTheory,WS2013/14 FabianKuhn 24
OtherApplicationsofPrefixSums
Prefixsums
are
avery
powerful
primitive
to
design
parallel
algorithms.
Particularlyalsobyusingotheroperatorsthan+
ExampleApplications:
Lexicalcomparisonofstrings
Addmultiprecisionnumbers
Evaluatepolynomials
Solverecurrences
Radixsort/quicksort
Searchfor
regular
expressions
Implementsometreeoperations
Top Related