Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel...

93
Laboratoi re Informati que Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch, Denis Trystram Projet MOAIS [ INRIA / CNRS / INPG / UJF ] moais.imag.fr

Transcript of Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel...

Page 1: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

LaboratoireInformatiquede Grenoble

Parallel algorithms and scheduling:

adaptive parallel programming and

applications

Bruno Raffin, Jean-Louis Roch, Denis Trystram Projet MOAIS [ INRIA / CNRS / INPG / UJF ]

moais.imag.fr

Page 2: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 3: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Why adaptive algorithms and how?

7 3 6

0 1 8

0 0 5

⎢ ⎢ ⎢

⎥ ⎥ ⎥

Input data varyResources availability is versatile

Adaptation to improve performances

Scheduling• partitioning • load-balancing• work-stealing

Measures onresources

Measures on data

Calibration • tuning parameters block size/ cache choice of instructions, …• priority managing

Choices in the algorithm • sequential / parallel(s) • approximated / exact• in memory / out of core• …

An algorithm is « hybrid » iff there is a choice at a high level between at least two algorithms, each of them could solve the same problem

Page 4: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 5: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Modeling an hybrid algorithm

• Several algorithms to solve a same problem f : – Eg : algo_f1, algo_f2(block size), … algo_fk : – each algo_fk being recursive

Adaptationto choose algo_fj for

each call to f

algo_fi ( n, … ) { …. f ( n - 1, … ) ; …. f ( n / 2, … ) ; …}

.

E.g. “practical” hybrids: • Atlas, Goto, FFPack• FFTW• cache-oblivious B-tree• any parallel program with scheduling support: Cilk, Athapascan/Kaapi, Nesl,TLib…

Page 6: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• How to manage overhead due to choices ? • Classification 1/2 :

– Simple hybrid iff O(1) choices [eg block size in Atlas, …]

– Baroque hybrid iff an unbounded number of choices

[eg recursive splitting factors in FFTW]

• choices are either dynamic or pre-computed based on input properties.

Page 7: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• Choices may or may not be based on architecture parameters.

• Classification 2/2. :

an hybrid is– Oblivious: control flow does not depend neither on static properties of the resources nor on the input

[eg cache-oblivious algorithm [Bender]

– Tuned : strategic choices are based on static parameters [eg block size w.r.t cache, granularity, ]

• Engineered tuned or self tuned[eg ATLAS and GOTO libraries, FFTW, …][eg [LinBox/FFLAS] [ Saunders&al]

– Adaptive : self-configuration of the algorithm, dynamlc• Based on input properties or resource circumstances discovered at run-time

[eg : idle processors, data properties, …] [eg TLib Rauber&Rünger]

Page 8: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Examples

• BLAS libraries– Atlas: simple tuned (self-tuned)– Goto : simple engineered (engineered tuned)– LinBox / FFLAS : simple self-tuned,adaptive [Saunders&al]

• FFTW– Halving factor : baroque tuned– Stopping criterion : simple tuned

Page 9: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Dynamic architecture : non-fixed number of resources, variable speedseg: grid, … but not only: SMP server in multi-users mode

Adaptation in parallel algorithmsProblem: compute f(a)

Sequentialalgorithm

parallelP=2

parallelP=100

parallelP=max

...

Multi-user SMP server GridHeterogeneous network

?Which algorithm to choose ?

… …

Page 10: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Dynamic architecture : non-fixed number of resources, variable speedseg: grid, … but not only: SMP server in multi-users mode

=> motivates « processor-oblivious » parallel algorithm that : + is independent from the underlying architecture:

no reference to p nor i(t) = speed of processor i at time t nor …

+ on a given architecture, has performance guarantees : behaves as well as an optimal (off-line, non-oblivious) one

Processor-oblivious algorithms

Page 11: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

How to adapt the application ?

• By minimizing communications • e.g. amortizing synchronizations in the simulation

[Beaumont, Daoudi, Maillard, Manneback, Roch - PMAA 2004]

adaptive granularity

• By contolling latency (interactivity constraints) :• FlowVR [Allard, Menier, Raffin]

overhead• By managing node failures and resilience [Checkpoint/restart][checkers]

• FlowCert [Jafar, Krings, Leprevost; Roch, Varrette]

• By adapting granularity• malleable tasks [Trystram, Mounié]

• dataflow cactus-stack : Athapascan/Kaapi [Gautier] • recursive parallelism by « work-stealling »

[Blumofe-Leiserson 98, Cilk, Athapascan, ... ]

• Self-adaptive grain algorithms • dynamic extraction of paralllelism

[Daoudi, Gautier, Revire, Roch - J. TSI 2004 ]

Page 12: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallelism and efficiency

Difficult in general (coarse grain)

But easy if T small (fine grain)

Tp = T1/p + T [Greedy scheduling, Graham69]

Expensive in general (fine grain)But small overhead if coarse grain

Schedulingefficient policy

(close to optimal)

control of the policy (realisation)

Problem : how to adapt the potential parallelism to the resources ?

«Depth »

parallel time on resources

T = #ops on a critcal path

∞T

« Work »sequential time

T1= #operations

=> to have T small with coarse grain control

Page 13: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 14: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applicationsBruno Raffin, Jean-Louis Roch, Denis Trystram

INRIA-CNRS Moais team - LIG Grenoble, FranceContents

I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis] Conclusion

Page 15: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

A COMPLETER Denis

Page 16: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 17: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applicationsBruno Raffin, Jean-Louis Roch, Denis Trystram

INRIA-CNRS Moais team - LIG Grenoble, FranceContents

I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis] Conclusion

Page 18: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Work-stealing (1/2)

«Depth »

W = #ops on a critical path

(parallel time on resources)

• Workstealing = “greedy” schedule but distributed and randomized

• Each processor manages locally the tasks it creates• When idle, a processor steals the oldest ready task on a

remote -non idle- victim processor (randomly chosen)

« Work »

W1= #total

operations performed

Page 19: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Work-stealing (2/2)

«Depth »

W = #ops on a critical path

(parallel time on resources)

« Work »

W1= #total

operations performed

• Interests : -> suited to heterogeneous architectures with slight modification [Bender-Rabin02]

-> if W small enough near-optimal processor-oblivious schedule with good probability on p processors with average speeds ave

NB : #succeeded steals = #task migrations < p W [Blumofe 98, Narlikar 01, Bender 02]

• Implementation: work-first principle [Cilk serie-parallel, Kaapi dataflow] -> Move scheduling overhead on the steal operations (infrequent case)-> General case : “local parallelism” implemented by sequential function call

Page 20: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Work stealing scheduling of a parallel recursive fine-grain algorithm

• Work-stealing scheduling• an idle processor steals the oldest ready task

• Interests :=> #succeeded steals < p. T [Blumofe 98, Narlikar 01, ....]

=> suited to heterogeneous architectures [Bender-Rabin 03, ....]

• Hypothesis for efficient parallel executions: • the parallel algorithm is « work-optimal »

• T is very small (recursive parallelism)

• a « sequential » execution of the parallel algorithm is valid

• e.g. : search trees, Branch&Bound, ...

• Implementation : work-first principle [Multilisp, Cilk, …]

• overhead of task creation only upon steal request: sequential degeneration of the parallel algorithm

• cactus-stack management

Page 21: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• Intérêt : Grain fin « statique », mais contrôle dynamique

• Inconvénient: surcôut possible de l’algorithme parallèle [ex. préfixes]

f2

Implementation of work-stealing

fork f2

f1() { ….

fork f2 ; …

} steal

f1

P

+ non-préemptive execution of ready task

P’

Hypothesis : a sequential schedule is valid

f1

Stack

Page 22: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 23: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Modèle de coût : avec une grande probabilité, sur p proc. Identiques - Temps d’exécution =

- nombre de requètes de vols =

Page 24: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 25: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Experimentation: knary benchmark

SMP ArchitectureOrigin 3800 (32 procs)

Cilk / Athapascan

Distributed Archi.iCluster Athapascan

#procs Speed-Up

8 7,83

16 15,6

32 30,9

64 59,2

100 90,1

Ts = 2397 s T1 = 2435

Page 26: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

How to obtain an efficientfine-grain algorithm ?

• Hypothesis for efficiency of work-stealing : • the parallel algorithm is « work-optimal » • T is very small (recursive parallelism)

• Problem :• Fine grain (T small) parallel algorithms may

involve a large overhead with respect to a sequential efficient algorithm: • Overhead due to parallelism creation and synchronization• But also arithmetic overhead

Page 27: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Work-stealing and adaptability • Work-stealing ensures allocation of processors to tasks transparently to

the application with provable performances• Support to addition of new resources• Support to resilience of resources and fault-tolerance (crash faults, network, …)

• Checkpoint/restart mechanisms with provable performances [Porch, Kaapi, …]

• “Baroque hybrid” adaptation: there is an -implicit- dynamic choice between two algorithms

• a sequential (local) algorithm : depth-first (default choice)• A parallel algorithm : breadth-first• Choice is performed at runtime, depending on resource idleness

• Well suited to applications where a fine grain parallel algorithm is also a good sequential algorithm [Cilk]:

• Parallel Divide&Conquer computations • Tree searching, Branch&X …

-> suited when both sequential and parallel algorithms perform (almost) the same number of operations

Page 28: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Self-adaptive grain algorithm• Principle :

To save parallelism overhead by privilegiating a sequential algorithm :

=> use parallel algorithm only if a processor becomes idle by extracting parallelism from a sequential computation

• Hypothesis : two algorithms : • - 1 sequential : SeqCompute

• - 1 parallel : LastPartComputation => at any time, it is possible to extract parallelism from the remaining computations of the sequential algorithm

SeqCompute

Extract_parLastPartComputation

SeqCompute

Page 29: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Generic self-adaptive grain algorithm

Page 30: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis] VIII Adaptation for fault toleranceConclusion

Page 31: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• General approach: to mix both • a sequential algorithm with optimal work W1 • and a fine grain parallel algorithm with minimal critical time W

• Folk technique : parallel, than sequential • Parallel algorithm until a certain « grain »; then use the sequential one• Drawback : W increases ;o) …and, also, the number of steals

• Work-preserving speed-up technique [Bini-Pan94] sequential, then parallel Cascading [Jaja92] : Careful interplay of both algorithms to build one with both

W small and W1 = O( Wseq ) • Use the work-optimal sequential algorithm to reduce the size • Then use the time-optimal parallel algorithm to decrease the time

Drawback : sequential at coarse grain and parallel at fine grain ;o(

How to get both optimal work W1 and W small?

Page 32: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Illustration : f(i), i=1..100 LastPart(w)

W=2..100

SeqComp(w)sur CPU=A

f(1)

Page 33: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Illustration : f(i), i=1..100 LastPart(w)

W=3..100

SeqComp(w)sur CPU=A

f(1);f(2)

Page 34: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Illustration : f(i), i=1..100 LastPart(w) on CPU=B

W=3..100

SeqComp(w)sur CPU=A

f(1);f(2)

Page 35: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Illustration : f(i), i=1..100

SeqComp(w)sur CPU=A

f(1);f(2)

LastPart(w)on CPU=B

W=3..51

SeqComp(w’)

LastPart(w’)

W’=52..100

LastPart(w)

Page 36: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Illustration : f(i), i=1..100

SeqComp(w)sur CPU=A

f(1);f(2)

W=3..51

SeqComp(w’)

LastPart(w’)

W’=52..100

LastPart(w)

Page 37: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Illustration : f(i), i=1..100

SeqComp(w)sur CPU=A

f(1);f(2)

W=3..51

SeqComp(w’)sur CPU=B

f(52)

LastPart(w’)

W’=53..100

LastPart(w)

Page 38: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Produit iteré Séquentiel, parallèle, adaptatif

[Davide Vernizzi]● Séquentiel :

● Entrée: tableau de n valeurs

● Sortie:

● c/c++ code:

for (i=0; i<n; i++)

res += atoi(x[i]);

● Algorithme parallèle :

● calcul récursif par bloc (arbre binaire avec fusion)

● Taille de bloc = pagesize

● Code kaapi : athapascan API

f (x i

i=1

n

∑ )

Expérimentation : parallèle <=> adaptatif

Page 39: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Variante : somme de pages

● Entrée: ensemble de n pages. Chaque page est un tableau de valeurs

● Sortie: une page où chaque élément estla somme des éléments de même indice des pages précédentes

● c/c++ code:

for (i=0; i<n; i++)

for (j=0; j<pageSize; j++)

res [j] += f (pages[i][j]);

res ji 0

n 1

f p a g e i , j

Expérimentation : - l’algorithme parallèle coûte environ 2 fois plus que l’algorithme séquentiel - l’algorithme adaptatif a une efficacité proche de 1

Page 40: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Démonstration sur ensibull

Script: [vernizzd@ensibull demo]$ more go-tout.sh #!/bin/sh./spg /tmp/data &./ppg /tmp/data 1 --a1 -thread.poolsize 3 &./apg /tmp/data 1 --a1 -thread.poolsize 3 &

Résultat: [vernizzd@ensibull demo]$ ./go-tout.sh Page size: 4096Memory allocatedMemory allocated0:In main: th = 1, parallel0: -----------------------------------------0: res = -2.048e+07

0: time = 0.408178 s ADAPTATIF (3 procs)0: Threads created: 540: -----------------------------------------0: res = -2.048e+07

0: time = 0.964014 s PARALLELE (3 procs)0: #fork = 74970: -----------------------------------------: -----------------------------------------: res = -2.048e+07

: time = 1.15204 s SEQUENTIEL (1 proc): -----------------------------------------

Page 41: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Algorithme adaptatif (1/3)

● Hypothèse: ordonnancement non préemptif - de type work-stealing

● Couplage séquentiel adaptatif :

void Adaptative (a1::Shared_w<Page> *resLocal, DescWork dw) {// cout << "Adaptative" << endl; a1::Shared <Page> resLPC;

a1::Fork<LPC>() (resLPC, dw);

Page resSeq (pageSize); AdaptSeq (dw, &resSeq); a1::Fork <Merge> () (resLPC, *resLocal, resSeq);}

Page 42: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Algorithme adaptatif (2/3)

● Côté séquentiel :

void AdaptSeq (DescWork dw, Page *resSeq){ DescLocalWork w; Page resLoc (pageSize); double k; while (!dw.desc->extractSeq(&w)) { for (int i=0; i<pageSize; i++ ) { k = resLoc.get (i) + (double) buff[w*pageSize+i]; resLoc.put(i, k); } } *resSeq=resLoc;}

Page 43: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Algorithme adaptatif (3/3)● Côté extraction = algorithme parallèle :

struct LPC { void operator () (a1::Shared_w<Page> resLPC, DescWork dw){ DescWork dw2; dw2.Allocate(); dw2.desc->l.initialize(); if (dw.desc->extractPar(&dw2)) { a1::Shared<Page> res2; a1::Fork<AdaptativeMain>() (res2, dw2.desc->i, dw2.desc->j); a1::Shared<Page> resLPCold; a1::Fork<LPC>() (resLPCold, dw); a1::Fork<MergeLPC>() (resLPCold, res2, resLPC); } }};

Page 44: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Application : gzip parallelisation

• Gzip :

• Utilisé (web) et coûteux bien que de complexité linéaire

• Code source :10000 lignes C, structures de données complexes

• Principe : LZ77 + arbre Huffman

• Pourquoi gzip ?• Problème P-complet, mais parallélisation pratique possible similar to iterated

product

• Inconvénient: toute parallélisation (connue) entraîne un surcoût • -> perte de taux de compression

Page 45: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Fichiercompressé

Fichieren entrée

Compressionà la volée

Algorithme

Partition dynamique en blocs

Parallélisation « facile  » ,100% compatible avec gzip/gunzip

Problèmes : perte de taux de compression, grain dépend de la machine, surcoût

Blocs compressés

Compressionparallèle

Partition statique en blocs

Parallélisation

=>

=>

Comment paralléliser gzip ?

Page 46: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Outputcompressedfile

InputFile

Compressionà la volée

SeqComp LastPartComputation

Outputcompressedblocks

Parallelcompression

Parallélisation gzip à grain adaptatif

Dynamicpartitionin blocks

cat

Page 47: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Performances sur SMPDuree de recherche dans de larges fichiers

DNA locaux

0

2

4

6

8

10

12

31924544 63849088

Taille des fichiers (octets)

Duee (sec.)

sequential parallel (12 threads)

Pentium 4x200 Mhz

Page 48: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Performances en distribué

Duree de recherche a travers deux disques non-locaux

0

200

400

600

800

1000

1200

674021 1228427

Taille des repertoires (octets)

Duree (sec.)

sequential 1 node, 4 threads 2 nodes, 4 threads

Séquentiel Pentium 4x200 Mhz

SMP Pentium 4x200 Mhz

Architecture distribuée Myrinet Pentium 4x200 Mhz + 2x333 Mhz

Recherche distribuée dans 2 répertoires de même taille chacun sur un disque distant (NFS)

Page 49: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Taille

Fichiers

Gzip Adaptatif

2 procs

Adaptatif

8 procs

Adaptatif

16 procs

0,86 Mo 272573 275692 280660 280660

5,2 Mo 1,023Mo 1,027Mo 1,05Mo 1,08 Mo

9,4 Mo 6,60 Mo 6,62 Mo 6,73 Mo 6,79 Mo

10 Mo 1,12 Mo 1,13 Mo 1,14 Mo 1,17 Mo

5,2 Mo 3,35 s 0,96 s 0,55 s

9,4 Mo 7,67 s 6,73 s 6,79 s

10 Mo 6,79 s 1,71 s 0,88 s

Surcoût en taille de fichier comprimé

Gain en temps

Page 50: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Performances

4 processors computer

0

10

20

30

40

50

60

70

80

90

1,106 2,089 2,263 4,260 6,769 7,905 8,960 10,957 15,962 19,298 21,914

Size of file (Ko)

Time (in seconds)

Sequential gzip

Athapascan gzipPentium 4x200Mhz

Page 51: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis] VIII Adaptation for fault toleranceConclusion

Page 52: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• Solution: to mix both a sequential and a parallel algorithm

• Basic technique : • Parallel algorithm until a certain « grain »; then use the sequential one• Problem : W increases

also, the number of migration … and the inefficiency ;o(

• Work-preserving speed-up [Bini-Pan 94] = cascading [Jaja92]

Careful interplay of both algorithms to build one withboth W small and W1 = O( Wseq )

• Divide the sequential algorithm into block• Each block is computed with the (non-optimal) parallel algorithm• Drawback : sequential at coarse grain and parallel at fine grain ;o(

• Adaptive granularity : dual approach : • Parallelism is extracted at run-time from any sequential task

But often parallelism has a cost !

Page 53: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• Prefix problem : • input : a0, a1, …, an • output : 0, 1, …, n with

• Sequential algorithm : for (i= 0 ; i <= n; i++ ) [ i ] = [ i – 1 ] * a [ i ] ;

• Fine grain optimal parallel algorithm [Ladner-Fischer]:

Prefix computation

Critical time W =2. log n

but performs W1 = 2.n ops

Twice more expensive   than the sequential …

a0 a1 a2 a3 a4 … an-1 an

* * **

Prefix of size n/2 1 3 … n

2 4 … n-1

** *

performs W1 = W = n operations

Page 54: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

• Any parallel algorithm with critical time W runs on p processors in time

– strict lower bound : block algorithm + pipeline [Nicolau&al. 1996]

–Question : How to design a generic parallel algorithm, independent from the architecture, that achieves optimal performance on any given architecture ? –> to design a malleable algorithm where scheduling suits the number of operations performed to the architecture

Prefix computation : an example where parallelism always costs

Page 55: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

- Heterogeneous processors with changing speed [Bender-Rabin02]

=> i(t) = instantaneous speed of processor i at time t in #operations per second

- Average speed per processor for a computation with duration T :

- Lower bound for the time of prefix computation :

Architecture model

Page 56: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Alternative : concurrently sequential and parallelBased on the Work-first principle : Executes always a sequential algorithm to reduce parallelism overhead

use parallel algorithm only if a processor becomes idle (ie steals) by extracting parallelism from a sequential computation

Hypothesis : two algorithms : • - 1 sequential : SeqCompute

- 1 parallel : LastPartComputation : at any time, it is possible to extract parallelism from the remaining computations of the sequential algorithm

– Self-adaptive granularity based on work-stealingSeqCompute

Extract_parLastPartComputation

SeqCompute

Page 57: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel

Sequential

0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12

Work-stealer 1

MainSeq.

Work-stealer 2

Adaptive Prefix on 3 processors

1

Steal request

Page 58: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel

Sequential

Adaptive Prefix on 3 processors

0 a1 a2 a3 a4

Work-stealer 1

MainSeq. 1

Work-stealer 2

a5 a6 a7 a8

a9 a10 a11 a127

3

Steal request

2

6 i=a5*…*ai

Page 59: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel

Sequential

Adaptive Prefix on 3 processors

0 a1 a2 a3 a4

Work-stealer 1

MainSeq. 1

Work-stealer 2

a5 a6 a7 a8

7

3 42

6 i=a5*…*ai

a9 a10 a11 a12

8

4

Preempt

10 i=a9*…*ai

8

8

Page 60: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel

Sequential

Adaptive Prefix on 3 processors

0 a1 a2 a3 a4 8

Work-stealer 1

MainSeq. 1

Work-stealer 2

a5 a6 a7 a8

7

3 42

6 i=a5*…*ai

a9 a10 a11 a12

85

10 i=a9*…*ai9

6

11

8

Preempt 11

118

Page 61: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel

Sequential

Adaptive Prefix on 3 processors

0 a1 a2 a3 a4 8 11 a12

Work-stealer 1

MainSeq. 1

Work-stealer 2

a5 a6 a7 a8

7

3 42

6 i=a5*…*ai

a9 a10 a11 a12

85

10 i=a9*…*ai9

6

11

12

10

7

118

Page 62: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel

Sequential

Adaptive Prefix on 3 processors

0 a1 a2 a3 a4 8 11 a12

Work-stealer 1

MainSeq. 1

Work-stealer 2

a5 a6 a7 a8

7

3 42

6 i=a5*…*ai

a9 a10 a11 a12

85

10 i=a9*…*ai9

6

11

12

10

7

118

Implicit critical path on the sequential process

Page 63: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Analysis of the algorithm • Execution time

• Sketch of the proof :– Dynamic coupling of two algorithms that completes simultaneously:– Sequential: (optimal) number of operations S on one processor– Parallel : minimal time but performs X operations on other processors

• dynamic splitting always possible till finest grain BUT local sequential– Critical path small ( eg : log X)– Each non constant time task can potentially be splitted (variable speeds)

– Algorithmic scheme ensures Ts = Tp + O(log X)=> enables to bound the whole number X of operations performedand the overhead of parallelism = (s+X) - #ops_optimal

Lower bound

Page 64: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Adaptive prefix : experiments1

Single-user context : processor-oblivious prefix achieves near-optimal performance : - close to the lower bound both on 1 proc and on p processors

- Less sensitive to system overhead : even better than the theoretically “optimal” off-line parallel algorithm on p processors :

Optimal off-line on p procs

Oblivious

Prefix sum of 8.106 double on a SMP 8 procs (IA64 1.5GHz/ linux)T

ime

(s)

#processors

Pure sequential

Single user context

Page 65: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Adaptive prefix : experiments 2

Multi-user context : Additional external charge: (9-p) additional external dummy processes are concurrently executed Processor-oblivious prefix computation is always the fastest 15% benefit over a parallel algorithm for p processors with off-line schedule,

Multi-user context : Additional external charge: (9-p) additional external dummy processes are concurrently executed Processor-oblivious prefix computation is always the fastest 15% benefit over a parallel algorithm for p processors with off-line schedule,

External charge (9-p external processes)

Off-line parallel algorithm for p processors

Oblivious

Prefix sum of 8.106 double on a SMP 8 procs (IA64 1.5GHz/ linux)

Tim

e (s

)

#processors

Multi-user context :

Page 66: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

The Prefix race: sequential/parallel fixed/ adaptive

Race between 9 algorithms (44 processes) on an octo-SMPSMP

0 5 10 15 20 25

1

2

3

4

5

6

7

8

9

Execution time (seconds)

Série1

Adaptative 8 proc.

Parallel 8 proc.

Parallel 7 proc.

Parallel 6 proc.Parallel 5 proc.

Parallel 4 proc.

Parallel 3 proc.

Parallel 2 proc.

Sequential

On each of the 10 executions, adaptive completes first

Page 67: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Conclusion

The interplay of an on-line parallel algorithm directed by work-stealing schedule is useful for the design of processor-oblivious algorithms

Application to prefix computation : - theoretically reaches the lower bound on heterogeneous processors

with changing speeds - practically, achieves near-optimal performances on multi-user SMPs

Generic adaptive scheme to implement parallel algorithms with provable performance

- work in progress : parallel 3D reconstruction [oct-tree scheme with deadline constraint]

Page 68: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 69: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

A COMPLETER Bruno / Luciano

Page 70: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 71: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

A COMPLETERBruno / Jean-Denis

Page 72: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation [Bruno/Luciano] VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis]VIII. Adaptation to support fault-tolerance by work-stealing Conclusion

Page 73: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

The dataflow graph is a portable distributed execution of the execution : can be used to adapt to resource resilience by computing a coherent global state

Page 74: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 75: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 76: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 77: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 78: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

SEL : protocole par journalisation du graphe de flot de données

Page 79: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 80: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 81: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 82: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,
Page 83: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Conclusion tolérance défaillances• Kaapi tolère ajout et

défaillance de machines – Protocole TIC :

• période à ajuster

– Détecteur défaillances • signal erreurs +heartbeat

– http://kaapi.gforge.inria.fr

Athapascan / Kaapi

Pile [TIC]

oui

Pas besoin

Locale ou globale

oui

FullDag[SEL]Dataflow graph

Page 84: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Parallel algorithms and scheduling: adaptive parallel programming and

applications

Contents I. Motivations for adaptation and examples.II. Off-line scheduling and adaptation : moldable/malleable [Denis ?] III On-line work-stealing scheduling and parallel adaptation IV. A first example : iterative product ; application to gzip V. Processor-oblivious parallel prefix computation VI. Adaptation to time-constraints : oct-tree computation VII. Bi-criteria latency/bandwith [Bruno / Jean-Denis] VIII Adaptation for fault toleranceConclusion

Page 85: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Thank you !

QuickTime™ et undécompresseur codec YUV420

sont requis pour visionner cette image.

Interactive Distributed Simulation[B Raffin &E Boyer]

- 5 cameras, - 6 PCs

3D-reconstruction+ simulation+ rendering

->Adaptive scheme to maximize 3D-reconstruction precision within fixed timestamp

Page 86: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

http://moais.imag.fr

LaboratoireInformatiquede Grenoble

Multi-Programming and Scheduling Design for Applications of Interactive Simulation

Comité d’évaluation du LIG - 23/01/2006

Louvre, Musée de l’Homme Sculpture (Tête)Artist : AnonymeOrigin: Rapa Nui [Easter Island]Date : between the XIst and the XVth centuryDimensions : 1,70 m high

Page 87: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Who are the Moais today ?• Permanent staff (7) : INRIA (2) + INPG (3) + UJF (2)

• Visiting professor (1) + PostDoc (1)

• ITA : Adminstration (1.25) + Engineer (1)

• PhD students (14) : 6 contracts + 2 joined (co-tutelle)

Vincent Danjean [MdC UJF]Thierry Gautier [CR INRIA] Guillaume Huard [MdC UJF] Grégory Mounié [MdC INPG]

Bruno Raffin [CR INRIA]Jean-Louis Roch [MdC INPG]Denis Trystram [Prof INPG]

Axel Krings [CNRS/RAGTIME , Univ Idaho - 1/09/2004->31/08/05}Luciano Suares [PostDoc INRIA, 2006]

Admin. : Barbara Amouroux [INRIA, 50%] Annie-Claude Vial-Dallais [INPG, 50%] Evelyne Feres[UJF, 50%]Engineer : Joelle Prévost [INPG, 50%] , IE [CNRS, 50%]

Julien Bernard (BDI ST)Florent Blanchot (Cifre ST)Guillaume Dunoyer (DCN, MOAIS/POPART/E-MOTION) Lionel Eyraud (MESR)Feryal-Kamila Moulai (Bull)Samir Jafar (ATER)Clément Menier (MESR, MOAIS/MOVI)

Jonathan Pecero-Sanchez(Egide Mexique)Laurent Pigeon (Cifre IFP)Krzysztof Rzadca (U Warsaw, Poland)Daouda Traore (Egide/Mali)Sébastien Varrette (U Luxembourg) Thomas Arcila (Cifre Bull)Eric Saule (MESR)

Jérémie Allard (MESR, 12/2005)Luiz-Angelo Estefanel (Egide/Brasil 11/2005)

Hamid-Reza Hamidi (Egide/Iran 10/2005)Jaroslaw Zola (U Czestochowa, Poland 12/2005)+4

Page 88: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Objective• Programming on virtual networks (clusters and lightweight grids)

applications where the multi-criteria performance depends on the resources

number with provable performances– Adaptability :

• Static to the platform : the devices evolute gradually

• Dynamic to the execution context : data and resources usage

• Target applications : interactive and distributed simulations – Virtual reality, compute intensive applications

(process engineering, optimization, bio-computing)

computation visualization

Eg Grimage platform

Page 89: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

MOAIS approach• Algorithm + Scheduling + Programming

– Large degree of parallelism w.r.t. the architecture– both local preemptive (system) and non-preemptive (application) scheduling

to obtain global provable performances

• dataflow, distributed, recursive/multithreaded

• Research thema– Scheduling

• off-line, on-line, multi-criteria

– Adaptive (poly-)algorithms • Various level of malleability : parallelism, precision, refresh rate, …

– Coupling : efficient description of synchronizations KAAPI software : fine grain dynamic scheduling

– Interactivity FlowVR software : coarse grain scheduling

Page 90: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Scientific challenges• To schedule with provable performances for a multicriteria objective

on complex architectures eg - Time / Memory - Makespan / MinSum - Time / Work

Approximation, Game theory, …

• To design distributed adaptive algorithms Efficiency/scalability : local decision but global performance

• Dynamic local choices - performance with good probability

• To implement on emerging platforms Challenging applications (partners)

Grimage (MOVI)H.264 / MPEG-4 encoding on mpScNet [ST]

QAP/Nugent on Grid5000 [PRISM, GILCO, LIFL]

Page 91: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

ObjectiveProgramming of applications where performance is a matter of

resources: take benefit of more and suit to less

– eg : a global computing platform (P2P)

Application code : “independent” from resources and adaptive Target applications: interactive simulation

– “ virtual observatory “

MOAIS

intearction

simulation

rendering

Performance is related to #resources - simulation : precision=size/order

#procs, memory space - rendering : images wall, sounds, ...

#video-projectors, #HPs, ... - interaction : acquisition peripherals

#cameras, #haptic sensors, …

Page 92: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

GRIMAGE platform

• 2003 : 11 PCs, 8 projectors and 4 cameras

First demo : 12/03

• 2005: 30 PCs, 16 projectors and 20 cameras– A display wall :

• Surface: 2x2.70 m

• Resolution: 4106x3172 pixels

• Very bright: daylight work

Commodity components

[B. Raffin]

Page 93: Laboratoire Informatique de Grenoble Parallel algorithms and scheduling: adaptive parallel programming and applications Bruno Raffin, Jean-Louis Roch,

Video [J Allard, C Menier]

QuickTime™ et undécompresseur

sont requis pour visionner cette image.