Frédéric Gava

29
F. Gava, HLPP 2005 Frédéric Gava A Modular Implementation of Parallel Data Structures in Bulk- Synchronous Parallel ML

description

A Modular Impl ementation of Parallel Data Structures in Bulk-Synchronous Parallel ML. Frédéric Gava. Outline. Introduction; The BSML language; Impl emen tation of parallel data structures in BSML : Dictionaries; Set s; Load -Balancing. Application; Conclusion and futur works. - PowerPoint PPT Presentation

Transcript of Frédéric Gava

Page 1: Frédéric Gava

F. Gava, HLPP 2005

Frédéric Gava

A Modular Implementation of Parallel Data Structures in

Bulk-Synchronous Parallel ML

Page 2: Frédéric Gava

F. Gava, HLPP 2005

Outline Introduction;

The BSML language;

Implementation of parallel data structures in BSML:

Dictionaries;

Sets;

Load-Balancing.

Application;

Conclusion and futur works.

Page 3: Frédéric Gava

F. Gava, HLPP 2005

Introduction Parallel Computing for speed;

To complex for many non-computer scientists;

Need for models/tools of parallelism.

AutomaticParallelization

ConcurrentProgramming

Structured Parallelism

AlgorithmicSkeletons

BSP

Data StructuresSkeletons

Page 4: Frédéric Gava

F. Gava, HLPP 2005

Introduction (bis) Observations:

Data Structures also important as algorithms;

Symbolic computations used massively those data structures.

Suggested solution, parallel implementations of data structures:

Interfaces as close as possible to the sequential ones;

Modular implementations to have a straightforward maintenance;

Load-balancing of the data.

Page 5: Frédéric Gava

F. Gava, HLPP 2005

BSML Outline:

Introduction;

BSML;

Parallel Data Structures in BSML;

Application;

Conclusion and futur works.

Page 6: Frédéric Gava

F. Gava, HLPP 2005

Advantages of the BSP model: Portability; Scalability, deadlock free; Simple cost model performance prediction.

Advantages of functional programming:

High-level features (higher order functions, pattern-matching, concrete types, etc…);

Savety of the environment;

Programs Proofs (proof of BSML programs using Coq).

Bulk-Synchronous Parallelism + Functional Programming =

BSML

Page 7: Frédéric Gava

F. Gava, HLPP 2005

Confluent language: deterministic algorithms;

Library for the « Objective Caml » language (called BSMLlib);

Operations to access to the BSP parameters;

5 primitives on a parallel data structure called parallel vector:

mkpar: create a parallel vector;

apply: parallel point-wise application;

put: send values within a vector;

proj: parallel projection;

super: BSP divide-and-conquer.

The BSML Language

Page 8: Frédéric Gava

F. Gava, HLPP 2005

A BSML Program

fp-1…f1f0

gp-1…g1g0

Sequential part

Parallel part

Page 9: Frédéric Gava

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

Communications

Synchronization

E1 E2 super E1 E2

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

0 1 20 . . .1 . . .2 . . .

Superthreads in BSML

Page 10: Frédéric Gava

F. Gava, HLPP 2005

Parallel Data Structures in BSML

Outline:

Introduction;

BSML;

Parallel Data Structures in BSML;

Application;

Conclusion and futur works.

Page 11: Frédéric Gava

F. Gava, HLPP 2005

General Points 5 modules: Set, Map, Stack, Queue, Hashtable;

Interfaces:

Same as O’Caml ones;

With some specific parallel functions (squeletons) as

parallel reduction;

Pure functional implementationx (for functional data);

Manual or Automatic load-balancing.

Page 12: Frédéric Gava

Modules in O’Caml Interface:

Implementation:

Functor:

module type Compare = sig type elt val compare : elt -> elt -> int end

module CompareInt = struct type elt=int let tools = ... let compare = ... endmodule AbstractCompareInt = (CompareInt : Compare)

module Make(Ord: Compare) = struct type elt = Ord.elt type t = Empty | Node of t * elt * t * int let mem e s = ... end

Page 13: Frédéric Gava

F. Gava, HLPP 2005

module Make (Ord : OrderedType)(Bal:BALANCE) (MakeLocMap:functor(Ord:OrderedType) -> Map.S with type key=Ord.t) = struct module Local_Map = MakeLocMap(Ord) type key = Ord.t type 'a t = ('a Local_Map.t par) * int * bool type seq_t = Local_Map.t (* operators as skeletons *)end

Parallel Dictionaries A parallel map (dictionary) = a map on each processor:

We need to re-implement all the operations (data skeletons).

Page 14: Frédéric Gava

F. Gava, HLPP 2005

Insert a Binding add: key 'a 'a t 'a t

If rebalanced

Otherwise

Page 15: Frédéric Gava

F. Gava, HLPP 2005

Parallel IteratorLet cardinal pmap=ParMap.fold (fun _ _ ii+1) 0 pmap

Fold need to respect the order of the keys;

Parallel map sequential map;

Too many communications…

async_fold: (key'a'b'b)'a t'b'b par

let cardinal pmap=List.fold left (+) 0 (total(ParMap.async fold (fun _ _ ii+1) pmap 0))

Page 16: Frédéric Gava

F. Gava, HLPP 2005

Parallel Sets

A sub-set on each processor;

Insert/Iteration as parallel maps;

But with some binary skeletons;

Load-balancing of couples of parallel sets using the superposition.

Page 17: Frédéric Gava

F. Gava, HLPP 2005

Difference

3 cases:

Two normal parallel sets;

One of the parallel sets has been rebalanced;

The two parallel sets have been rebalanced;

Imply a problem with duplicate elements.

Page 18: Frédéric Gava

F. Gava, HLPP 2005

Difference (third case)

S1

S2

Page 19: Frédéric Gava

F. Gava, HLPP 2005

Load-Balancing (1)

« Same sizes » of the local data structures;

Better performances for parallel iterations;

Load-Balancing in 2 super-steps (M. Bamha and G. Hains) using

a histogram

Page 20: Frédéric Gava

F. Gava, HLPP 2005

Generic code of the algorithm:

Load-Balancing (2)

rebalance: ( par) (int list ) ( ) list

parint par

Data || datasSelect « n » messages

Union Messages data

Datas data || HistogramData ||

Page 21: Frédéric Gava

F. Gava, HLPP 2005

ApplicationOutline:

Introduction;

BSML;

Parallel Data Structures in BSML;

Application;

Conclusion and futur works.

Page 22: Frédéric Gava

F. Gava, HLPP 2005

Computation of the « nth » nearest neighbors atom in a molecule

Code from «Objective Caml for Scientists » (J. Harrop);

Molecule as a infinitely-repeated graph of atoms;

Computation of sets differences (the neighbors);

Replace « fold » with « async_fold »;

Experiments with a silicate of 100.000 atoms and with a cluster of

5/10 machines (Pentium IV, 2.8 Ghz, Gigabit Ethernet Card).

Page 23: Frédéric Gava

Experiments (1)

Page 24: Frédéric Gava

Experiments (2)

Page 25: Frédéric Gava

Experiments (3)

Page 26: Frédéric Gava

F. Gava, HLPP 2005

Conclusion and Futur Works

Outline:

Introduction;

BSML;

Parallel Data Structures in BSML;

Application;

Conclusion and futur works.

Page 27: Frédéric Gava

F. Gava, HLPP 2005

BSML=BSP+ML;

Implementation of some data structures;

Modular for a simple development and maintenance;

Pure functional implementation;

Cost prediction with the BSP model;

Generic Load-balancing;

Application.

Conclusion

Page 28: Frédéric Gava

F. Gava, HLPP 2005

Futur Works Proof of the implementations (pure functional);

Implementation of another data structures (tree, priority list etc.);

Application to another scientist problems;

Comparison with another parallel ML (OCamlP3L, HirondML,

OCaml-Flight, MSPML etc.);

Development of a modular and parallel graph library:

Edges as parallel maps;

Vertex as parallel sets.

Page 29: Frédéric Gava

F. Gava, HLPP 2005