Computational Science =1=of Computer Systems

12
Computational Science of Computer Systems Computational Science Complex models to understand and predict the studied phenomenons Large-Scale Distributed Systems I Complex hierarchies: grid, cluster, multicore I Massive parallelism: 10 6 cores (10 9 in 2020) I Heterogeneity (GPU, SOC) ; Dynamicity Scientific Challenge: Study system correctness and performance I Need for adapted scientific instruments, combining approaches I Modeling to understand system dynamics; Simulation to predict experiments SimGrid I Many competitors in each application domain (GridSim, PeerSim, BigSim. . . ) (1/4)

Transcript of Computational Science =1=of Computer Systems

Page 1: Computational Science =1=of Computer Systems

Computational Science of Computer Systems

Computational ScienceComplex models to understand and predict the studied phenomenons

Large-Scale Distributed SystemsI Complex hierarchies: grid, cluster, multicore

I Massive parallelism: 106 cores (109 in 2020)

I Heterogeneity (GPU, SOC) ; Dynamicity

Scientific Challenge: Study system correctness and performanceI Need for adapted scientific instruments, combining approaches

I Modeling to understand system dynamics; Simulation to predict experiments

SimGridI Many competitors in each application domain (GridSim, PeerSim, BigSim. . . )

(1/4)

Page 2: Computational Science =1=of Computer Systems

Approach and Originality in ModelingEpistemological stance

I Empirically consider large-scale computer systems as natural objects

I Complexity of these artificial artifacts reaches “natural” levels

I Ich bin ein physicist: Focus on modeling errors (more instructive)

Fine and accurate modeling of distributed applications (perfs & semantic)

SimulatedRealSweep3D

Reality sometimes sucks

Very long timeouts on pkt loss

Do we want to model it or fix it? LU (left: 32p (up:real;down sim); right: 128)

(2/4)

Page 3: Computational Science =1=of Computer Systems

Approach and Originality in Simulation

SimGrid is an Operating Simulator

I OS-like internal design, isolating user processes with simcalls

Functional ViewProcess Process Process

SimCall Interface

MaestroSimulation Modelske

rnel

Temporal View

M

U2U1

U3

Implementation

T1tn

T2

tn+1M

Efficient [parallel] simulation

0

10000

20000

30000

40000

0 500000 1e+06 1.5e+06 2e+06

Run

ning

tim

e in

sec

onds

Number of nodes

OverSim (OMNeT++)PeerSim

OverSim (simple underlay)SimGrid (sequential)SimGrid (4 threads)

dPeerSim: 2LP ; 4h / 16LP ; 1h

(but only 47s in sequential PeerSim, and 5s with SimGrid :)

Model-Checking (Safety & Liveness)1

2

iSend

3

iRecv

4

Wait

5

iRecv

1 5 3

Test TRUE

6

Test TRUE

7

iSend

8

Wait

9

iRecv

1 0

Test FALSE

1 1

MC_RANDOM (0)

1 1 5

MC_RANDOM (1)

1 2 2

iRecv

1 2

MC_RANDOM (0)

1 0 5

MC_RANDOM (1) 1 1 2

iRecv

1 3

MC_RANDOM (0)MC_RANDOM (1)

1 0 7

iRecv

MC_RANDOM (0)

1 5

MC_RANDOM (1)

2 4

iRecv

1 6

iSend

1 7

iRecv

1 8

Wait

1 9

Test TRUE

2 0

iSend

2 1

Wait

2 2

iRecv

Test FALSE

2 5

MC_RANDOM (0)

1 0 3

MC_RANDOM (1)

2 6

Test FALSE

2 7

MC_RANDOM (0)

6 6

MC_RANDOM (1)

2 8

MC_RANDOM (0)

6 2

MC_RANDOM (1)

2 9

MC_RANDOM (0)MC_RANDOM (1)

MC_RANDOM (0)

3 1

MC_RANDOM (1)

3 2

iSend

3 5

Test FALSE

3 3

Wait

Test TRUE

3 6

iSend

5 8

MC_RANDOM (0)

6 0

MC_RANDOM (1)

3 7

Wait

5 4

MC_RANDOM (0)

5 6

MC_RANDOM (1)

3 8

MC_RANDOM (0)

5 2

MC_RANDOM (1)

3 9

MC_RANDOM (0)

4 4

MC_RANDOM (1)

4 0

MC_RANDOM (0)

MC_RANDOM (1)

4 1

MC_RANDOM (0) MC_RANDOM (1)

4 2

Test TRUE

iSend

4 5

Test TRUE

4 6

iSend

4 7

Wait

4 8

iRecv

Test FALSE

Test TRUE

Wait Wait

iSend iSend

6 3

Test FALSE

Test FALSE

6 7

iSend

7 5

Test FALSE

6 8

Wait

6 9

Test TRUE

7 0

iSend

7 1

Wait

7 2

iSend

7 3

iRecv

Test FALSE

7 6

iSend

9 9

MC_RANDOM (0)

1 0 1

MC_RANDOM (1)

7 7

Wait

9 5

MC_RANDOM (0)

9 7

MC_RANDOM (1)

7 8

MC_RANDOM (0)

9 3

MC_RANDOM (1)

7 9

MC_RANDOM (0)

8 4

MC_RANDOM (1)

8 0

MC_RANDOM (0)

MC_RANDOM (1)

8 1

MC_RANDOM (0) MC_RANDOM (1)

8 2

Test TRUE

iSend

8 5

Test TRUE

8 6

iSend

8 7

Wait

8 8

iSend

8 9

iRecv

Test FALSE

Test TRUE

Wait Wait

iSend iSend

iSend

Test FALSE

MC_RANDOM (0)

1 0 9

MC_RANDOM (1)

Test FALSE

MC_RANDOM (0)

MC_RANDOM (1)

1 1 6

iSend

1 1 7

iRecv

1 1 8

Wait

1 1 9

Test TRUE

1 2 0

iSend

Wait

MC_RANDOM (0)

1 2 4

MC_RANDOM (1)

1 2 5

iSend

1 2 8

Test FALSE

1 2 6

Wait

Test TRUE

1 2 9

iSend

1 4 9

MC_RANDOM (0)

1 5 1

MC_RANDOM (1)

1 3 0

Wait

1 4 5

MC_RANDOM (0)

1 4 7

MC_RANDOM (1)

1 3 1

MC_RANDOM (0)

1 4 3

MC_RANDOM (1)

1 3 2

MC_RANDOM (0)

1 3 7

MC_RANDOM (1)

1 3 3

MC_RANDOM (0)

MC_RANDOM (1)

1 3 4

MC_RANDOM (0) MC_RANDOM (1)

1 3 5

Test TRUE

iSend

1 3 8

Test TRUE

1 3 9

iSend

Wait

Test TRUE

Wait Wait

iSend iSend

iRecv

Exhaustive Chord(2 processes)

I Aim at bug finding,not assessment

I System State Equality

I + DPOR Reduction

I Soon MPI-complient

I Soon more parallelism

I Soon statistical MC

(3/4)

Page 4: Computational Science =1=of Computer Systems

My tools

Technical SideI SimGrid has no dependency, works for Linux, Mac and Windows; C/Java

I Grid’5000 for reproducible experiments (both on modelisation and simulation)

I Paje / Triva visualization tools, R statistical post-processing

Scientific SideI ANR SONGS: Simulation Of Next Generation Systems

8 WP: Grid/P2P/Clouds/HPC; Kernel/Models/Analyse/Open Science1.8Me 2012-2016, Grenoble, Lyon, Bordeaux, Nantes, Strasbourg, NiceAction d’Envergure Nationale de l’ANR :)

I ANR USS SimGrid: Ultra Scalable Simulation (800ke 2009-2011)

I Quelques petites choses a la region, en PHC avec la belgique, dans le CPER

Communication SideI Conf.: IPDPS, HPDC, SC, CCGrid, EuroPar; CAV, Forte; EuroSys, Usenix

I Journals: TPDS, ParCo, JPDC, SPE

I SimGrid Users’ Day, SimGrid Working Workshops, SimGrid Sprints (4/4)

Page 5: Computational Science =1=of Computer Systems

Take Away Messages

SimGrid will prove helpful to your research

I Versatile: Used in several communities (scheduling, GridRPC, HPC, P2P, Clouds)

I Accurate: Model limits known thanks to validation studies

I Sound: Easy to use, extensible, fast to execute, scalable to death, well tested

I Open: User-community much larger than contributors group; LGPL120 publications (110 auteurs distincts, 5 continents), 4 PhD/ 25 commiters, 5 unaffiliated

I Around since over 10 years, and ready for at least 10 more years

Welcome to the Age of (Sound) Computational Science

I Discover: http://simgrid.gforge.inria.fr/

I Learn: 101 tutorials, user manuals and examples

I Join: user mailing list, #simgrid on irc.debian.orgWe even have some open positions ;)

(5/4)

Page 6: Computational Science =1=of Computer Systems

When simulation is better than reality

BigDFT on graphene

I Hardware bug

I Packet drops & delays

I Fix model or reality?

MapReduce on Grid’5000

(6/4)

Page 7: Computational Science =1=of Computer Systems

Modeling MPI point-to-point communication

Measurements

Small

Medium1Medium2

Detached

Small

Medium1Medium2

Detached

MPI_Send MPI_Recv

1e−04

1e−02

1e+01 1e+03 1e+05 1e+01 1e+03 1e+05Message size (bytes)

Dur

atio

n (s

econ

ds) group

Small

Medium1

Medium2

Detached

Large

Models

SMPI Asynchronousmode (k ≤ Sa)

T3

Pr

Ps

T1

T2

SMPI Detached mode(Sa < k ≤ Sd)

Ps

Pr

T2T4

T1

SMPI Synchronousmode (k > Sd)

Ps

Pr

T4 T2 (7/4)

Page 8: Computational Science =1=of Computer Systems

Open Science and CS2

Les experiences sont ameliorablesI Pas decrites avec assez de precision; le diable est dans les details

I Probleme de reproductibilite, un comble!

Les outils sont ameliorablesI Beaucoup de micro-communautes: auteur seul utilisateur

I Peu d’outils passent l’epreuve du temps

I Certains outils sont buggues voire reconnus faux

Les methodes complementaires se marient malI Formalismes differents (programme 6= prototype 6= modeles formels)

I Outils differents, sans aucune interoperabilite

I Choix de l’outil dicte par les habitudes de l’experimentateur

Vision: Open ScienceI Des outils et methodes standards pour regagner la reproductibilite

I Idee: SimGrid comme cheval de Troie des best practices experimentales

I Collaboration avec Grid’5000, Distem et tous les autres

(8/4)

Page 9: Computational Science =1=of Computer Systems

Invalidating Simulators from the LitteratureNaive flow models documented as wrong

Setting Expected Output OutputB = 100 B = 100

B = 20

B = 100 B = 100

B = 20

B = 100 B = 100

B = 20

Known issue in Narses (2002), OptorSim (2003), GroudSim (2011).

Validation by general agreement“Since SimJava and GridSim have been extensively utilized in conductingcutting edge research in Grid resource management by severalresearchers, bugs that may compromise the validity of the simulationhave been already detected and fixed.” CloudSim, ICPP’09

Setting Expected Output OutputB B B

Buggy flow model (GridSim 5.2, Nov. 25, 2010). Similar issues with naivepacket-level models.

(9/4)

Page 10: Computational Science =1=of Computer Systems

Quick Overview of Internals Organization

User-visible SimGrid Components

I MSG: heuristics as Concurrent Sequential Processes (Java/Ruby/Lua bindings)

I SimDag: heuristics as DAG of (parallel) tasks

I SMPI: simulate real applications written using MPI

SimGrid internal layers

I MSG: User-friendly syntactic sugar

I Simix: Processes, synchronizations

I SURF: Resources usage interface

I Models: Compute completion dates

SIMIX

SURF

MSG SMPI SIMDAG

User Code

PlatformDescription372

435work

remaining

variable

530530

50664

245245

Concurrentprocesses

Synchro.abstractions

...

...

...

App. spec. as concurrent code

App. spec. astask graph

...

x1

x2

x3

x3

+

xn

...

+

+ xn

Variables ResourceConstraints

≤ CLm

≤ CL2

≤ CP

≤ CL1x1

... ...

Activities

...

{

CL2

CLm CL1

Cp

(10/4)

Page 11: Computational Science =1=of Computer Systems

SimGrid Usage example: Master/workers example

1. Write the Code of your Agents

int master(int argc, char **argv) {for (i = 0; i < number_of_tasks; i++) {t=MSG_task_create(name,comp_size,comm_size,data );sprintf(mailbox,"worker-%d",i % workers_count);MSG_task_send(t, mailbox);

}

int worker(int ,char**){sprintf(my_mailbox,"worker-%d",my_id);while(1) {

MSG_task_receive(&task, my_mailbox);MSG_task_execute(task);MSG_task_destroy(task);

}

2. Describe your Experiment

XML Platform File<?xml version=’1.0’?><!DOCTYPE platform SYSTEM "surfxml.dtd"><platform version="2"><host name="host1" power="1E8"/><host name="host2" power="1E8"/>...<link name="link1" bandwidth="1E6"

latency="1E-2" />...<route src="host1" dst="host2">

<link:ctn id="link1"/></route></platform>

XML Deployment File

<?xml version=’1.0’?><!DOCTYPE platform SYSTEM "surfxml.dtd"><platform version="2"><!-- The master process --><process host="host1" function="master"><argument value="10"/><!--argv[1]:#tasks--><argument value="1"/><!--argv[2]:#workers-->

</process>

<!-- The workers --><process host="host2" function="worker">

<argument value="0"/></process></platform>

(11/4)

Page 12: Computational Science =1=of Computer Systems

Challenges of System-level State Equality

Over provisioning

fragment size 256 256 512 1024 256 256 1024 512

size used 240 200 400 924 256 648

Syntactic differences

I In malloc, blocs order can vary without impacting applicative semantic

0x100

0x100

0x200

0x200

0x300

0x300

0x400

0x400

0x500

0x500

123456

123456

aSd25

aSdYY

ffe

gcc

gcc

ffe

= 6= = =

Padding Bytes

I Data is aligned in memory for efficiency, leaving holes

Irrelevant differencesI Host-related data (pid, files), simulation-related data (time)

(12/4)