Computational Science =1=of Computer Systems
Transcript of Computational Science =1=of Computer Systems
Computational Science of Computer Systems
Computational ScienceComplex models to understand and predict the studied phenomenons
Large-Scale Distributed SystemsI Complex hierarchies: grid, cluster, multicore
I Massive parallelism: 106 cores (109 in 2020)
I Heterogeneity (GPU, SOC) ; Dynamicity
Scientific Challenge: Study system correctness and performanceI Need for adapted scientific instruments, combining approaches
I Modeling to understand system dynamics; Simulation to predict experiments
SimGridI Many competitors in each application domain (GridSim, PeerSim, BigSim. . . )
(1/4)
Approach and Originality in ModelingEpistemological stance
I Empirically consider large-scale computer systems as natural objects
I Complexity of these artificial artifacts reaches “natural” levels
I Ich bin ein physicist: Focus on modeling errors (more instructive)
Fine and accurate modeling of distributed applications (perfs & semantic)
SimulatedRealSweep3D
Reality sometimes sucks
Very long timeouts on pkt loss
Do we want to model it or fix it? LU (left: 32p (up:real;down sim); right: 128)
(2/4)
Approach and Originality in Simulation
SimGrid is an Operating Simulator
I OS-like internal design, isolating user processes with simcalls
Functional ViewProcess Process Process
SimCall Interface
MaestroSimulation Modelske
rnel
Temporal View
M
U2U1
U3
Implementation
T1tn
T2
tn+1M
Efficient [parallel] simulation
0
10000
20000
30000
40000
0 500000 1e+06 1.5e+06 2e+06
Run
ning
tim
e in
sec
onds
Number of nodes
OverSim (OMNeT++)PeerSim
OverSim (simple underlay)SimGrid (sequential)SimGrid (4 threads)
dPeerSim: 2LP ; 4h / 16LP ; 1h
(but only 47s in sequential PeerSim, and 5s with SimGrid :)
Model-Checking (Safety & Liveness)1
2
iSend
3
iRecv
4
Wait
5
iRecv
1 5 3
Test TRUE
6
Test TRUE
7
iSend
8
Wait
9
iRecv
1 0
Test FALSE
1 1
MC_RANDOM (0)
1 1 5
MC_RANDOM (1)
1 2 2
iRecv
1 2
MC_RANDOM (0)
1 0 5
MC_RANDOM (1) 1 1 2
iRecv
1 3
MC_RANDOM (0)MC_RANDOM (1)
1 0 7
iRecv
MC_RANDOM (0)
1 5
MC_RANDOM (1)
2 4
iRecv
1 6
iSend
1 7
iRecv
1 8
Wait
1 9
Test TRUE
2 0
iSend
2 1
Wait
2 2
iRecv
Test FALSE
2 5
MC_RANDOM (0)
1 0 3
MC_RANDOM (1)
2 6
Test FALSE
2 7
MC_RANDOM (0)
6 6
MC_RANDOM (1)
2 8
MC_RANDOM (0)
6 2
MC_RANDOM (1)
2 9
MC_RANDOM (0)MC_RANDOM (1)
MC_RANDOM (0)
3 1
MC_RANDOM (1)
3 2
iSend
3 5
Test FALSE
3 3
Wait
Test TRUE
3 6
iSend
5 8
MC_RANDOM (0)
6 0
MC_RANDOM (1)
3 7
Wait
5 4
MC_RANDOM (0)
5 6
MC_RANDOM (1)
3 8
MC_RANDOM (0)
5 2
MC_RANDOM (1)
3 9
MC_RANDOM (0)
4 4
MC_RANDOM (1)
4 0
MC_RANDOM (0)
MC_RANDOM (1)
4 1
MC_RANDOM (0) MC_RANDOM (1)
4 2
Test TRUE
iSend
4 5
Test TRUE
4 6
iSend
4 7
Wait
4 8
iRecv
Test FALSE
Test TRUE
Wait Wait
iSend iSend
6 3
Test FALSE
Test FALSE
6 7
iSend
7 5
Test FALSE
6 8
Wait
6 9
Test TRUE
7 0
iSend
7 1
Wait
7 2
iSend
7 3
iRecv
Test FALSE
7 6
iSend
9 9
MC_RANDOM (0)
1 0 1
MC_RANDOM (1)
7 7
Wait
9 5
MC_RANDOM (0)
9 7
MC_RANDOM (1)
7 8
MC_RANDOM (0)
9 3
MC_RANDOM (1)
7 9
MC_RANDOM (0)
8 4
MC_RANDOM (1)
8 0
MC_RANDOM (0)
MC_RANDOM (1)
8 1
MC_RANDOM (0) MC_RANDOM (1)
8 2
Test TRUE
iSend
8 5
Test TRUE
8 6
iSend
8 7
Wait
8 8
iSend
8 9
iRecv
Test FALSE
Test TRUE
Wait Wait
iSend iSend
iSend
Test FALSE
MC_RANDOM (0)
1 0 9
MC_RANDOM (1)
Test FALSE
MC_RANDOM (0)
MC_RANDOM (1)
1 1 6
iSend
1 1 7
iRecv
1 1 8
Wait
1 1 9
Test TRUE
1 2 0
iSend
Wait
MC_RANDOM (0)
1 2 4
MC_RANDOM (1)
1 2 5
iSend
1 2 8
Test FALSE
1 2 6
Wait
Test TRUE
1 2 9
iSend
1 4 9
MC_RANDOM (0)
1 5 1
MC_RANDOM (1)
1 3 0
Wait
1 4 5
MC_RANDOM (0)
1 4 7
MC_RANDOM (1)
1 3 1
MC_RANDOM (0)
1 4 3
MC_RANDOM (1)
1 3 2
MC_RANDOM (0)
1 3 7
MC_RANDOM (1)
1 3 3
MC_RANDOM (0)
MC_RANDOM (1)
1 3 4
MC_RANDOM (0) MC_RANDOM (1)
1 3 5
Test TRUE
iSend
1 3 8
Test TRUE
1 3 9
iSend
Wait
Test TRUE
Wait Wait
iSend iSend
iRecv
Exhaustive Chord(2 processes)
I Aim at bug finding,not assessment
I System State Equality
I + DPOR Reduction
I Soon MPI-complient
I Soon more parallelism
I Soon statistical MC
(3/4)
My tools
Technical SideI SimGrid has no dependency, works for Linux, Mac and Windows; C/Java
I Grid’5000 for reproducible experiments (both on modelisation and simulation)
I Paje / Triva visualization tools, R statistical post-processing
Scientific SideI ANR SONGS: Simulation Of Next Generation Systems
8 WP: Grid/P2P/Clouds/HPC; Kernel/Models/Analyse/Open Science1.8Me 2012-2016, Grenoble, Lyon, Bordeaux, Nantes, Strasbourg, NiceAction d’Envergure Nationale de l’ANR :)
I ANR USS SimGrid: Ultra Scalable Simulation (800ke 2009-2011)
I Quelques petites choses a la region, en PHC avec la belgique, dans le CPER
Communication SideI Conf.: IPDPS, HPDC, SC, CCGrid, EuroPar; CAV, Forte; EuroSys, Usenix
I Journals: TPDS, ParCo, JPDC, SPE
I SimGrid Users’ Day, SimGrid Working Workshops, SimGrid Sprints (4/4)
Take Away Messages
SimGrid will prove helpful to your research
I Versatile: Used in several communities (scheduling, GridRPC, HPC, P2P, Clouds)
I Accurate: Model limits known thanks to validation studies
I Sound: Easy to use, extensible, fast to execute, scalable to death, well tested
I Open: User-community much larger than contributors group; LGPL120 publications (110 auteurs distincts, 5 continents), 4 PhD/ 25 commiters, 5 unaffiliated
I Around since over 10 years, and ready for at least 10 more years
Welcome to the Age of (Sound) Computational Science
I Discover: http://simgrid.gforge.inria.fr/
I Learn: 101 tutorials, user manuals and examples
I Join: user mailing list, #simgrid on irc.debian.orgWe even have some open positions ;)
(5/4)
When simulation is better than reality
BigDFT on graphene
I Hardware bug
I Packet drops & delays
I Fix model or reality?
MapReduce on Grid’5000
(6/4)
Modeling MPI point-to-point communication
Measurements
Small
Medium1Medium2
Detached
Small
Medium1Medium2
Detached
MPI_Send MPI_Recv
1e−04
1e−02
1e+01 1e+03 1e+05 1e+01 1e+03 1e+05Message size (bytes)
Dur
atio
n (s
econ
ds) group
Small
Medium1
Medium2
Detached
Large
Models
SMPI Asynchronousmode (k ≤ Sa)
T3
Pr
Ps
T1
T2
SMPI Detached mode(Sa < k ≤ Sd)
Ps
Pr
T2T4
T1
SMPI Synchronousmode (k > Sd)
Ps
Pr
T4 T2 (7/4)
Open Science and CS2
Les experiences sont ameliorablesI Pas decrites avec assez de precision; le diable est dans les details
I Probleme de reproductibilite, un comble!
Les outils sont ameliorablesI Beaucoup de micro-communautes: auteur seul utilisateur
I Peu d’outils passent l’epreuve du temps
I Certains outils sont buggues voire reconnus faux
Les methodes complementaires se marient malI Formalismes differents (programme 6= prototype 6= modeles formels)
I Outils differents, sans aucune interoperabilite
I Choix de l’outil dicte par les habitudes de l’experimentateur
Vision: Open ScienceI Des outils et methodes standards pour regagner la reproductibilite
I Idee: SimGrid comme cheval de Troie des best practices experimentales
I Collaboration avec Grid’5000, Distem et tous les autres
(8/4)
Invalidating Simulators from the LitteratureNaive flow models documented as wrong
Setting Expected Output OutputB = 100 B = 100
B = 20
B = 100 B = 100
B = 20
B = 100 B = 100
B = 20
Known issue in Narses (2002), OptorSim (2003), GroudSim (2011).
Validation by general agreement“Since SimJava and GridSim have been extensively utilized in conductingcutting edge research in Grid resource management by severalresearchers, bugs that may compromise the validity of the simulationhave been already detected and fixed.” CloudSim, ICPP’09
Setting Expected Output OutputB B B
Buggy flow model (GridSim 5.2, Nov. 25, 2010). Similar issues with naivepacket-level models.
(9/4)
Quick Overview of Internals Organization
User-visible SimGrid Components
I MSG: heuristics as Concurrent Sequential Processes (Java/Ruby/Lua bindings)
I SimDag: heuristics as DAG of (parallel) tasks
I SMPI: simulate real applications written using MPI
SimGrid internal layers
I MSG: User-friendly syntactic sugar
I Simix: Processes, synchronizations
I SURF: Resources usage interface
I Models: Compute completion dates
SIMIX
SURF
MSG SMPI SIMDAG
User Code
PlatformDescription372
435work
remaining
variable
530530
50664
245245
Concurrentprocesses
Synchro.abstractions
...
...
...
App. spec. as concurrent code
App. spec. astask graph
...
x1
x2
x3
x3
+
xn
...
+
+ xn
Variables ResourceConstraints
≤ CLm
≤ CL2
≤ CP
≤ CL1x1
... ...
Activities
...
{
CL2
CLm CL1
Cp
(10/4)
SimGrid Usage example: Master/workers example
1. Write the Code of your Agents
int master(int argc, char **argv) {for (i = 0; i < number_of_tasks; i++) {t=MSG_task_create(name,comp_size,comm_size,data );sprintf(mailbox,"worker-%d",i % workers_count);MSG_task_send(t, mailbox);
}
int worker(int ,char**){sprintf(my_mailbox,"worker-%d",my_id);while(1) {
MSG_task_receive(&task, my_mailbox);MSG_task_execute(task);MSG_task_destroy(task);
}
2. Describe your Experiment
XML Platform File<?xml version=’1.0’?><!DOCTYPE platform SYSTEM "surfxml.dtd"><platform version="2"><host name="host1" power="1E8"/><host name="host2" power="1E8"/>...<link name="link1" bandwidth="1E6"
latency="1E-2" />...<route src="host1" dst="host2">
<link:ctn id="link1"/></route></platform>
XML Deployment File
<?xml version=’1.0’?><!DOCTYPE platform SYSTEM "surfxml.dtd"><platform version="2"><!-- The master process --><process host="host1" function="master"><argument value="10"/><!--argv[1]:#tasks--><argument value="1"/><!--argv[2]:#workers-->
</process>
<!-- The workers --><process host="host2" function="worker">
<argument value="0"/></process></platform>
(11/4)
Challenges of System-level State Equality
Over provisioning
fragment size 256 256 512 1024 256 256 1024 512
size used 240 200 400 924 256 648
Syntactic differences
I In malloc, blocs order can vary without impacting applicative semantic
0x100
0x100
0x200
0x200
0x300
0x300
0x400
0x400
0x500
0x500
123456
123456
aSd25
aSdYY
ffe
gcc
gcc
ffe
= 6= = =
Padding Bytes
I Data is aligned in memory for efficiency, leaving holes
Irrelevant differencesI Host-related data (pid, files), simulation-related data (time)
(12/4)