Post on 15-Jan-2016
description
GRID superscalar: a programming model for the Grid
Raül Sirvent PardellAdvisor: Rosa M. Badia Sala
Doctoral ThesisComputer Architecture Department
Technical University of Catalonia
GRID superscalar: a programming model for the Grid
2
Outline
1. Introduction2. Programming interface3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
GRID superscalar: a programming model for the Grid
3
Outline
1. Introduction1.1 Motivation1.2 Related work1.3 Thesis objectives and contributions
2. Programming interface3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
GRID superscalar: a programming model for the Grid
4
1.1 Motivation
The Grid architecture layers
Applications
Grid Middleware
(Job management, Data transfer,
Security, Information, QoS, ...)
Distributed Resources
GRID superscalar: a programming model for the Grid
5
1.1 Motivation
What middleware should I use?
GRID superscalar: a programming model for the Grid
6
1.1 Motivation
Programming tools: are they easy?
VS.Grid AWARE Grid UNAWARE
GRID
GRID superscalar: a programming model for the Grid
7
1.1 Motivation
Can I run my programs in parallel?
VS.Explicit
parallelismImplicit
parallelism
fork
join
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
…
Draw it by
hand means
explicit
GRID superscalar: a programming model for the Grid
8
1.1 Motivation
The Grid: a massive, dynamic and heterogeneous environment prone to failures– Study different techniques to detect and overcome
failures
Checkpoint Retries Replication
GRID superscalar: a programming model for the Grid
9
1.2 Related work
System / Features
Grid unawareImplicit
parallelismLanguage
Triana No No Graphical
Satin Yes No Java
ProActive Partial Partial Java
Pegasus Yes Partial VDL
Swift Yes Partial SwiftScript
GRID superscalar: a programming model for the Grid
10
1.3 Thesis objectives and contributions Objective: create a programming model
for the Grid– Grid unaware– Implicit parallelism– Sequential programming– Allows to use well-known imperative languages– Speed up applications– Include fault detection and recovery
GRID superscalar: a programming model for the Grid
11
1.3 Thesis objectives and contributions Contribution: GRID superscalar
– Programming interface– Runtime environment– Fault tolerance features
GRID superscalar: a programming model for the Grid
12
Outline
1. Introduction2. Programming interface
2.1 Design2.2 User interface2.3 Programming comparison
3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
GRID superscalar: a programming model for the Grid
13
2.1 Design
Interface objectives– Grid unaware
– Implicit parallelism– Sequential programming– Allows to use well-known imperative
languages
GRID superscalar: a programming model for the Grid
14
2.1 Design
Target applications– Algorithms which may be easily splitted in tasks
• Branch and bound computations, divide and conquer algorithms, recursive algorithms, …
– Coarse grained tasks– Independent tasks
• Scientific workflows, optimization algorithms, parameter sweep
– Main parameters: FILES• External simulators, finite element solvers, BLAST, GAMESS
GRID superscalar: a programming model for the Grid
15
2.1 Design
Application’s architecture: a master-worker paradigm– Master-worker parallel paradigm fits with our objectives– Main program: the master– Functions: workers
• Function = Generic representation of a task
– Glue to transform a sequential application into a master-worker application: stubs – skeletons (RMI, RPC, …)
• Stub: call to runtime interface• Skeleton: binary which calls to the user function
GRID superscalar: a programming model for the Grid
16
2.1 Design
app.c
app-functions.c
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
void matmul(char *f1, char *f2, char *f3){ getBlocks(f1, f2, f3, A, B, C); for (i = 0; i < A->rows; i++) { for (j = 0; j < B->cols; j++) { for (k = 0; k < A->cols; k++) { C->data[i][j] += A->data[i][k] * B->data[k][j]; putBlocks(f1, f2, f3, A, B, C);}
Local scenario
GRID superscalar: a programming model for the Grid
17
2.1 Design
Middleware
Master-Worker paradigm
app.c
app-functions.capp-functions.capp-functions.capp-functions.capp-functions.capp-functions.c
GRID superscalar: a programming model for the Grid
18
2.1 Design
Intermediate language concept: assembler code
In GRIDSs
The Execute generic interface– Instruction set is defined by the user– Single entry point to the runtime– Allows easy building of programming language bindings
(Java, Perl, Shell Script)• Easier technology adoption
C, C++, … Assembler Processor execution
C, C++, … Workflow Grid execution
GRID superscalar: a programming model for the Grid
19
2.2 User interface
Steps to program an application– Task definition
• Identify those functions/programs in the application that are going to be executed in the computational Grid
• All parameters must be passed in the header (remote execution)
– Interface Definition Language (IDL)• For every task defined, identify which parameters are
input/output files and which are input/output scalars
– Programming API: master and worker• Write the main program and the tasks using GRIDSs API
GRID superscalar: a programming model for the Grid
20
interface MATMUL {void matmul(in File f1, in File f2, inout File f3);
};
Interface Definition Language (IDL) file
– CORBA-IDL like interface:• in/out/inout files• in/out/inout scalar values
– The functions listed in this file will be executed in the Grid
2.2 User interface
GRID superscalar: a programming model for the Grid
21
2.2 User interface
Programming API: master and worker
Master sideGS_OnGS_OffGS_FOpen/GS_FCloseGS_Open/GS_CloseGS_BarrierGS_Speculative_End
app.c app-functions.c
Worker sideGS_Systemgs_resultGS_Throw
GRID superscalar: a programming model for the Grid
22
2.2 User interface
Task’s constraints and cost specification– Constraints: allow to specify the needs of a task (CPU,
memory, architecture, software, …)• Build an expression in a constraint function (evaluated for
every machine)
– Cost: estimated execution time of a task (in seconds)• Useful for scheduling• Calculate it in a cost function• GS_GFlops / GS_Filesize may be used• An external estimator can be also called
other.Mem == 1024
cost = operations / GS_GFlops();
GRID superscalar: a programming model for the Grid
23
2.3 Programming comparison
Globus vs GRIDSs
int main(){ rsl = "&(executable=/home/user/sim)(arguments=input1.txt output1.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input1.txt home/user/input1.txt))(file_stage_out=/home/user/output1.txt gsiftp://bscgrid01.bsc.es/path/output1.txt)(file_clean_up=/home/user/input1.txt /home/user/output1.txt)"; globus_gram_client_job_request(bscgrid02.bsc.es, rsl, NULL, NULL);
rsl = "&(executable=/home/user/sim)(arguments=input2.txt output2.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input2.txt /home/user/input2.txt))(file_stage_out=/home/user/output2.txt gsiftp://bscgrid01.bsc.es/path/output2.txt)(file_clean_up=/home/user/input2.txt /home/user/output2.txt)"; globus_gram_client_job_request(bscgrid03.bsc.es, rsl, NULL, NULL);
rsl = "&(executable=/home/user/sim)(arguments=input3.txt output3.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input3.txt /home/user/input3.txt))(file_stage_out=/home/user/output3.txt gsiftp://bscgrid01.bsc.es/path/output3.txt)(file_clean_up=/home/user/input3.txt /home/user/output3.txt)"; globus_gram_client_job_request(bscgrid04.bsc.es, rsl, NULL, NULL);}
Grid-awareExplicit parallelism
GRID superscalar: a programming model for the Grid
24
2.3 Programming comparison
Globus vs GRIDSs
void sim(File input, File output){ command = "/home/user/sim " + input + ' ' + output; gs_result = GS_System(command);}
int main(){ GS_On(); sim("/path/input1.txt", "/path/output1.txt"); sim("/path/input2.txt", "/path/output2.txt"); sim("/path/input3.txt", "/path/output3.txt"); GS_Off(0);}
GRID superscalar: a programming model for the Grid
25
2.3 Programming comparison
DAGMan vs GRIDSs
JOB A A.condorJOB B B.condorJOB C C.condorJOB D D.condorPARENT A CHILD B CPARENT B C CHILD D
int main(){ GS_On(); task_A(f1, f2, f3); task_B(f2, f4); task_C(f3, f5); task_D(f4, f5, f6); GS_Off(0);}
A
B C
DExplicit parallelism
No if/while clauses
GRID superscalar: a programming model for the Grid
26
2.3 Programming comparison
Ninf-G vs GRIDSsint main(){ grpc_initialize("config_file"); grpc_object_handle_init_np("A", &A_h, "class"); grpc_object_handle_init_np("B", &B_h," class"); for(i = 0; i < 25; i++) { grpc_invoke_async_np(A_h,"foo",&sid,f_in[2*i],f_out[2*i]); grpc_invoke_async_np(B_h,"foo",&sid,f_in[2*i+1],f_out[2*i+1]); grpc_wait_all(); } grpc_object_handle_destruct_np(&A_h); grpc_object_handle_destruct_np(&B_h); grpc_finalize();} int main()
{ GS_On(); for(i = 0; i < 50; i++)
foo(f_in[i], f_out[i]); GS_Off(0);}
Grid-awareExplicit parallelism
GRID superscalar: a programming model for the Grid
27
2.3 Programming comparison
VDL vs GRIDSsDV trans1( a2=@{output:tmp.0}, a1=@{input:filein.0} );DV trans2( a2=@{output:fileout.0}, a1=@{input:tmp.0} );
DV trans1( a2=@{output:tmp.1}, a1=@{input:filein.1} );DV trans2( a2=@{output:fileout.1}, a1=@{input:tmp.1} );
...
DV trans1( a2=@{output:tmp.999}, a1=@{input:filein.999} );DV trans2( a2=@{output:fileout.999}, a1=@{input:tmp.999} );
int main(){ GS_On(); for(i = 0; i < 1000; i++) {
tmp = "tmp." + i; filein = "filein." + i;fileout = "fileout." + i;trans1(tmp, filein);trans2(fileout, tmp);
} GS_Off(0);}
No if/while clauses
GRID superscalar: a programming model for the Grid
28
Outline
1. Introduction2. Programming interface3. Runtime
3.1 Scientific contributions3.2 Developments3.3 Evaluation tests
4. Fault tolerance at the programming model level5. Conclusions and future work
GRID superscalar: a programming model for the Grid
29
3.1 Scientific contributions
Runtime objectives– Extract implicit parallelism in sequential
applications
– Speed up execution using the Grid
Main requirement: Grid middleware– Job management
– Data transfer
– Security
GRID superscalar: a programming model for the Grid
30
3.1 Scientific contributions
Apply computer architecture knowledge to the Grid (superscalar processor)
Grid
ns seconds/minutes/hours
L3 D
irectory/C
on
trol L2 L2 L2
LSU LSUIFUBXU
IDU IDU
IFUBXU
FPU FPU
FX
U
FX
U
ISU ISU
GRID superscalar: a programming model for the Grid
31
3.1 Scientific contributions
Data dependence analysis: allow parallelism
Read after Write
Write after Read
Write after Write
task1(..., f1)
task2(f1, ...)
task1(f1, ...)
task2(..., f1)
task1(..., f1)
task2(..., f1)
GRID superscalar: a programming model for the Grid
32
3.1 Scientific contributions
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
matmul(A(0,0), B(0,0), C(0,0))
matmul(A(0,1), B(1,0), C(0,0))
matmul(A(0,2), B(2,0), C(0,0))
i = 0
j = 0
matmul(A(0,0), B(0,0), C(0,1))
matmul(A(0,1), B(1,0), C(0,1))
matmul(A(0,2), B(2,0), C(0,1))
i = 0
j = 1
...
k = 0
k = 1
k = 2
k = 0
k = 1
k = 2
GRID superscalar: a programming model for the Grid
33
3.1 Scientific contributions
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
matmul(A(0,0), B(0,0), C(0,0))
matmul(A(0,1), B(1,0), C(0,0))
matmul(A(0,2), B(2,0), C(0,0))
i = 0
j = 0
matmul(A(0,0), B(0,0), C(0,1))
matmul(A(0,1), B(1,0), C(0,1))
matmul(A(0,2), B(2,0), C(0,1))
i = 0
j = 1
i = 0
j = 2
...
i = 1
j = 0
i = 1
j = 1
i = 1
j = 2
...
k = 0
k = 1
k = 2
k = 0
k = 1
k = 2
GRID superscalar: a programming model for the Grid
34
3.1 Scientific contributions
File renaming: increase parallelism
Read after Write
Write after Read
Write after Write
task1(..., f1)
task2(f1, ...)Unavoidab
le
task1(f1, ...)
task1(..., f1)
Avoidable
Avoidable
task2(..., f1)task2(..., f1_NEW)
task2(..., f1)task2(..., f1_NEW)
GRID superscalar: a programming model for the Grid
35
3.2 Developments
Basic functionality– Job submission (middleware usage)
• Select sources for input files• Submit, monitor or cancel jobs• Results collection
– API implementation• GS_On: read configuration file and environment• GS_Off: wait for tasks, cleanup remote data, undo renaming• GS_(F)Open: create a local task• GS_(F)Close: notify end of local task• GS_Barrier: wait for all running tasks to finish• GS_System: translate path• GS_Speculative_End: barrier until throw. If throw, discard
tasks from throw to GS_Speculative_End• GS_Throw: use gs_result to notify it
GRID superscalar: a programming model for the Grid
36
3.2 Developments
Middleware
...
Task scheduling: Direct Acyclic Graph
GRID superscalar: a programming model for the Grid
37
3.2 Developments
Task scheduling: resource brokering– A resource broker is needed (but not an objective)
– Grid configuration file• Information about hosts (hostname, limit of jobs, queue,
working directory, quota, …)• Initial set of machines (can be changed dynamically)
<?xml version="1.0" encoding="UTF-8"?><project isSimple="yes" masterBandwidth="100000" masterBuildScript="" masterInstallDir="/home/rsirvent/matmul-master" masterName="bscgrid01.bsc.es" masterSourceDir="/datos/GRID-S/GT4/doc/examples/matmul" name="matmul" workerBuildScript="" workerSourceDir="/datos/GRID-S/GT4/doc/examples/matmul">...<workers><worker Arch="x86" GFlops="5.985" LimitOfJobs="2" Mem="1024" NCPUs="2" NetKbps="100000" OpSys="Linux" Queue="none" Quota="0" deploymentStatus="deployed" installDir="/home/rsirvent/matmul-worker" name="bscgrid01.bsc.es">
GRID superscalar: a programming model for the Grid
38
3.2 Developments
Task scheduling: resource brokering– Scheduling policy
• Estimation of total execution time of a single task
• FileTransferTime: time to transfer needed files to a resource (calculated with the hosts information and the location of files)
– Select fastest source for a file
• ExecutionTime: estimation of the task’s run time in a resource. An interface function (can be calculated, or estimated by an external entity)
– Select fastest resource for execution
• Smallest estimation is selected
imeExecutionTerTimeFileTransft
GRID superscalar: a programming model for the Grid
39
3.2 Developments
Task scheduling: resource brokering– Match task constraints and machine capabilities– Implemented using the ClassAd library
• Machine: offers capabilities (from Grid configuration file: memory, architecture, …)
• Task: demands capabilities
– Filter candidate machines for a particular task
Software = BLAST
SoftwareList = BLAST, GAMESS
SoftwareList = GAMESS
GRID superscalar: a programming model for the Grid
40
3.2 Developments
Middleware
f1 f2
f3f3
Task scheduling: File locality
imeExecutionTerTimeFileTransft
GRID superscalar: a programming model for the Grid
41
3.2 Developments
Other file locality exploitation mechanisms– Shared input disks
• NFS or replicated data
– Shared working directories• NFS
– Erasing unused versions of files (decrease disk usage)
– Disk quota control (locality increases disk usage and quota may be lower than expected)
GRID superscalar: a programming model for the Grid
42
3.3 Evaluation
NAS Grid Benchmarks Representative benchmark, includes different types of workflows which emulate a wide range of Grid Applications
Simple optimization example
Representative of optimization algorithms, workflow with two-level synchronization
New product and process development
Production application, workflow with parallel chains of computation
Potential energy hypersurface for acetone
Massively parallel, long running application
Protein comparison Production application, big computational challenge, massively parallel, high number of tasks
fastDNAml Well-known application in the context of MPI for Grids, workflow with synchronization steps
GRID superscalar: a programming model for the Grid
43
3.3 Evaluation
NAS Grid BenchmarksLaunchLaunch
ReportReport
SP
SP
SP
SPSP
SPSP
SPSP
SP
SP
SP
SPSP
SPSP
SPSP
SP
SP
SP
SPSP
SPSP
SPSP
Launch
Report
BT MG FT
BT MG FT
BT MG FT
MF
MF
MF
MFMF
MF
LaunchLaunch
ReportReport
BTBT MGMG FTFT
BTBT MGMG FTFT
BTBT MGMG FTFT
MF
MF
MF
MFMF
MF
Launch
Report
LU LU LU
MG MG MG
FT FT FT
MFMFMF
MFMFMF
LaunchLaunch
ReportReport
LULU LULU LULU
MGMG MGMG MGMG
FTFT FTFT FTFT
MFMFMF
MFMFMF
Launch
Report
BT SP LU
BT SP LU
BT SP LU
MF
MF MF
MF
MF MF
MF
MF
LaunchLaunch
ReportReport
BTBT SPSP LULU
BTBT SPSP LULU
BTBT SPSP LULU
MF
MF MF
MF
MF MF
MF
MF
ED
HC
VP
MB
GRID superscalar: a programming model for the Grid
44
3.3 Evaluation
Run with classes S, W, A (2 machines x 4 CPUs) VP benchmark must be analyzed in detail (does
not scale up to 3 CPUs)
0,00
0,50
1,00
1,50
2,00
2,50
3,00
1 2 3 4
Limit of tasks (Kadesh8)
Sp
eed
up
MB.S
MB.W
VP.S
VP.W
VP.A
GRID superscalar: a programming model for the Grid
45
3.3 Evaluation
Performance analysis– GRID superscalar runtime instrumented– Paraver tracefiles from the client side– The lifecycle of all tasks has been studied in detail
Overhead of GRAM Job Manager polling intervalGlobus overhead (VP.W)
05
1015202530354045
1 3 5 7 9 11 13 15
Task N
tim
e (
s) Task duration
Active to Done
Request to Active
GRID superscalar: a programming model for the Grid
46
3.3 Evaluation
VP.S task assignment– 14.7% of the transfers when exploiting locality– VP is parallel, but its last part is sequentially executed
BT
BT
BT
MF
MF
MF
MG
MG
MG
MF
MF
MF
FT
FT
FT
Kadesh8
Khafre Remote file transfers
GRID superscalar: a programming model for the Grid
47
3.3 Evaluation
Conclusion: workflow and granularity are important to achieve speed up
GRID superscalar: a programming model for the Grid
48
Two-dimensional potential energy hypersurface for acetone as a function of the 1, and 2 angles
3.3 Evaluation
GRID superscalar: a programming model for the Grid
49
3.3 Evaluation
Number of executed tasks: 1120 Each task between 45 and 65 minutes Speed up: 26.88 (32 CPUs), 49.17 (64 CPUs) Long running test, heterogeneous and
transatlantic Grid
22 CPUs14 CPUs
28 CPUs
GRID superscalar: a programming model for the Grid
50
15 million Proteins
Genomes
15 million Proteins
3.3 Evaluation
15 million protein sequences have been compared using BLAST and GRID superscalar
GRID superscalar: a programming model for the Grid
51
3.3 Evaluation
100,000 tasks in 4000 CPUs (= 1,000 exclusive nodes)
“Grid” of 1,000 machines with very low latency between them– Stress test for the runtime
Avoids user to work with queuing system Saves queuing system from handling a huge set
of independent tasks
GRID superscalar: a programming model for the Grid
52
GRID superscalar: programming interface and runtime
Publications
Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta, "Automatic Grid workflow based on imperative programming languages", Concurrency and Computation: Practice and Experience, John Wiley & Sons, vol. 18, no. 10, pp. 1169-1186, 2006.
Rosa M. Badia, Raul Sirvent, Jesus Labarta, Josep M. Perez, "Programming the GRID: An Imperative Language-based Approach", Engineering The Grid: Status and Perspective, Section 4, Chapter 12, American Scientific Publishers, January 2006.
Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela and Rogeli Grima, "Programming Grid Applications with GRID Superscalar", Journal of Grid Computing, Volume 1, Issue 2, 2003.
GRID superscalar: a programming model for the Grid
53
GRID superscalar: programming interface and runtime
Work related to standards
R.M. Badia, D. Du, E. Huedo, A. Kokossis, I. M. Llorente, R. S. Montero, M. de Palol, R. Sirvent, and C. Vázquez, "Integration of GRID superscalar and GridWay Metascheduler with the DRMAA OGF Standard", Euro-Par, 2008.
Raül Sirvent, Andre Merzky, Rosa M. Badia, Thilo Kielmann, "GRID superscalar and SAGA: forming a high-level and platform-independent Grid programming environment", CoreGRID Integration Workshop. Integrated Research in Grid Computing, Pisa (Italy), 2005.
GRID superscalar: a programming model for the Grid
54
Outline
1. Introduction2. Programming interface3. Runtime4. Fault tolerance at the programming model level
4.1 Checkpointing4.2 Retry mechanisms4.3 Task replication
5. Conclusions and future work
GRID superscalar: a programming model for the Grid
55
4.1 Checkpointing
Inter-task checkpointing Recovers sequential consistency in the out-of-
order execution of tasks– Single version of every file is saved– No need to save any data structures in the runtime
Drawback: some completed tasks may be lost– Application-level checkpoint can avoid this
30 1 2 3 4 5 6
GRID superscalar: a programming model for the Grid
56
4.1 Checkpointing
Conclusions– Low complexity in order to checkpoint a task
• ~1% overhead introduced
– Can deal with both application level errors or Grid level errors
• Most important when an unrecoverable error appears
– Transparent for end users
GRID superscalar: a programming model for the Grid
57
4.2 Retry mechanisms
Middleware
Automatic drop of machines
CC
GRID superscalar: a programming model for the Grid
58
4.2 Retry mechanisms
Middleware
Soft and hard timeouts for tasks
Soft timeout
FailureSuccess
Soft timeoutHard timeout
GRID superscalar: a programming model for the Grid
59
4.2 Retry mechanisms
Middleware
Retry of operations
Request
FailureSuccess
syscall
SuccessFailure
GRID superscalar: a programming model for the Grid
60
4.2 Retry mechanisms
Conclusions– Keep running despite failures– Dynamic: when and where to resubmit– Detects performance degradations– No overhead when no failures are detected– Transparent for end users
GRID superscalar: a programming model for the Grid
61
4.3 Task replication
Middleware
Replicate running tasks depending on successors
3 4 5 6 7
0 1 210 21
GRID superscalar: a programming model for the Grid
62
4.3 Task replication
Middleware
Replicate running tasks to speed up the execution
3 4 5 6 7
0 1 210
GRID superscalar: a programming model for the Grid
63
4.3 Task replication
Conclusions– Dynamic replication: application level knowledge is used
(the workflow)– Replication can deal with failures hiding retry overhead– Replication can speed up applications in heterogeneous
Grids– Transparent for end users
– Drawback: increased usage of resources
GRID superscalar: a programming model for the Grid
64
4. Fault tolerance features
Publications
Vasilis Dialinos, Rosa M. Badia, Raül Sirvent, Josep M. Pérez and Jesús Labarta, "Implementing Phylogenetic Inference with GRID superscalar", Cluster Computing and Grid 2005 (CCGRID 2005), Cardiff, UK, 2005.
Raül Sirvent, Rosa M. Badia and Jesús Labarta, "Graph-based task replication for workflow applications", Submitted, HPCC 2009.
GRID superscalar: a programming model for the Grid
65
Outline
1. Introduction2. Programming interface3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
GRID superscalar: a programming model for the Grid
66
5. Conclusions and future work
Grid-unaware programming model
Transparent features for users, exploiting parallelism and failure treatment
Used in REAL systems and REAL applications
Some future research is already ONGOING (StarSs)
GRID superscalar: a programming model for the Grid
67
5. Conclusions and future work
Future work– Grid of supercomputers (Red Española de
Supercomputación)– Higher scale tests (hundreds? thousands?)– More complex brokering
• Resource discovery/monitoring• New scheduling policies based on the workflow• Automatic prediction of execution times
– New policies for task replication– New architectures for StarSs