Programming the Grid with POP-C ++

8
Future Generation Computer Systems 23 (2007) 23–30 www.elsevier.com/locate/fgcs Programming the Grid with POP-C++ Tuan-Anh Nguyen a,* , Pierre Kuonen b a HCMC University of Technology, Faculty of Computer Science and Engineering, 268 Ly Thuong Kiet, Dst. 10, Ho Chi Minh City, Viet Nam b University of Applied Sciences of Fribourg, EIA-FR, Fribourg, Switzerland Received 3 May 2005; received in revised form 3 April 2006; accepted 5 April 2006 Available online 15 June 2006 Abstract Despite the fact that Grid computing is the main theme of distributed computing research during the last few years, programming on the Grid is still a huge difficulty to normal users. The POP-C++ programming system has been built to provide Grid programming facilities which greatly ease the development and the deployment of parallel applications on the Grid. The original parallel object model used in POP-C++ is a combination of powerful features of object-oriented programming and of high-level distributed programming capabilities. The model is based on the simple idea that objects are suitable structures to encapsulate and to distribute heterogeneous data and computing elements over the Grid. Programmers can guide the resource allocation for each object through the high-level resource descriptions. The object creation process, supported by the POP-C++ runtime system, is transparent to programmers. Both inter-object and intra-object parallelism are supported through various method invocation semantics. The POP-C++ programming language extends C++ to support the parallel object model with just a few new keywords. In this paper, we present the Grid programming aspects of POP-C++. With POP-C++, writing a Grid-enabled application becomes as simple as writing a sequential C++ application. c 2006 Elsevier B.V. All rights reserved. Keywords: Parallel object; Distributed object; Parallelism; Programming model; Grid programming 1. Introduction Although many researchers focus on the Grid infrastructure such as the resource management and discovery [1], the service architecture [2], the Grid security [3,4], the Grid data management [5], etc., Grid programming is still a huge difficulty to end users. Recent efforts to bring traditional programming tools to the Grid such as MPI [6], RMI [7], ProActive [8] or JavaSymphony [9] attain some success. However, exploiting the Grid performance regarding its heterogeneity is still very tricky. We have developed a parallel programming system for the Grid called POP-C++ 1 that allows programmers to exploit the heterogeneous performance of the Grid easily and transparently. The POP-C++ system provides Grid programming capabilities and Grid deployment support at different levels: from a programming tool (POP-C++ language * Corresponding author. Tel: +84 8 864 72 56; fax: +84 8 864 51 37. E-mail addresses: [email protected], [email protected] (T.-A. Nguyen), [email protected] (P. Kuonen). 1 The former name of POP-C++ is ParoC++. and compiler) for writing Grid applications to the runtime services for running applications on the Grid. Inspired by CORBA [10] and C++, the POP-C++ programming language extends C++ by adding a new type of “parallel object”, allowing “C++-like objects” to run on distributed resources. With POP-C++, programming on the Grid is as simple as writing a sequential C++ program. This paper focuses on programming aspects of the POP- C++ system. We first give an overview of the system in Section 2. The POP-C++ programming model is discussed in Section 3. In Section 4, we present the POP-C++ programming language. Section 5 is devoted to the discussion of parallelism in POP-C++. Section 6 shows some experiments and a case study. Some related works are discussed in Section 7 before the conclusion in Section 8. 2. POP-C++ system overview Fig. 1 presents the POP-C++ layered architecture. The architecture supports the Grid-enabled application development and deployment at different levels, e.g. from the programming 0167-739X/$ - see front matter c 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2006.04.012

Transcript of Programming the Grid with POP-C ++

Page 1: Programming the Grid with POP-C ++

Future Generation Computer Systems 23 (2007) 23–30www.elsevier.com/locate/fgcs

Programming the Grid with POP-C++

Tuan-Anh Nguyena,∗, Pierre Kuonenb

a HCMC University of Technology, Faculty of Computer Science and Engineering, 268 Ly Thuong Kiet, Dst. 10, Ho Chi Minh City, Viet Namb University of Applied Sciences of Fribourg, EIA-FR, Fribourg, Switzerland

Received 3 May 2005; received in revised form 3 April 2006; accepted 5 April 2006Available online 15 June 2006

Abstract

Despite the fact that Grid computing is the main theme of distributed computing research during the last few years, programming on theGrid is still a huge difficulty to normal users. The POP-C++ programming system has been built to provide Grid programming facilities whichgreatly ease the development and the deployment of parallel applications on the Grid. The original parallel object model used in POP-C++

is a combination of powerful features of object-oriented programming and of high-level distributed programming capabilities. The model isbased on the simple idea that objects are suitable structures to encapsulate and to distribute heterogeneous data and computing elements over theGrid. Programmers can guide the resource allocation for each object through the high-level resource descriptions. The object creation process,supported by the POP-C++ runtime system, is transparent to programmers. Both inter-object and intra-object parallelism are supported throughvarious method invocation semantics. The POP-C++ programming language extends C++ to support the parallel object model with just afew new keywords. In this paper, we present the Grid programming aspects of POP-C++. With POP-C++, writing a Grid-enabled applicationbecomes as simple as writing a sequential C++ application.c© 2006 Elsevier B.V. All rights reserved.

Keywords: Parallel object; Distributed object; Parallelism; Programming model; Grid programming

1. Introduction

Although many researchers focus on the Grid infrastructuresuch as the resource management and discovery [1], theservice architecture [2], the Grid security [3,4], the Griddata management [5], etc., Grid programming is still a hugedifficulty to end users. Recent efforts to bring traditionalprogramming tools to the Grid such as MPI [6], RMI [7],ProActive [8] or JavaSymphony [9] attain some success.However, exploiting the Grid performance regarding itsheterogeneity is still very tricky.

We have developed a parallel programming system forthe Grid called POP-C++

1 that allows programmers toexploit the heterogeneous performance of the Grid easilyand transparently. The POP-C++ system provides Gridprogramming capabilities and Grid deployment support atdifferent levels: from a programming tool (POP-C++ language

∗ Corresponding author. Tel: +84 8 864 72 56; fax: +84 8 864 51 37.E-mail addresses: [email protected], [email protected]

(T.-A. Nguyen), [email protected] (P. Kuonen).1 The former name of POP-C++ is ParoC++.

0167-739X/$ - see front matter c© 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.future.2006.04.012

and compiler) for writing Grid applications to the runtimeservices for running applications on the Grid.

Inspired by CORBA [10] and C++, the POP-C++

programming language extends C++ by adding a new typeof “parallel object”, allowing “C++-like objects” to run ondistributed resources. With POP-C++, programming on theGrid is as simple as writing a sequential C++ program.

This paper focuses on programming aspects of the POP-C++ system. We first give an overview of the system inSection 2. The POP-C++ programming model is discussed inSection 3. In Section 4, we present the POP-C++ programminglanguage. Section 5 is devoted to the discussion of parallelismin POP-C++. Section 6 shows some experiments and a casestudy. Some related works are discussed in Section 7 before theconclusion in Section 8.

2. POP-C++ system overview

Fig. 1 presents the POP-C++ layered architecture. Thearchitecture supports the Grid-enabled application developmentand deployment at different levels, e.g. from the programming

Page 2: Programming the Grid with POP-C ++

24 T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30

Fig. 1. The layered architecture of POP-C++ system.

language for writing applications up to the runtime system forexecuting applications on the Grid.

The POP-C++ runtime system consists of the infrastructureservice layer managed by some Grid toolkits (e.g. GlobusToolkit or XtremWeb [11]), the POP-C++ service layerto interface with the Grid infrastructures (e.g. POP-C++

services for Globus), and the POP-C++ essential serviceabstractions layer that provides a well defined abstract interfacefor the programming layer. Details of the POP-C++ runtimeare described in [12]. POP-C++ programming, on top ofthe architecture, is the most important layer that providesnecessary support for developing Grid-enabled object-orientedapplications based on the parallel object model. This layer ispresented in the next section.

3. The parallel object model

The model introduces a new type of object: parallelobjects which co-exist and co-operate with sequential objectsduring the execution of applications. Parallel objects generalizesequential objects by keeping “good” object properties suchas data encapsulation, inheritance and polymorphism and byadding the following new properties to objects:

• Distributed shareable objects• Dynamic and transparent object allocation driven by high-

level requirement descriptions• Various method invocation semantics.

3.1. Distributed shareable objects

Like CORBA [10], parallel objects are distributed objects.However, parallel objects allow several enhancements com-pared to CORBA. Firstly, the parallel object model follows theobject-centric approach where programmers can dynamicallycreate and destroy parallel objects on demand and at any dis-tributed place within the application.

Secondly, passing a parallel object from one place to anotherplace through remote method invocations is transparent toprogrammers. This helps to remove the resource boundary thatexists in traditional distributed object-oriented applications.

Finally, as each parallel object is accessible from otherdistributed components, all parallel objects are shareable insidea single application.

3.2. Requirement driven object allocation

In rather static, homogeneous and small distributedenvironments such as in Clusters of Workstations (CoW),users know all the available machines, thus they can manuallyassign a suitable machine to each parallel object. When thesize, the heterogeneity and the volatility of the environmentincrease, such as on the Grid, a manual selection of suitableresources becomes extremely difficult. In such environments,supporting tools should allow users to describe high-levelresource requirements and have the underlying services selectautomatically suitable resources.

Parallel objects describe their resource requirements in aso-called object description (OD). Each constructor can beassociated with an OD. The resource expressions in each ODcan be parameterized with the actual inputs of that constructor.Upon creating a parallel object, the OD will be automaticallyevaluated to generate the object-resource specification which,in turn, will be used by the runtime system to perform resourcediscovery, resource reservation and resource allocation for theobject.

An OD expression can be on the computing power, thememory space, the network bandwidth, the communicationprotocol, etc. It can also contain optional information suchas the name of the resource discovery service, the objectexecutable location, or the data encoding method (raw, XDRor SOAP/XML).

3.3. Method invocation semantics

The parallel object model enhances method invocationsemantics to support inter-object and intra-object parallelism.These semantics define the behavior of each method invocationat both sides: the outside which invokes a method (interface-side) and the internal execution of the method (object-side).

• Interface-side semantics — the semantics that affects thecaller. The semantics can be one of the following:

Page 3: Programming the Grid with POP-C ++

T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30 25

Table 1Comparison between POP-C++ parallel object class and C++ class

POP-C++ C++ Description

parclass MyClass { class MyClass { Class declaration

public: public: Constructor declaration.MyClass() @{od.power(100); }; MyClass(); Parclass has an additional object description.

async conc void Method1(); void Method1(); Method declaration: parclass methods added newkeywords for different invocation semantics.

sync seq int Method2(int p); int Method2(int p);

private: private: Attribute declaration: parclass attributes must be privateor protected.

int attr; int attr;

}; };

. . . . . . Class implementation

MyClass::MyClass() MyClass::MyClass() Constructor implementation: the same in bothPOP-C++ and C++

{. . . {. . .} }

void MyClass::MyMethod() MyMethod::MyClass() Method implementation: the same in both POP-C++

and C++

{. . . {. . .} }

. . . . . .

· Synchronous invocation: the caller waits until the remotemethod execution is finished and the results are returned.

· Asynchronous invocation: the caller returns immediatelyafter sending its request to the remote object. Thisinvocation type excludes the remote method from returningthe results. To retrieve them, the caller can provide a“callback” parallel object as a method argument so that theresults will be actively sent back through this “callback”(Fig. 2).

• Object-side semantics — the execution semantics ofmethods inside each parallel object. The semantics can beone of the following:· Mutex invocation: this semantics guarantees the atomicity

and the total barrier of all method invocations within theparallel object. A mutex invocation request is executedonly if no method instance is currently running. Otherwise,it will be pending until all other method instances areterminated. While pending, the mutex also blocks all otherrequests arriving afterward from being served (barrier).

· Sequential invocation: this guarantees the serialization ofall sequential methods. All sequential methods will beserved in their arrival orders without overlap. Waitingfor a sequential invocation request does not block otherconcurrent methods from being executed.

· Concurrent invocation: the method execution occurs ina new process (thread) if no mutex method is currentlypending or executed (see the mutex invocation semantics).There is no guarantee of the execution order or theatomicity. Concurrent invocations can improve the overlapbetween computation and communication and enhance theperformance on SMP systems.

Fig. 3 illustrates different object-side method invocationsemantics. Sequential invocation Seq1 is served immediately,

Fig. 2. Store asynchronous method invocation results using a callback object.

running concurrently with Conc1. Seq2 is delayed due to thecurrent execution of Seq1. When the mutex invocation Mutex1arrives, it has to wait for other running methods to finish.During this waiting, it blocks other requests arriving afterward(Conc3) (atomicity and barrier).

4. POP-C++ programming language

POP-C++ implements the parallel object model (Section 3)by extending the C++ language to support the parallel objectclass (or parclass for short). Table 1 shows a comparisonbetween a parclass in POP-C++ and a traditional C++ class.There are only a few differences in the class declaration.

4.1. Parclass method declarations and invocations

The user defines invocation semantics for each parclassmethod. These semantics, described in Section 3.3, are

Page 4: Programming the Grid with POP-C ++

26 T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30

Fig. 3. Method execution semantics: concurrent (Conc1, Conc2, Conc3), sequential (Seq1, Seq2) and mutex (Mutex1).

specified by two keywords, one for each side: sync(synchronous) or async (asynchronous) for the interface-side;and conc (concurrent), seq (sequential) or mutex for theobject-side.

Programmers can help the POP-C++ compiler to generatethe efficient code by specifying which arguments haveto be transferred by directives in (for input), out (foroutput) or both. Output arguments are only possible forsynchronous invocations. Without those directives, the currentimplementation of POP-C++ applies the following rules:

• If the method is asynchronous, all arguments are input-only.• If the method is synchronous:

· Constant and passing-by-value arguments are input-only.· Pointer and array arguments are both input and output.

POP-C++ can marshal/demarshal automatically all non-pointer data types (simple types and struct types). Data pointersin C++ are ambiguous. Therefore, programmers have toexplicitly supply the number of elements using the directivesize=<number expression> in the marshalling block.

The following example illustrates how to declare a parclassmethod. Sort is a synchronous-concurrent method of SortObjwith two arguments: the integer array data (input and output)and the size n (input only) of data:

parclass SortObj {...sync conc void Sort([in, out, size=n]

int *data, int n);};

The current release of POP-C++ (http://www.eif.ch/gridgroup/popc, version 1.1) supports several communicationprotocols: TCP/IP socket, GM/Myrinet (experimental) andHTTP (to bypass firewalls). Negotiation for communicationprotocols is performed automatically at runtime. Differentdata encoding methods can also be used: Sun XDR; raw;compression forms of XDR and raw data encoding (withZLIB); and SOAP/XML (experimental, to provide webservices-based access to parallel objects).

4.2. Object description

Object description (OD) can be declared for each parclassconstructor directly after the constructor declaration and beforethe end-of-instruction symbol “;”. The syntax of an OD is asfollows:

@{resources requirement expression}The current implementation allows programmers to specify

the resource requirements in terms of computing power(Mflops; keyword: power), memory (MB; keyword: memory),bandwidth (Mb/s; network), resource location (host name orIP address; url), protocol (socket or http; protocol), and dataencoding (raw, xdr, raw-zlib or xdr-zlib; encoding).

The exact syntax as well as a simple example of the OD ispresented below:

resources requirement expression :=od.<resource type1> (<numexpr>) |

od.<resource type1> (<numexpr>,<numexpr>) |

od.<resource type2> (<strexpr>)

resource type1 := power | memory | network |

walltimeresource type2 := protocol | encoding | url

numexpr := a number expressionstrexpr := a null-terminated string expression

Example: the constructor of the parclass Matrix (size n ∗ m)bellow requires a desired computing power of 200 Mflops(at least 100 MFlops is acceptable) (non-strict OD) and thecommunication protocol is socket or http.

parclass Matrix{

Matrix(int n, int m) @{ od.power(200, 100);od.protocol("socket http"); };...

};

Object descriptions are used by the POP-C++ runtimesystem (see [12]) to find a suitable resource for the parallelobject.

4.3. Object events

Each parallel object has its own event queue which canbe used to synchronize and to notify other concurrent methodexecutions within the object about some particular events. Anevent is represented as a positive number whose semanticsdepends on each application. Waiting for an event “n” usingeventwait(n) blocks the execution until other method raises

Page 5: Programming the Grid with POP-C ++

T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30 27

(a) Heterogeneous tasks. (b) MPI implementation. (c) POP-C++ implementation.

Fig. 4. Message passing versus pull data in POP-C++.

that event by eventraise(n). One typical use of object eventsis to mark the arrival of new data from other parallel objects.

Object events to improve the parallel efficiency are discussedin Section 5.

4.4. Mutual exclusive execution

The user can surround different code blocks in each parallelobject by the mutex keyword to allow these blocks to beexecuted in mutual exclusion mode. Mutex blocks are parallelobject scope.

5. Parallelism in POP-C++

We discuss a scenario where the POP model can efficientlyimprove the communication/computation overlap of parallelapplications.

Many modern parallel and distributed applications suchas those in snow process modeling and simulation [13]or in mobile network simulation and optimization [14]are heterogeneous. Such applications require multilevelparallelism. Efficient MPI or PVM implementations forthose heterogeneous applications are generally difficult,especially in heterogeneous environments like the Grid becauseprogrammers have to manually synchronize between send andreceive calls on the one hand, and between the imbalance ofcomputation/communication of different parallel tasks (tasksT12 and T22, T13 and T23 in Fig. 4(a)) on the other hand.

With the message passing paradigm, such issues illustratedin Fig. 4(a) can be solved by re-ordering the send/receive pairsbetween parallel tasks (Fig. 4(b)). This usually leads to thesending task waiting for others to receive data or vice versa.

Instead of using send/receive, POP-C++ programmersuse method invocations to set data on remote objectswithout disturbing the computation on that object. When thecomputation thread needs new data from another object, itchecks the “arrival” of data using the object event queue

(Section 4.3). Fig. 4(c) shows an example with two POP-C++ objects (tasks) O1 and O2. When one object producesnew data, it immediately calls the asynchronous concurrentmethod SetData of the remote object. SetData, just beforefinishing, raises an “arrival” event by eventraise(arrival)on its object event queue. Computation on this object continuesuntil it needs new data. Then it checks for the event byeventwait(arrival) and has to wait only if SetDatais not previously invoked yet. Thus, the user can optimizethe computation/communication overlap among heterogeneousparallel tasks.

6. Experiments

6.1. Communication cost

We test the communication service of POP-C++ by a smallpingpong program on a Pentium 4 Linux cluster with FastEthernet.

Fig. 5 show the results in comparison with those ofMPICH. Asynchronous invocations, due to the overlap, usebetter bandwidth than synchronous invocations. It is slightlybetter than the asynchronous send (one way) of MPICH.With synchronous invocations, MPICH achieves somehowbetter bandwidth in our experiment (15%–20% better for largemessages). This is due to the extra cost for marshalling dataand multiplexing remote methods in POP-C++ that cannot beoverlapped at both the interface and the object sides. AlthoughPOP-C++ is less efficient than the MPI in synchronouscommunication, it provides much higher abstraction. Moreover,to improve the efficiency, asynchronous communication shouldbe used wherever it is possible.

In addition to the bandwidth, we also measure the latency ofmethod invocations. POP-C++ gives a better latency comparedto MPICH: 94 µs for synchronous invocations (no overlap)(MPICH: 123 µs).

Page 6: Programming the Grid with POP-C ++

28 T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30

Fig. 5. Parallel object communication cost.

Fig. 6. Matrix multiplication speedup on Linux/Pentium 4 machines.

6.2. Matrix multiplication

We compare the performance of POP-C++ and MPICHin matrix multiplications. The matrix sizes of 4000 × 4000(M4000) and 5000 × 5000 (M5000) are chosen for this test.

The speedup of the parallel algorithm over the sequential oneis presented in Fig. 6. We achieve an almost linear speedup up to8 processors in both MPICH and POP-C++ implementations.The speedups with 1 processor are 0.97 (M4000) and 0.95(M5000) in the POP-C++ implementation; 0.95 (M4000)and 0.88 (M5000) in the MPICH implementation. With8 processors, the speedups are almost the same in bothversions (POP-C++ and MPICH): 7.4 (92.5% efficiency)for 4000 × 4000 matrix and 7.2 (90% efficiency) forM5000 matrix. Although POP-C++ provides much higherprogramming abstraction than MPICH, it can achieve acomparable performance of MPICH in the test.

6.3. A real application

POP-C++ has been used to develop the Pattern and DefectDetection System (PDDS). PDDS is part of the Europeanproject Forall-12 in textile manufacturing to analyze continuoustissue images for pattern positions and defect detection. Thewhole system is described in Fig. 7. PDDS takes images fromthe AVS, performs parallel image analysis and outputs theresults to the NS which will then cut the tissue based on

2 European project E!1955/CTI 5130.1 financed by the Swiss Government inthe Eureka program.

pattern and defect positions. The specification requires a real-time analysis speed of at least 3.3 Mpixels/s.

The first experiment measures the PDDS speedup in ahomogeneous environment. The second experiment takes intoaccount the changes of the computation demand and of theheterogeneous environment (CPU load).

The input image for the first experiment consists of 100frames, each of size 2048 × 2048 pixels. Fig. 8 showsthe speedup on two tissues: small patterns (Sict2) and bigpatterns (Monti) on a network of Solaris workstations and ona Linux cluster. In both environments, almost linear speedup isachieved.

In the second experiment, PDDS is launched in a mixedenvironment of Solaris and Linux. We dynamically increasethe required speed every 2 min and ask the PDDS to adaptto this change online. Fig. 9 shows the actual analysis speedand the required speed over time. Additional compute objects(resources) are dynamically created to meet the demand.

One interesting note in Fig. 9 is that at 220 s, theactual analysis speed goes down because we had intentionallychanged the load of one machine used by PDDS. The systemreacts to this change by dynamically allocating more computeobjects to speed up the analysis. By this experiment we want toshow that:

• POP-C++ applications can efficiently deal with computa-tion on demand.

• POP-C++ can adaptively use heterogeneous resources ofthe Grid.

7. Related works

On the language aspect, Orca [15], MPL [16] and PO [17]provide some distributed object capabilities. Orca developsa new language based on shared objects. The programmingmodel used in Orca is Distributed Shared Memory (DSM) [18]for task parallelism. While Orca aims at using objects as a meanto share data between processes, our approach combines thetwo concepts of shared data object and process into a singleparallel object.

MPL on the other hand, is an extension of C++ withsome so-called Mentat classes for parallel execution. MPLfollows the data-driven model. The parallelism is achievedby concurrent invocations on these objects. While Mentatobjects support only asynchronous method invocations and arenot shareable, POP-C++ provides a more general approachwith various invocation types (synchronous, asynchronous,concurrent, sequential, mutex) and the capability of sharingobjects. Moreover, both Orca and MPL do not allow us tospecify the resource requirement for each object.

PO and our parallel objects both share the inter-object andintra-object parallelism. The difference is on the object model:PO follows the active object model [19] with the ability todecide when and which invocation requests to serve while ourparallel object model uses the passive object model similar toC++. The Abstract Configuration Language (ACL) in PO tospecify high-level directives for the object allocation is similarto our Object Description (OD); however, the ACL directives

Page 7: Programming the Grid with POP-C ++

T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30 29

Fig. 7. Overview of Forall system.

Fig. 8. Performance of PDDS/POP-C++.

Fig. 9. Adaptation to the changes.

are only expressed at class-level and cannot be parameterizedfor specific instances whereas our OD deals directly with eachobject instance. Therefore, our OD can be customized, basedon the real input parameters of the object.

On the tool aspect, COBRA [20] and Parallel DataCORBA [21] extend CORBA by encapsulating severaldistributed components (object parts) within an object and byimplementing the data parallelism based on automatic datapartitioning. This differs from our approach in which eachparallel object resides in a single memory address space andthe parallelism is achieved by parallel executions among objectsand concurrent executions of methods within the same object.In addition, the specification of resource requirements is notdefined in both Data Parallel CORBA and COBRA.

8. Conclusions

Exploiting the performance of heterogeneous resources ofthe Grid requires adaptation capabilities from the application.

Such adaptations have two forms: either the applicationcomponents should be somehow decomposed dynamically,based on the available resources of the environment; or thecomponents should allow the infrastructure to select suitableresources by providing descriptive information about theirresource requirements.

We have addressed these two forms of adaptation byintroducing our parallel objects and the POP-C++ object-oriented programming system. The integration of requirementdriven object allocation into the shareable parallel object,the multiple invocation semantics and the transparent parallelobject creation/destruction are the most distinct features ofPOP-C++. With POP-C++, programmers can easily “move”sequential objects to run on distributed resources with justvery minor changes inside the code. Programming in POP-C++ is as simple as programming in C++ where the maindifference is only on determining the appropriate methodinvocation semantics and the high-level resource requirementsin parallel object class declarations. This greatly facilitates Gridprogramming and allows users to apply existing object orientedsoftware engineering methods to develop their complexapplications.

Some primary experiments on POP-C++ have beenperformed. The results showed that (1) POP-C++ can achievea similar performance to MPICH; (2) POP-C++ can beefficiently used to deal with computation on demand for HPCapplications on the Grid.

The first version of POP-C++ has been released that allowsusers to write–compile–execute POP-C++ programs on theGrid. However, many issues, especially those in the POP-C++ runtime system such as the incompatibility of objectexecutables (dynamic link libraries), the object failures, thesupports for distributed debugging, etc. need to be furtherinvestigated. In addition, we are working to enhance POP-C++

in two directions: improve the inter-operability by integratingWeb Services into parallel objects; and enable POP-C++ onspecially design HPC systems such as Cray XT3 at the SwissNational Supercomputing Center (CSCS) or BlueGene/L at theSwiss Federal Institute of Technology in Lausanne (EPFL).

References

[1] I. Foster, C. Kesselman, Globus: A metacomputing infrastructure toolkit,International Journal of Supercomputer Applications 11 (1997) 115–128.

[2] I. Foster, C. Kesselman, J. Nick, S. Tuecke, Grid services for distributedsystem integration, Computer 35 (2002).

Page 8: Programming the Grid with POP-C ++

30 T.-A. Nguyen, P. Kuonen / Future Generation Computer Systems 23 (2007) 23–30

[3] V. Welch, F. Siebenlist, I. Foster, J. Bresnahan, K. Czajkowski, J. Gawor,C. Kesselman, S. Meder, L. Pearlman, S. Tuecke, Security for gridservices, in: Twelfth IEEE International Symposium on High PerformanceDistributed Computing, HPDC-12, 2003.

[4] D.C. Jiageng Li, A scalable authorization approach for the globus gridsystem, Future Generation Computer Systems 21 (2005) 191–301.

[5] W. Allcock, J. Bester, J. Bresnahan, A. Chervenak, L. Liming,S. Meder, S. Tuecke, GridFTP Protocol Specification. GGF GridFTPWorking Group Document, http://www.globus.org/research/papers.htm,2002.

[6] I. Foster, N. Karonis, A grid-enabled mpi: Message passing inheterogeneous distributed computing systems, in: Proc. 1998 SCConference, 1998.

[7] M. Alt, S. Gorlatch, Adapting java rmi for grid computing, FutureGeneration Computer Systems 21 (2005) 699–707.

[8] D. Caromel, L. Henrio, B. Serpette, Asynchronous and deterministicobjects, in: Proceedings of the 31st ACM Symposium on Principles ofProgramming Languages, ACM Press, 2004, pp. 123–134.

[9] A. Jugravu, T. Fahringer, Javasymphony, a programming model for thegrid, Future Generation Computer Systems 21 (2005) 239–247.

[10] H. Balen, Distributed Object Architectures with CORBA, CambridgeUniversity Press, 2000.

[11] F. Cappello, S. Djilali, G. Fedak, T. Herault, F. Magniette, V. Neri,O. Lodygensky, Computing on large-scale distributed systems: Xtremwebarchitecture, programming models, security, tests and convergence withgrid, Future Generation Computer Systems 21 (2005) 417–437.

[12] T.A. Nguyen, An object-oriented model for adaptive high performancecomputing on the computational Grid, Ph.D. Thesis, Swiss FederalInstitute of Technology, Lausanne, 2004.

[13] M. Lehning, I. Volksch, D. Gustafsson, T.A. Nguyen, M.Z. M. Stahli,Alpine3d: A detailed model of mountain surface processes and itsapplication to snow hydrology, Hydrological Processes 20 (2006).

[14] P. Kuonen, F. Guidec, P. Calegari, Multilevel parallelism applied tothe optimization of mobile networks, in: Proceedings of the High-Performance Computing, HPC’98, Society for Computer SimulationInternational, 1998, pp. 277–282.

[15] H.E. Bal, M.F. Kaashoek, A.S. Tanenbaum, Orca: A language for parallelprogramming of distributed systems, IEEE Transactions on SoftwareEngineering 18 (1992) 190–205.

[16] A. Grimshaw, A. Ferrari, E. West, Parallel Programming Using C++, TheMIT Press, Cambridge, MA, 1996, pp. 383–427.

[17] A. Corradi, L. Leonardi, F. Zambonelli, Hpo: a programming environmentfor object-oriented metacomputing, in: Proc. of the 23rd EUROMICROConference, 1997.

[18] I. Foster, C. Kesselman, The Grid: Blueprint for a New ComputingInfrastructure, Morgan Kaufmann Publishers, 1998.

[19] R. Chin, S. Chanson, Distributed object-based programming system,ACM Computing Surveys 23 (1991).

[20] K. Keahey, Pardis: Programmer-level abstractions for metacomputing,Future Generation Computer Systems 15 (1999) 637–647.

[21] T. Priol, C. Rene, COBRA: A CORBA-compliant programmingenvironment for high-performance computing, in: Proc. of Europar’98,Southampton, UK, 1998, pp. 1114–1122.

Tuan-Anh Nguyen is currently lecturer at the HoChi Minh city University of Technology (HCMUT),Vietnam. He finished his Bachelor of computer scienceat the University of Technology in Ho Chi Minh city,Vietnam in 1998. In 1999, he was awarded two-yearscholarship from the Swiss government studying at theComputer Science Theory Laboratory, EPFL. In 2001,he joined University of Applied Sciences of Valais asa scientific collaborator for two years. Then he moved

to University of Applied Sciences of Fribourg (EIA-FR) to work as researcherat the Grid and Ubiquitous Computing Group. He received his Ph.D. from theSwiss Federal Institute of Technology (EPFL) in 2004. He then continued atEIA-FR for his postdoc until end of 2005. From 2006, he went back to HCMUTto work there. His main research interests are high performance computing,Grid computing, programming models and object-oriented technologies.

Pierre Kuonen obtained a Ph.D. degree in computerscience from the Swiss Federal Institute of Technology(EPFL) in 1993. From that time, he steadily worked inthe field of parallel and distributed high performancecomputing. His main interests in this domain areparallel and distributed programming (models andalgorithms) and middleware for GRID systems. Since2002 he is full professor at the University of AppliedScience of Western Switzerland, Fribourg where he is

leading the GRID and Ubiquitous Computing Group (www.eif.ch/gridgroup).He actively participate to the CoreGRID European Network of Excellence andis co-leading the Swiss national project ISS project (Intelligent Grid SchedulingSystem).