master-midterm.ppt
-
Upload
sandra4211 -
Category
Documents
-
view
327 -
download
1
Transcript of master-midterm.ppt
A Framework for Flexible Programming in Complex Grid
Environments04/24/08Taura Lab. 2nd Year76426 Ken Hironaka
New Context for Grid Computing• Grid Computing:– Computation across multiple clusters over WAN
• Conventionally, high performance computing– Parallel programming experts
• Broadening of demands and needs– Natural Language Processing– Genetic Sequence Analysis
– The users are extending to Non-parallel programming experts
More Applications for Grid Computing
• Just computing ⇒ computing is only 1 part• “Cloud Computing”– Applications with Grid computing backend• Handle intensive computation• load-balancing
– e.g.: Web-Applicationsbackend
Application
Publicly accessible
Simple Job Submitter is not enough
Problems with conventional Frameworks
• Conventional Grid Computing– Distributed Task computation
frameworks– No interaction
• Answer to broader Application and Demands– Complex interaction
TaskFile
Fine GrainInteraction
Need for flexible workflow coordination without loss of simplicity
Programming support for Grid
Problems with Grid Computing
• Deployment Complexity on Grid Environments– Dynamically joining nodes– Node/Network failures– Network environment• Prevalence of NAT/firewall• Unreliable WAN connections
leave
join
Fire Wall
Faulty link
Configuration? Communication (sockets)? Error Handling?Need for simple deployment on complex environments
Our Contribution
• A distributed object-oriented Programming Framework that alleviates the burden of Grid environments– Flexibility of programming without loss of simplicity– Simplicity of deployment
• Run on the Grid with minimal configuration
– Real – Life Applications• Deployed an application on over 900 cores across 9 clusters• “trouble-shooting” search engine on the Grid
– As example of Cloud Computing
Agenda
• Introduction• Related Work• Proposal• Preliminary Experiments• Conclusion and Future Work
Distributed Objects and RMI• ProActive [Huet ‘04]• Distributed Object Oriented
– Objects on remote nodes• Work Delegation
– RMI (Remote Method Invocation)• Parallel Computation via
asynchronous RMI– Possible race-conditions
• Active Objects– 1 object = 1 thread– induces deadlocks
•
foo.doJob(args)
RMI
compute
foo
-Need for synchronization ⇒ cluttered with locks/synchronization
Coding becomes complex
Async. RMI
b.f()
a.g()a
bdeadlock
Handling Joins and Failures• JoJo [Nakada ‘04]
– Master – Worker framework– Event driven coding– A handler is invoked for each
event• Task completion• Node Joins• Node Failures
Join
Failure Handler
Join Handler
- Synchronization issues- Event driven programming
-For more complex problems, coding easily becomes unreadable
Resolving Connectivity on the Grid
• ProActive [Huet ‘04]– overlay Network for
communication– Resorts to manual
network configuration files• Specify each connection
ConnectionConfiguration File
NAT
Firewall
- configuration overhead becomes enormous on Grid scale
Configure each link
Agenda
• Introduction• Related Work• Proposal• Preliminary Experiments• Conclusion and Future Work
Our Proposal• A distributed object oriented framework for the
Grid– Distributed Objects with Grid programming support
• deadlock-free synchronization• Additional constructs to cope with join/failure of node
– Automatic and Adaptive Overlay Construction for Grid Runtime
- object oriented with support for race-condition/join/failure : flexibility and simplicity-deployment requires minimal configuration : simplicity
Object Synchronization Model• parallel programming with
minimal use of explicit locks
• Distributed objects with ownership– Its method can only be
executed by 1 thread at a time : the owner thread
– Eliminates data races• Owner gives up ownership
for blocking operations– Other threads may contest
for ownership– Eliminates deadlocks for
common cases
ThTh Th Th
object ownerthread
Th
Th Th Th
object
newownerthread
Give-upOwner
ship
block
Th Th Th Th
object
unblock
re-contestfor ownership
waiting threads
Adaptation to Dynamic Resources• programming support for
joining/leaving nodes• Decentralized object lookup
– Allow joining nodes to access other objects and join the computation
• Node Failure ⇒ RMI Failure– Failure returned as exception
to method invocation– The user can catch the
exception, and perform rollback procedures if necessary
Exception!
Objects in computation
New object on joining
node
lookup
Object on failed node
Automatic Overlay Construction (1)- Automatic/Transparent
communication- Configuration ONLY for
firewalled clusters- Adapts to dynamic
joins/leaves
• Nodes create a TCP overlay network cooperatively– Each node picks a small
number of nodes to connect– Created connected graph
NAT
Firewall
Global IP
Attempt connection
established connections
Automatic Overlay Construction (2)• NAT Clusters
– NAT nodes can connect to global nodes
• Firewalled Clusters– Automatic SSH port-
forwarding• User specifies points
• Transparent Communication– Point-Point communication is
routed over the network– Ad-hoc routing Protocol
• AODV [Perkins ‘97]– Adapts to node joins/leaves
SSH
Firewalltraversal
P-to-Pcommunication
Failure Detection on Overlay• How do we detect failures
on the overlay?• RMI Failure
– Intermediate/end node failure ⇒ link failure
• Path Pointers– Forwarding nodes remember
the nexthop– RMI reply is returned the
same way• For link failure along
pointer, back-propagate the failure to the invoker
Path pointer
RMI handler
failure
Backpropagate
Agenda
• Introduction• Related Work• Proposal• Preliminary Experiments• Conclusion and Future Work
Experiment Cluster Settings
900 cores over 9 clusters
hongo
chiba
okubo
suzuk
imadekototoi
kyoto
istbs
tsubame
Global IPs
Firewall
Private IPs
All packets dropped
Overlay Construction Simulation– Evaluate the overlay construction scheme– For different cluster configurations, modified number of attempted connections
per peer– 1000 trials per each cluster/attempted connection configuration
Even for pathological case, 20 connections per
peer is enough
Dynamic Master-Worker
• A master object distributes work to worker objects– 10,000 tasks all together– Task as RMIs
• Worker nodes join/leave at runtime– New task for new node– Reassignment for tasks on failed nodes– No tasks were lost during computation
Dynamic Master-Worker
As the number of workers change, the number of assigned tasks change accordingly.The Master adaptively distributes, rolls back, and redistribute tasks.
A Real-Life Application
• Solving a combination optimization problem– Permutation Flow Shop Problem– Parallel Branch-and-Bound• Master-Worker style• Periodic updates
– Work distribution• Divide search space evenly as subtasks• Load-balancing
– Unfinished tasks are sub-divided and redistributed– Wasteful computation is quite possible
Master-Worker Coordination
• Master does RMI to Worker– Worker: periodic bound exchange with master– Not a straightforward Master-Worker application– Requires flexible framework like ours
Master
Worker
doJob() exchange_bound()
Runtime Speedup
Lacks scalability with over 900 cores
Cumulative Computation TimeGrowth in Cum. Comp. time is attributed
to increased re-execution of task
If the Cum. Comp. time is taken into account, the speed up from 169 cores to 948 cores (5.64 times) is 4.94
Troubleshoot Search Engine• Ever stuck debugging, or troubleshooting?• Re-rank google queries and give weight to pages
for web-forums and solutions– Natural language processing and machine learning
• Parallel computation on Grid backend– Real time response
backend
Search Engine
Query:“vmware kernel panic”
Compute!!
Compute!!
Compute!!
Agenda
• Introduction• Related Work• Proposal• Preliminary Experiments• Conclusion and Future Work
Conclusion
• A distributed object oriented programming framework for Grid environments– A novel distributed object oriented programming model– Grid-enabled via automatic overlay construction
• Showed that real-life Grid application needs can be addressed by our framework– Deployed actual parallel applications on over 900 cores
over 9 clusters with NAT/Firewalls, joins, and failures– Implemented a Grid computing backend for
troubleshooting search engine
Future Work
• Reliable WAN communication for the Grid overlays– Node failure– Connection failure
• Weakness of WAN connections– Router Policies• close connections after given period
– Obscure kernel bugs with NAT• Connection resets
Faulty link
WAN links are more vulnerable, and failures will occur
Some Related Work• Robust Tree Topologies
for Sensor Networks [ England ‘06]– Create spanning tree for
data reduction– Flat tree for high reliability
• Fewest Hops– Tree with short distance for
low power consumption• Shortest Path
⇒ Spanning Tree that merges the two metrics for the best of two worlds
Fewest Hop:High ReliabilityHigh Power Usage
Shortest Path:Low ReliabilityLow Power Usage
Possible Future Direction
• Our context: Grid computing– communication latency = metric for link reliability
• Fewest Hops– Reliability for node
failure
• Shortest Distance– Reliability for link failure
Short reliable links
Long faulty links
Can we construct an overlay connection topology that take the best of two worlds?
Publications• 1. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming
Framework for Complex Grid Environments. In 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008 (Poster Paper. To Appear).
• 2. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In IPSJ Transactions on Advanced Computing Systems. (Conditional Accept)
• 3. Ken Hironaka, Hideo Saito, Kei Takahashi, Kenjiro Taura. A Flexible Programming Framework for Complex Grid Environments. In Proceedings of 2008 Symposium on Advanced Computing Systems and Infrastructures. (To Appear)
• 4. Ken Hironaka, Shogo Sawai, Kenjiro Taura. A Distributed Object-Oriented Library for Computation Across Volatile Resources. In Summer United Workshops on Parallel, Distributed and Cooperative Processing. August 2007
• 5. Ken Hironaka, Kenjiro Taura, Takashi Chikayama. A Low-Stretch Object Migration Scheme for Wide-Area Environments. In IPSJ Transactions on Programming. Vol 48 No.SIG 12 (PRO 34), pp.28-40, August 2007
Problems with Grid Computing (2)• Complexity of Programming on the Grid– Low Level Computing (sockets)
• Communication• Multi-threaded Computing (Synchronization)• Heavy Burden on Non-experts
– Flexibility and Integration• Grid Frameworks for task distribution• Independent parallel programming languages• Computing is not execution of many independent tasks
– Need finer grained communication• Bad interface with user application
– Java, Ruby, Python, PHP
Related Work
• Discussed with respect to criteria necessary for modern Grid computing– Workflow Coordination• Flexibility without putting the burden on the user
– Joining Nodes / Failure of resources• Handling these events should not dominate the
programming overhead– Connectivity in Wide-Area Networks• Adaptation to networks with NAT/firewall with little
manual settings
Workflow Coordination (1)
• Condor / DAGMan [Thain ‘05]– “Tasks” are expressed as script
files and distributed on idle nodes– Dependency between tasks can be
expressed in DAG (Directed Acyclic Graph)
• Ibis / Satin [Wrzesinska ‘06]– framework for divide-and-conquer
problems– Tasks can be broken into smaller
sub-tasks, on which it depends
DAG Dependency
Relationship
Central Manager
Busy Nodes
Assign
Cluster
Task
- Many computation cannot be expressed as “Tasks” with dependencies- A task’s communication is limited to others to which it has dependencies
Object Synchronization Example
class A: def __init__(self, x): self.x = x
def f(self, b): self.x += 1
#blocking RMI b.g() self.x -= 1
return
Atomic section
Atomic section
a b
b.g()
Value x stays
consistent
In method f(), instance a invokes blocking method g() on object b
only 1 thread at a
time
give-upownershipduring RMI
block
Adaptation to Dynamic Resources• Signal delivery to objects
– Unblocks any thread that is blocking in the object’s context• Can be used to notify
asynchronous events– A joining node
• Node Failure ⇒ RMI Failure– Failure returned as exception
to method invocation– The user can catch the
exception, and perform rollback procedures if necessary
exception
Th
object
blockunblock
signal
Preliminary Experiments
• Overlay Construction Simulation• A Simple Master-Worker Application with
dynamically joining/leaving nodes• A Real-life Parallel Application on the Grid• A Troubleshoot-Search Engine