Timeshared Parallel Machines Need resource management Need resource management Shrink and expand...
-
Upload
claude-chapman -
Category
Documents
-
view
213 -
download
1
Transcript of Timeshared Parallel Machines Need resource management Need resource management Shrink and expand...
Timeshared Parallel Timeshared Parallel MachinesMachines
Need resource managementNeed resource management Shrink and expand individual jobs to Shrink and expand individual jobs to
available sets of processorsavailable sets of processors Example: Machine with 100 Example: Machine with 100
processorsprocessors Job1 arrives, can use 20-150 processorsJob1 arrives, can use 20-150 processors Assign 100 processors to itAssign 100 processors to it Job2 arrives, can use 30-70 processors, Job2 arrives, can use 30-70 processors,
– and will pay more if we meet its deadlineand will pay more if we meet its deadline Make resource allocation decisionsMake resource allocation decisions
Multiple Parallel MachinesMultiple Parallel Machines Faucet submits a request:Faucet submits a request:
CPU seconds, min-max cpus, deadline, CPU seconds, min-max cpus, deadline, interacive?interacive?
Parallel machines submit bids:Parallel machines submit bids: A job for 100 cpu hours may get a lower price A job for 100 cpu hours may get a lower price
bid if:bid if: It has less tight deadline, It has less tight deadline, more flexible PE rangemore flexible PE range
A job that requires 15 cpu minutes and a A job that requires 15 cpu minutes and a deadline of 1 minutedeadline of 1 minute
Will generate a variety of bidsWill generate a variety of bids A machine with idle time on its hand: low bidA machine with idle time on its hand: low bid
How to make all of this How to make all of this work?work?
The key: fine-grained resource The key: fine-grained resource management modelmanagement model Work units are objects and threadsWork units are objects and threads
rather than processesrather than processes Data units are object data, thread Data units are object data, thread
stacks, ..stacks, .. Rather than pagesRather than pages
Work/Data units can be migrated Work/Data units can be migrated automatically automatically
during a runduring a run
Anonymous Compute Anonymous Compute PowerPower
What is needed to make this metaphor work?Timeshared parallel machines in the background
effective resource managementQuality of computational service contracts/guaranteesFront ends that will allow agents to submit jobs on user’s behalf:
Computational Faucets
Computational FaucetsComputational Faucets
What does a Computational faucet do?What does a Computational faucet do? Submit requests to “the grid”Submit requests to “the grid” Evaluate bids and decide whom to assign Evaluate bids and decide whom to assign
workwork Monitor applications (for performance and Monitor applications (for performance and
correctness)correctness) Provide interface to users: Provide interface to users:
Interacting with jobs, and monitoring behaviorInteracting with jobs, and monitoring behavior
What does it look like?What does it look like? A browser!
Faucets QoSFaucets QoS•User specifies desired job parameters such as: program executable name, executable platform, min PE, max PE, estimated CPU-seconds (for various PE), priority, etc.
•User does not specify machine. Faucet software contacts a central server and obtains a list of available workstation clusters, then negotiates with clusters and chooses one to submit the job.
•User can view status of clusters.
•Planned: file transfer, user authentication, merge with Appspector for job monitoring.
Central Server
Faucet Client
Web Browser
Workstation Cluster
Workstation Cluster
Workstation Cluster
Time-shared Parallel Time-shared Parallel MachinesMachines
•To bid effectively (profitably) in such an environment, a parallel machine must be able to run well-paying (important) jobs, even when it is already running others.
•Allows a suitably written Charm++/Converse program running on a workstation cluster to dynamically change the number of CPU's it is running on, in response to a network (CCS) request.
•Works in coordination with a Cluster Manager to give a job as many CPU's as are available when there are no other jobs, while providing the flexibility to accept new jobs and scale down.
AppspectorAppspector
•Appspector provides a web interface to submitting and monitoring parallel jobs.
•Submission: user specifies machine, login, password, program name (which must already be available on the target machine).
•Jobs can be monitored from any computer with a web browser. Advanced program information can be shown on the monitoring screen using CCS.
BioCoREBioCoRE
•Project Based•Workbench for Modeling•Conferences/Chat Rooms•Lab Notebook•Joint Document Preparation
Goal: Simulate the process of doing research. Provide a web-based way to virtually bring scientists together.
http://www.ks.uiuc.edu/Research/biocore/