Timeshared Parallel Machines Need resource management Need resource management Shrink and expand...

9
Timeshared Parallel Timeshared Parallel Machines Machines Need resource management Need resource management Shrink and expand individual jobs to Shrink and expand individual jobs to available sets of processors available sets of processors Example: Machine with 100 processors Example: Machine with 100 processors Job1 arrives, can use 20-150 processors Job1 arrives, can use 20-150 processors Assign 100 processors to it Assign 100 processors to it Job2 arrives, can use 30-70 processors, Job2 arrives, can use 30-70 processors, and will pay more if we meet its deadline and will pay more if we meet its deadline Make resource allocation decisions Make resource allocation decisions

Transcript of Timeshared Parallel Machines Need resource management Need resource management Shrink and expand...

Page 1: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Timeshared Parallel Timeshared Parallel MachinesMachines

Need resource managementNeed resource management Shrink and expand individual jobs to Shrink and expand individual jobs to

available sets of processorsavailable sets of processors Example: Machine with 100 Example: Machine with 100

processorsprocessors Job1 arrives, can use 20-150 processorsJob1 arrives, can use 20-150 processors Assign 100 processors to itAssign 100 processors to it Job2 arrives, can use 30-70 processors, Job2 arrives, can use 30-70 processors,

– and will pay more if we meet its deadlineand will pay more if we meet its deadline Make resource allocation decisionsMake resource allocation decisions

Page 2: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Multiple Parallel MachinesMultiple Parallel Machines Faucet submits a request:Faucet submits a request:

CPU seconds, min-max cpus, deadline, CPU seconds, min-max cpus, deadline, interacive?interacive?

Parallel machines submit bids:Parallel machines submit bids: A job for 100 cpu hours may get a lower price A job for 100 cpu hours may get a lower price

bid if:bid if: It has less tight deadline, It has less tight deadline, more flexible PE rangemore flexible PE range

A job that requires 15 cpu minutes and a A job that requires 15 cpu minutes and a deadline of 1 minutedeadline of 1 minute

Will generate a variety of bidsWill generate a variety of bids A machine with idle time on its hand: low bidA machine with idle time on its hand: low bid

Page 3: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

How to make all of this How to make all of this work?work?

The key: fine-grained resource The key: fine-grained resource management modelmanagement model Work units are objects and threadsWork units are objects and threads

rather than processesrather than processes Data units are object data, thread Data units are object data, thread

stacks, ..stacks, .. Rather than pagesRather than pages

Work/Data units can be migrated Work/Data units can be migrated automatically automatically

during a runduring a run

Page 4: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Anonymous Compute Anonymous Compute PowerPower

What is needed to make this metaphor work?Timeshared parallel machines in the background

effective resource managementQuality of computational service contracts/guaranteesFront ends that will allow agents to submit jobs on user’s behalf:

Computational Faucets

Page 5: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Computational FaucetsComputational Faucets

What does a Computational faucet do?What does a Computational faucet do? Submit requests to “the grid”Submit requests to “the grid” Evaluate bids and decide whom to assign Evaluate bids and decide whom to assign

workwork Monitor applications (for performance and Monitor applications (for performance and

correctness)correctness) Provide interface to users: Provide interface to users:

Interacting with jobs, and monitoring behaviorInteracting with jobs, and monitoring behavior

What does it look like?What does it look like? A browser!

Page 6: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Faucets QoSFaucets QoS•User specifies desired job parameters such as: program executable name, executable platform, min PE, max PE, estimated CPU-seconds (for various PE), priority, etc.

•User does not specify machine. Faucet software contacts a central server and obtains a list of available workstation clusters, then negotiates with clusters and chooses one to submit the job.

•User can view status of clusters.

•Planned: file transfer, user authentication, merge with Appspector for job monitoring.

Central Server

Faucet Client

Web Browser

Workstation Cluster

Workstation Cluster

Workstation Cluster

Page 7: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

Time-shared Parallel Time-shared Parallel MachinesMachines

•To bid effectively (profitably) in such an environment, a parallel machine must be able to run well-paying (important) jobs, even when it is already running others.

•Allows a suitably written Charm++/Converse program running on a workstation cluster to dynamically change the number of CPU's it is running on, in response to a network (CCS) request.

•Works in coordination with a Cluster Manager to give a job as many CPU's as are available when there are no other jobs, while providing the flexibility to accept new jobs and scale down.

Page 8: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

AppspectorAppspector

•Appspector provides a web interface to submitting and monitoring parallel jobs.

•Submission: user specifies machine, login, password, program name (which must already be available on the target machine).

•Jobs can be monitored from any computer with a web browser. Advanced program information can be shown on the monitoring screen using CCS.

Page 9: Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

BioCoREBioCoRE

•Project Based•Workbench for Modeling•Conferences/Chat Rooms•Lab Notebook•Joint Document Preparation

Goal: Simulate the process of doing research. Provide a web-based way to virtually bring scientists together.

http://www.ks.uiuc.edu/Research/biocore/