Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many...
-
Upload
duane-peters -
Category
Documents
-
view
218 -
download
0
Transcript of Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many...
![Page 1: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/1.jpg)
Parallel Computing
• The Bad News– Hardware is not getting faster fast enough– Too many architectures– Existing architectures are too specific– Programs closely tied to architecture– Software is being developed using 50’s
mentality
![Page 2: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/2.jpg)
Computing Trends
• Centralized systems are a thing of the past– Evolving towards cycle servers
• Each user has their own computer
• Workstations are networked– Typical LAN speeds are 100mbs
• For some a single workstation does not provide adequate computing power
![Page 3: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/3.jpg)
A Solution
• A virtual computing environment– Utilize existing software to build a programming model
that can be used to develop distributed and parallel applications
– Provide tools to create, debug, and execute applications on heterogeneous hardware
– Let the software map high level descriptions of the problems to available hardware
– Programmer will no longer need to be concerned with low-level issues
![Page 4: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/4.jpg)
Other Names
• For many scientists, it is not uncommon to find problems that require weeks or months of computation to solve.– Such an environment is called a High Throughput
Computing (HTC) environment– Scientists involved in this type of research need a
computing environment that delivers large amounts of computational power over a long period of time
• In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of power over a short period of time.
![Page 5: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/5.jpg)
Workstation Users
• All VCE configuration include some workstations• Workstations are chronically underutilized• Workstation users can be classified as follows:
– Casual Users
– Sporadic Users
– Frustrated Users
• The VCE must help frustrated users without hurting casual and sporadic users
![Page 6: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/6.jpg)
Other Considerations
• The VCE must be cost effective– Use existing tools like NFS, ISIS, PVM, MPI whenever
possible
– Must not require tremendous amounts of processor power
• The VCE must coexist with other software– Non-VCE applications should not be impacted by the
VCE
• The VCE must avoid kernel modes
![Page 7: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/7.jpg)
Users View of the VCE
• The software development module (SDM) provides tools to build and annotate an application task graph
• The Execution module (EXM) compiles the application and dispatches the tasks
![Page 8: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/8.jpg)
The VCEProblem Specification
Design Stage
Coding Level
Compilation Manager
Runtime Manager
SDM
EXM
![Page 9: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/9.jpg)
Runtime Issues
• Compilation Issues– Executables must be prepared to maximize
scheduling flexibility– Compilations must be scheduled to maximize
application performance and hardware utilization
– Java?
![Page 10: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/10.jpg)
Runtime Issues
• Task Placement– The criteria for selecting machines to host tasks
must consider both hardware utilization and application throughput
– Hints supplied by the programmer might improve task placement decisions
![Page 11: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/11.jpg)
Processor Utilization
• Free Parallelism– Parallel applications with low efficiency benefit
when run on idle machines
• Anticipatory Processing– Use idle resources to perform work which may
be useful if certain schedules are ultimately executed
![Page 12: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/12.jpg)
Load Balancing
• Central issue in the execution module
• Good application throughput must be achieved without impacting interactive users
• Many systems provide the ability to migrate tasks
![Page 13: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/13.jpg)
Task Migration
• Various migration strategies are possible– Redundant execution– Check-pointing– Dump and migrate– Recompilation– Byte coded tasks
![Page 14: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/14.jpg)
Systems
• Many systems are available which provide some form of a VCE– PVM– MPI– Beowulf– Condor– …
![Page 15: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/15.jpg)
The Berkeley Now Project
![Page 16: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/16.jpg)
Condor
• Condor is a software system that runs on a cluster of workstations to harness wasted CPU cycles.– A Condor pool consists of any number of machines, of
possibly different architectures and operating systems, that are connected by a network
• To monitor the status of the individual computers in the cluster, Condor "daemons" must run all the time. – One daemon is called the "master". Its only job is to
make sure that the rest of the Condor daemons are running.
![Page 17: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/17.jpg)
Idle Machines Only
• Two other daemons run on every machine in the pool: startd and schedd
• Startd monitors information about the machine that is used to decide if it is available to run a Condor job– keyboard and mouse activity
– load on the CPU
– startd also notices when a user returns to a machine that is currently running and removes the job.
![Page 18: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/18.jpg)
Condor Architecture
![Page 19: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/19.jpg)
Condor Executables
• Code does not have to be modified in any way to be used in Condor– it must be linked with the Condor libraries
• Once re-linked, jobs gain two crucial abilities:– Checkpoint– Perform remote system calls
• Condor also provides a mechanism to run binaries that have not been re-linked, which are called "vanilla" jobs
![Page 20: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/20.jpg)
Condor Executables
![Page 21: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/21.jpg)
Condor Tricks
• Match Making– When a task is submitted to Condor, the system finds a
machine that matches the resources required by the task
• Condor uses check-pointing to migrate jobs– You only loose the computation that has been
performed since the last checkpoint
• Condor tasks move around to find the under utilized workstations
![Page 22: Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.](https://reader035.fdocuments.in/reader035/viewer/2022062408/56649e115503460f94afd9a9/html5/thumbnails/22.jpg)
Beowulf
• The Beowulf parallel workstation is a single user multiple computer with direct access keyboard and monitors. Beowulf comprises: – 16 motherboards with Intel x86 processors – 256 Mbytes of DRAM, 16 MByte per processor board – 16 hard disk drives and controllers – 2 Ethernets and controllers per processor – 2 high res monitors with controllers and 1 keyboard
• The Beowulf architecture is a fully COTS (Commodity Off The Shelf) configured system.