The SC Grid Computing Initiative Kirk W. Cameron, Ph.D. Assistant Professor Department of Computer...

27
The SC Grid Computing Initiative Kirk W. Cameron, Ph.D. Assistant Professor Department of Computer Science and Engineering University of South Carolina
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of The SC Grid Computing Initiative Kirk W. Cameron, Ph.D. Assistant Professor Department of Computer...

The SC Grid Computing Initiative

Kirk W. Cameron, Ph.D.Assistant Professor

Department of Computer Science and Engineering

University of South Carolina

The Power Grid

Reliable

Universally accessibleStandardized

Low-cost

Billions of different devices

Resource

Transmission

Consumption on Demand

Distribution

The Computing Grid

Reliable

Universally accessibleStandardized

Low-cost

Billions of different devices

Resource Consumption on Demand

DistributionTransmission

What is a Grid?

• H/W + S/W infrastructure to provide access to computing resources †

– Dependable (guaranteed performance)– Consistent (standard interfaces)– Pervasive (available everywhere)– Inexpensive (on-demand to decrease overhead)

†The Grid: Blueprint for a Future Computing Infrastructure, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann Publishers, 1998.

Examples

• Problem: Need to project companies computing needs– Solution: Computing on Demand– Example: Any company with medium-large # employees

• Problem: Computational needs exceed local abilities (cost)– Solution: Supercomputing on Demand– Example: Aerodynamic simulation of vehicles

• Problem: Data sets too large to be held locally– Solution: Collaborative computing with regional centers– Example: DNA sequencing analysis --> derive single sequence

locally, compare to large database that is non-local

• Private Sector Interest– Grid engines (SW), Hardware support, Outsourcing– IBM (PC Outsourcing), Sun (Grid One Engine)

Large-scale Example of a Grid

†The TeraGrid project is funded by the National Science Foundation and includes five partners: NCSA, SDSC, Argonne, CACR and PSC

NSF TeraGrid

Small-scale Example

CPU CPU CPU CPUCPU CPU CPU CPU

CPU CPU CPU CPUCPU CPU CPU CPU

CPU CPU CPU CPUCPU CPU CPU CPU

CPU

CPU

CPU

CPU

CPU

Local Network

Network of Workstations

Cluster of SMPs

128-node DoDFarm

Sender user space

CPU

Memory hierarchy

Network Buffer

Application buffer

Sender user space

CPU

Memory hierarchy

Network Buffer

Application bufferSender user space

CPU

Memory hierarchy

Network Buffer

Application bufferSender user space

CPU

Memory hierarchy

Network Buffer

Application bufferSender user space

CPU

Memory hierarchy

Network Buffer

Application buffer

Local Network

CPU CPU CPU CPUCPU CPU CPU CPU

32-node Beowulf

CPU

Loca

l Net

wor

k

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

InterconnectCloud

InterconnectCloud Lo

cal N

etwor

k

SC Grid Initiative

Electric Power Supply

• Circa 1910– Electric Power can be generated– Devices are being designed to use electricity– Users lack ability to build and operate their own generators

• Electric Power Grid– reliable, universally-accessible, standardized, low-cost,

transmission and distribution technologies– Result: new devices and new industries to manufacture

them

• Circa 2002– Billions of devices running on reliable, low-cost power

My background

• I’m a “systems” person– Intel Corporation

• SMP memory simulation

– Los Alamos National Laboratory• Performance Analysis Team (Scientific Computing)• DOE ASCI Project (TERA- and PETA-scale scientific apps)

– 2nd year at USC• Courses: Comp Arch, Parallel Comp Arch, Perform Analysis• Research: Parallel and Distributed Computing, Computer

Architecture, Performance Analysis and Prediction, Scientific Computing

• Interests: Identifying and improving performance of scientific applications through changes to algorithm and systems design (hardware, compilers, middleware (OS+))

Outline

• Gentle intro to Grid• SC Grid Computing Initiative• Preliminary Results• SCAPE Laboratory

Computational Power Supply

• Analogous to Power “Grid”– Heterogeneity (generators/outlets vs. machines/networks)– Consumer requirements

• Power consumption vs. computational requirements• Service guarantees vs. QOS• Money to be made vs. money to be made

– Economies of scale (power on demand?)– Political influence at large scale

• Local control necessary, with interfaces to outside (standards)

Why now?

• Technological improvements (cpu, networks, memory capacity)• Need for demand-driven access to computational power (e.g.

MRI)• Utilization of idle capacity (cycle stealing)• Sharing of collaborative results (virtual laboratories over

WANs)• Utilize new techniques and tools

– Network enabled solvers (Dongarra’s NetSolve at UT-Knoxsville)– Teleimmersion (collaborative use of Virtual Reality: Argonne,

Berkeley, MIT)

Applications of Grid Computing

• Distributed Supercomputing– Maximize available resources for large problems– Large-scale scientific computing– Challenges

• Scalability of service• Latency tolerance• Heterogeneous system high performance

• On-demand Computing– Access to non-local resources – computation, s/w, data,

sensors– Driven by cost/performance over absolute performance– Example: MUSC MRI data analysis

Applications of Grid Computing

• High-Throughput Computing– Scheduling large number of independent tasks– Condor (Wisconsin)– http://setiathome.ssl.berkeley.edu/

• Data Intensive Computing– Data analysis applications– Grid Physics Network (http://www.griphyn.org/)

• Collaborative Computing– Virtual shared-space laboratories– Example: Boilermaker (Argonne)

• Collaborative, interactive design of injective pollution control systems for boilers (http://www.mcs.anl.gov/metaneos)

Other Grid-related Work

• General Scientific Community (http://www.gridforum.com)– NSF Middleware Initiative– Globus Project– Condor Project– Cactus Project– See grid forum for long list…

Outline

• Gentle intro to Grid• SC Grid Computing Initiative• Preliminary Results• SCAPE Laboratory

SC Grid Initiative

• Immediate increase in local computational abilities• Ability to observe application performance and look “under the

hood”• Interface infrastructure to link with other computational Grids• Ability to provide computational power to others on-demand• “Tinker-time” to establish local expertise in Grid computing• Incentive to collaborate outside university and obtain external

funding

SC Grid Milestones

• Increase computational abilities of SC– Establish prelim working Grid on SC campus– Benchmark prelim system configurations– Use established testbed for

• Scientific computing• Multi-mode computing• Middleware Development

– Establish dedicated Grid resources• $45K Equipment Grant from USC VP Research

• Extend SC Grid boundaries– Beyond department resources– Beyond campus resources (MUSC)– Beyond state resources (IIT)

• Incorporate other technologies– Multi-mode applications (e.g. Odo)

Grid-enabled as of 1 Nov 02!

In progress

In progress

done

done

done

Outline

• Gentle intro to Grid• SC Grid Computing Initiative• Preliminary Results• Synergistic Activities

USC CSCE Department Mini Grid

1 2 3 4 5 6

7 8 9 101112

AB

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

Eth

ern

et

A

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

C

1 2 3 4 5 6

7 8 9 101112

AB

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

Eth

ern

et

A

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

C

1 2 3 4 5 6

7 8 9 101112

AB

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

Eth

ern

et

A

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

C

1 2 3 4 5 6

7 8 9 101112

AB

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

Eth

ern

et

A

12x

6x

8x

2x

9x

3x

10x

4x

11x

5x

7x

1x

C

SUN Ultra 10#17#2

SUN Ultra 10#1

SUN Blade 100#21#2

SUN Blade 100#1

SUN Blade 150#1

SUN Blade 150#2

Daniel N31 N32

N01 N02 N15 N16

N17 N18

Beowulf

NOWs

Node Configurations

Resources # HW OS Networking

Beowulf(for research)

1 1 master node + 32 slave nodes PIII 933M, 1GB Mem

RedHat Linux 7.1

10/100M NIC

SUN Ultra10(for teaching)

17 UltraSPARC-Iii 440M,256MB memory

Solaris 2.9 10/100M NIC

SUN Blade100(for teaching)

21

UltraSPARC-IIe 502M,256MB memory

Solaris 2.9 10/100M NIC

SUN Blade150(for teaching)

2 UltraSPARC-IIe 650M,256MB memory

Solaris 2.9 10/100M NIC

Benchmark and Testing Environment

• NPB (NAS Parallel Benchmarks 2.0)– Specifies a set of programs as benchmarks– Each benchmark has 3 problem sizes

• Class A: for moderately powerful workstation• Class B: high-end workstations or small parallel systems• Class C: high-end supercomputing

• We tested the performance of– EP kernel is "embarrassingly" parallel in that no

communication is required for the generation of the random numbers itself.

– SP kernel solves three sets of uncoupled systems of equations, first in the x, then in the y, and finally in the z direction. These systems are scalar penta-diagonal.

• Running Setting– When #node <= 16, EP is run on NOWs – When #node > 16 , EP is run on NOWs and Beowulf

Execution time for EP (Class C)

0500

100015002000250030003500

1 2 4 8 16 32 64

# of processors

Ex

ec

uti

on

Tim

e (

s)

Performance doubles with number of nodes.

Except here

MFLOPS for EP (Class C)

0

20

40

60

80

1 2 4 8 16 32 64

# of processors

MF

LO

PS

/s

MFLOPS overall shows same trend. (Performance doubles with number of

nodes.)

Except here

Node Performance for EP (Class C)

0

0.5

1

1.5

2

2.5

3

1 2 4 8 16 32 64

# of processors

MF

LO

PS

/s/N

od

e

MFLOPS/node illustrates reason for less than optimal application scalability.

At this point we incorporate older Sun machines (SunBladeSunUltra)

0

1000

2000

3000

4000

5000

6000

7000

8000

1 4 9 16 25 36

Number of nodes

Ex

ec

uti

on

tim

e (

s)

Execution Time for SP (Class B)

More realistic problems will have performance bottlenecks:

need analysis to run efficiently

Kirk W. Cameron, Ph.D.Assistant Professor

Department of Computer Science and EngineeringUniversity of South Carolina