The Great White SHARCNET. SHARCNET: Building an Environment to Foster Computational Science Shared...

43
The Great White SHARCNET

Transcript of The Great White SHARCNET. SHARCNET: Building an Environment to Foster Computational Science Shared...

The Great White SHARCNET

SHARCNET:

Building an Environment to Foster Computational Science

Shared Hierarchical Academic Research Computing Network

How do you study the first milliseconds of the universe?

And then change the rules?

How do you study materials that can’t be made in a lab (yet)?

On the surface of the sun? At the centre of the earth?

How do you study the effects of an amputated limb on blood flow through the heart?

How do you repeat the experiment? Where do you get volunteers?

Increasingly, the answer to “how” in these questions is by using a computer

In addition to experimental and theoretical, we now have computational science!

So what does this have to do with Western and SHARCNET????

And GreatWhite?

SHARCNET was created to meet the needs of computational researchers in South Western Ontario.

Western is the lead institution and the administrative home of SHARCNET.

Vision

To establish a world-leading, multi-university and college, interdisciplinary institute with an active academic-industry partnership, enabling forefront computational research in critical areas of science, engineering and business.

Focus on Infrastructure and Support

Support development of computational approaches for research in science, engineering, business, social sciences

Computational facilities People

Computational Focus

Provide world-class computational facilities and support

Explore new computational models• Build on “Beowulf” model

“Beowulf”

6th century Scandinavian hero Yes, but not in this context:

Collection of separate computers connected by standard communications

What’s so interesting about this?

First Beowulf

First explored in 1994, two researchers at NASA

Built out of “common” computers• 16 Intel ‘486 processors• 10Mb ethernet

To meet specific computational needs

Beowulf “Philosophy”

Build out of “off the shelf computational components”

Take advantage of increased capabilities, reliability and cost effectiveness of mass market computers

Take advantage of parallel computation “Price/Performance”: “cheap” supercomputing!

Growth of Beowulf

Growth in number and size of “Beowulf clusters” Continued development of mass produced

computational elements Continued development of communication

technologies Development of new parallel programming

techniques

SHARCNET

Sought to exploit “Beowulf” approach High performance clusters: “Beowulf on steroids”

• Powerful “off the shelf” computational elements• Advanced communications

Geographical separation (local use) Connect clusters: emerging optical

communications

Great White! Processors:

• 4 alpha processors: 833Mhz (4p-SMP)• 4 Gb of memory• 38 SMPs: a total of 152 processors

Communications• 1 Gb/sec ethernet• 1.6 Gb/sec quadrics connection

November 2001: #183 in the world• Fastest academic computer in Canada• 6th fastest academic computer in North America

Great White

SHARCNET

Extend “Beowulf” approach to clusters of high performance clusters

Connect clusters: “clusters of clusters”• Build on emerging optical communications• Initial configuration used optical equipment

from telecommunications industry Collectively a supercomputer!

Clusters Across Universities

GUELPH

MACUWO

108

128

48

152

Optical communication

Experimental Computational Environment

Deeppurple

48 processors

Optical communication (8Gb/sec)

Greatwhite

152 processors

Technical Issues Resource Management

• How to intelligently allocate and monitor resources• Availability

– Failure rates multiplied by number of systems– Job migration, check pointing

• Performance User Management

Technical Issues Data Management

• Must get the data to the processors

• Some data sets are too large to move

• Some HPC centers now focusing on “Data Grids” vs “Computation Grids”

Most Sharcnet programs use MPI which can run either over TCP or the Quadrics transport layer.

SHARCNET: More than this!

Not just computational facilities Focus on computational resources to provide

innovative science• Support• Build research community

CFI-OIT: Infrastructure ORDCF: People and programs

Objectives

Provide state of the art computational facilities Develop a network of HPC Clusters Facilitate & enable world class computational research Increase pool of people skilled in HPC techniques & processes Evaluate & create computational Grid as a means of providing

supercomputing capabilities Achieve log term self sustainability Create major business opportunities in Ontario

Operating Principles

Partnership among institutions Shared resources Equality of opportunity Services at no cost to researchers

Partners

Academic: Industry:

University of Guelph Hewlett PackardMcMaster University Quadrics Supercomputing WorldThe University of Western Ontario Platform ComputingUniversity of Windsor Nortel NetworksWilfrid Laurier University Bell CanadaSheridan CollegeFanshawe College

Support Programs: Part 1

Chairs Program: up to 14 new faculty Fellowships: approx. $1 Million per year

• Undergraduate summer jobs• Graduate scholarships• Post doctoral fellowships

Support Programs: Part 2

Technical Staff• System administrators at sites• HPC consultants at sites

Workshops Conferences

Results??

Researchers from a variety of disciplines• Chemistry• Physics• Economics• Biology

Beginning to “ramp up”

Chemistry

Model chemical systems and processes at the atomic and electronic levels• New quantum chemical techniques• New computational methods

For example: Molecular dynamics simulation of hydrogen formation in a single-site Olefin polymerization catalyst

Economics Research on factors that influence people to

retire Model incorporates both health and financial

factors• Previous models looked at one or other• Much more complex: difficult to estimate

parameters

Materials Understand friction and

lubrication at molecular and atomic levels

Friction between polymer bearing surfaces

Two polymer bearing surfaces in sliding motion and good solvent conditions

Green: upper wall Red: lower wall

Astrophysics

Galaxy merger Approximately 100,000

particles (stars and dark matter)

Astrophysics

Forming giant planets from protoplanetary disks

Shows evolution of disk over about 200 years

Materials: Granular matter:

Understand flow of granular matter

Study the effectiveness of mixing

Study the effect of different components on mixing

The Future?

New members: Southwestern Ontario Support for new science

• Greater computation• Storage facilities

New areas• Bioinformatics

The Future? New Partners (Waterloo, Brock, York, UOIT) Additional Capacity

– Storage• ½ petabyte across 4 sites (multistage performance)

– Network• 10 Gb/s core (UWO, Waterloo, Guelph, Mac)• 1 Gb/s to other SHARCNET sites• 10 Gb/s to Michnet, HPCVL (?)

– Upgrades• Large capability machines at Mac, UWO, Guelph• Large capacity machines at Waterloo• Increased development sites

The Future? Additional Capabilities

–Visualization Total investment

–~$49M (actually there is an additional 7 million cash from HP).

2004-2005–With the new capabilities, Sharcnet could be in

the top 100 or 150 of supercomputers.–Will be the fastest supercomputer of its kind –

I.e.,a distributed system where nodes are clusters.

Lake OntarioLakeHuron

I

I

X

Iα α

I

S

S

Windsor

Western

Waterloo

Guelph

UOIT

YorkFields

Brock

McMaster

Laurier

Sheridan

Fanshawe

Robarts

Perimeter

Lake Erie

T

αG

I

I

10 Gb/sα Redeployed Alphas

1 Gb/s

I Itanium clusterX Xeon clusterS SMPG Grid LabT Interconnect Topology Cluster

Tape

EVA Disc

MSA Disc

100km

50 mi

(0.1ms)

The Future: SHARCNET!