Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

25
SC05 November, 2005 [email protected] Supercomputing • Communications • NCAR Scientific Computing Div Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets John Clyne Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA

description

Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets. John Clyne Scientific Computing Division National Center for Atmospheric Research Boulder, CO USA. - PowerPoint PPT Presentation

Transcript of Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

Page 1: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

John Clyne

Scientific Computing Division

National Center for Atmospheric Research

Boulder, CO USA

Page 2: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

[Numerical] models that can currently be run on typical supercomputing platforms produce data in amounts that make storage expensive, movement cumbersome, visualization difficult, and detailed analysis impossible.  The result is a significantly reduced scientific return from the nation's largest computational efforts.

We can now compute more data than we know how to analyze!!!

Page 3: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

A sampling of various technology performance curves

• Not all technologies advance at same rate!!!

Performance gains from 1980 to present

1

10

100

1000

10000

100000Im

pro

vem

ent

Disk Drive Internal DataRate

Disk Drive InterfaceData RateEthernet NetworkBandwidth

Intel MicroprocessorClock SpeedDrive Capacity

Page 4: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Estimated Sustained GFLOPs at NCAR (Production Systems)

0

100

200

300

400

500

600

700

800

900

Jan-97 Jan-98 Jan-99 Jan-00 Jan-01 Jan-02 Jan-03 Jan-04 Jan-05 Jan-06

IBM p5-575/HPS(bluevista)

IBM Opteron/Linux(pegasus)

IBM Opteron/Linux(lightning)

IBM POWER4/Federation(thunder)

IBM POWER4/Colony(bluesky)

IBM POWER4 (bluedawn)

SGI Origin3800/128

IBM POWER3(blackforest)

IBM POWER3 (babyblue)

Compaq ES40/32(prospect)

SGI Origin2000/128 (ute)

HP SPP-2000/64 (sioux)

CRI Cray C90/16 (antero)

CRI Cray J90 series

ARCS Phase 4

Cray C90/16

HP SPP2000

SGI Origin2000

blackforest (WH-1)

SGI Origin3800

lightning

bluesky

blackforest

ARCS Phase 3

ARCS Phase 2

ARCS Phase 1

pegasus

Linux

blackforest (WH-2/NH-2)

Page 5: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

NCAR MSS - Data Holdings

0

500

1000

1500

2000

2500

3000Ja

n-9

7

Jan

-98

Jan

-99

Jan

-00

Jan

-01

Jan

-02

Jan

-03

Jan

-04

Jan

-05

Jan

-06

Te

rab

yte

s

Total

Unique

40 years for thefirst PetaByte

Nov '02

20 months for thesecond PetaByte

Jul '04

Page 6: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Growth of individual NCAR simulation data sets

Approximate Simulation Data Set Sizes

0.1

1

10

100

1000

10000

100000

1989 1995 1998 2000 2004 2006

GB

s

Climate

Turbulence

Weather

Representative data sets from climate, turbulence, and weather disciplines

Page 7: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Climate simulation grid resolutions

Page 8: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Example: Compressible plume dynamics

• 504x504x2048

• 5 variables (u,v,w,rho,temp)

• ~500 time steps saved

• 9TBs storage (4GBs/variable/timestep)

• Six months compute time required on 112 IBM SP RS/6000 processors

• Three months for post-processing

• Data may be analyzed for several years

M. Rast, 2004. Images courtesy of Joseph Mendoza, NCAR/SCD

Page 9: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Workflow in computational science

SimulationAnalysis

& VisualizationStorage

PostProcessing Storage

BatchBatch &

Interactive Interactive

Bandwidth requirements?

Bandwidth requirements?

Page 10: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

What is meant by interactive computing?

Definition: A system is interactive if the time between a user event and the response to that event is short enough maintain my full attention

If the response time is…1-5 seconds : I’m engaged

5-60 seconds : I’m reading email

1-3 minutes : I’ve forgotten what I was trying to do

> 3 minutes : I’ve given up!

Page 11: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

IO wait times for high resolution simulations

• Assumptions– Single precision

– 100 MB/sec bandwidth

– No contention

Resolution MBs per variable

Scalar variable wait time

Vector variable wait time

1283 8 0.1 0.3

2563 67 0.7 2.1

5123 537 5.0 15.0

10243 4295 43.0 130.0

Interactive!

Reading mail!!

Page 12: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Visualization and Analysis Platform for oceanic, atmospheric, and solar Research (VAPoR)

Key components1. Domain specific application focus: numerically simulated turbulence 2. Quantitative capabilities to support scientific data analysis3. Integrate visualization into analysis process, interactively steering the

analysis while enhancing data understanding4. Employ multiresolution data representation as a data reduction technique

This work is funded in part through a U.S. National Science Foundation, Information Technology Research program grant

Combination of visualization with multiresolution data representation that provide sufficient data reduction to enable interactive work

Page 13: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Enabling speed/quality tradeoffs with multiresolution data representation

1

•Multiple copies of data at varying power of two resolutions

•Storage costs:

1/2

1/41/8

dddL

l

dl //// 32

0

212121121

•2D Example: Texture MIP MappingExample: Texture MIP Mapping

Page 14: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Wavelet Transforms for 3D Multiresolution data representation

• Permit hierarchical data representation

• Invertible and lossless (subject to floating point round off errors)

• Numerically efficient – forward and inverse transform

• No additional storage cost!!!

Page 15: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Compressible Convection

1283 5123M. Rast, 2002

Page 16: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

504x504x2048

Full

252x252x1024

1/8

126x126x512

1/64

63x63x256

1/512

Compressible plume data set shown at native and progressively coarser resolutions

Compressible plume

Resolution:

Problem size:

Page 17: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Rendering timings

0.1

1

10

100

1000

Full 1/2 1/4 1/8

Resolution

Tim

e in

se

con

ds

Mdb

Vtk

0.01

0.1

1

10

Full 1/2 1/4 1/8

Resolution

Tim

e in

se

con

ds

Mdb

5123 Compressible Convection 5042x2048 Compressible Plume

Reduced resolution affords responsive interaction while preserving all but finest features

SGI Octane2, 1x600MHz R14k

SGI Origin, 10x600MHz R14k

Interactive

Page 18: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Derived quantities

p: pressure

: density

T: temperature

: ionization potential

: Avogadro’s number

me: electron mass

k: Boltzmann’s constant

h: Planck’s constant

(1) Tp

(2)

2323

2

2

2

1kTe e

N

T

h

km

y

y

(3)22 u

Derived quantities produced from the simulation’s field variables as a post-process

Page 19: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Calculation timings for derived quantities

0.01

0.1

1

10

100

1000

10000

Full 1/2 1/4 1/8

Resolution

Tim

e in

Se

co

nd

s

pressure (eq 1)

ionization (eq 2)

enstrophy (eq 3)

Note: 1/2th resolution is 1/8th problem size, etc

Deriving new quantities on interactive time scales only possible with data reduction

SGI Origin, 10x600MHz R14k

Page 20: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

Integrated visualization and analysis on interactively selected subdomains:

u

2ur

pg

z

1 pr

1 pr

2ur

z

Vertical vorticity of the flow.

Mach number of the vertical velocity.Full domain seen from above. Subdomain from side.

Full domain seen from above. Subdomain from side.

Efficient analysis requires rapid calculation and visualization of unanticipated derived quantities. This can be facilitated by a combination of subdomain selection and resolution reduction.

Page 21: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

A test of multiresolution analysis: Force balance in supersonic downflows

Sites of supersonic downflow are also those of very high vertical vorticity. The cores of the vortex tubes are evacuated, with centripetal acceleration balancing that due to the inward directed pressure gradient. Buoyancy forces are maximum on the tube periphery due to mass flux convergence.

The same interpretation results from analysis at half resolution.

1 pr

u

2ur

pg

z

1 pr

2ur

z

u

2ur

pg

z

1 pr

1 pr

2ur

z

Full

Half

Resolution

Subdomain selection and reduced resolution together yield data reduction by a factor of 128!!!

Page 22: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

Future???

Original 20:1 Lossy Compression

Page 23: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

• Live VAPOR demonstrations, SGI Theatre (booth # 602):– Wednesday, 11:30am– Thursday, 3:30pm

• VAPOR URL:– http://www.scd.ucar.edu/hss/dasg/software/vapor

Questions???

Page 24: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

• Inadequate IO bandwidth is but one impediment to interactive analysis and visualization.

• Others impediments include:– Insufficient capacity of high-speed storage

– Reliance on un-optimized, serial applications

– Mismatch between simulation and analysis computing resources

Page 25: Desktop Techniques for the Exploration of Terascale Sized Turbulence Data Sets

SC05November, [email protected]

Supercomputing • Communications • Data

NCAR Scientific Computing Division

NCAR Science

Space Weather Turbulence

Atmospheric ChemistryClimate Weather

The Sun

More than just the atmosphere… from the earth’s oceans to the solar interior