Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis...

27
Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability (MIRANDA, BG/L) 2 trillion element mesh 2006 2012

Transcript of Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis...

Page 1: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Petascale I/O Impacts on Visualization

Hank Childs

Lawrence Berkeley National Laboratory

& UC Davis

March 24, 2010

27B elementRayleigh-Taylor Instability(MIRANDA, BG/L)

2 trillion element mesh

2006

2012

Page 2: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

How does the {peta-, exa-} scale affect visualization?

Large # of time steps

Large ensembles

High-res meshes

Large # of variables

Your mileage may vary - Are you running full machine?- How much data do you

output?

Page 3: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

The soon-to-be “good ole days” … how visualization is done right now

P0P1

P3

P2

P8P7 P6

P5

P4

P9

Pieces of data(on disk)

Read Process Render

Processor 0

Read Process Render

Processor 1

Read Process Render

Processor 2

Parallelized visualizationdata flow network

P0 P3P2

P5P4 P7P6

P9P8

P1

Parallel Simulation Code

This technique is called “pure parallelism”

Page 4: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Pure parallelism performance is based on # bytes to process and I/O rates.

Vis is almost always >50% I/O and sometimes 98% I/O

Amount of data to visualize is typically O(total mem)

FLOPs Memory I/O

Terascale machine

“Petascale machine”

Two big factors:

① how much data you have to read

② how fast you can read it Relative I/O (ratio of total memory and I/O) is

key

Page 5: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Anedoctal evidence: relative I/O really is getting slower.

Machine name Main memory I/O rate

ASC purple 49.0TB 140GB/s 5.8min

BGL-init 32.0TB 24GB/s 22.2min

BGL-cur 69.0TB 30GB/s 38.3min

Petascale machine

?? ?? >40min

Time to write memory to disk

Page 6: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Why is relative I/O getting slower?

• “I/O doesn’t pay the bills”—And I/O is becoming a dominant cost in the

overall supercomputer procurement.• Simulation codes aren’t as exposed.

—And will be less exposed with proposed future architectures.

Page 7: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Recent runs of trillion cell data sets provide further evidence that I/O dominates

7

● Weak scaling study: ~62.5M cells/core

7

#coresProblem Size

TypeMachine

8K0.5TZAIXPurple

16K1TZSun LinuxRanger

16K1TZLinuxJuno

32K2TZCray XT5JaguarPF

64K4TZBG/PDawn

16K, 32K1TZ, 2TZCray XT4Franklin2T cells, 32K procs on Jaguar

2T cells, 32K procs on Franklin

- Approx I/O time: 2-5 minutes- Approx processing time: 10 seconds

Page 8: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Summary: what are the challenges?

• Scale—We can’t read all of the data at full resolution

any more… • Q: What can we do?• A: We need algorithmic change.

• Insight—How are we going to understand it?

• (There is a lot more data than pixels!)

Page 9: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Multi-resolution techniques use coarse representations then refine.

P0P1

P3

P2

P8P7 P6

P5

P4

P9

Pieces of data(on disk)

Read Process Render

Processor 0

Read Process Render

Processor 1

Read Process Render

Processor 2

Parallelized visualizationdata flow network

P0 P3P2

P5P4 P7P6

P9P8

P1

Parallel Simulation Code

P2

P4

Page 10: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Multi-resolution: pros and cons

• Pros—Drastically reduce I/O & memory requirements

• Cons—Is it meaningful to process simplified version of

the data?—How do we generate hierarchical

representations? What costs do they incur?

Page 11: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

In situ processing does visualization as part of the simulation.

P0P1

P3

P2

P8P7 P6

P5

P4

P9

Pieces of data(on disk)

Read Process Render

Processor 0

Read Process Render

Processor 1

Read Process Render

Processor 2

P0 P3P2

P5P4 P7P6

P9P8

P1

Parallel Simulation Code

Page 12: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

In situ processing does visualization as part of the simulation.

P0P1

P3

P2

P8P7 P6

P5

P4

P9

GetAccessToData

Process RenderProcessor 0

Parallelized visualization data flow networkParallel Simulation Code

GetAccessToData

Process RenderProcessor 1

GetAccessToData

Process RenderProcessor 2

GetAccessToData

Process RenderProcessor 9

… … … …

Page 13: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

In situ: pros and cons

• Pros:—No I/O!—Lots of compute power available

• Cons:—Very memory constrained—Many operations not possible

• Once the simulation has advanced, you cannot go back and analyze it

—User must know what to look a priori• Expensive resource to hold hostage!

Page 14: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Now we know the tools … what problem are we trying to solve?

• Three primary use cases:—Exploration—Confirmation—Communication

Examples:Scientific discoveryDebugging

Examples:Data analysisImages / moviesComparison

Examples:Data analysisImages / movies

Multi-res

In situ

Will still need pure parallelism and possibly other techniques such as data subsetting and streaming.

Page 15: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Prepare for difficult conversations in the future.

• Multi-resolution:—Do you understand what a multi-resolution

hierarchy should look like for your data?—Who do you trust to generate it?—Are you comfortable with your I/O routines

generating these hierarchies while they write?—How much overhead are you willing to tolerate

on your dumps? 33+%?—Willing to accept that your visualizations are

not the “real” data?

Page 16: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Prepare for difficult conversations in the future.

• In situ:—How much memory are you willing to give up

for visualization?—Will you be angry if the vis algorithms crash?—Do you know what you want to generate a

priori? • Can you re-run simulations if necessary?

Page 17: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Visualization on BlueWaters: Two Scenarios

1) Pure parallelism continues (no SW change):—Visualization and analysis will be done using large

portions of BW, users charging against science allocations• Lots of time spent doing I/O• This increases overall I/O contention on BW

—Vis & analysis is slow people do less ( insights will be lost (?))

2) Smart techniques deployed:—Allocations are used for simulation, not vis & analysis—Less artificial I/O contention introduced—Ability to explore / interact with data

Page 18: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

18

184

VisIt is a richly featured, turnkey application

• VisIt is an open source, end user visualization and analysis tool for simulated and experimental data

– Used by: physicists, engineers, code developers, vis experts

– >100K downloads on web

• R&D 100 award in 2005• Used “heavily to exclusively” on 8 of

world’s top 12 supercomputers

217 pin reactor coolingsimulation. Run on ¼ of Argonne BG/P.

1 billion grid points

Page 19: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

19

19

Terribly Named!! Intended for more than just visualization!

Data Exploration

VisualDebugging Quantitative Analysis

Presentations

=?

Comparative Analysis

Page 20: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

20

20

VisIt has a rich feature set that can impact many science areas.

• Meshes: rectilinear, curvilinear, unstructured, point, AMR

• Data: scalar, vector, tensor, material, species• Dimension: 1D, 2D, 3D, time varying• Rendering (~15): pseudocolor, volume rendering,

hedgehogs, glyphs, mesh lines, etc…• Data manipulation (~40): slicing, contouring, clipping,

thresholding, restrict to box, reflect, project, revolve, …• File formats (~85)• Derived quantities: >100 interoperable building blocks

+,-,*,/, gradient, mesh quality, if-then-else, and, or, not• Many general features: position lights, make movie, etc• Queries (~50): ways to pull out quantitative information,

debugging, comparative analysis

Page 21: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

21

21

VisIt employs a parallelized client-server architecture.

• Client-server observations:—Good for remote

visualization—Leverages available

resources—Scales well—No need to move data

Additional design considerations:• Plugins• Multiple UIs: GUI (Qt), CLI

(Python), more…

remote machine

Parallel vis resources

Userdata

localhost – Linux, Windows, Mac

Graphics Hardware

Page 22: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

22

22

The VisIt team focuses on making a robust, usable product for end users.

• Manuals—300 page user manual—200 page command line interface manual—“Getting your data into VisIt” manual

• Wiki for users (and developers)• Revision control, nightly regression testing, etc• Executables for all major platforms• Day long class, complete with exercises

Slides from the VisIt class

Page 23: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

23

23

VisIt is a vibrant project with many participants.

• Over 50 person-years of effort• Over one million lines of code• Partnership between: Department of Energy’s Office of Nuclear

Energy, Office of Science, and National Nuclear Security Agency, and among others—NSF XD centers both expected to make large contributions.

2004-6

User communitygrows, includingAWE & ASC Alliance schools

Fall ‘06

VACET is funded

Spring ‘08

AWE enters repo

2003

LLNL user communitytransitioned to VisIt

2005

2005 R&D100

2007

SciDAC Outreach Center enablesPublic SW repo

2007

Saudi Aramcofunds LLNL to support VisIt

Spring ‘07

GNEP funds LLNL to support GNEP codes at Argonne

Summer‘07

Developers from LLNL, LBL, & ORNLStart dev in repo

‘07-’08

UC Davis & UUtah research done in VisIt repo

2000

Project started

‘07-’08

Partnership withCEA is developed

2008

Institutional supportleverages effort from many labs

Spring ‘09

More developersEntering repo allthe time

Page 24: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

24

VisIt: What’s the Big Deal?

• Everything works at scale• Robust, usable tool• Vis to code development to scientific insight

24

Page 25: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

VisIt and the smart data techiques

• Full pure parallelism implementation• Data subsetting well integrated

—(if you set up your data properly)• In situ: yes, but … it’s a memory hog• Multi-res: emerging effort in this space

Page 26: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

26

Three Ways To Get Data Into VisIt

• (1) Write to a known output format• (2) Write a plugin file format reader• (3) Integrate VisIt “in situ”

—“lib-VisIt” is linked into simulation code• (Note: Memory footprint issues with

implementation!)

—Use model:• simulation code advances• at some time interval (e.g. end of cycle), hands

control to lib-VisIt.• lib-VisIt performs vis & analysis tasks, then hands

control back to simulation code• repeat

26

Page 27: Petascale I/O Impacts on Visualization Hank Childs Lawrence Berkeley National Laboratory & UC Davis March 24, 2010 27B element Rayleigh-Taylor Instability.

Summary

• Massive data will force algorithmic change in visualization tools.

• The VisIt project is making progress on bringing these algorithmic changes to the user community.

• Contact info:—Hank Childs, LBL & UC Davis—[email protected] / [email protected]—http://vis.lbl.gov/~hrchilds

Questions??