N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and...

15
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and Engineering University of California, San Diego

Transcript of N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and...

Page 1: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Fast Adaptive Storage and Retrieval

Scott B. Baden

Department of Computer Science and EngineeringUniversity of California, San Diego

Page 2: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Motivation

Some applications are able to distinguish interesting features from background data using on-line analysis

Page 3: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Features

Page 4: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Animation

Page 5: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Fast Adaptive Storage and Retrieval

• If the volume fraction of interesting data is small, then we can reduce storage, memory, and network bandwidth requirements significantly by storing only what is “needed”

• We call a scheme that realizes this capabilityAdaptive Storage and Retrieval (FASTR)

• This is a new paradigm for scientific users, since they are reluctant to part with their data

• We use resources only to the extent that we require them: remote knowledge discovery and data browsing

Page 6: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

The KeLP Project

• C++ run time libraries for parallel application & library development• Hide low level details without sacrificing

performance • Irregular block structured data • Express communication at a high level using

intuitive geometric set operations

• Also applies to data intensive applications• KeLP I/O: out of core (Bradley Broom, Rice)

Page 7: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Data intensive application of KeLP

• KDistuf• Turbulent flow with Direct Numerical Simulation• Collaboration involving K. Nomura (UCSD MAE),

W. Kerney and D. Shalit (UCSD CSE),G. Balls (UCSDSC), P. Diamessis (USC)

• Content-based data compression • Borrow structured adaptive mesh

refinement grid techniques to…• Capture features at full resolution• Discard remaining background data

Page 8: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

More about the application

• Turbulent mixing in stably stratified flow under the influence of background shear

• Solve the incompressible Navier Stokes equations

• Follow the time evolution of regions of overturned dense fluid, which are the main agents of stirring and mixing

“The efficiency of mixing in turbulent patches: inferences from direct simulations and microstructure observations,” in press, J. Phys. Ocean. Smyth, Moum, and Caldwell, 2001.

Page 9: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Information discovery• Oceanographic observations are

incomplete: restricted to 1 dimensional observations

• Discovery: time evolution, energy dissipation and lifetime of overturn regions, which have irregular shapes

Bill Smyth, Dept. Oceanic & Atmospheric Sciences,Oregon State University

Page 10: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Fast Adaptive Storage and Retrieval

• Compression depends on the data, currently on 1283

• Best case ~ 20:1 compression (10 GB 500 MB), worst case ~ 2.8:1

• Lempel-Ziv (gzip) give us only 10%

Page 11: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Further savings: another application

• Use volume tracking [Silver, Rutgers] to follow individual features

• FASTR permits us to extract only the data we need out of the many features present• Computational volume: 2M pts• Average feature size: 1K points• Maximum feature size: 20K pts

• Saves additional two orders of magnitude in communication bandwidth requirements

• Perform local analysis on a workstation

Page 12: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Future plans• Develop remote analysis capability• Integrate with DTF data handling

infrastructure• Larger scale simulations on Blue Horizon

and on clusters: 2563

• Study vortex pairs in a stratified turbulent

environment• Improved understanding of aircraft wake vortices• Practical importance for air traffic control

Page 13: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Remote analysis capability

• Perform analysis on data sets stored remotely, e.g. Data Cutter

• We can perform some data analysis on a local workstation

• For highly intensive data analysis, we can use higher end resources, but again we access only the data we need

Page 14: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Publications and people• FASTR is based on a research prototype called

MOLD, which is the M.S. thesis research of UCSD CSE student William Kerney“MOLD: A System for Breaking Down Large Visualization and

Post-Processing Problems.” Expected March 2002.

• Peter Diamessis, then a PhD student with Keiko Nomura (UCSD MAE Dept), used MOLD to carry out an exploration of overturns• An Investigation of Vortical Structures and Density Overturns

in Stably Stratified Homogeneous Turbulence by Means of Direct Numerical Simulation, P. Diamessis, PhD thesis, 2001

• “Automated Tracking of Turbulent Structures in Direct Numerical Simulation,” P. Diamessis et al, PARA 2002, Helsinki, Finland. To appear.

Page 15: N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Fast Adaptive Storage and Retrieval Scott B. Baden Department of Computer Science and.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Software availability

• FASTR- contact us

• KeLP• Hardened version of KeLP, AKA KeLP1.4

• http://www.cse.ucsd.edu/groups/hpcl/scg/kelp

• NPACI Blue Horizon, Sun HPC, Cray T3E, Linux clusters

• Workstations: Solaris, Linux, etc.

• Dual tier variant, KeLP2.1: hierarchical KeLP for SMP clusters and SMP based machines (e.g. BH)