Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

19
Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP) SDM AHM October 5, 2005 Scott A. Klasky ORNL

description

Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP). SDM AHM October 5, 2005 Scott A. Klasky ORNL. Perhaps not just the CPES FSP. Can we form the CAFÉ solution? Combustion, Astrophysics, Fusion End-to-end framework Combustion SciDAC - PowerPoint PPT Presentation

Transcript of Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Page 1: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Requirements for an end-to-end solution the Center for Plasma Edge Simulation

(FSP)

SDM AHM

October 5, 2005

Scott A. Klasky

ORNL

Page 2: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Perhaps not just the CPES FSP

• Can we form the CAFÉ solution?• Combustion, Astrophysics, Fusion End-to-end

framework– Combustion SciDAC– Astrophysics: TSI SciDAC.– Fusion SciDACS (CPES, SWIM, GPS, CEMM)– SNS: Follow closely, and try to exchange technology.

Page 3: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Center for Plasma Edge Simulations (A Fusion Simulation Project SciDAC)How can a particular plasma edge condition dramatically improve the confinement of fusion plasma, as observed in the experiments?The physics of the transitional edge plasma that connects the hot core (of order 100-million-degree-C, or tens of keV) with the material walls is the subject of this research question.5-year goal: Predict the edge pedestal behavior for the ITER and existing devices. This must be answered for the success of ITERWe are developing a testable pedestal simulation framework which incorporates the relevant spectrum of physics processes (e.g, transport, kinetic and magnetohydrodynamic stability and turbulence, flows, and atomic physics in realistic geometry) that span the range of plasma parameters relevant to ITER.

Use Kepler for end-to-end solution with autonomic high performance NXM data transfers for code coupling, code monitoring, saving results.

Codes used in this projectXGC-ET:A fully kinetic PIC code which will solve turbulence, neoclassical, and neutral dynamics self-

consistently.• High velocity space resolution and arbitrary shaped wall are necessary to solve this research

problem. •Will acquire the gyrokinetic machinery from the GTC code, part of the GPS SciDAC.•Will include Degas-2 for more accurate neutral atomic physics around the boundary.

M3D-edge:•An edge modified version of M3D MHD/2-fluid code, part of the CEMM SciDAC.•For nonlinear MHD ELM crashes.

-ET

Linear solvers:•Simple preconditioners for diagonally dominant systems•Multigrid for scalable elliptic solves.•perfect weak scaling:

•investigation of tree code methods (e.g. fast multipole) for direct calculation of electrostatic forces (i.e., PIC w/o cells)

JobJobsubmissionsubmission

Input Files MHD LinearStability monitor

STABLE?True

DistributedStorage

aa

Pedestalgrowth

Pedestal

M3D simulation depicting edge localized modes (ELMs)

M3D Simulation

aa

Portal

XGC-ET Simulation on leadership-class computer

False

Data Interpolation

Data Interpolation

Noise Monitor

Out-of-core isosurface

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

XGC-ETCompute SOL

DistributedStorage

Page 4: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Code Coupling: Forming a computational pipeline

• 2 computers (or more)– 1 computer runs in

batch.– Other system(s) is for

interactive parallel use.

– Security will be by-passed if we can have all computers at ORNL.

Cray XT3XGC on 1,024P

I. clusterMhd-L on 4P I. cluster

M3D on 32P

Move 10MB

<1 second

Move 10M

B

<1 second

I. clusterNoise monitor 80P

30G

B/m

inut

e

Page 5: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Interfaces must be designed to couple codes.• What variables are to be moved/what units?• What is the data decomposition on the sending side? On

the receiving side?• Intercomm (Sussman) seems very interesting (PVM)

– Development of algorithms and techniques for effectively solving key problems in software support for coupled simulations.

– Concentrate on three main issues:• Comprehensive support for determining at runtime what data is to be moved

between simulations• Flexibly and efficiently determining when the data should be moved• Effectively deploying coupled simulation codes in a Grid computing environment.

– A major goal is to minimize the changes that must be made to each individual simulation code.

• Accomplished by having an individual simulation model only specify what data will be made available for a potential data transfer and not specify when an actual data transfer will take place.

• Decisions about when data transfers will take place will be made through a separate coordination specification, that generally will be provided by the person building the complete coupled simulation.

Page 6: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Look at Mbs, not total data sizes• Hawkes (SciDAC 2005)

– INCITE calculation:• ~2000 Seaborg processors, 2.5 million hours total• ~ 5tb data, 9.3Mbs.

• Blondin (SciDAC 2005)– 4 TB, 30 hours =310Mbs

• CPES code coupling: 1.3Mbs, data saving (3D): 300 - 30(0)GB/10 minutes

• Future is difficult to predict for data generation rates.– Codes add more physics, which slow down the code, algorithms speed

up the code, new variables are generated, computers speed up,…

• This is also true for analysis of the Data.– Do we need all of the data at all of the timesteps before we can

analyze?• Can we do analysis and data movement together?

– Analysis/Visualization systems might have to be changed.

Page 7: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

What happens when the Mbs gets too large?

• Must understand the “features” in the data.• Use AMR-like scheme to save the data.

– Does the data change dramatically everywhere?– Is the data smooth in some regions?

• Can save 100x in compression techniques, but must be able to “use” data.– New viz/analysis tools?

• Could just stitch up the grid, and use old tools.• Useful for Level of Detail Visualization (more detail in

regions which change).

• Use in combination with “smart” data caching/ data compression (see below)

Page 8: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

End-to-end/workflow requirements.

• Easy to Install:– Good examples (MPI, Netcdf, HDF5, LN, bbcp)

• Easy to Use:– Ensight-Gold

• Must have “value-added” over simple approaches.– Value added discussed in the following slides.

• Must be robust/fault tolerant.– The workflow can not crash our simulations/nodes!

Page 9: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Need a data model • Allows the CS community to design “modules” which can

understand the data.– Allow for netcdf, hdf5.– Develop interfaces to “extract” portion of the data from the

files/memory.

• Must come from the application areas teaming up with the CS community.– HDF5/Netcdf is not a data model.

• Can we use the data model in SciRun/AVSExpress/Ensight as a start?– Meshes[] (uniform, rectlinear, structured, unstructured).

• Hierarchy in meshes (AMR).

– Cell Centered, Vertex Centered, Edge Centered data.– Multiple variables on a mesh.– Can we use “simple” API’s in the codes which can write the data

out?

Page 10: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Monitoring.

• We want to “watch” portions of the data from the simulation, as the simulation progresses.– Want the ability to play back from t=0 to the current

frame. I.e. snapshot movies.– Want this information presented so that we can

collaborate during/after the simulation.• Highlights part of the data, to discuss with other users.• Draw on the figures.• Mostly 1D plots, some 2d (surface/contour) plots, some 3D

plots.• Example

(http://w3.pppl.gov/transp/ElVis/121472A03_D3D.html)

Page 11: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Portal to launch workflow/monitor jobs

• Use the portal as a front-end to the workflow.– Would like to see the workflow/ but not monitor it.

• Perhaps it will allow us to choose different workflows which were created?

– Would like to launch workflow, and have automatic job submission for known “clusters/HPC”.

• Submit to all, kill all when one starts running

Page 12: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Users want to write their own analysis

• Requires that they can do this in F90, C/C++, Python.– Need wizards to allow users to describe their

input/output.• Similar to AVS/Express, SciRun, OpenDX.

• Common scenario– Users want the main data field (field_in), they want a

string (“temperature”), they want a condition (>), they want an output field. They also want this to run on their cluster with M processors. They also want to change the inputs at any given time.

Page 13: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Efficient Data Movement

• One same node– Use memory reference.

• On same cluster– Use MPI communication.

• On different clusters (NXM communication)– 2 approaches: memory-memory vs. files.– File approach is not always useable.

• Will break the solution for “code-coupling” approaches since I/O can become the bottleneck. (open/close/read/write).

– Working with Parashar/Kohl to look into the NXM problem.

– Do we make this part of Kepler?

Page 14: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Distributed Data Storage - 1

• Users do NOT want to know where their data is stored.

• Users want the FASTEST possible method to get to their data.– Users “seldom” look at all of their data at once.

• Usually, we look at a handful of variables at a time, with only a few time slices at a time. (DON’T need 4 TB in a second).

– Users require that solution works on their laptop when traveling! (must cache results from local-disk).

– Users do NOT want to change their mode-of-operational during travel.

Page 15: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Distributed-data storage -2

• LN is a good example of an “almost” useable system.– Needs to directly understand HDF5/netcdf.– Needs to be able to cache information on local disks,

and modify the eXnodes.– Needs to be able to work with HPSS.

• But this is NOT enough!

Page 16: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Smart data cache

• Users typically access their data in “similar” patterns.– Look at timestep 1 for variables A,B, look at ts=2 for

A,B, …..– If we know what the user wants, when he/she wants

it, then we can use smart technologies.

• In a collaboration, the data access gets more complicated.– Neural Networks to the rescue!

Page 17: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Need data mining technology integrated into the solution

• We must understand the “features” of the data.– Requires a working relationship with app. Scientists

and computer scientists.

• Want to detect features “on-the-fly” (from the current, and previous timesteps).

• Could feature born analysis be done by the end of the simulation?– Pre-compute everything possible by the end of the

simulation. DO NOT REQUIRE the end user to wait for anything that we know we want.

Page 18: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Security

• Users do NOT want to deal with this.– But of course, they have to.

• Will DOE require single sign-ins.– Can “trusted” sites talk to other “trusted” sites via

ports being opened from A-B?

• Will this be the death for workflow automation?– Can automate data movement, if we must sign on

each time with unique passwords.

Page 19: Requirements for an end-to-end solution the Center for Plasma Edge Simulation (FSP)

Conclusions.

• We “need” Kepler in order for the CPES project to be successful.– We need efficient NXM data moved, and monitored.– We need to be able to provide feedback to the simulation(s).– Codes must be coupled, and we need an efficient mechanism to

couple the data.

• What do we do with single-logins?– ORNL tells me that we can have ports open from one site to

another without violating the security model. What about other sites?

• Are we prepared for new architectures?– Cray XT3 has only 1 small pipe out to the world.