Post on 13-Jan-2016
description
Switching to High Gear Switching to High Gear Opportunities for Grand-scale Real-Opportunities for Grand-scale Real-time Parallel Simulationstime Parallel Simulations
Kalyan S. Perumalla, Ph.D.Kalyan S. Perumalla, Ph.D.
Senior Research Staff MemberSenior Research Staff MemberOak Ridge National LaboratoryOak Ridge National Laboratory
Adjunct ProfessorAdjunct ProfessorGeorgia Institute of TechnologyGeorgia Institute of Technology
IEEE DS-RT, Singapore
Oct 26, 2009
2 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Main ThemeMain Theme
Computational PowerComputational Power…unprecedented potential…exploit…unprecedented potential…exploit
Simulation ScaleSimulation Scale…stretch imagination…new scopes…stretch imagination…new scopes
Computational PowerComputational Power…unprecedented potential…exploit…unprecedented potential…exploit
Simulation ScaleSimulation Scale…stretch imagination…new scopes…stretch imagination…new scopes
““Think Big…Think Big…Really BigReally Big””
3 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Confluence of Opportunities, NeedsConfluence of Opportunities, Needs
High-End ComputingHigh-End ComputingPowerPower
ScalableScalableSimulationSimulationMethodsMethods
Large-scaleLarge-scaleScientificScientificQuestionsQuestions
Yes
Yes
???
4 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Parallel Computing Power: It’s ComingParallel Computing Power: It’s Coming
High-end computing…High-end computing…
Coming soon to a center near you!Coming soon to a center near you!
High-end computing…High-end computing…
Coming soon to a center near you!Coming soon to a center near you!
Access to 1000’s of cores…Access to 1000’s of cores…
for for every parallel simulation researcherevery parallel simulation researcher……
in just 2-3 years from nowin just 2-3 years from now
Access to 1000’s of cores…Access to 1000’s of cores…
for for every parallel simulation researcherevery parallel simulation researcher……
in just 2-3 years from nowin just 2-3 years from now
5 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Evidence of Growth in 10Evidence of Growth in 1033-Core-Core
6 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Now, Now, allall Top 500 are 10 Top 500 are 1033-core or More!-core or More!
7 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Switching GearsSwitching GearsGear Decade
Processors
1 1980 101
2 1990 102
3 2000 103
4 2010 104
5 2010 105 -106
R 2020
8 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Business Sensitive
Potential Areas for Discrete Event Potential Areas for Discrete Event Execution on 10Execution on 1055-10-1066 Scale Scale
• Cyber infrastructure simulationsCyber infrastructure simulations– Internet protocols, peer-to-peer designs, …Internet protocols, peer-to-peer designs, …
• Epidemiological simulationsEpidemiological simulations– Disease spread models, mitigation strategies, …Disease spread models, mitigation strategies, …
• Social dynamics simulationsSocial dynamics simulations– Pre- and post-operations campaigns, foreign policy, …Pre- and post-operations campaigns, foreign policy, …
• Vehicular mobility simulationsVehicular mobility simulations– Regional- or nation-scale, …Regional- or nation-scale, …
• Agent-based simulationsAgent-based simulations– Behavioral exploration, complex compositions, …Behavioral exploration, complex compositions, …
• Sensor network simulationsSensor network simulations– Wide area monitoring, situational awareness, …Wide area monitoring, situational awareness, …
• Organization simulationsOrganization simulations– Command and control, business processes, …Command and control, business processes, …
• Logistics simulationsLogistics simulations– Supply chain processes, contingency analyses, …Supply chain processes, contingency analyses, …
• Cyber infrastructure simulationsCyber infrastructure simulations– Internet protocols, peer-to-peer designs, …Internet protocols, peer-to-peer designs, …
• Epidemiological simulationsEpidemiological simulations– Disease spread models, mitigation strategies, …Disease spread models, mitigation strategies, …
• Social dynamics simulationsSocial dynamics simulations– Pre- and post-operations campaigns, foreign policy, …Pre- and post-operations campaigns, foreign policy, …
• Vehicular mobility simulationsVehicular mobility simulations– Regional- or nation-scale, …Regional- or nation-scale, …
• Agent-based simulationsAgent-based simulations– Behavioral exploration, complex compositions, …Behavioral exploration, complex compositions, …
• Sensor network simulationsSensor network simulations– Wide area monitoring, situational awareness, …Wide area monitoring, situational awareness, …
• Organization simulationsOrganization simulations– Command and control, business processes, …Command and control, business processes, …
• Logistics simulationsLogistics simulations– Supply chain processes, contingency analyses, …Supply chain processes, contingency analyses, …
Initial models scaling to103-104 cores
9 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
If only we look harder…If only we look harder…
• Many nation-scale and world-scale questions are becoming relevant
• New methods and methodologies are waiting to be discovered
10 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Slippery SlopesSlippery Slopes
10
Gory detailGory detail AbstractionsAbstractions
Starting point for an Starting point for an experimental studyexperimental study
Tendency with
evolving needs of
accuracy and detail
Tendency with
evolving needs of
accuracy and detail
11 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
How do we abstract immense complexity?How do we abstract immense complexity?Answer: Very difficult until we experiment with the system at scale
12 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
What do we mean by What do we mean by Gory DetailGory Detail??Cyber Security ExampleCyber Security Example
• Network at large– Topologies, bandwidths, latencies, link types, MAC protocols, TCP/IP, BGP, …
• Core systems– Routers, databases, service level agreements, inter-AS relationships, …
• End systems– Processor traits, disk traits, OS instances, daemons, services, S/W bugs, …
• “Heavy” applications and traffic– Video (YouTube, …), VOIP, live streams; foreground, background
• Behavioral infusion– Social nets (topologies, dynamics, agencies, advertisers), peer-to-peer
12
13 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Example: Epidemiology or Computer Worm PropagationExample: Epidemiology or Computer Worm Propagation
13
• Typical dynamics model– Multiple variants exist, but
qualitatively similar
• Excellent fit, but post-facto (!)– Plot collected data
• Difficult as predictive model– Great amount of detail buried in α
• Gory detail needed for better predictive power– Interaction topology
– Resource limitations
• Typical dynamics model– Multiple variants exist, but
qualitatively similar
• Excellent fit, but post-facto (!)– Plot collected data
• Difficult as predictive model– Great amount of detail buried in α
• Gory detail needed for better predictive power– Interaction topology
– Resource limitations
( )dI
I S Idt
( )dI
I S Idt
14 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Slippery Slope: Cost and TimeSlippery Slope: Cost and Time
14
Cost to realize experimentation capability
Time to reach experimentation capability
15 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Our Research Organization in Discrete Our Research Organization in Discrete Event Runtimes and ApplicationsEvent Runtimes and ApplicationsOur Research Organization in Discrete Our Research Organization in Discrete Event Runtimes and ApplicationsEvent Runtimes and Applications
TransportationNetwork
Simulations
TransportationNetwork
Simulations
Sensor Network
Simulations
Sensor Network
Simulations
…………
Evacuation Decision Support
Evacuation Decision Support
Vehicular Simulations
Vehicular Simulations
Communication Network Simulations
Communication Network Simulations
Logistics Simulations
Logistics Simulations
Enterprise Simulations
Enterprise Simulations
Social Network
Simulations
Social Network
Simulations
Asynchronous Scientific
Simulations
Asynchronous Scientific
Simulations…
Parallel/Distributed Discrete Event Simulation Parallel/Distributed Discrete Event Simulation EnginesEngines
Parallel/Distributed Discrete Event Simulation Parallel/Distributed Discrete Event Simulation EnginesEngines
Model Model ExecutionExecution
Model Model ExecutionExecution
SynchronizatioSynchronizationn
SynchronizatioSynchronizationn
Data Data IntegrationIntegration
Data Data IntegrationIntegration
InteroperabilitInteroperabilityy
InteroperabilitInteroperabilityy
Super Super computecompute
rsrs
Super Super computecompute
rsrs
Multi-ScaleMulti-ScaleMulti-ScaleMulti-Scale…
•“Enabling”
•Scalability
•Efficiency
•Correctness
•Robustness
•Usability
•Extensibility
•Integration
ClustersClustersClustersClusters Multi-Multi-CoresCoresMulti-Multi-CoresCores GPGPUsGPGPUsGPGPUsGPGPUs PDAsPDAsPDAsPDAs…
•Core Models
•Feasibility Demonstration
•Extensible Frameworks
•Novel Modeling Methods
•Trade-offs•Memory-Computation
•Speed-Accuracy
•Customization
•Scenario Generation
•Experimentation
•Visualization
Automated Detection/Tracking Design & Analysis
Automated Detection/Tracking Design & Analysis
Comm. Effects
Design & Analysis
Comm. Effects
Design & Analysis
…………
Business Sensitive
16 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
A Few of Our Current Areas, ProjectsA Few of Our Current Areas, Projects• State-level mobilityState-level mobility
– Multi-million intersections and linksMulti-million intersections and links
• Epidemiological analysesEpidemiological analyses
– Detailed, billion-entity dynamicsDetailed, billion-entity dynamics
• Wireless radio signal estimationWireless radio signal estimation
– Multi-million-cell cluttered terrainsMulti-million-cell cluttered terrains
• Supercomputer designSupercomputer design
– Designing next architectures by Designing next architectures by simulating on currentsimulating on current
• Internet security, protocol designInternet security, protocol design
– As-is instantiation of nodes and As-is instantiation of nodes and routersrouters
• Populace’s cognitive behaviorsPopulace’s cognitive behaviors
– Large population cognition with Large population cognition with connectionist networksconnectionist networks
• State-level mobilityState-level mobility
– Multi-million intersections and linksMulti-million intersections and links
• Epidemiological analysesEpidemiological analyses
– Detailed, billion-entity dynamicsDetailed, billion-entity dynamics
• Wireless radio signal estimationWireless radio signal estimation
– Multi-million-cell cluttered terrainsMulti-million-cell cluttered terrains
• Supercomputer designSupercomputer design
– Designing next architectures by Designing next architectures by simulating on currentsimulating on current
• Internet security, protocol designInternet security, protocol design
– As-is instantiation of nodes and As-is instantiation of nodes and routersrouters
• Populace’s cognitive behaviorsPopulace’s cognitive behaviors
– Large population cognition with Large population cognition with connectionist networksconnectionist networks
• GARFIELD-EVACGARFIELD-EVAC
– 101066-10-1077-link scenarios of FL, LA, …-link scenarios of FL, LA, …
• RCREDIFRCREDIF
– 101099-individual infection scenarios-individual infection scenarios
• RCTLMRCTLM
– 3-D 103-D 1077-cells simulated on 10-cells simulated on 1044 cores cores
• µµΠΠ
– Performance prediction of 10Performance prediction of 1066-core -core MPI programs on 10MPI programs on 1044 cores cores
• NetWarpNetWarp
– Hi-fi Internet test-bedHi-fi Internet test-bed
• GARFIELD-EVACGARFIELD-EVAC
– 101066-10-1077-link scenarios of FL, LA, …-link scenarios of FL, LA, …
• RCREDIFRCREDIF
– 101099-individual infection scenarios-individual infection scenarios
• RCTLMRCTLM
– 3-D 103-D 1077-cells simulated on 10-cells simulated on 1044 cores cores
• µµΠΠ
– Performance prediction of 10Performance prediction of 1066-core -core MPI programs on 10MPI programs on 1044 cores cores
• NetWarpNetWarp
– Hi-fi Internet test-bedHi-fi Internet test-bed
17 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Scalable Experimentation for Cyber SecurityScalable Experimentation for Cyber SecurityNetWarp NetWarp is our novel test-bed is our novel test-bed technology for highly scalable, technology for highly scalable, detailed, rapid experimentation of detailed, rapid experimentation of cyber security and cyber cyber security and cyber infrastructuresinfrastructures
18 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Cyber Experimentation ApproachesCyber Experimentation Approaches
Real-T
ime
or F
aste
r
Scalability
Fid
elit
y
Hardware Testbed
Emulation System
Packet-level Simulation
Mixed Abstraction Simulation
Aggregate Models
Fully Virtualized System
NetWarpNetWarp
102 103 104 105 106 107 108
As Fast As Possible
Sequential
Parallel
19 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
NetWarp ArchitectureNetWarp Architecture
Business sensitive
19
20 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
DOE-Sponsored Institute for Advanced DOE-Sponsored Institute for Advanced Architectures and AlgorithmsArchitectures and Algorithms
“…catalyst for the co-design and development of architectures, algorithms, and applications to create synergy in their respective evolutions…”
“…catalyst for the co-design and development of architectures, algorithms, and applications to create synergy in their respective evolutions…”
Need highly scalable simulation methods and Need highly scalable simulation methods and methodologies to simulate next generation architectures methodologies to simulate next generation architectures and algorithms on future supercomputing platforms…and algorithms on future supercomputing platforms…
21 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
μπμπ (MUPI) Performance Investigation (MUPI) Performance Investigation SystemSystem
• μπ = micro parallel performance investigator– Performance prediction for MPI,
Portals and other parallel applications
– Actual application code executed on the real hardware
– Platform is simulated at large virtual scale
– Timing customized by user-defined machine
• Scale is key differentiator– Target: 150,000 virtual cores
– E.g., 150,000 virtual MPI ranks in simulated scenario
• Based on µsik (micro simulator kernel)– Scalable PDES engine
– TCP- or MPI-connectedsimulation kernels
• Scale is key differentiator– Target: 150,000 virtual cores
– E.g., 150,000 virtual MPI ranks in simulated scenario
• Based on µsik (micro simulator kernel)– Scalable PDES engine
– TCP- or MPI-connected simulation kernels
22 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Example: MPI application over Example: MPI application over μπμπ
Modify MPI include and recompile– Change #include <mpi.h> to#include <mupi.h>
Relink to mupi library– Instead of –lmpi, use -lmupi
Run the modified MPI application(a μπ simulation)– mpirun –np 4 test -nvp 32
runs test with 32 virtual MPI rankssimulation uses 4 real cores
μπ itself uses multiple real cores torun in parallel
Modify MPI include and recompile– Change #include <mpi.h> to#include <mupi.h>
Relink to mupi library– Instead of –lmpi, use -lmupi
Run the modified MPI application(a μπ simulation)– mpirun –np 4 test -nvp 32
runs test with 32 virtual MPI rankssimulation uses 4 real cores
μπ itself uses multiple real cores to run in parallel
23 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Epidemic Disease PropagationEpidemic Disease Propagation• Can be an extremely challenging simulation problemCan be an extremely challenging simulation problem
• Asymptotic behaviors are relatively well understoodAsymptotic behaviors are relatively well understood
• Transients are poorly understood, hard to predict wellTransients are poorly understood, hard to predict well
• Defined and characterized by many interlinked processesDefined and characterized by many interlinked processes
• ““Gory Detail” necessaryGory Detail” necessary
• Can be an extremely challenging simulation problemCan be an extremely challenging simulation problem
• Asymptotic behaviors are relatively well understoodAsymptotic behaviors are relatively well understood
• Transients are poorly understood, hard to predict wellTransients are poorly understood, hard to predict well
• Defined and characterized by many interlinked processesDefined and characterized by many interlinked processes
• ““Gory Detail” necessaryGory Detail” necessary
24 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Epidemic Disease PropagationEpidemic Disease Propagation
Image from psc.edu
• Reaction-diffusion processes
– Probability based on interaction times, vulnerabilities, thresholds
– Short- and long-distance mobility, sojourn times
– Probabilistic state transitions, infections, recoveries
• Supercomputing’08 model reported scalability only to 400 cores
– Synchronization costs become prohibitive
– Synchronous execution our prime suspect
• Our discrete event execution relieves synchronization costs
– Scales to tens of thousands of cores
– Up to 1 billion affected entities
• Reaction-diffusion processes
– Probability based on interaction times, vulnerabilities, thresholds
– Short- and long-distance mobility, sojourn times
– Probabilistic state transitions, infections, recoveries
• Supercomputing’08 model reported scalability only to 400 cores
– Synchronization costs become prohibitive
– Synchronous execution our prime suspect
• Our discrete event execution relieves synchronization costs
– Scales to tens of thousands of cores
– Up to 1 billion affected entities
25 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
PDES Scaling NeeedsPDES Scaling Neeeds
• Anticipate Anticipate impending impending opportunitiesopportunities in multiple in multiple application areas of grand-application areas of grand-scale PDES scenariosscale PDES scenarios
• Prepare to capitalize on Prepare to capitalize on increasing increasing computational computational powerpower (300K+ cores) (300K+ cores)
• Aim to achieve computational Aim to achieve computational capability to enable capability to enable new new PDES-based scientific PDES-based scientific solutionssolutions
• Anticipate Anticipate impending impending opportunitiesopportunities in multiple in multiple application areas of grand-application areas of grand-scale PDES scenariosscale PDES scenarios
• Prepare to capitalize on Prepare to capitalize on increasing increasing computational computational powerpower (300K+ cores) (300K+ cores)
• Aim to achieve computational Aim to achieve computational capability to enable capability to enable new new PDES-based scientific PDES-based scientific solutionssolutions
26 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Jaguar Petascale System [Cray XT5]Jaguar Petascale System [Cray XT5]
27 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Jaguar: NCCS’ Cray XT5*Jaguar: NCCS’ Cray XT5*
* Data and images from http://nccs.gov
28 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Technological Upgrade: 10Technological Upgrade: 1055-Scalable -Scalable PDES FrameworksPDES Frameworks
To realize scale with any of the PDES To realize scale with any of the PDES models and applications, we need the models and applications, we need the
core frameworkscore frameworks to scale to scale
To realize scale with any of the PDES To realize scale with any of the PDES models and applications, we need the models and applications, we need the
core frameworkscore frameworks to scale to scale
29 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Recent Attempts at 10Recent Attempts at 1055-Core PDES -Core PDES FrameworksFrameworks
Bauer Bauer et alet al (Jun’09) on Blue Gene P (Argonne) (Jun’09) on Blue Gene P (Argonne) Perumalla & Tipparaju (Jan’09) on Cray XT5 (ORNL)Perumalla & Tipparaju (Jan’09) on Cray XT5 (ORNL)
Business Sensitive
Degradation beyond 64K cores observed by us as well as othersDegradation beyond 64K cores observed by us as well as others
Degradation observed in more than one metric (rollback efficiency, speedup)Degradation observed in more than one metric (rollback efficiency, speedup)
30 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Implications to Discrete Event Execution on Implications to Discrete Event Execution on High Performance Computing PlatformsHigh Performance Computing Platforms
Business Sensitive
31 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Some of our ObjectivesSome of our Objectives
• Scale from 10Scale from 1044 cores (current) to 10 cores (current) to 1055-10-1066 cores (new) cores (new)
• Realize very large-scale scenarios (multi-billion entity)Realize very large-scale scenarios (multi-billion entity)• Cyber infrastructures, social computing, epidemiology, logisticsCyber infrastructures, social computing, epidemiology, logistics• Aid projects in simulation-based design of future generation supercomputersAid projects in simulation-based design of future generation supercomputers
• Scale from 10Scale from 1044 cores (current) to 10 cores (current) to 1055-10-1066 cores (new) cores (new)
• Realize very large-scale scenarios (multi-billion entity)Realize very large-scale scenarios (multi-billion entity)• Cyber infrastructures, social computing, epidemiology, logisticsCyber infrastructures, social computing, epidemiology, logistics• Aid projects in simulation-based design of future generation supercomputersAid projects in simulation-based design of future generation supercomputers
Fill technological gap by achieving the highest scaling Fill technological gap by achieving the highest scaling capabilities of parallel discrete event simulationscapabilities of parallel discrete event simulations
Ultimately, enable formulation of grand-scale solutions with non-Ultimately, enable formulation of grand-scale solutions with non-traditional supercomputing simulationstraditional supercomputing simulationsUltimately, enable formulation of grand-scale solutions with non-Ultimately, enable formulation of grand-scale solutions with non-traditional supercomputing simulationstraditional supercomputing simulations
32 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Electro-magnetic (EM) Wave PropagationElectro-magnetic (EM) Wave Propagation
• Predict receiver signal
• Account for reflectivity, transmitivity, multi-path effects
• Power level (voltage) modeled per face of grid cell
33 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
PHOLD BenchmarkPHOLD Benchmark
• Relatively fine grained– ~5 microseconds computation per event
• 10 “juggler” entities per processor core– Analogous to grid cells, road intersections or such
• Total of 1000 “juggling balls” per core– Analogous to state updates exchanged among cells
• Upon receipt of a ball event, a juggler throws it back random (exponential) time into the future to a random juggler– 1 every 1000 juggling exchanges are constrained to be intra-core, rest
inter-core
34 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Radio Propagation: Speedup on Cray XT4Radio Propagation: Speedup on Cray XT4
35 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Radio Propagation: Speedup on Cray XT4Radio Propagation: Speedup on Cray XT4
36 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Radio Propagation: Runtime Costs on Cray XT4Radio Propagation: Runtime Costs on Cray XT4
37 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Epidemic Propagation: Performance on Cray Epidemic Propagation: Performance on Cray XT5XT5
38 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Epidemic Propagation – Parallel Run time Epidemic Propagation – Parallel Run time on Cray XT5on Cray XT5
500
550
600
650
700
750
0 16384 32768 49152 65536
No. of Cores
Ru
nti
me
(sec
on
ds)
Optimistic Conservative
39 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
PHOLD: Performance on Cray XT5PHOLD: Performance on Cray XT5
40 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Scalability – ObservationsScalability – Observations
• Scalability problems with current approaches not evident previously– Fine until 104 cores, but poor thereafter
• Even with discrete event, implementation is key– Semi-asynchronous execution scales poorly
– Fully asynchronous execution needed
• Scalability problems with current approaches not evident previously– Fine until 104 cores, but poor thereafter
• Even with discrete event, implementation is key– Semi-asynchronous execution scales poorly
– Fully asynchronous execution needed
41 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Trial 0 Trial r-1 Trial r
d+1,0
d,r d,r+1 Δ>0
Δ==0
Band d+1 Band d+2Band d
…
New band started
Δ>0 Ends when Δ==0
LBTS computation for band d
Algorithm Design and Development for Algorithm Design and Development for Scalable Discrete Event ExecutionScalable Discrete Event Execution
Design algorithms optimized for Cray XT5, Blue Gene P/QDesign algorithms optimized for Cray XT5, Blue Gene P/Q
• Design new virtual-time synchronization algorithmDesign new virtual-time synchronization algorithm
• Design novel rollback control schemesDesign novel rollback control schemes
• Design discrete event-specific flow controlDesign discrete event-specific flow control
Design algorithms optimized for Cray XT5, Blue Gene P/QDesign algorithms optimized for Cray XT5, Blue Gene P/Q
• Design new virtual-time synchronization algorithmDesign new virtual-time synchronization algorithm
• Design novel rollback control schemesDesign novel rollback control schemes
• Design discrete event-specific flow controlDesign discrete event-specific flow control
Current synchronization algorithm
42 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Additional Important Algorithmic AspectsAdditional Important Algorithmic Aspects
• Novel separation of event Novel separation of event communication from synchronizationcommunication from synchronization– Prioritization support in our Prioritization support in our
communication layercommunication layer
– ““QoS” support for fast synchronizationQoS” support for fast synchronization
• Novel timestamp-aware bufferingNovel timestamp-aware buffering– Exploit near Exploit near vsvs. far timestamps. far timestamps
– Coordinated with virtual-time Coordinated with virtual-time synchronizationsynchronization
• Novel separation of event Novel separation of event communication from synchronizationcommunication from synchronization– Prioritization support in our Prioritization support in our
communication layercommunication layer
– ““QoS” support for fast synchronizationQoS” support for fast synchronization
• Novel timestamp-aware bufferingNovel timestamp-aware buffering– Exploit near Exploit near vsvs. far timestamps. far timestamps
– Coordinated with virtual-time Coordinated with virtual-time synchronizationsynchronization
• Efficient Efficient flow controlflow control– Highly unstructured inter-Highly unstructured inter-
processor communicationprocessor communication
• Optimized Optimized rollback dynamicsrollback dynamics– Stability and throttling Stability and throttling
mechanismsmechanisms
– Cancel back protocolsCancel back protocols
• Efficient Efficient flow controlflow control– Highly unstructured inter-Highly unstructured inter-
processor communicationprocessor communication
• Optimized Optimized rollback dynamicsrollback dynamics– Stability and throttling Stability and throttling
mechanismsmechanisms
– Cancel back protocolsCancel back protocols
Example of the “transient event” problem
transientmessage
CoreD
wallclock time
Past
Future
CoreC
CoreB
CoreA
transientmessage
CoreD
wallclock time
Past
Future
CoreC
CoreB
CoreA
43 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Data Integration Interface DevelopmentData Integration Interface Development
Application Programming Interface (API) toApplication Programming Interface (API) to– Incorporate streaming input into discrete Incorporate streaming input into discrete
event executionevent execution
– Achieve runtime efficiency as an important Achieve runtime efficiency as an important considerationconsideration
Application Programming Interface (API) toApplication Programming Interface (API) to– Incorporate streaming input into discrete Incorporate streaming input into discrete
event executionevent execution
– Achieve runtime efficiency as an important Achieve runtime efficiency as an important considerationconsideration
Novel concepts supporting latency-hidingNovel concepts supporting latency-hiding– To permit maximal concurrency without violating time-To permit maximal concurrency without violating time-
ordering between live simulation and real-time inputsordering between live simulation and real-time inputs
– Reuse optimistic synchronization for latency-hiding for Reuse optimistic synchronization for latency-hiding for unpredictable data input from external sourcesunpredictable data input from external sources
Novel concepts supporting latency-hidingNovel concepts supporting latency-hiding– To permit maximal concurrency without violating time-To permit maximal concurrency without violating time-
ordering between live simulation and real-time inputsordering between live simulation and real-time inputs
– Reuse optimistic synchronization for latency-hiding for Reuse optimistic synchronization for latency-hiding for unpredictable data input from external sourcesunpredictable data input from external sources
Interconnection Network(s)
…Machine
ProcessorCore
SimulatorProcess
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
…Machine
ProcessorCore
SimulatorProcess
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
SimulatorProcess
LPLP
LPLP
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
SimulatorProcess
LPLP
LPLP
LPLP
LPLP
…Machine
ProcessorCore
SimulatorProcess
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
…Machine
ProcessorCore
SimulatorProcess
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
SimulatorProcess
LPLP
LPLP
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
ProcessorCore
SimulatorProcess
LPLP
LPLP
SimulatorProcess
LPLP
LPLP
LPLP
LPLP
…
LP=Logical Process with its own timeline
44 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Software ImplementationSoftware Implementation
Runtime algorithms and data integration Runtime algorithms and data integration interfaces realized in softwareinterfaces realized in software
– Primarily in C/C++Primarily in C/C++
– Building on current software (scales to 10Building on current software (scales to 1044))
– Optimized for performance on Cray XT5 and Optimized for performance on Cray XT5 and Blue Gene PBlue Gene P
Runtime algorithms and data integration Runtime algorithms and data integration interfaces realized in softwareinterfaces realized in software
– Primarily in C/C++Primarily in C/C++
– Building on current software (scales to 10Building on current software (scales to 1044))
– Optimized for performance on Cray XT5 and Optimized for performance on Cray XT5 and Blue Gene PBlue Gene P
Communication to be structured flexiblyCommunication to be structured flexibly– Use MPI or Portals or combinationUse MPI or Portals or combination
– Will explore potentially new layersWill explore potentially new layers
– Non-blocking collectives (MPI-3)Non-blocking collectives (MPI-3)
– Chapel languageChapel language
Communication to be structured flexiblyCommunication to be structured flexibly– Use MPI or Portals or combinationUse MPI or Portals or combination
– Will explore potentially new layersWill explore potentially new layers
– Non-blocking collectives (MPI-3)Non-blocking collectives (MPI-3)
– Chapel languageChapel language
ECTS QECTS Q
Commitable
Pc
EPTS QEPTS Q
Processable
Pp
EETS QEETS Q
Emittable
Pe
LPLP
LPLPLPLP
LPLPLPLP
KPKP KPKP
KPKP KPKP
User LPs
Kernel LPs
Micro-Kernel
FEL LVT
Future Event ListProc’d Event ListLocal Virtual Time
→tPEL →t
FEL LVT
Future Event ListProc’d Event ListLocal Virtual Time
→tPEL →t
When update kernel Q’s?
•New LP added or deleted
•LP executes an event
•LP receives an event
µsik
µsikProcess
µsikProcess
µsikProcess
µsikProcess
µsikProcess
µsikProcess
libSynk
TM Null
TM
TM Red
RM
FM
FM ShM
FM Myr FM TCP
FM MPI
RM Bar
X Y Implies X uses Y
TM Null
TM
TM Red
RM
FM
FM ShM
FM Myr FM TCP
FM MPI
RM Bar
X Y Implies X uses Y
OS/Hardware
Network
Our existing layered software
Our current scalable data structures
45 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Performance MetricsPerformance Metrics
Efficiency, speedup measured using event rates
Event rate ≡ No. of events processed per wall clock sec
Efficiency, speedup measured using event rates
Event rate ≡ No. of events processed per wall clock sec
• Weak scaling:Ideal speedup ≡ Events/second/processor invariant with
number of processors
• Strong scaling:Ideal speedup ≡ Aggregate events/second linearly increases
with number of processors
• Weak scaling:Ideal speedup ≡ Events/second/processor invariant with
number of processors
• Strong scaling:Ideal speedup ≡ Aggregate events/second linearly increases
with number of processors
46 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Entire runtime and data integration Entire runtime and data integration frameworks to be exercisedframeworks to be exercised
– Instantiate scenarios scaled up from Instantiate scenarios scaled up from smaller-scale scenarios in literaturesmaller-scale scenarios in literature
– Experiment with strong-scaling as Experiment with strong-scaling as well as weak-scaling, as appropriate well as weak-scaling, as appropriate for each application areafor each application area
Entire runtime and data integration Entire runtime and data integration frameworks to be exercisedframeworks to be exercised
– Instantiate scenarios scaled up from Instantiate scenarios scaled up from smaller-scale scenarios in literaturesmaller-scale scenarios in literature
– Experiment with strong-scaling as Experiment with strong-scaling as well as weak-scaling, as appropriate well as weak-scaling, as appropriate for each application areafor each application area
Application Benchmarking and Application Benchmarking and DemonstrationDemonstration
At-scale simulation from each At-scale simulation from each areaarea
– Epidemiological simulationsEpidemiological simulations
– Human behavioral simulationsHuman behavioral simulations
– Cyber infrastructure simulationsCyber infrastructure simulations
– Logistics simulationsLogistics simulations
At-scale simulation from each At-scale simulation from each areaarea
– Epidemiological simulationsEpidemiological simulations
– Human behavioral simulationsHuman behavioral simulations
– Cyber infrastructure simulationsCyber infrastructure simulations
– Logistics simulationsLogistics simulations
ln(1 )
1r i
r R
N rs
ip e
Example: Probability of infection in epidemiological model
Example inter-entity networks
47 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
StatusStatus
Showed preliminary evidence that PDES isShowed preliminary evidence that PDES is– Feasible even at the largest core-countsFeasible even at the largest core-counts
– Adequately scalable to over 100,000 coresAdequately scalable to over 100,000 cores
– But should be improved much, much moreBut should be improved much, much more
Applications can now move beyond “if” and begin to contemplate Applications can now move beyond “if” and begin to contemplate on “how” to use petascale discrete event executionon “how” to use petascale discrete event execution
Showed preliminary evidence that PDES isShowed preliminary evidence that PDES is– Feasible even at the largest core-countsFeasible even at the largest core-counts
– Adequately scalable to over 100,000 coresAdequately scalable to over 100,000 cores
– But should be improved much, much moreBut should be improved much, much more
Applications can now move beyond “if” and begin to contemplate Applications can now move beyond “if” and begin to contemplate on “how” to use petascale discrete event executionon “how” to use petascale discrete event execution
48 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Methodological AlternativesMethodological Alternatives
Sometimes, new modeling formulations may better suit scaling needs!
– Redefine and refine model to suit the computing platform
Example– Ultra-scale vehicular mobility simulations on GPUs…
Sometimes, new modeling formulations may better suit scaling needs!
– Redefine and refine model to suit the computing platform
Example– Ultra-scale vehicular mobility simulations on GPUs…
49 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Example: Ultra-scale Vehicular Mobility Example: Ultra-scale Vehicular Mobility SimulationsSimulations
E.g., National Evacuation Conference
• www.nationalevacuationconference.org
50 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Our GARFIELD Simulation & Visualization Our GARFIELD Simulation & Visualization SystemSystem
FP
FP
FPFP
Texture Memory
FP=Fragment Processor
FP
FP
FPFP
Texture Memory
FP=Fragment Processor
FP
FP
FPFP
Texture Memory
FP=Fragment Processor
FPFPFPFP
Texture Memory
FP=Fragment Processor
FP
FP
FP
FP
Texture Memory
FP=Fragment Processor
v vvFP
FP
FPFP
Texture Memory
FP=Fragment Processor
v
TextureEvacTime
RunTime
State Nodes Links X×X Hours Sec
DC 9,559 14,884 1048576 35.20 54.90
LA 413,574 988,458 4194304 65.07 409.59
TN 583,484 1,335,586 3211264 157.91 353.89
FL 1,048,506 2,629,268 4194304 179.20 611.83
TX 2,073,870 5,116,492 3211264 217.60 777.65
Demo
51 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
MarketingMarketing
Simulation community’s responsibilitySimulation community’s responsibility– Identify potential, benefitsIdentify potential, benefits– Invent new methods, methodologies, capabilitiesInvent new methods, methodologies, capabilities– Educate about need, potential, benefitEducate about need, potential, benefit
Simulation community’s responsibilitySimulation community’s responsibility– Identify potential, benefitsIdentify potential, benefits– Invent new methods, methodologies, capabilitiesInvent new methods, methodologies, capabilities– Educate about need, potential, benefitEducate about need, potential, benefit
Text book definition of Text book definition of marketingmarketing
““Creating the needCreating the need””
Text book definition of Text book definition of marketingmarketing
““Creating the needCreating the need””
52 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Lighter Vein or Reality?Lighter Vein or Reality?
• David Nicol once noted
“PADS research tends to scratch where it doesn’t itch”
• Now, probably time to ponder
“Have we been tolerating some (very bothersome) itches for lack of a long scratching stick?”
• David Nicol once noted
“PADS research tends to scratch where it doesn’t itch”
• Now, probably time to ponder
“Have we been tolerating some (very bothersome) itches for lack of a long scratching stick?”
PADS=Parallel and Distributed Simulation
53 Managed by UT-Battellefor the U.S. Department of Energy IEEE DS-RT'09, Singapore
Perspective and ActionPerspective and Action
• Assume immense computing powerAssume immense computing power
• Conceive large simulation-enabled solutionsConceive large simulation-enabled solutions
• Assume immense computing powerAssume immense computing power
• Conceive large simulation-enabled solutionsConceive large simulation-enabled solutions
““Perfect opportunity to expand our outlook in Perfect opportunity to expand our outlook in simulation-based methods and methodologies”simulation-based methods and methodologies”
• 101055-10-1066 cores nearly a reality cores nearly a realityMillion-core computers impending Million-core computers impending
www.exascale.orgwww.exascale.org
• Nation-scale, world-scale Nation-scale, world-scale questions of increasing interestquestions of increasing interestCompositional dynamics of millions to billion Compositional dynamics of millions to billion
processes, individualsprocesses, individuals
• 101055-10-1066 cores nearly a reality cores nearly a realityMillion-core computers impending Million-core computers impending
www.exascale.orgwww.exascale.org
• Nation-scale, world-scale Nation-scale, world-scale questions of increasing interestquestions of increasing interestCompositional dynamics of millions to billion Compositional dynamics of millions to billion
processes, individualsprocesses, individuals
Thank you!Thank you!
Questions? Comments?