Achieving Scalability in the Presence of Asynchrony · resource access or functional services...

28
Achieving Scalability in the Presence of Asynchrony Thomas Sterling Professor of Informatics and Computing Chief Scientist and Associate Director Center for Research in Extreme Scale Technologies (CREST) Pervasive Technology Institute at Indiana University Fellow, Sandia National Laboratories CSRI June 27, 2012

Transcript of Achieving Scalability in the Presence of Asynchrony · resource access or functional services...

Page 1: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Achieving Scalability in the Presence of Asynchrony

Thomas SterlingProfessor of Informatics and Computing

Chief Scientist and Associate DirectorCenter for Research in Extreme Scale Technologies (CREST)

Pervasive Technology Institute at Indiana University

Fellow, Sandia National Laboratories CSRI

June 27, 2012

Page 2: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel
Page 3: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Working Definition

“Asynchrony noun

unpredictable  variability of response time, either  bounded or unbounded, for requests of 

resource access or functional services  within and by computing systems. 

Uncertainty in time.”

Page 4: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Nothing New in Broad Sense

• Mass storage

• Distributed Computing

• Grids• OS noise• Cache replacement

• TLB misses

Page 5: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Manifesting Asynchrony

• Thread time to completion uncertainty

• Remote memory access apparent latency

• Long blocking time for barriers due to slowest  dependent task

• Local memory access time due to cache  hierarchy

• Thread start time due to resource availability

Page 6: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Sources of Asynchrony

• Embedded policies

• Resources outside the scope of user name  space

• Communications network

• Data Access• Function Service Time

• Fault tolerance 

Page 7: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

What is new for Exascale• Scale• Increase range of network latencies and opportunities 

for packet collisions and routing variations

• Deeper memory hierarchy

• Scheduling conflicts for threads to cores• Active response to errors• Variable instruction rate from clock and voltage change

• Finer grain threads as normalization factor of time  delays

Page 8: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

4 Requirements1.

Exploit physical locality opportunities when 

they are present, even if runtime determined

2.

Avoid physical resource blocking caused by  logical task blocking of asynchronous actions

3.

Permit varied order of execution of  delineated localized tasks

4.

Expose intermediate level parallelism &  avoid over constrained synchronization

Page 9: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Synthesis of Concepts• Synchronous domains –

computational complex (e.g., threads, codelets)

– Exploits physical locality and predictable synchronous operation

– Split‐phase transactions

• Blocking avoidance– Lightweight multi‐threading with active context switching

– Dynamic allocation of local execution resources (cores, hardware

threads)

– Runtime task queuing 

• Variable execution ordering– Message‐driven computation: Parcels to Move work to data when preferable

– Percolation

– Constraint‐based synchronization – Declarative criteria for work that is Even‐driven

• Expose parallelism– Data‐directed execution

– Merger of flow control and data structure

– Shared name space

Page 10: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

ParalleX Execution  Model

10

Page 11: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

HPX Runtime System Design

• Current version of HPX provides the following infrastructure  on conventional systems as defined by the ParalleX execution 

model.

– Active Global Address Space (AGAS).– HPX Threads and Thread Management.

– Parcel Transport and Parcel Management.

– Local Control Objects (LCOs).– HPX Processes (distributed objects).

• Namespace and policies management, locality control.

– Monitoring subsystem.

LCOsLCOs

ActionManager

ActionManager

InterconnectInterconnect

Parcel 

Port

Parcel 

Port

Process ManagerProcess Manager Local Memory Management

Local Memory Management Performance MonitorPerformance Monitor

Performance Counters

Page 12: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Active Global Address Space

• Global Address Space throughout the system.– Removes dependency on static data distribution.– Enables dynamic load balancing of application and system data.

• AGAS assigns global names (identifiers, unstructured 128 bit integers) to all 

entities managed by HPX.• Unlike PGAS provides a mechanism to resolve global identifiers into corresponding 

local virtual addresses (LVA).– LVAs comprise – Locality ID, type of entity being referenced and its local memory address.– Moving an entity to a different locality updates this mapping.– Current implementation is based on centralized database storing the mappings which are 

accessible over the local area network. 

– Local caching policies have been implemented to prevent bottlenecks and minimize the 

number of required round‐trips. 

• Current implementation allows autonomous creation of globally unique ids in the 

locality where the entity is initially located and supports memory pooling of similar 

objects to minimize overhead.• Implemented garbage collection scheme of HPX objects.

LCOsLCOs

ActionManager

ActionManager

InterconnectInterconnect

Parcel 

Port

Parcel 

Port

Process ManagerProcess Manager Local Memory Management

Local Memory Management Performance MonitorPerformance Monitor

Performance Counters

Page 13: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Thread Management

• Thread manager is modular and implements a work‐ queue based task management

• Threads are cooperatively scheduled at user level  without requiring a kernel transition

• Specially designed synchronization primitives such as  semaphores, mutexes etc. allow synchronization of 

HPX‐

threads in the same way as conventional threads• Thread management currently supports several key 

modes– Global Thread Queue– Local Queue (work stealing)– Local Priority Queue (work stealing)

LCOsLCOs

ActionManager

ActionManager

InterconnectInterconnect

Parcel 

Port

Parcel 

Port

Process ManagerProcess Manager Local Memory Management

Local Memory Management Performance MonitorPerformance Monitor

Performance Counters

Page 14: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

LCOs (Local Control Objects)

• LCOs provide a means of controlling parallelization and  synchronization of HPX‐threads.

• Enable event‐driven thread creation and can support in‐ place data structure protection and on‐the‐fly scheduling.

• Preferably embedded in the data structures they protect.• Abstraction of a multitude of different functionalities for.

– Event driven PX‐thread creation.– Protection of data structures from race conditions.– Automatic on‐the‐fly scheduling of work.

• LCO may create (or reactivate) a HPX‐thread as a result of  “being triggered”.

LCOsLCOs

ActionManager

ActionManager

InterconnectInterconnect

Parcel 

Port

Parcel 

Port

Process ManagerProcess Manager Local Memory Management

Local Memory Management Performance MonitorPerformance Monitor

Performance Counters

Page 15: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Constraint‐based Synchronization

• Supports Dynamic‐Adaptive Task Scheduling

• Declarative Semantics for Continuation of Execution– Defines conditions for work to be performed

– Not imperative code by user

• Establishes Criteria for Task Instantiation• Supports DAG flow control representation• Examples:

– Dataflow– Futures

15

Page 16: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Future LCOs• Future LCOs are 

essential to HPX  because they are the 

primary method of  controlling 

asynchrony in HPX.• Compatible with 

std::future<>.• Futures act as proxies 

for values that are  being computed 

asynchronously by  actions.

Locality 1Locality 1

Suspend consumerthread 

Execute another thread

Resume consumerthread

Locality 2Locality 2Execute Future:

Producer thread

Future object Future object 

Result is being returned

Page 17: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Dataflow LCOs

• Dataflow LCOs are an extension  of futures that enable 

dependency‐driven asynchrony.• Compatible with std::future<>.• Computation of the action 

associated with a dataflow does  not begin until all the arguments  of the dataflow are ready.

– Non‐LCO arguments are always  ready.

– LCO arguments, such as futures,  might not be ready.

Page 18: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Parcel Management

• Active messages (parcels)

• Destination address, function to execute, parameters, continuation 

• Any inter‐locality messaging is based on Parcels

• In HPX parcels are represented as polymorphic objects

• An HPX entity on creating a parcel object hands it to the parcel handler.

• The parcel handler serializes the parcel where all dependent data is  bundled along with the parcel

• At the receiving locality theparcel is de‐serialized andcauses a HPX thread to becreated based on its content

Locality 2Locality 1

Parcel Handler

parcel object

Action Manager

HPX Threads

put()

Serialized Parcel De-serialized Parcel

Locality 2Locality 1

Parcel Handler

parcel object

Action Manager

HPX Threads

put()

Serialized Parcel De-serialized Parcel

LCOsLCOs

ActionManagerAction

Manager

InterconnectInterconnect

Parcel 

Port

Parcel 

Port

Process ManagerProcess Manager Local Memory Management

Local Memory Management Performance MonitorPerformance Monitor

Performance Counters

Page 19: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Adaptive Mesh Refinement• Why Adaptive Mesh Refinement?

• The AMR dataflow in ParalleX

• The impact of granularity

• 3‐D Test problem: comparisons between MPI based  publicly available AMR toolkits and ParalleX AMR

• Impact of nedmalloc, concur, tcmalloc, and jmalloc

Page 20: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Constraint based Synchronization  for AMR

20

Page 21: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Application: Adaptive Mesh Refinement (AMR) for Astrophysics simulations

• ParalleX based AMR removes all global computation barriers, including the timestep barrier (so 

not all points have to reach the same timestep in order to proceed computing)

Page 22: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel
Page 23: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

MHDe Code Tests• All C++• PPM Reconstruction with HLLE Numerical Flux

• AMR 

• Equations described in  http://arxiv.org/abs/gr‐qc/0605102

Page 24: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Finite Temperature Equation of State

• Putting in the right nuclear physics (polytropes don’t  have the right compactness).  

• To do anything with neutrinos, you need a  temperature: neutrino cooling, dynamics of 

hypernova

• To do anything with radiation, you need a physical  temperature

• This is a spring‐board for new astrophysics and  microphysics

Page 25: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Equations of State come in tables• Not practical to do EOS calculation in place; it’s best to use a 

look‐up table

• The table covers a lot of physics for you

• In high energy astrophysics:

Temperatures vary from 0 to > 100 MeV

Proton fraction changes from 0 to 0.6

Density varies across 10 orders of magnitude

• Finer grid tables are better for accuracy

• Tables need to cover a wide variety of conditions: black hole  formation, neutron star mergers, nucleosynthesis

Page 26: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Preliminary Radial Pulsation  Frequencies

100 200 300 400 500 600 700

0.00076

0.00077

0.00078

0.00079

Page 27: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel

Conclusions• Exascale computing (really, trans‐Petaflops) will exhibit an 

increasing degree of asynchronous functionality

• Runtime adaptation is probably required to address these sources of unpredictability in time domain behavior

• Key concepts, when employed in synergy may address this– Multi‐threaded

– Constraint‐based synchronization represents an experimental step

– Dynamic, overlap/multiphase message‐driven execution

– Global address space

• Early runtime experiments with ParalleX‐based HPX runtime– ParalleX provides an experimental model with HPX implementation

– Demonstrates adaptivity in the presence of asynchrony

Page 28: Achieving Scalability in the Presence of Asynchrony · resource access or functional services within and by computing systems. Uncertainty in time. ” Nothing New ... – Parcel