Dax: A Massively Threaded Visualization and Analysis Toolkit for … · 2014. 4. 8. · Sandia...
Transcript of Dax: A Massively Threaded Visualization and Analysis Toolkit for … · 2014. 4. 8. · Sandia...
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2014-1176P
Dax: A Massively Threaded Visualiza5on and Analysis Toolkit for Extreme Scale
GPU Technology Conference March 26, 2014
Kenneth Moreland Sandia Na5onal Laboratories
Robert Maynard Kitware, Inc.
A Toolkit for Scien5fic Visualiza5on
Sci Vis a Conglomera5on of Many Geometric and Other Tools
Mo#va#on
Slide of Doom
System Parameter 2011 “2018” Factor Change
System Peak 2 PetaFLOPS 1 ExaFLOP 500
Power 6 MW ≤ 20 MW 3
System Memory 0.3 PB 32 – 64 PB 100 – 200
Total Concurrency 225K 1B × 10 1B × 100 40,000 – 400,000
Node Performance 125 GF 1 TF 10 TF 8 – 80
Node Concurrency 12 1,000 10,000 83 – 830
Network BW 1.5 KB/s 100 GB/s 1000 GB/s 66 – 660
System Size (nodes) 18,700 1,000,000 100,000 50 – 500
I/O Capacity 15 PB 300 – 1000 PB 20 – 67
I/O BW 0.2 TB/s 20 – 60 TB/s 10 – 30
Slide of Doom
System Parameter 2011 “2018” Factor Change
System Peak 2 PetaFLOPS 1 ExaFLOP 500
Power 6 MW ≤ 20 MW 3
System Memory 0.3 PB 32 – 64 PB 100 – 200
Total Concurrency 225K 1B × 10 1B × 100 40,000 – 400,000
Node Performance 125 GF 1 TF 10 TF 8 – 80
Node Concurrency 12 1,000 10,000 83 – 830
Network BW 1.5 KB/s 100 GB/s 1000 GB/s 66 – 660
System Size (nodes) 18,700 1,000,000 100,000 50 – 500
I/O Capacity 15 PB 300 – 1000 PB 20 – 67
I/O BW 0.2 TB/s 20 – 60 TB/s 10 – 30
Extreme Scale is Threads, Threads, Threads!
§ To succeed at extreme scale, you need to consider the finest possible level of concurrency § Expect each thread to process exactly one element § Disallow communica5on among threads
*Source: Scien5fic Discovery at the Exascale, Ahern, Shoshani, Ma, et al.
Jaguar – XT5 Titan – XK7 Exascale*
Cores 224,256 299,008 and 18,688 gpu
1 billion
Concurrency 224,256 way 70 – 500 million way 10 – 100 billion way
Memory 300 Terabytes 700 Terabytes 128 Petabytes
Project Goals § Reduce the challenges of wri5ng highly concurrent algorithms.
“Everybody who learns concurrency thinks they understand it, ends up finding mysterious races they thought weren’t possible, and discovers
that they didn’t actually understand it yet aher all.” Herb Suker
Approach
Functor Mapping [Baker, et al. 2010]
functor()
Applied to Topologies
functor()
Applied to Topologies
functor()
Framework
Execu5on Environment
Control Environment
Dax Framework
dax::cont dax::exec
Execu5on Environment
Control Environment
Grid Topology Array Handle Invoke
Dax Framework
dax::cont dax::exec
Execu5on Environment
Cell Opera5ons Field Opera5ons
Basic Math Make Cells
Control Environment
Grid Topology Array Handle Invoke
Worklet
Dax Framework
dax::cont dax::exec
Execu5on Environment
Cell Opera5ons Field Opera5ons
Basic Math Make Cells
Control Environment
Grid Topology Array Handle Invoke
Worklet
Dax Framework
dax::cont dax::exec
Execu5on Environment
Cell Opera5ons Field Opera5ons
Basic Math Make Cells
Control Environment
Grid Topology Array Handle Invoke
Device Adapter Allocate Transfer Schedule Sort …
Worklet
Dax Framework
dax::cont dax::exec
Device Adapter Contents § Tag (struct DeviceAdapterFoo { };) § Execu5on Array Manager
§ Schedule
§ Scan § Sort § Other Support algorithms
§ Stream compact, copy, parallel find, unique
Control Environment Execu5on Environment
functor
worklet worklet worklet worklet worklet worklet worklet functor
8 3 5 5 3 6 0 7 4 0 8 11 16 21 24 30 30 37 41 41
8 3 5 5 3 6 0 7 4 0 0 0 3 3 4 5 5 6 7 8
Transfer
Schedule
Compute
Compute
Anatomy of a Worklet
struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(Out)); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };
struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(Out)); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };
struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(Out)); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };
struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(Out)); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };
struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(Out)); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };
struct Sine: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(Out)); typedef _2 ExecutionSignature(_1); DAX_EXEC_EXPORT dax::Scalar operator()(dax::Scalar v) const { return dax::math::Sin(v); } };
dax::cont::ArrayHandle<dax::Scalar> inputHandle = dax::cont::make_ArrayHandle(input); dax::cont::ArrayHandle<dax::Scalar> sineResult; dax::cont::DispatcherMapField<Sine> dispatcher; dispatcher.Invoke(inputHandle, sineResult);
Control Environment
Execu5on Environment
struct Zip2: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(In), Field(Out)); typedef _3 ExecutionSignature(_1, _2); DAX_EXEC_EXPORT dax::Vector2 operator()(dax::Scalar x, dax::Scalar y) const { return dax::make_Vector2(x, y); } };
struct ImagToPolar: public dax::exec::WorkletMapField { typedef void ControlSignature(Field(In), Field(In), Field(Out), Field(Out)); typedef void ExecutionSignature(_1, _2, _3, _4); DAX_EXEC_EXPORT void operator()(dax::Scalar real, dax::Scalar imaginary, dax::Scalar &magnitude, dax::Scalar &phase) const { magnitude = dax::math::Magnitude( dax::make_Vector2(real, imaginary)); phase = dax::math::ATan2(imaginary, real); } };
struct Advect: public dax::exec::WorkletMapField { typedef void ControlSignature( Field(In), Field(In), Field(In), Field(Out), Field(Out), Field(Out), Field(Out)); typedef void ExecutionSignature( _1, _2, _3, _4, _5, _6, _7); DAX_EXEC_EXPORT void operator()(dax::Vector3 startPosition, dax::Vector3 startVelocity, dax::Vector3 acceleration, dax::Vector3 &endPosition, dax::Vector3 &endVelocity, dax::Scalar &rotation, dax::Scalar &angularVelocity) const { ... } };
struct Contour: public dax::exec::WorkletInterpolatedCell { typedef void ControlSignature(Topology, Geometry(Out), Field(Point,In)); typedef void ExecutionSignature( Vertices(_1), _2, _3, VisitIndex); template<typename CellTag> DAX_EXEC_EXPORT void operator()( const CellVertices<CellTag>& verts, InterpolatedCellPoints<CellTagTriangle>& outCell, const CellField<dax::Scalar,CellTag> &values, dax::Id inputCellVisitIndex) const { ... } };
Example Code Mandelbulb Fractal
Mandelbulb “nth power” of 3D Vector
where
Mandelbulb Itera5on § Fractal created by itera5ng on power opera5on, offseqng by input coordinate.
§ Our implementa5on counts how many itera5ons to “escape.”
Acknowledgements § This work was supported by the DOE Office of Science, Advanced Scien5fic
Compu5ng Research, under award number 10-‐014707, program manager Lucy Nowell.
§ Addi5onal support by the Director, Office of Advanced Scien5fic Compu5ng Research, Office of Science, of the U.S. Department of Energy under Contract No. 12-‐015215, through the Scien5fic Discovery through Advanced Compu5ng (SciDAC) Ins5tute of Scalable Data Management, Analysis and Visualiza5on.
§ Sandia Na5onal Laboratories is a mul5-‐program laboratory managed and operated by Sandia Corpora5on, a wholly owned subsidiary of Lockheed Mar5n Corpora5on, for the U.S. Department of Energy’s Na5onal Nuclear Security Administra5on under contract DE-‐AC04-‐94AL85000.