Scaling Post-Meshing Operations on Next Generation Platforms€¦ · Scaling Post-Meshing...

Photos placed in horizontal position with even amount

of white spacebetween photos

and header

Photos placed in horizontal position

with even amount of white space

between photos and header

Sandia National Laboratories is a multi mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

RoshanQuadros,BrianCarnes,MadisonBrewer,andByronHanks

DOECentersofExcellencePerformancePortabilityMeetingAug22-24,2017Denver

ScalingPost-MeshingOperationsonNextGenerationPlatforms

SAND No: SAND2017-8825 PE

Parallel decomposition,refinement, smoothing, & projection

Percept

CUBITandPerceptcombinedworkflowforgeneratinglargemeshes

Initial Mesh Generation: advanced meshing

algorithms for Tri, Quad, Tet, and Hex mesh

generation

CUBITprovidesextensivecapabilitiesforpreparinggeometryandgeneratinganinitialmeshhttps://cubit.sandia.gov

Post-MeshingOperations:Refinement->Projection->Smoothing

Mesh refinement workflow:• Generate refined meshes in

memory from an existing mesh

• Project new boundary nodes onto geometry

• Smooth interior mesh nodes to improve mesh quality

Supported mesh types:• block-structured• unstructured• hybrid

Post-MeshingOperator#1:Projection

GeometryKernel:OpenNURBS

§ OpenSourcefromwww.rhino3d.com

§ Lightweight&easytoPort

§ Queryoperationsarethreadsafe

§ Supportsvariouscurves&surfacedefinitions

ParallelKernel:ProjectPointsonaNURBSSurface

ProgrammingModel:MPI+Kokkos (OpenMP)Threelevelsofparallelismisrequired:

1) DistributedmemoryparallelismviaMPI

2) SharedmemorythreadlevelparallelismontheMICdeviceusingKokkoswithOpenMP runtime

3) VectorizationforVectorProcessingUnit(VPU)

Hardware: Trinity testbed containing 72 core KNL

Image courtesy of http://www.hotchips.org

ProgrammingModel:MPI+Kokkos (OpenMP){ // MPI distributes data to n processes

ON_Surface *surface;ON_3dPoint *buff_p;MPI_Comm_size( MPI_COMM_WORLD, &numtasks);…if ( rank == 0 ){

for( int r=1; r < numtasks; r++){…ierr = MPI_Send ( p_start, num_pnts*3, MPI_DOUBLE, r, Tag, MPI_COMM_WORLD);

}} else{

ierr = MPI_Recv ( buff_p, num_pnts*3, MPI_DOUBLE, MPI_ANY_SOURCE, Tag, MPI_COMM_WORLD, &status );

}projection_method( surface, buff_p, num_pnts );

}void projection_method( const ON_Surface *surface, ON_3dPoint *buff_p, const int num_pnts ){

…// Kokkos handles thread level parallelismKokkos::parallel_for ( num_pnts, KOKKOS_LAMBDA(const int i ){

...// OpenNURBS API for projecting a point on a surface double u, v;surface->GetClosestPoint( buff_p[i], &u, &v );ON_3dPoint projected_pt = surface->PointAt( u, v );

PointProjectionScalingResultsonKNL

933.071108.85

1 10 100 1000

Time(sec)

Threadsperrank

mpi_1 mpi_2 mpi_4 mpi_8 mpi_16 mpi_32 mpi_64

Speedup (MPI 32 ranks) = 28X Speedup (MPI + Kokkos) = 77X

ThreadAffinityonKNL

1 10 100 1000Threads per KNL

Blue - ScatterRed - Compact

Post-meshingOperator#2:RefinementStructured grid refinement was a relatively simple process§ Removed pointers and references to main memory objects§ Replaced structured grid data structures with Kokkos views§ Algorithm :

§ Allocate new mesh (view)§ For each block (in serial)

§ For each element in old mesh (in parallel)§ Interpolate coordinates for new mesh§ Transfer existing node coordinates

ScalabilityofRefinement

Sequence of meshes• 0.5M, 4M, 33M elements• multiple blocks (12)

Compare• serial• GPU• threading (OpenMP)

Better scalability with increasing problem size

Blocks were refined sequentially

Post-MeshingOperator#3:SmoothingSmoothingstructuredgridswasmorecomplexprocess§ computeglobalqualitymetricandgradient§ nonlinearCGoptimizationwithlinesearch§ communicationbetweenstructuredblocks

(gradients)Example:smoothingalargecubewithinitialrandomlyperturbednodes

Untangling: invalid elements Smoothing: metric/gradientUntangling: metric/gradient

§ Abstractmeshinterfaceforbothstructuredandunstructured§ highcostofkernelcalls§ sub-optimalinterfacetostructuredgrid§ functionsnotsafeforthreadsorGPU

§ Suggestionforabstractinterfaces:§ designedwithsharedmemory(Kokkos)fromstart§ otherwise,optforspecializedinterfaces.

SmoothingPitfall:AbstractInterfaces

StructuredMesh

UnstructuredMesh

Smoother<MeshBase>• dependsonmany

virtualfunctionsinMeshType class

MeshBase

§ Totalmetric(longdouble)wassumofindividualmetrics(double)§ ProblematicforGPUbuildsasCUDAonlyusesupto64bits(double

precision)forfloatingpointrepresentation.§ Causesillegalmemoryaccess:

§ cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered

§ CertainSTLclassesandfunctionsareproblematiconGPUs§ array,unorderedmap,vector,set,…

§ array=>Kokkos::Array§ unorderedmap =>Kokkos::UnorderedMap

§ std::maxwasrewrittenlocally

SmoothingPitfall:HardwareDifferences

§ MemorylayoutinitiallycausedverypoorperformanceontheGPU§ Smoothingtestcase:perturbedcubefollowedbymeshsmoothing

SmoothingPitfall:MemoryLayout

FutureWork§ Projection:

• StudyhighwatermarkofmemoryusagefordifferentcombinationofMPIranksandThreadsperrank

§ UnstructuredRefinement:• Morecomplicatedalgorithmthanstructured• Example:determinenumberofnewnodes

• useKokkosmaptostoreneedednodes• valuesstoredforeverymeshedge/face• mapalsousedtointerpolatenewcoordinates

§ Smoothing:• Investigateothersmoothingalgorithms(ellipticsmoother)

ThankYou

Scaling Post-Meshing Operations on Next Generation Platforms€¦ · Scaling Post-Meshing...

Documents

Transcript of Scaling Post-Meshing Operations on Next Generation Platforms€¦ · Scaling Post-Meshing...

Advanced Techniques in ANSYS Meshing - padtinc.com · Advanced Techniques in ANSYS Meshing ...

Surface Meshing

Adaptive Meshing and Distortion Control - imechanica · –Adaptive meshing is supported for all step-dependent features (contact, mass scaling, etc.). –Adaptive meshing can be

Digital Platforms for Agro-advisory and Business service Delivery ...€¦ · Digital Platforms for Agro-advisory and Business service Delivery: Lessons from Scaling-up of AgroTech

Platforms, Scales and Networks: Meshing a Local ......Platforms, Scales and Networks: Meshing a Local Sustainable Sharing Economy Ann Light1,2* & Clodagh Miskelly1 *1University of

Scaling Post-Meshing Operations on Next …...Scaling Post-Meshing Operations on Next Generation Platforms SAND No: SAND2017-8825 PE Parallel decomposition, refinement, smoothing,

Platforms and Algorithms for Big Data Analyticsdmkd.cs.vt.edu/TUTORIAL/Bigdata/Slides.pdf · 2016-03-22 · Spark Vertical Scaling Platforms High Performance Computing (HPC) Clusters

Adaptive Meshing - mashayekhi.iut.ac.ir · Adaptive Meshing Adaptive meshing basics – Adaptive remeshing is performed in ABAQUS/Explicit using the arbitrary Lagrangian-Eulerian

CHAMELEON: BUILDING A RECONFIGURABLE EXPERIMENTAL … · auto-scaling, high-availability, cloud federation, etc. Development of new models, algorithms, platforms, auto-scaling HA,

Mapped Meshing

Lecture 3: Introduction to ANSYS Meshing · Meshing – 3D Geometry •3D cell Types •First Meshing Approach Part/Body based • Meshing occurs at part or body level. • Meshing

Optimal Meshing

Scaling Up Nutrition (SUN) — PowerPoint - fao.org · Scaling Up Nutrition ... with strong in-country leadership & a shared space (multi-stakeholder platforms) ... academia, private

Meshing techniques

Meshing Help - ceet.niu.edu (1).pdf · ANSYS Workbench and Mechanical APDL Application Meshing Differences ... CFX-Mesh Method Control ... Meshing Application Tutorials ...

Meshing of a detailed DrivAer Body with ANSYS Meshing and ...

Meshing 010

Lecture 5: Cutcell Meshing - dl.ptecgroup.irdl.ptecgroup.ir/.../Fluent_Meshing_14.5_L05_CutCell.pdf · 14.5 Release Lecture 5: Cutcell Meshing Introduction to ANSYS Fluent Meshing

ANSYS 14 Meshing Automatic Robust Meshing

Re Meshing