Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi...

download Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oñate.

If you can't read please download the document

Transcript of Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi...

  • Slide 1
  • Scalable System for Large Unstructured Mesh Simulation Miguel A. Pasenau, Pooyan Dadvand, Jordi Cotela, Abel Coll and Eugenio Oate
  • Slide 2
  • 29th Nov 2012 / 2 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines Acknowledgements
  • Slide 3
  • 29th Nov 2012 / 3 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines Acknowledgements
  • Slide 4
  • 29th Nov 2012 / 4 Introduction Education: Masters in Numerical Methods, trainings, seminars, etc. Publishers: magazines, books, etc. Research: PhDs, congresses, projects, etc. One of the International Centers of Excellence on Simulation-Based Engineering and Sciences [Glotzer et al., WTEC Panel Report on International Assessment of Research and Development in Simulation Based Engineering and Science. World Technology Evaluation Center (wtec.org), 2009].
  • Slide 5
  • 29th Nov 2012 / 5 Introduction Simulation: structures
  • Slide 6
  • 29th Nov 2012 / 6 Introduction CFD: Computer Fluid Dynamics
  • Slide 7
  • 29th Nov 2012 / 7 Introduction Geomechanics Industrial forming processes Electromagnetism Acoustics Bio-medical engineering Coupled problems Earth sciences
  • Slide 8
  • 29th Nov 2012 / 8 Introduction Simulation Preparation of analysis data Visualization of results GiD Geometry description Provided by CAD or using GiD Computer Analysis
  • Slide 9
  • 29th Nov 2012 / 9 Introduction Analysis Data generation Read in and correct CAD data Assignment of boundary conditions Definitions of analysis parameters Generation of analysis data Assignment of material properties, etc.
  • Slide 10
  • 29th Nov 2012 / 10 Introduction Visualization of Numerical Results Deformed shapes, temperature distributions, pressures, etc. Vector, contour plots, graphs, Line diagrams, results surfaces Animated sequences Particle line flow diagrams
  • Slide 11
  • 29th Nov 2012 / 11
  • Slide 12
  • 29th Nov 2012 / 12 Introduction Goal: do a CFD simulation with 100 Million elements using in-house tools Hardware: cluster with Master node: 2 x Intel Quad Core E5410, 32 GB RAM 3 TB disc with dedicated Gigabit to Master node 10 nodes: 2 x Intel Quad Core E5410 and 16 GB RAM 2 nodes: 2 x AMD Opteron Quad Core 2356 and 32 GB Total of 96 cores, 224 GB RAM available Infiniband 4x DDR, 20 Gbps
  • Slide 13
  • 29th Nov 2012 / 13 Introduction Airflow around a F1 car model
  • Slide 14
  • 29th Nov 2012 / 14 Introduction Kratos: Multi-physics, open source framework Parallelized for shared and distributed memory machines GiD: Geometry handling and data management First coarse mesh Merging and post-processing results
  • Slide 15
  • 29th Nov 2012 / 15 Introduction Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize
  • Slide 16
  • 29th Nov 2012 / 16 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines and Acknowledgements
  • Slide 17
  • 29th Nov 2012 / 17 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize
  • Slide 18
  • 29th Nov 2012 / 18 Meshing Single workstation: limited memory and time Three steps: Single node: GiD generates a coarse mesh with 13 Million tetrahedrons Single node: Kratos + Metis divide and distribute In parallel: Kratos refines the mesh locally
  • Slide 19
  • 29th Nov 2012 / 19 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize
  • Slide 20
  • 29th Nov 2012 / 20 Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3 Efficient partitioning: before
  • Slide 21
  • 29th Nov 2012 / 21 Rank0 read the model, partitions it and send the partitions to the other ranks Rank 0Rank 1 Rank 2Rank 3 Efficient partitioning: before
  • Slide 22
  • 29th Nov 2012 / 22 Requires large memory in node 0 Using the cluster time for partitioning which can be done outside Each rerun need repartitioning Same working procedure for OpenMP and MPI run Efficient partitioning: before
  • Slide 23
  • 29th Nov 2012 / 23 Dividing and writing the partitions in another machine Reading data of each rank separately Efficient partitioning: now
  • Slide 24
  • 29th Nov 2012 / 24 Preparation and simulation Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize
  • Slide 25
  • 29th Nov 2012 / 25 Local refinement: triangle k i j l m n i l j m k n 1 3 4 2 k k i j l i l j 1 2 i j m k 1 2 k i j l m i l j m k 1 3 2 i l j m k 1 3 2
  • Slide 26
  • 29th Nov 2012 / 26 Local refinement: triangle Selecting the case respecting nodes Id The decision is not for best quality! It is very good for parallelization OpenMP MPI k i j l m i l j m k 1 3 2 i l j m k 1 3 2
  • Slide 27
  • 29th Nov 2012 / 27 Local refinement: tetrahedron Father Element Child Elements
  • Slide 28
  • 29th Nov 2012 / 28 Local refinement: examples
  • Slide 29
  • 29th Nov 2012 / 29 Local refinement: examples
  • Slide 30
  • 29th Nov 2012 / 30 Local refinement: examples
  • Slide 31
  • 29th Nov 2012 / 31 Local refinement: uniform A Uniform refinement can be used to obtain a mesh with 8 times more elements Does not improve the geometry representation
  • Slide 32
  • 29th Nov 2012 / 32 Introduction Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize
  • Slide 33
  • 29th Nov 2012 / 33 Parallel calculation Calculated using 12 x 8 MPI processes Less than 1 day for 400 time steps About 180 GB memory usage Single volume mesh of 103 Million tetrahedrons split into 96 files ( mesh portion and its results)
  • Slide 34
  • 29th Nov 2012 / 34 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines and Acknowledgements
  • Slide 35
  • 29th Nov 2012 / 35 Post processing Geometry Conditions Materials Coarse mesh generation Partition Distribution Communication plan part 1 part 2 Refinement Calculation part n res. 1 res. 2 res. n Merge Visualize
  • Slide 36
  • 29th Nov 2012 / 36 Post-process Challenges to face: Single node Big files: tens or hundreds of GB Merging: Lots of files Batch post-processing Maintain generality
  • Slide 37
  • 29th Nov 2012 / 37 Big Files: results cache Uses a defined memory pool to store results. Used to cache results stored in files. Mesh information Created Results: cuts, extrusions, tcl Temporal results User definable Memory pool Results from files: single, multiple, merge
  • Slide 38
  • 29th Nov 2012 / 38 Big Files: results cache Results cache table RC entry timestamp RC entry timestamp RC entry timestamp Result RC info RC Info file 1offsettype file 2offsettype file noffsettype memory footprint Open files table filehandletype filehandletype filehandletype Result RC info Result RC info Granularity of result
  • Slide 39
  • 29th Nov 2012 / 39 Big Files: results cache Verifies results file(s) and gets results position in file and memory footprint. Results of latest analysis step in memory. Loaded on demand. Oldest results unloaded if needed. Touch on use.
  • Slide 40
  • 29th Nov 2012 / 40 Big Files: results cache Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results cache)
  • Slide 41
  • 29th Nov 2012 / 41 Big Files: results cache Chinese harbour: 104 GB results file 7,6 Million tetrahedrons 2.292 time steps 3,16 GB memory usage ( 2 GB results cache)
  • Slide 42
  • 29th Nov 2012 / 42 Merging many partitions Before: 2, 4,... 10 partitions Now: 32, 64, 128,... of a single volume mesh Postpone any calculation: Skin extraction Finding boundary edges Smoothed normals Neighbour information Graphical objects creation
  • Slide 43
  • 29th Nov 2012 / 43 Merging many partitions Telescope example 23,870,544 tetrahedrons Before32 partitions24 10 After32 partitions4 34 128 partitions10 43 Single file2 16
  • Slide 44
  • 29th Nov 2012 / 44 Merging many partitions
  • Slide 45
  • 29th Nov 2012 / 45 Merging many partitions Racing car example 103,671,344 tetrahedrons Before96 partitions> 5 hours After96 partitions51 21 Single file13 25
  • Slide 46
  • 29th Nov 2012 / 46 Memory usage Around 12 GB of memory used with a spike of 15 GB ( MS Windows) 17,5 GB ( Linux), including: Volume mesh ( 103 Mtetras) Skin mesh ( 6 Mtriangs) Several surface and cut meshes Stream line search tree 2 GB of results cache Animations
  • Slide 47
  • 29th Nov 2012 / 47 Pictures
  • Slide 48
  • 29th Nov 2012 / 48 Pictures
  • Slide 49
  • 29th Nov 2012 / 49 Pictures
  • Slide 50
  • 29th Nov 2012 / 50 Batch post-processing: off-screen GiD with no interaction and no window Command line: gid -offscreen [ WxH] -b+g batch_file_to_run Useful to: launch costly animations in bg or in queue use gid as template generator use gid behind a web server: Flash Video animation Animation window: added button to generate batch file for offscreen-gid to be sent to a batch queue.
  • Slide 51
  • 29th Nov 2012 / 51 Animation
  • Slide 52
  • 29th Nov 2012 / 52 Overview Introduction Preparation and Simulation More Efficient Partitioning Parallel Element Splitting Post Processing Results Cache Merging Many Partitions Memory usage Off-screen mode Conclusions, Future lines and Acknowledgements
  • Slide 53
  • 29th Nov 2012 / 53 Conclusions The implemented improvements helped us to achieve the milestone: Prepare, mesh, calculate and visualize a CFD simulation with 103 Million tetrahedrons GiD: also modest machines take profit of these improvements
  • Slide 54
  • 29th Nov 2012 / 54 Future lines Faster tree creation for stream lines. Now: ~ 90 s. creation time, 2-3 s. per stream line Mesh simplification, LOD geometry and results criteria Surface meshes, iso-surfaces, cuts: faster drawing Volume meshes: faster cuts, stream lines Near real-time Parallelize other algorithms in GiD: Skin and boundary edges extraction Parallel cuts and stream lines creation
  • Slide 55
  • 29th Nov 2012 / 55 Challenges 10 9 10 10 tetrahedrons, 610 8 610 9 triangles Large workstation with Infiniband to cluster and 80 GB or 800 GB RAM? Hard disk? Post process as backend of a web server in cluster? Security issues? Post process embedded in solver? Output of both: the original mesh and a simplified one?
  • Slide 56
  • 29th Nov 2012 / 56 Acknowledgements Ministerio de Ciencia e Innovacin, E-DAMS project European Commission, Real-time project
  • Slide 57
  • 29th Nov 2012 / 57 Comments, questions...... ?
  • Slide 58
  • Thanks for your attention Scalable System for Large Unstructured Mesh Simulation