HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering...

15
2018 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM Modeling & Simulation, Testing and Validation (MSTV) Technical Session August 7-9, 2018 – Novi, Michigan HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO-SIMULATION OF VEHICLE–TERRAIN INTERACTION Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Dan Negrut Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI ABSTRACT Current modeling and simulation capabilities permit tackling complex multi-physics problems, such as those encountered in ground vehicle mobility studies, using high-fidelity physics-based models for all involved subsystems, including the vehicle, tires, and deformable terrain. However, these come at significant computational burden; research and development on new software architecture and paral- lelization techniques is crucial in enabling such predictive simulation capabilities to be useful in design of new vehicles or in operational settings. In this paper, we describe the architecture, philosophy, and implementation of a distributed message- passing-based granular terrain simulation capability and its incorporation into an explicit force– displacement co-simulation framework to enable effective simulation of multi-physics mobility prob- lems. We demonstrate that the proposed infrastructure has good parallel scaling characteristics and can thus effectively leverage available computing resources. Furthermore, we show that the outer com- munication layer, also implemented with a message passing approach, is effective and adds negligible overhead. 1 INTRODUCTION Multibody system dynamics models and tools for vehicle simulation, both commercial and open- source, are nowadays at very high levels of ma- turity and sophistication and are routinely used, both in research and industry, for design and anal- ysis of ground vehicles, particularly for on-road scenarios. However, simulations of off-road vehi- cles continue to rely heavily on empirical or semi- empirical methods that, while computationally ex- peditious, lack the level of fidelity necessary to cap- ture all effects in complex vehicle – terrain interac- tions. Relatively recent advances in terramechan-

Transcript of HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering...

Page 1: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

2018 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUMModeling & Simulation, Testing and Validation (MSTV) Technical Session

August 7-9, 2018 – Novi, Michigan

HIGH PERFORMANCE COMPUTING FRAMEWORK FORCO-SIMULATION OF VEHICLE–TERRAIN INTERACTION

Radu SerbanDept. of Mechanical Engineering

University of Wisconsin - MadisonMadison, WI

Nicholas OlsenDept. of Mechanical Engineering

University of Wisconsin - MadisonMadison, WI

Dan NegrutDept. of Mechanical Engineering

University of Wisconsin - MadisonMadison, WI

ABSTRACT

Current modeling and simulation capabilities permit tackling complex multi-physics problems, suchas those encountered in ground vehicle mobility studies, using high-fidelity physics-based models forall involved subsystems, including the vehicle, tires, and deformable terrain. However, these come atsignificant computational burden; research and development on new software architecture and paral-lelization techniques is crucial in enabling such predictive simulation capabilities to be useful in designof new vehicles or in operational settings.

In this paper, we describe the architecture, philosophy, and implementation of a distributed message-passing-based granular terrain simulation capability and its incorporation into an explicit force–displacement co-simulation framework to enable effective simulation of multi-physics mobility prob-lems. We demonstrate that the proposed infrastructure has good parallel scaling characteristics andcan thus effectively leverage available computing resources. Furthermore, we show that the outer com-munication layer, also implemented with a message passing approach, is effective and adds negligibleoverhead.

1 INTRODUCTION

Multibody system dynamics models and tools forvehicle simulation, both commercial and open-source, are nowadays at very high levels of ma-turity and sophistication and are routinely used,both in research and industry, for design and anal-

ysis of ground vehicles, particularly for on-roadscenarios. However, simulations of off-road vehi-cles continue to rely heavily on empirical or semi-empirical methods that, while computationally ex-peditious, lack the level of fidelity necessary to cap-ture all effects in complex vehicle – terrain interac-tions. Relatively recent advances in terramechan-

Page 2: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

ics modeling and associated simulation softwareare motivating a push towards physics-based, high-fidelity modeling and simulation of the overall off-road mobility problem, including the vehicle, tires,and soil. An example of efforts in this direction isthe ongoing work on producing a next generationof the NATO Reference Mobility Model [1] usinga completely physics-based solution, including full3-D vehicle multibody models and high-fidelity soiland terrain representations [2].

While solutions coupling high-fidelity vehiclemodels with fully-resolved granular dynamics ter-rain models have been demonstrated to capturethe effects important for predictive mobility stud-ies [3, 4, 5], they are computationally very inten-sive; this is especially the case for simulations cou-pling Finite Element Analysis (FEA) for tire mod-eling with Discrete Element Method (DEM) forgranular soil representation. Monolithic simula-tions are prohibitively expensive and undesirablylimit freedom in selecting appropriate numericalmethods and/or parallelization technique due tothe inherent multi-physics nature of the problem.Furthermore, solutions such as using a movingpatch approach to decrease the computational costof deformable soil simulations are not always ap-propriate (for example, when interested in multi-pass effects or simulations of vehicle convoys).

Limitations imposed by monolithic solutions canbe alleviated by adopting a co-simulation approachto the multi-physics mobility problem, allowing in-dependent and concurrent simulation of subsys-tems naturally separated by their physical char-acteristics (vehicle – tire – terrain). Such tech-niques have been demonstrated to be suitable foreffectively calculate mobility metrics such as draw-bar pull and motion resistance, investigate de-formable tire behavior (strain and stress distri-butions), and quantify multi-pass effects on ter-rain compaction and vehicle performance [3]. Aspart of previous work, we introduced a softwareframework which implements an explicit force–

displacement co-simulation technique to decouplethe global problem involving a vehicle multibodysystem, flexible FEA-based tires, and deformableterrain modeled using DEM techniques [4]. Thatsolution relies on multi-core parallelization of thegranular terrain simulation, implemented throughOpenMP [6]. While it was demonstrated that thispermits effectively advancing the state of all in-volved subsystems independently and in parallel,thus reducing the overall time to solution to thecost of the most expensive subsystem, it was alsonoted that scenarios involving large-scale terrainpatches quickly lead to the granular dynamics sim-ulation becoming the computational bottleneck.This is due to the fact that multi-core paralleliza-tion does not scale well beyond a relatively smallnumber of threads owing to cache coherence andfalse-sharing issues [7]. Other single-node alterna-tives, such as using GPU acceleration for the gran-ular dynamics simulation are still limited in scala-bility, primarily due to the available device mem-ory, computational cost of out-of-core implemen-tations, or complexity of multi-GPU approaches.

The work presented herein was therefore moti-vated by the desire to provide a fully scalable solu-tion for granular terrain simulations (limited onlyby available computing resources) while maintain-ing the advantages provided by a co-simulation ap-proach to the vehicle–tire–terrain global problem.Both strong and weak scalability of the proposedsolution are important and both are consideredherein. In the context of a co-simulation frame-work, the final goal in establishing a scalable gran-ular dynamics capability is ensuring that, in allscenarios of interest, the computational bottleneckis always the simulation of one flexible FEA-basedtire. While this calculation can be (and usuallyis) expensive itself, this is a fixed computationalcost. Furthermore, the subsystem separation pro-vided by the co-simulation approach allows workon improving efficiency of FEA calculations to oc-cur as a separate task; this is the object of ongoing

HPC framework for co-simulation of vehicle–terrain interactionPage 2 of 15

Page 3: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

related work.

The outcome is a hybrid platform which im-plements parallelism at multiple levels, includinga distributed granular dynamics solver based ona domain-decomposition approach and leveragingmulti-core parallel computing on each subdomain.This co-simulation framework relies on MessagePassing Interface (MPI) [8] for the data exchangein the outer co-simulation layer and on a hybridMPI-OpenMP parallelization for highly-scalableterrain simulation.

This paper is organized as follows. In the nextsection we describe the overall approach and thearchitecture of the co-simulation framework andprovide a brief overview of the vehicle and flexi-ble tire modeling and simulation capabilities avail-able in Chrono. Section 3 discusses the hybriddistributed/multi-core parallel solution techniquefor large-scale granular dynamics problems, basedon distributed domain-decomposition coupled withmulti-core parallelization of each local subproblem.We provide results from numerical experiments inSection 4 where the emphasis is on computationaland parallel efficiency. We conclude the paper inSection 5 where we also discuss related ongoingand planned research and development efforts.

2 TECHNICAL APPROACH

The co-simulation framework described here wasimplemented within the Chrono multi-physicspackage [9, 10]. Developed through a joint uni-versity effort, Chrono is available under a BSD-3license [11] and is distributed as a suite of mid-dleware libraries organized in separate functionalmodules. The core module provides support forthe modeling, simulation, and visualization of rigidmultibody systems; optional modules provide sup-port for additional classes of problems (e.g., fi-nite element analysis and fluid-solid interaction),for modeling and simulation of specialized systems(such as ground vehicles and granular dynamics

problems), provide specialized parallel computingalgorithms for large-scale simulations (multi-core,GPU, and distributed), or offer additional utilitiesfor post-processing and run-time visualization.

As an open-source, freely available softwarepackage, Chrono fosters collaborative developmentand offers a convenient framework for exploringand developing new simulation and architectureparadigms, such as the one discussed in this pa-per. Furthermore, its permissive licensing modelenables cost-effective design of experiment analy-ses allowing parallel high throughput computingsimulations, relevant in applications such as off-road vehicle mobility assessment.

We provide next an overview of the hybrid MPI-MPI-OpenMP co-simulation framework proposedas a solution to simulating complex vehicle–terraininteraction in which the terrain is representedthrough a Discrete Element Method (DEM) on do-mains of practically arbitrary dimensions and witharbitrary levels of resolution of the granular ma-terial (of course, modulated by available comput-ing resources and numerical limitations inherentto simulating particles at very small scale). Wethen briefly list the Chrono modules that enablesimulation of the various subsystems consideredhere, in particular its ground vehicle modeling ca-pabilities through the Chrono::Vehicle module andthe Chrono::FEA finite element analysis (FEA)module used to model and simulate flexible tires.We concentrate most of the discussion on efficientparallelization of granular dynamics simulations,focusing in particular on the design, architec-ture, and implementation of Chrono::Distributed,a new module targeted at distributed simulationof many-body dynamics with friction and contact.

2.1 High-level framework overview

The proposed co-simulation framework (see Fig. 1)relies on two MPI layers, one for implementing theco-simulation data exchange and an inner layer for

HPC framework for co-simulation of vehicle–terrain interactionPage 3 of 15

Page 4: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

the domain-decomposition-based distributed gran-ular terrain simulation, and a multi-core parallelcomputing layer used to accelerate FEA calcula-tions for the tire subsystems as well as the localgranular dynamics problems in each terrain sub-domain.

The top level of the co-simulation framework is athree-way communication mechanism between thevehicle subsystem, the four (or more) tire subsys-tems, and the terrain. This layer implements anexplicit force–displacement co-simulation scheme,wherein the vehicle exchanges wheel state and tireforces with each individual tire subsystem and eachtire exchanges (deformable or rigid) mesh nodestates and contact forces with the terrain subsys-tem. The current implementation relies on MPI(see Fig. 1) but could be implemented with a dif-ferent mechanism.

Decoupling the simulation of the inherentlymulti-physics off-road mobility problem into vehi-cle, tire, and deformable terrain subsystems allowsgreater flexibility in accelerating the simulation.First, each subsystem can evolve its internal dy-namics with a suitable integration time step, whichcan be different between the various subsystemsand dictated by accuracy and stability consider-ation for that subsystem. Second, each subsys-tem can employ a different numerical integrationscheme, suitable for its specific dynamics problem;for instance, we use an adaptive HHT (Hilber-Hughes-Taylor) scheme in Chrono::FEA while us-ing a semi-implicit Euler scheme for granular dy-namics problems. Finally, and most importantly inthe context of this work, a co-simulation approachpermits leveraging different and independent par-allelization techniques for each subsystem, as dic-tated by the structure, type, and characteristic ofeach subproblem.

Computation of internal forces and Jacobiansin Chrono::FEA, particularly compute-intensivewhen using the Absolute Nodal Coordinate For-mulation (ANCF), leverages shared memory multi-

core parallelism through OpenMP. Granular ter-rain simulation can be performed either ina shared memory, OpenMP-based parallel set-ting, or else in a distributed fashion using theChrono::Distributed hybrid parallel model. As de-scribed below in Section 3 and demonstrated inSection 4.1, the latter offers greatest scalabilityand parallel performance.

2.2 Vehicle and tire simulation

Chrono::Vehicle [12] is a specialized Chrono mod-ule which provides a collection of templates (pa-rameterized models) for various topologies of bothwheeled and tracked vehicle subsystems, as wellas support for closed-loop and interactive drivermodels, and run-time and off-line visualizationof simulation results. Chrono::Vehicle leveragesand works in tandem with other Chrono mod-ules, such as Chrono::FEA for finite element sup-port; Chrono::Granular for granular dynamics sup-port; Chrono::Irrlicht and Chrono::OpenGL forrun-time visualization; and Chrono::Parallel andChrono::Distributed for parallel computing sup-port.

Chrono::Vehicle provides a comprehensive setof vehicle subsystem templates (for tires, suspen-sions, steering mechanisms, drivelines, sprockets,track shoes, etc.), templates for external systems(for powertrains, drivers, terrain models), and ad-ditional utility classes and functions for vehiclevisualization, monitoring, and collection of sim-ulation results. As a C++ middleware library,Chrono::Vehicle requires the user to provide classesfor a concrete instantiation of a particular tem-plate. An optional Chrono library provides com-plete sets of such concrete C++ classes for a fewground vehicles, both wheeled and tracked, whichcan serve as examples for other specific vehiclemodels. An alternative mechanism for definingconcrete instantiation of vehicle system and sub-system templates is based on input specification

HPC framework for co-simulation of vehicle–terrain interactionPage 4 of 15

Page 5: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Figure 1: Schematic of the hybrid MPI-MPI-OpenMP co-simulation framework. The outer MPI layer is responsible fororchestrating the inter-system co-simulation communication. The inner MPI-OpenMP layer is responsible for the hybridparallel granular terrain simulation.

files in the JSON format.

The particular off-road vehicle used for the nu-merical experiments in Section 4 was a four-wheeldrive off-road vehicle with independent double-wishbone suspension and Pitman arm steering (seeFig. 2).

Chrono::Vehicle currently supports three differ-ent categories of tire models: rigid, semi-empirical,and finite element. Rigid tires can be modeled ascylindrical shapes or else as non-deformable tri-angular meshes. From the second class of tiremodels, Chrono::Vehicle provides template imple-mentations for Pacejka (89 and 2002), Fiala, Lu-gre, and TMeasy tire models, all suitable for ma-neuvers on rigid terrain. Finally, the third classof tire models offered are full finite element rep-resentations of the tire, using either ANCF orReissner shell elements. The ANCF-based flex-ible tire model in Chrono::Vehicle, used in thisstudy, is based on the laminated ANCF shell el-

Figure 2: Wheeled vehicle with double wishbone suspen-sions and Pitman arm steering.

HPC framework for co-simulation of vehicle–terrain interactionPage 5 of 15

Page 6: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

ement [13, 14]. Additional details on the actualChrono implementation of this element can befound in [15].

2.3 Granular terrain simulation

While providing a full-fledged multibody simula-tion framework, a unique characteristic of Chronolies with its mature support for frictional con-tact. For such problems, Chrono implements bothnon-smooth (NSC) and smooth (SMC) contact ap-proaches. The former relies on a complementarityformulation which, in conjunction with a Coulombfriction law, results in a Differential VariationalInequality (DVI) form of the equations of mo-tion. The smooth contact method (also known asa penalty method) can be viewed as regulariza-tion approach wherein normal and tangential con-tact forces are based on local body deformationat the contact point. NSC and SMC also differ interms of their modeling capabilities, parameteriza-tions, as well as in their computational complexityand amenability to parallel computing. The twoapproaches to frictional contact implemented inChrono have been compared and validated in [16].For further details on NSC, the reader is directedto [17, 18]. The SMC formulation is discussedin [19, 20].

Large-scale simulations of granular material(i.e., problems with millions of bodies interactingthrough contact and friction), where each parti-cle is considered individually and all inter-particleinteractions accounted for, are collectively knownunder the name Discrete Element Method (DEM).Because NSC formulations for frictional contactare relatively more recent, the term DEM is usuallyassociated with SMC-type formulations. To distin-guish between the two, here we use the acronymsDEM-C (for complementarity) and DEM-P (forpenalty).

As mentioned above, a distinguishing character-istic of the two formulations (NSC and SMC) and

therefore of the two corresponding DEM flavors isthe type of resulting equations of motion. WhileDEM-C requires the solution of a global optimiza-tion problem (obtained after manipulations of theDVI equations of motion), inclusion of frictionalcontact forces in DEM-P is a local process, on aper-contact basis. As such, DEM-P is much easierto parallelize, particularly in a distributed comput-ing environment.

3 PARALLEL GRANULAR DYNAMICS

In order to enable large-scale, high-resolutiongranular dynamics simulations, we developedthe Chrono::Distributed module of Chrono.Chrono::Distributed wraps existing OpenMP sup-port in Chrono::Parallel with MPI support fora hybrid parallel solution to large-scale com-putational granular dynamics. The intent ofChrono::Distributed is to break a large granu-lar simulation into many smaller, mostly-disjointChrono::Parallel simulations which run on power-ful many-core nodes with fast interconnect. Onthis type of architecture, Chrono uses parallelismat three levels to utilize a large number of CPUcores and memory and accelerate simulation: MPIfor domain-decomposition, OpenMP for sharedmemory parallelization of the local problem oneach processor, and SIMD instruction-level paral-lelism within each local thread.

3.1 Multi-core granular dynamics

Chrono::Parallel, an optional module of Chrono,provides multi-core shared-memory support forgranular dynamics using OpenMP directives andthe Thrust library with the OpenMP back-end.

Chrono::Parallel is constructed so that it relieson the core Chrono functionality for its model-ing capabilities, while using different data struc-ture and algorithms more suited to OpenMPshared-memory parallelism. Most importantly,

HPC framework for co-simulation of vehicle–terrain interactionPage 6 of 15

Page 7: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Chrono::Parallel uses a single data container whichimplements a Structure of Arrays (SoA) scheme.This contrasts with Chrono’s main Array of Struc-tures (AoS) representation of data. The SoA formis much more amenable to loop parallelism and toSIMD capabilities on modern CPUs. Dependen-cies of Chrono::Parallel include OpenMP [6] (forparallel for loops), Thrust [21] (for parallel sort,scan, and reduction algorithms), and Blaze [22](for high-performance sparse matrix representa-tion).

Chrono::Parallel gives the user access to alarge set of the modeling elements availablein Chrono; in particular, these include rigidbodies with frictional contact (both penalty-and complementarity-based), arbitrary collisionshapes, kinematic joints, force elements (springs,actuators, etc.), as well as 1-D shaft elements re-quired for simulation of complete Chrono::Vehiclemodels of both wheeled and tracked vehicle sys-tems. Chrono::Parallel also provides implemen-tations of most DVI solvers in Chrono (includ-ing Barzilai-Borwein and Nesterov-type solvers),but the module is currently limited to a semi-implicit Euler time stepping scheme. Because ofthis restriction, finite-element models cannot becurrently tackled with Chrono::Parallel directly.

Another module, Chrono::Granular, provides anumber of utility functions for specification ofgranular materials with either preset or user-defined distributions of material attributes. Theseinclude specifications of particle size and shape,physical properties, and contact material charac-teristics. Chrono::Granular also provides efficientspatial sampling for initializing granular materialin prismatic, cylindrical, and spherical domains ac-cording to regular grids, hexagonal close-packing,and Poisson disk sampling (which provides uni-formly distributed initial positions with guaran-teed minimum separation).

Parallel collision detection. Both DEM-P andDEM-C approaches require a mechanism for col-

lision detection, a process responsible for produc-ing a geometric characterization of interactions ofcolliding shape pairs at any given system configu-ration. This characterization includes the pair ofclosest points, normal, radii of curvature at thecontact point, etc. The most basic approach tocollision detection naively tests all pairs of con-tact shapes in the system, which gives it O(n2)complexity and therefore renders it impractical forlarge-scale problems. In order to improve upon thecomplexity of the naive algorithm, most collisiondetection systems use (at least) a two-phase pro-cess. The first phase, called broad-phase, quicklydismisses pairs of shapes that are clearly not in-teracting. In order to do this efficiently, broad-phase uses bounding shapes (spheres, axis-alignedboxes, or object-oriented boxes) and specializeddata structures like dynamic trees or hierarchicalgrids to spatially organize collision shapes; this re-duces the set of possible interactions to only in-clude pairs of shapes near to each other. In a sec-ond narrow-phase, the pairs of shapes identifiedduring broad-phase are closely examined geomet-rically, either using analytical methods for shapeswith simple geometry, or else with algorithmslike Gilbert-Johnson-Keerthi (GJK) or Minkovski-Portal-Refinement (MPR) which are able to de-termine the contact characterization for pairs ofgeneral convex shapes. Depending on their repre-sentation, concave collision shapes may require aconvex decomposition pre-processing step.

The broad-phase in Chrono::Parallel implementsa binning algorithm (see [23] for implementationdetails), suitable for both multi-core and GPUshared-memory parallel environments, and analyt-ical algorithms with a fall-back on MPR for thenarrow-phase collision detection.

3.2 Distributed granular dynamics

Chrono::Distributed is a relatively minimal mod-ule which provides synchronization functionality in

HPC framework for co-simulation of vehicle–terrain interactionPage 7 of 15

Page 8: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

order to coordinate Chrono::Parallel simulations,which each run on their own MPI rank. In thedistributed framework, the predefined spatial sim-ulation domain is broken into subdomains alongan axis and each subdomain is assigned to a rank.Each rank is then only responsible for computa-tions on the bodies in its designated subdomainand in a thin border layer with its neighbor sub-domains. This boundary layer extends slightlyinto both subdomains and is populated with bod-ies that are seen on both MPI ranks. In orderto maintain consistency between neighbor subdo-mains Chrono::Distributed stores a status for eachbody on each rank to denote whether informationfor that body must be shared with neighboringranks. These statuses are induced by a spatialbreakdown of each subdomain as show in Fig. 3.Each subdomain consists of three types of regions:owned, shared, and ghost. Bodies in the ownedregion are seen only by that rank and no synchro-nization is necessary for them. Shared bodies arethose bodies still in the rank’s subdomain, but thatare close enough to affect bodies in the neighbor-ing subdomain. They are advanced in time by thisrank, but are represented as a proxy body on theneighboring rank. Finally, bodies in the ghost re-gion lie outside of the subdomain, but are closeenough to affect bodies in the subdomain. Theyare advanced in time by the neighboring rank, butexist as proxies in order to couple the neighboringsubsystems. Note that shared bodies on one rankare represented by proxy bodies on another, andvice versa.

Communication between ranks is done at theend of each time-step, at which point all bodieson each rank are classified into the aforementionedstatuses based on their location. A series of MPImessages between ranks then (i) creates new proxybodies for bodies which have moved into a borderregion, (ii) deletes proxy bodies for bodies whichhave moved out of a border region, and (iii) up-dates proxy bodies from the appropriate rank in

Figure 3: Regions of an internal subdomain i inChrono::Distributed. In addition to bodies in its ownedregion, subdomain i also integrates the states of bodies inghost layers of adjacent subdomains.

order to maintain continuity between all ranks.MPI communication consists of packing buffers ofMPI custom data types and sending them point-to-point between neighboring pairs of ranks. In theframework, it is important to note that communi-cation is never global, and that each rank need onlycommunicate with a maximum of two neighbors.This prevents major scaling penalties associatedwith collective MPI transactions in large commu-nicators.

Chrono::Distributed does not require a special-ized distributed collision detection algorithm. In-stead, in line with its design philosophy, it pro-vides enough state information about the systemthrough appropriate replication of shared bodieswith proxies on neighboring MPI ranks to simplyuse the existing Chrono::Parallel shared memoryparallel collision detection described above, in par-allel and independently on each processor.

HPC framework for co-simulation of vehicle–terrain interactionPage 8 of 15

Page 9: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

4 NUMERICAL EXPERIMENTS

The primary focus of the simulations and numer-ical experiments presented herein was on scala-bility and parallel efficiency of overall solutionand distributed terrain simulation within the co-simulation framework. We do not address the is-sue of performance of the Chrono::FEA module(which, as shown below can and does become thecomputational bottleneck); while a very importantaspect in itself and critical to the effectiveness ofthe proposed approach, this falls outside the scopeof this paper but is the object of ongoing work.Similarly, we do not concentrate on using the pro-posed framework for conducting actual mobilitystudies for off-road vehicles on deformable terrain.The ability of capturing relevant mobility metricswith a simulation capable of complex, high-fidelityrepresentation of a vehicle model, flexible tires, anddeformable granular soil has already been demon-strated in [3, 4].

All numerical experiments reported here wereconducted on a Cray XC30 system. The computenodes are equipped with 12-core IntelR© XeonR© E5-2697 v2 processors and are interconnected with adedicated Cray Aries high-speed network.

4.1 Scalability properties

We assessed the scalability characteristics ofChrono::Distributed, first as a stand-alone gran-ular dynamics solver and second within the co-simulation framework. We present results and an-alyze both strong and weak scalability. The formeris concerned with the change in solution time as thenumber of computing resources is increased on aproblem of fixed total size; in the context of off-roadvehicle mobility, good strong scaling properties arerelevant in accelerating solution of a given maneu-ver on a fixed terrain domain with fixed soil char-acteristics when resource allocation is increased.Weak scaling is defined as the variation in solutiontime with number of processors for a fixed prob-

lem size per processor; good weak scaling proper-ties translate in the ability to increase resolutionof terrain representation proportional to computeresource allocation, with minimal effect on time tosolution.

Denoting with T (n) the time to solution (wall-clock time) for a simulation conducted on n proces-sors, appropriate performance metrics for parallelefficiency can be defined as:

Es(n) =T (1)

nT (n)for strong scaling, (1)

Ew(n) =T (1)

T (n)for weak scaling. (2)

Perfect scaling corresponds to E(n) = 1 for all n.Taking into account the current limitations of

Chrono::Distributed, most notably the lack of dy-namic load balancing, a scalability analysis wasperformed on a settling simulation of layered gran-ular material. This particular test was also se-lected because it is representative of the use ofChrono::distributed for simulation of granular ter-rain for mobility studies. Strong and weak scalingperformance of Chrono::Distributed is summarizedin Tables 1 and 2, respectively. These simulationscorrespond to 2 s of settling for the correspond-ing number of particles in a domain of width 0.5m and a base length of 4 m. For the weak scal-ing experiments, the computational domain lengthwas increased proportionally with the number ofprocessors (MPI ranks), up to 128 m when using32 processors. In all cases, the granular materialconsisted of identical spheres of radius 1.25 mm,initialized in layers with uniformly distributed po-sitions in each layer in a rectangular grid.

The main reason for departures from a per-fect parallel efficiency value E(n) = 1 in domain-decomposition architectures such as the one imple-mented in Chrono::Parallel is the increased costof inter-process communication (more interfaceboundary in the strong scaling experiments ormore data per message in the case of weak scal-

HPC framework for co-simulation of vehicle–terrain interactionPage 9 of 15

Page 10: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

MPIRanks

Particles Ratio Time (s) E(n)

1 1,236,372 1 23,489.5 -

2 1,236,372 1 11,826.7 0.993

4 1,236,372 1 5,954.3 0.986

8 1,236,372 1 2,772.5 1.059

16 1,236,372 1 1,439.5 1.020

32 1,236,372 1 712.3 1.031

Table 1: Strong scaling results on settling problem. The

parallel efficiency is defined as: Es(n) = T (1)nT (n) . Wall-clock

times represent the time to simulate the process for 1 s.

MPIRanks

Particles Ratio Time (s) E(n)

1 1,236,372 1 23,489.5 -

2 2,472,744 2 23,901.0 0.983

4 4,944,700 4 23,997.6 0.979

8 9,889,400 8 24,099.4 0.975

16 19,778,012 16 24,406.5 0.962

32 39,555,236 32 24,480.8 0.960

Table 2: Weak scaling results on settling problem. Here,

parallel efficiency is defined as: Ew(n) = T (1)T (n) . Wall-clock

times represent the time to simulate the process for 1 s.

ing). As the results in Tables 1 and 2 indicate,the implementation of the communication layer inChrono::Distributed, most notably the judicioususe of asynchronous (immediate) point-to-pointMPI communication, results in minimal commu-nication overhead and thus translates in parallelefficiencies close to 1. It is worth noting that,in some instances, strong scaling yields efficienciesabove 1.0. The reasons for this are (i) the slightlynonlinear scaling of the broad-phase collision de-tection phase in Chrono::Parallel, which performsbetter on small problem sizes, and (ii) likely bettercache utilization on smaller local problem sizes.

4.2 Full vehicle on deformable terrain

To demonstrate the capabilities of the proposedframework and the effectiveness of a distributedsimulation of the underlying granular terrain sub-system, we chose to simulate the following two sce-narios:

• Acceleration test of a vehicle with flexible tireson granular terrain (Fig. 4a). The goal ofthis example is to demonstrate that, beyonda certain point, the resolution of the granu-lar terrain representation does not affect over-all performance (given enough computing re-sources). This is due to the fact that the timeevolution of the granular terrain subsystemoccurs at a fraction of the cost of advancingthe dynamics of a single flexible FEA-basedtire subsystem.

• Double lane change (DLC) maneuver of a ve-hicle with rigid-mesh tires on granular terrain(Fig. 4b). The purpose of this example isto eliminate the limiting effect of FEA-basedtires on overall time to solution and demon-strate the strong scalability characteristics ofthe distributed terrain simulation within thecontext of the proposed co-simulation frame-work.

HPC framework for co-simulation of vehicle–terrain interactionPage 10 of 15

Page 11: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

(a) Acceleration test; ANCF-based flexible tires. (b) Double-lane change; rigid-mesh tires.

Figure 4: Simulation snapshots for off-road vehicle on granular terrain.

Parameters of the two types of simulations aboveare summarized in Table 3, including number ofparticles and particle radius. In all cases, the soilcohesion is set at 100 kPa. The vehicle has an over-all mass of 2550 kg, independent front and rear sus-pension and a Pitman arm steering mechanism.In simulations A and B, each FEA-based tire isslick and is modeled with a 90× 24 mesh of multi-layered, orthotropic ANCF shell elements [24],with the section geometry created by cubic splines.The default tire internal pressure is assumed to be200 kPa. Rigid-mesh tires in experiments C andD are modeled with a 2× 90× 24 fixed triangularmesh. In the acceleration tests, the vehicle throttleinput is ramped to 80% over a duration of 0.2 s andkept constant afterwards; steering and braking in-puts are maintained at 0%. The DLC simulationsrely on a driver controller which adjusts steeringinput to follow the path depicted in Fig. 4b whilethrottle and braking inputs are adjusted to main-tain a constant speed of 10 m/s.

The good parallel scaling characteristics ofChrono::Distributed (see Section 4.1) are main-tained when integrating this module within theco-simulation framework for the simulation of theterrain subsystem. This is illustrated by the tim-ing results presented in Table 3. These simulationspresented in this section were set up as to permit

a scaling assessment of the underlying distributedterrain simulation and an evaluation of the over-all co-simulation framework performance when in-creasing computing resources. Indeed, the acceler-ation tests using flexible tires (columns A and B)can be used to estimate the weak scaling efficiency,with an adjusted metric defined as:

E ′w =

T (nA)

T (nB)· P (nB)/nB

P (nA)/nA

, (3)

where P (n) is the total problem size (number ofparticles) solved with n processors. Consideringonly the terrain subsystem, we obtain E ′

w = 0.852;the decreased efficiency can be explained by theload imbalance introduced by the additional proxybodies representing the tire mesh within the ter-rain subsystem. With the setup considered here,in both cases A and B advancing the state ofthe terrain subsystem between two co-simulationcommunication points is done completely in “theshadow” of the flexible tire simulations; in fact,such simulations could have been performed withsignificantly higher granular resolution (or alterna-tively on many fewer MPI ranks) without affectingthe time to solution. Furthermore, comparing theoverall wall-clock time to the cost of the slowesttire node, it can be seen that the communicationoverhead in the outer MPI layer – dedicated to im-plementing the force–displacement co-simulation

HPC framework for co-simulation of vehicle–terrain interactionPage 11 of 15

Page 12: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

Problem setup and simulation parametersA B C D

Maneuver Acceleration Acceleration DLC DLC

Tire model ANCF ANCF Rigid Rigid

Domain size [m × m] 8× 3 8× 3 110× 6 110× 6

Particle radius[mm] 12.5 10.0 12.5 12.5

Number particles 283,162 591,090 6,513,518 6,513,518

Step-size [ms] 0.04 0.04 0.04 0.04

Number MPI ranks 5 + 8 5 + 16 5 + 16 5 + 32

Average timing information, per-step [ms]A B C D

Vehicle subsystem 0.93 0.93 0.92 0.92

Terrain subsystem 348.03 391.57 3953.67 1992.25

Tire 0 (front-left) 3467.53 3072.81 1.37 1.39

Tire 1 (front-right) 3435.08 3444.10 1.42 1.41

Tire 2 (rear-left) 3483.08 3488.41 1.46 1.41

Tire 3 (rear-right) 3472.58 2977.90 1.44 1.39

Overall 3547.53 3545.19 3966.21 2002.35

Table 3: Problem setup and average per-step timing for the 4 vehicle co-simulation experiments. All simulations used astep-size of 4 · 10−5 s. Simulations A and B involved ANCF-based flexible tires, while C and D used rigid-mesh tires. Alltiming information is reported in milliseconds and represent cost per co-simulation step.

data exchange – is minimal and represents only1.6% – 1.8% of the overall time to solution.

Similarly, the double-lane change simulationspresented in Section 4.2 can be used to assessthe strong scaling characteristics of the distributedterrain simulation when incorporated in the co-simulation framework. This is because these runswere performed using rigid-mesh tires, thus effec-tively taking out of the equation the integrationcost on the tire nodes. To accommodate the factthat timing results are available only on 16 and 32processors in the terrain MPI intra-communicator,we use the modified efficiency metric:

E ′s =

T (nC)

T (nD)· nD

nC

(4)

which, using the timing information in Table 3,results in a value of E ′

s = 0.99 indicating closeto perfect strong scaling of the terrain calculation.Here again, it can be seen that the overhead ofcommunication overhead in the outer MPI layer is

negligible (0.3% – 0.5%).

5 CONCLUSIONS

We have discussed the architecture and design ofa hybrid MPI-MPI-OpenMP co-simulation frame-work and demonstrated its applicability to large-scale off-road vehicle mobility simulations. Takingadvantage of the hardware configuration of currentsupercomputer clusters, the proposed method-ology leverages parallelism at multiple levels,from multiple node distributed MPI, to multi-core shared-memory within a node, and SIMDinstruction-level parallelism within a thread. Scal-ing analysis results, for both stand-alone granulardynamics problems and within the co-simulationframework itself, demonstrate that the proposedsolution can effectively take advantage of avail-able computing resources and enable simulationsof large terrain patches, highly-resolved granularmaterial, or both. This is shown to be possible at

HPC framework for co-simulation of vehicle–terrain interactionPage 12 of 15

Page 13: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

the computational cost dictated by the simulationof a single flexible FEA tire subsystem.

Future work. Chrono::Distributed was designedand architected as a stand-alone Chrono modulededicated to providing an MPI-based distributedparallel simulation capability for generic granularproblems using domain-decomposition. However,in its first incarnation it is tailored to problemsthat are typical of simulating deformable terrain.Most notably, it is currently assumed that thereis little particle migration between subdomains.For the scaling characteristics demonstrated herefor settling simulations to carry over to problemssuch as granular flow, additional work is requiredto implement dynamic load-balancing. Along thesame lines, future development plans include theability to perform domain decomposition alongone, two, or three axes, not necessarily alignedwith the global coordinate frame. Furthermore,Chrono::Distributed will be extended to optionallyuse the DEM-C approach (complementarity-basedfrictional contact formulation) by providing sup-port for the distributed solution of the resultingdifferential-algebraic-variational problem and as-sociated conic-constrained quadratic optimizationproblem.

As demonstrated by the numerical experimentsprovided in this paper, the adoption of a dis-tributed parallel solution for the granular terrainsimulation achieves the stated goal of ensuring thatthe computational bottleneck is transferred to theFEA-based tire simulation. Ongoing work is ded-icated to accelerating and increasing performanceof the Chrono::FEA module by improving the el-ement formulation and implementation, enhanc-ing the numerical integration and implementinga time-adaptive scheme, and exploring alternativeparallelization techniques (multi-core or GPU).

Finally, we plan on enhancing the co-simulationframework itself by formalizing it into a dedicatedgeneral-purpose Chrono module, suitable to ad-dress complex multi-physics problems that incor-

porate multibody systems, rigid and flexible bod-ies, granular material, and fluid phase.

Acknowledgments

Chrono development was supported in part by U.S.Army TARDEC Rapid Innovation Fund grant No.W911NF-13-R-0011, Topic No. 6a, “Maneuver-ability Prediction”. Support for the developmentof Chrono::Vehicle was provided by U.S. ArmyTARDEC grant W56HZV-08-C-0236.

REFERENCES

[1] R. B. Ahlvin and P. W. Haley. NATO Ref-erence Mobility Model: Edition II. NRMMUser’s Guide. US Army Engineer WaterwaysExperiment Station, 1992.

[2] M. McCullough, P. Jayakumar, J. Dasch, ,and D. Gorsich. The next generation nato ref-erence mobility model development. In Proc.8th Americas Regional Conference of the In-ternational Society for Terrain-Vehicle Sys-tems, 2016.

[3] A. M. Recuero, R. Serban, B. Peterson,H. Sugiyama, P. Jayakumar, and D. Negrut.A high-fidelity approach for vehicle mobilitysimulation: Nonlinear finite element tires op-erating on granular material. Journal of Ter-ramechanics, 72:39 – 54, 2017.

[4] R. Serban, D. Negrut, A. M. Recuero, andP. Jayakumar. A co-simulation framework forhigh-performance, high-fidelity simulation ofground vehicle-terrain interaction. Intl. J. Ve-hicle Performance, in press, 2018.

[5] Tamer M Wasfy, Hatem M Wasfy, andJeanne M Peters. Coupled multibody dy-namics and discrete element modeling of vehi-cle mobility on cohesive granular terrains. In

HPC framework for co-simulation of vehicle–terrain interactionPage 13 of 15

Page 14: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

ASME 2014 International Design Engineer-ing Technical Conferences and Computers andInformation in Engineering Conference, pagesV006T10A050–V006T10A050. American So-ciety of Mechanical Engineers, 2014.

[6] OpenMP. Specification Standard 4.5.Available online at http://openmp.org/wp/,2017.

[7] David A Patterson and John L Hennessy.Computer organization and design: the hard-ware/software interface. Morgan Kaufmann,Burlington, Massachusets., 5 edition, 2013.

[8] Message P Forum. MPI: A message-passing interface standard. Technical report,Knoxville, TN, USA, 1994.

[9] A. Tasora, R. Serban, H. Mazhar, A. Pa-zouki, D. Melanz, J. Fleischmann, M. Tay-lor, H. Sugiyama, and D. Negrut. Chrono:An open source multi-physics dynamics en-gine. In T. Kozubek, editor, High Perfor-mance Computing in Science and Engineering– Lecture Notes in Computer Science, pages19–49. Springer, 2016.

[10] Project Chrono. Chrono: An OpenSource Framework for the Physics-BasedSimulation of Dynamic Systems. http://

projectchrono.org. Accessed: 2016-03-07.

[11] Project Chrono Development Team.Chrono: An Open Source Frame-work for the Physics-Based Simula-tion of Dynamic Systems. https:

//github.com/projectchrono/chrono.Accessed: 2017-05-07.

[12] R. Serban, M. Taylor, D. Negrut, andA. Tasora. Chrono::Vehicle Template-BasedGround Vehicle Modeling and Simulation.Intl. J. Veh. Performance, accepted, in press,2018.

[13] H. Yamashita, A. Valkeapaa, P. Jayakumar,and H. Sugiyama. Bi-linear shear deformableANCF shell element using continuum me-chanics approach. In Proceedings of ASMEInternational Conference on Multibody Sys-tems, Nonlinear Dynamics, and Control, Buf-falo, NY. ASME, 2014.

[14] Hiroki Yamashita, Antti I Valkeapaa, Param-sothy Jayakumar, and Hiroyuki Sugiyama.Continuum mechanics based bilinear shear de-formable shell element using absolute nodalcoordinate formulation. Journal of Computa-tional and Nonlinear Dynamics, 10(5):051012,2015.

[15] B. Peterson and A. M. Recuero. Validationof a bilinear, gradient-deficient ancf shellelement in Chrono::FEA – http://www.

projectchrono.org/assets/validations/

fea/ancfshellvalidation.pdf. Technicalreport, University of Wisconsin–Madison,2016.

[16] A. Pazouki, M. Kwarta, K. Williams,W. Likos, R. Serban, P. Jayakumar, andD. Negrut. Compliant versus rigid contact:A comparison in the context of granular dy-namics. Phys. Rev. E, 96, 2017.

[17] M. Anitescu and A. Tasora. An iterativeapproach for cone complementarity problemsfor nonsmooth dynamics. Computational Op-timization and Applications, 47(2):207–235,2010.

[18] D. Negrut, R. Serban, and A. Tasora. Posingmultibody dynamics with friction and con-tact as a differential complementarity prob-lem. ASME Journal of Computational andNonlinear Dynamics, 13(1):014503, 2017.

[19] P. Cundall and O. Strack. A discrete elementmodel for granular assemblies. Geotechnique,29:47–65, 1979.

HPC framework for co-simulation of vehicle–terrain interactionPage 14 of 15

Page 15: HIGH PERFORMANCE COMPUTING FRAMEWORK FOR CO …...Radu Serban Dept. of Mechanical Engineering University of Wisconsin - Madison Madison, WI Nicholas Olsen Dept. of Mechanical Engineering

Proceedings of the 2018 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS)

[20] C. O’Sullivan. “Particle-Based Discrete Ele-ment Modeling: Geomechanics Perspective”.Int. J. Geomech., 11(6):449–464, 2011.

[21] Nathan Bell and Jared Hoberock. Thrust: Aproductivity-oriented library for cuda. GPUcomputing gems – Jade edition, 2:359–371,2011.

[22] K. Iglberger, G. Hager, J. Treibig, andU. Rude. High performance smart expres-sion template math libraries. In High Perfor-mance Computing and Simulation (HPCS),2012 International Conference on, pages 367–373, July 2012.

[23] H. Mazhar. Physics Based Simulation Us-ing Complementarity and Hybrid Lagrangian-Eulerian Methods. PhD thesis, University ofWisconsin–Madison, 2016.

[24] Hiroki Yamashita, Paramsothy Jayakumar,and Hiroyuki Sugiyama. Physics-based flex-ible tire model integrated with lugre tire fric-tion for transient braking and cornering anal-ysis. Journal of Computational and NonlinearDynamics, 11(3):031017, 2016.

HPC framework for co-simulation of vehicle–terrain interactionPage 15 of 15