HPC and the EM Products -...

27
© 2011 ANSYS, Inc. November 5, 2014 1 HPC and the EM Products Vincent Delafosse

Transcript of HPC and the EM Products -...

Page 1: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 1

HPC and the EM Products

Vincent Delafosse

Page 2: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 4

EBU HPC is a product that enables acceleration of the solution for a single design point (1 project – 1 solve).

This product can be used by any EBU product (Maxwell, Q3D, HFSS)

The EBU HPC product and the MBU/FBU HPC are not the same; they are not compatible. The main reason is because of the different licenses servers.

EBU HPC

Page 3: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 5

When HPC? When DSO? • HPC enables:

– Multiprocessing in our static solvers (MS, Eddy, ES) – SDM (Spectral Decomposition Method or Frequency sweeps) in

eddy current solver. – Full parallelization in Transient solver.

• DSO enables distribute parametric analysis

– Highest level of parallel analysis providing best linearity and scaling

– Optimetrics product is necessary

•“HPC accelerates model extraction” •“DSO accelerates design extraction”

Page 4: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 6

We can solve 4 frequencies at the time in an Eddy Current problem.

The frequency sweep is defined in the Solve Setup; Optimetrics does not need to be involved.

SDM example

30:20 vs. 14:16 with 8 cores, 2.13 X Speed up with 8 cores

Page 5: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 7

The Multi-Threading includes:

- Initial Tau Mesh

- Non Linear Newton-Raphson Loop

- Matrix Assembly

- Matrix Solving

- Matrix Postprocessing

Use OpenMP with shared memory

HPC Solution for 3D transient

Terminology:

A desktop possesses one or several Processors.

Each Processor can have multiple cores

In Maxwell, you specify the number of cores you want to use; these cores can be located over several processors but they have to share the same memory.

Note:

Maxwell cannot run a single design simulation over a cluster

Full Parallelization of 3D Transient

Enabled through EBU HPC license

Page 6: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 8

• “MP” product becomes legacy

• Existing MP customers can still use MP

• New 3D transient parallel solver is enabled only by HPC license; existing MP license cannot activate the new 3D transient parallel solver

• HPC can be consumed by individual Tasks or by Pack – n HPC tasks can activate n cores – One HPC pack can activate up to 8 cores – Two HPC packs can activate up to 32 cores

• It is not possible to run a simulation using tasks and a pack at the same time

HPC Solution - Changes for R14.5

Page 7: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 9

3D IPM synchronous machine with motion

Mesh size: 120,000 tets – around 2 GB of RAM

Machine: 4 x Xeon CPU [email protected] processors

(32 cores total – 512 GB of RAM)

HPC Solution – Small Designs

Number of Cores Average Time per Non linear Iteration

Average Speed-up Compared w/1 core

1 70s -

2 41 1.7

4 21s 3.33

8 18s 3.88

12 18s 3.88

Page 8: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 10

00.5

11.5

22.5

33.5

44.5

1 Core 2 Cores 4 Cores 8 Cores 12 Cores

Speed up

HPC Solution – Small Designs

MP

New HPC

Page 9: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 11

3D IPM synchronous machine with motion, Eddy current in Magnets

Mesh size: 515,000 tets – around 10GB of RAM

Machine: 8 x Xeon CPU [email protected] processors -32 cores total

HPC Solution – Medium Designs

Number of cores Average Time per Non linear Iteration

Average Speed-up

Compared w/1 core

Average Speed-up per core

1 15 min -

4 5 min36 2.67 0.67

8 3m55 3.82 0.48

12 3min 25 4.40 0.37

16 2min 52 5.24 0.32

24 2m37 5.73 0.24

Page 10: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 12

HPC Solution – Medium Designs

Note: each processor has 8 cores, hence using 12 cores is the least favorable scenario

0

1

2

3

4

5

6

7

1 Core 4 Cores 8 Cores 12 Cores 16 Cores 24 Cores

Speed up

MP

New HPC

Page 11: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 13

3D IPM synchronous machine with motion

Mesh size: 1,350,000 tets – around 35GB of RAM

Machine: 8 x Xeon CPU [email protected] processors – 32 cores total

HPC Solution – Large Designs

Number of cores Average Time per Non linear Iteration

Average Speed-up

Compared w/1 core

Average Speed-up per core

1 8h43 -

8 1h19 6.62 0.82

16 1h04 8.17 0.5

24 1h03 8.30 0.34

32 58m 9 0.28

Page 12: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 14

HPC Solution – Large Designs

MP

New HPC

0123456789

10

1 Core 8 Cores 16 Cores 24 Cores 32 Cores

Speed up

Page 13: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 16

How to read the Profile

Matrix Solving Time (for all 8 iterations)

Iterations needed to achieve convergence

Number of cores used

Within a time step, several non linear iterative steps (Newton-Raphson method) are run to achieve proper convergence (error has to be lower than the non linear residual)

Each Non linear iteration consists in different steps, but the most time consuming part is the matrix solving

Solver time -excluding Matrix solving- (for all 8 iterations) The total solve time for this time step is 1:04:02+10:58= 1:15:00

Mesh Size

Page 14: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 17

Technology update

Page 15: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 18

(Wikipedia) A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system.

The components of a cluster are usually connected to each other through fast local area networks, each node (computer used as a server) running its own instance of an operating system.

Each node is a compute by itself, and the CPU of each node has only access to the node’s memory

A Definition

Page 16: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 19

A Maxwell ‘solve’ can only benefit from shared memory hardware:

- Desktop

- Laptop

- A node of a cluster

All the cores involved in the computation of a single design point must be located on the same board and access the same memory.

It is NOT possible to use 2 desktops or 2 nodes of a cluster with Maxwell for a single project.

Note: HFSS can use DDM technology to do it. It is still a long term research topic for Maxwell, due to the complexity of having non linear materials.

EM technology – full parallelization

Page 17: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 20

A Parametric sweep can be accelerated with DSO. Several design points can be solved at the same time. Each design point can use one or several cores (with proper HPC license) of a

- Desktop

- Laptop

- A node of a cluster

All the cores involved in the computation of a single design point must be located on the same board and access the same memory.

It is possible to have several pieces of hardware to run DSO

EM technology – DSO

Page 18: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 21

Case 1: A single 8-core desktop can solve:

- 8 design points of a sweep at the same time, each design point using 1 core. The desktop memory has to be large enough to support the 8 projects.

- 4 design points of a sweep at the same time, each design point using 2 cores. That requires HPC and DSP licenses

- …

Case 2: 3 single 8-core desktop can solve:

- 24 design points at the same time.

- 3 design points using 8 cores.

It cannot solve 2 design points, each design point using 12 cores …

DSO/HPC configuration examples

Page 19: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 22

Many large companies have clusters with other simulation tools running. Scheduling tools are used to balance the load between different tasks/ different software. The adoption of our proprietary DSO technology should not be an issue, as Maxwell is compatible with the major scheduling programs: LSF, …

It means that you can start Maxwell/Optimetrics in this environment, and the jobs will be passed to the nodes of the cluster by the scheduler.

Using our own DSO is the only viable solution as the nodes of these clusters usually don’t have a Graphical card. DSO only uses the solver process on the nodes. It is not possible to run a Maxwell solver without the UI involved.

Clusters

Page 20: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 23

HPC/ DSO offering

Page 21: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 24

• HPC can be consumed by individual Tasks or by Pack

– n HPC tasks can activate n cores – One HPC pack can activate up to 8 cores – Two HPC packs can activate up to 32 cores

• DSO can be consumed by individual Tasks or by Pack

– p DSO tasks can activate p Maxwell solves – One DSO pack can activate up to 10 runs

Product bundles

Page 22: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 25

If you have 3 DSO tasks and 4 HPC tasks, each design variation can use the 4 HPC tasks.

When solving the parametric sweep:

- DSO task 1 can use 4 cores

- DSO task 2 can use 4 cores

- ….

At the end, 3 * 4 cores are used, for a total of 12 cores

DSO also distributes HPC

Page 23: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 26

There is room for a lot of up sales opportunities with HPC and DSO. The value that the customer gets is great. Most importantly, HPC can benefit large customers and small customers.

It is important to understand how our customers want to use Maxwell and what Hardware they have. You should ask the following questions:

1) How many Maxwell 3D licenses does the customer have?

2) How many physically separated machines will used?

3) How many cores are on each machine?

4) How much RAM on each machine? How large are their problems?

5) Do they own Optimetrics?

6) Do they want to solve parametric simulations with many rows?

Customer configurations

Page 24: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 27

1) How many Maxwell 3D licenses does the customer have?

→ If it is more than one, maybe one HPC pack is not ideal as it cannot be split between seats. HPC tasks might provide better return

2) How many physically separated machines will used?

→ If it is more than one, it limits the number of cores that can used by one Maxwell run.

3) How many cores are on each machine? → We have to provide the best combination of

HPC/DSO tasks/packs to maximize hardware availability

Customer configurations

Page 25: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 28

4) How much RAM on each machine? → Not enough RAM will limit the possibility of running

several DSO tasks on the same machine

5) Do they own Optimetrics? → No Optimetrics = No DSO

6) How much parametric analysis do they plan to do → the goal is to have a feeling on how much benefit

they would get having more DSO tasks than HPC tasks

Customer configurations

Page 26: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 29

Doing any parametric analysis?

- No: Offer 2/4 tasks or an HPC pack depending on the budget to maximize usage or cores.

- Yes: Doing a lot of parametric analysis? Are the 3D projects require a lot of RAM? - If non RAM intensive, with a lot of parametric: 2 HPC

tasks and 4 DSO tasks - If non RAM intensive, with occasional parametric: 4 HPC

tasks or pack and 2DSO tasks - If RAM intensive, make sure total number of DSO tasks

does not exceed RAM available

Case 1: 1 seat of Maxwell, 8 core Machine

Page 27: HPC and the EM Products - register.ansys.com.cnregister.ansys.com.cn/ansyschina/minisite/201411_em/motordesign... · HPC and the EM Products Vincent Delafosse . ... Maxwell for a

© 2011 ANSYS, Inc. November 5, 2014 30

As HPC packs cannot be split between seats, maybe the best value for the customer is to have 4 or 6 HPC tasks so that they can assign the tasks to each seat on a case by case basis.

If they have budget, 2 times a 8 Pack is the solution to offer

Case 2: 2 seats of Maxwell