03_HFSS_HPC_and_DSO Recent Advances and Case Studies.pdf
Transcript of 03_HFSS_HPC_and_DSO Recent Advances and Case Studies.pdf
Productivity Tools Productivity Tools for HFSS: for HFSS: Increasing Speed Increasing Speed and Size with DSO and Size with DSO and HPCand HPC
Productivity Tools Productivity Tools for HFSS: for HFSS: Increasing Speed Increasing Speed and Size with DSO and Size with DSO and HPCand HPC
© 2010 ANSYS, Inc. All rights reserved. 1 2010 ANSYS Regional Conferences
Objectives
• Introduction
• Brief overview of technology and acronyms
• Licensing options & Enabling technologies
– DSO– DSO
– HPC
• Case studies
• Conclusions
Introduction
• This talk will focus on additions to a standard HFSS solver to:
– Increase efficiency
• Solve more models in less time
• Solve a single model in less time
– Increase capacity
• Solve larger models
Technology Technology OverviewOverviewTechnology Technology OverviewOverview
© 2010 ANSYS, Inc. All rights reserved. 4 2010 ANSYS Regional Conferences
Computing Terminology
• HPC – High Performance Computing
– Uses supercomputers and computer clusters (connected
groups of computers) to solve advanced computation
problems [From Wikipedia]
• Socket – Part on computer motherboard to place Processor packaged devicepackaged device
– Modern motherboard contain as few as 1 socket, and as
many as 4 sockets
• Core – Single computing unit
– Modern processors packages contain as few as 2 cores,
and as many as 12
• Common configuration
• Dual-socket, Quad-core package � 8 cores in a box
ANSYS Terminology
• On top of all these hardware choices, we’ve added license choices to improve productivity for different classes of problems
– DSO – Distributed Solve Option
– DDM – Domain Decomposition Method
• Feature of HPC licensing
– MPO – Multi-processing option
• Feature of HPC licensing
HFSS Adaptive Mesh
HFSS Solver Terminology
• Each time HFSS solves the Volumetric Field Solution it must solve a matrix of unknowns.
– The solution describes the field behavior for that particular mesh
– This is done for each adaptive pass and directly solved frequency point.
• HFSS offers 3 Solvers Options to apply to this matrix equation:
bAx =• HFSS offers 3 Solvers Options to apply to this matrix equation:
1. Direct Solver (Default)
• Traditional solver used in HFSS
• Very stable
• Can be memory and time intensive for large matrices
2. Iterative Solver
• Added in HFSSv11
• More memory efficient than the Direct Solver
• Can be more time efficient than the Direct Solver
3. Domain Decomposition (more on this later…)
• Added in HFSSv12
HFSS Solver Terminology
Direct Solver
• The Direct Solver obtains an exact solution to the matrix equation
• Common Direct Matrix Solver Methods:
– Gaussian Elimination
– LU Decomposition
bAx =
– LU Decomposition
• Best uses for the Direct Solver
– Moderately sized matrices
– Large number of excitations
=
4
3
2
1
4
3
2
1
44
3433
242322
14131211
000
00
0
b
b
b
b
x
x
x
x
a
aa
aaa
aaaa
HFSS Solver Terminology
Iterative Solver
• How does it work?– The Iterative Matrix Solver works by “guessing” a
solution to the matrix of unknowns, and then recursively updating the “guess” until an error tolerance has been reached
• What is the advantage?– Reduced RAM and Simulation Time
Best uses for the Iterative Solver
Initial guess
Preconditioner
Update solution and search direction
• Best uses for the Iterative Solver– Large Matrices (>30,000 Tets)
– Moderate Port Count (2 Ports Per Processor)
– For 1st, 2nd and Mixed Order Basis Functions only
Converges ?
yes
no
MPCG
Iterative Matrix Solver
Increases Simulation Capacity
• Compared direct and iterative matrix solvers for JSF example
• Iterative solver requires 3-4x less RAM than direct solver
Productivity Option #1Productivity Option #1Distributed Solve Distributed Solve (DSO)(DSO)
Productivity Option #1Productivity Option #1Distributed Solve Distributed Solve (DSO)(DSO)
© 2010 ANSYS, Inc. All rights reserved. 12 2010 ANSYS Regional Conferences
• Distributed Solve Option (DSO)
• Distributed Solve is a productivity
enhancement option that accelerates solution
times for frequency sweeps and model
variations by leveraging a network of
processors.
• Offers a near-linear speed-up over
conventional single license simulation
sweeps by distributing and
Distributed Solve Option (DSO)
Design VariationsOptimetrics / ANSYS DesignXplorer sweeps by distributing and
simultaneously solving across a network
of computers
• Increases throughput by speeding up turn-
around time for individual simulations
Optimetrics / ANSYS DesignXplorer
Frequency SweepsHFSS
Distributed Solve - Applications
• Applications
• What-if studies
• Design of experiments (DOE)
• Dynamic circuit model generation
• Design for Six Sigma (DFSS)
• Broad-band frequency sweeps
• Licensing•Hardware Independent
•Mix different CPU/Cores and RAM
•User defined machine selection
•Group setting for solver MP
•OS Independent
•Supports Windows and/or Linux
•LSF/PBS/SunGrid/HPC EnabledBroad-band frequency sweeps •LSF/PBS/SunGrid/HPC Enabled
•Solver independent•Common license for supported solvers
•MP ready
•Flexible•Share the licenses in the pool between
multiple users or simulations
•License Options:•Singles - 1 Design Point
•Saver Pack – 10 Design Points
DSO Examples: Parametric
• Optimetrics analysis of circularwaveguide phased array
• Parametric sweep over 45 scan
angles
• 5X faster when distributed to 6 CPUs
• Optimetrics analysis of PIFA
radiating element
• Parametric sweep of antenna geometry
• 7.5X faster when distributed to 8 CPUs
2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0Freq [GHz]
-35
-30
-25
-20
-15
-10
-5
0
dB
(S(P
1,P
1))
Ansoft Corporation isolationS11 for Element 1 Parametric Sweep
Curve Info
dB(S(P1,P1))Setup1 : Sweep1extra_element_lengt
dB(S(P1,P1))Setup1 : Sweep1extra_element_lengt
dB(S(P1,P1))Setup1 : Sweep1extra_element_lengt
dB(S(P1,P1))Setup1 : Sweep1extra_element_lengt
dB(S(P1,P1))Scan Impedance
DSO Example
Investigating Solution Space
• Distributed analysis used to quickly explore multi-dimensional design space
– Wire radius
– Pitch spacing
– Helix radius
• DSO distributes frequency and parametric sweeps to network of
DSO distributes frequency and parametricsweeps to networked processors
parametric sweeps to network of processors
• Approximately linear increase in simulation throughput
• Highly scalable to large numbers of processors
• Multi-processor nodes can be utilized
Solution Space Exploration
Helical Wire Antenna
• Wire radius varied to determine impact on input impedance
– Used DSO to solve 27X faster
• 3D plots created in HFSS to easily visualize solution space
– Return loss as function of frequency and wire radius
Wire Radius
Fre
qu
en
cy
0.15 in0.06 in3GHz
Return Loss
Acceptable wire radius
4GHz
Wire Radius
DSO Example: Molex Connector
Frequency Sweep Distribution
• Adaptive process completed on one machine with frequency sweep sub-bands sent to multiple machines
Mesh Adaptive Frequency
SolutionSolutionSolutionSolutionSetupSetupSetupSetup
Parametric yes
Sweep #1Sweep #1Sweep #1Sweep #1
Sweep #2Sweep #2Sweep #2Sweep #2
Sweep #3Sweep #3Sweep #3Sweep #3
Sweep #NSweep #NSweep #NSweep #N
#pts to Converge
Clock Time Delta to Reference
Reference 76 22h26m 1x
DSO Interpolating 78 3h52m 5.8x
DSO Discrete NA 2h41m 8.4x
Mesh
Generation
Adaptive Mesh
Refinement
Frequency
SweepConvergence
no
Parametric Model
Generation
yes
Productivity Option #2Productivity Option #2High Performance High Performance Computing OptionComputing Option(HPC)(HPC)
Productivity Option #2Productivity Option #2High Performance High Performance Computing OptionComputing Option(HPC)(HPC)
© 2010 ANSYS, Inc. All rights reserved. 19 2010 ANSYS Regional Conferences
Ansoft HPC Overview
Bigger
• Domain Decomposition (DDM)
• A distributed memory parallel solver
technique that distributes mesh sub-
domains to a network of processors.
• This method is a hybrid iterative and
direct solver technique that
significantly increases the simulation
capacity by distributing the RAM Domain Decomposition
Multi-Processing
Faster• Multi-Processing (MP)
• The MP option is used for solving models
on a single machine with multiple
processors/cores which share RAM.
• Increases throughput by speeding up
turn-around time for individual
simulations
capacity by distributing the RAM
usage across multiple computers.
• Enables the solution of higher fidelity
and larger models
Ansoft HPC - Applications
• Applications
• Electrically Large RF/Antenna Designs
• Antenna Placement
• Radome Design
• Radar Cross-Section (RCS)
• EMC Analysis
Cell Tower
EMC Analysis
• Industries
• Aerospace and Defense
• Wireless/Mobile Platforms
• Communications
• Healthcare
RCS – 24GHZ
Friend or Foe Antenna
Satellite
Medical
Domain Decomposition for HFSS
• New feature in HFSS v12
• Distributes mesh sub-domains to network of processors
• Distributed memory parallel technique
• Significantly increases
HPC distributes mesh subdomainsto networked processors and memory
• Significantly increases simulation capacity
– 64-bit meshing
• Highly scalable to large numbers of processors
• Multi-processor nodes can be utilized
Domain Decomposition Example
Cellular Base Station Array
• GSM base station tower with radome-enclosed antenna arrays
– 950 MHz
– Electronic downtilt
• Domain solver used to predict installed antenna patterns
– 34 domains– 34 domains
– 3.5 GB average RAM per domain
– 16M unknowns
– 119 GB Total Effective RAM used Base Station Printed Dipole Arrays
Technology Comparison
HPC for MP HPC for DDMGeometric Complexity
Excitations/RHS
Solv
er
Fitness
Direct Iterative DDM
Electrical Size/Fidelity
Solv
er
Fitness
Ansoft HPC - Licensing
• Allocation
– Each Simulation consumes one or more HPC packs
• Each individual pack enables 8 Parallel
• Parallel count increases quickly with multiple packs
• Flexible Technology Access
– Enable MP or DDM or DDM with MP
2048
32
8
128
512
Parallel
Enabled
Packs per Simulation
1 2 3 4 5
– Enable MP or DDM or DDM with MP
• Scalable Licensing
– HPC Packs
– HPC Workgroup (Volume access to parallel)
• 128 to 2048 Parallel shared across any number of Simulations
– EnterpriseHPC License count is determined by the larger of the two:
1. # of Simulations (Solvers)2. # of Cores
Ansoft HPC
Multi-Processor Option
• Single workstation solution to increase simulation throughput
– Takes advantage of multi-core and/or multi-processor computing resources
• Capability introduced in HFSS v8 for direct matrix solver
– Parallelized matrix solver for multiple – Parallelized matrix solver for multiple
processors with shared memory
• Enhanced by addition of iterative matrix solver in HFSS
– Parallelized matrix pre-conditioner
– Parallelized excitations
MP Option for Helix Design
• Element model converges with 30k mesh elements
• Multi-processor option reduces direct solver time by factor of 2x
– 20 seconds vs 40 seconds
• Array model converges with 330k mesh elements
• Multi-processor option reduces iterative solver time by factor of 2.5x
– 8 minutes vs 20 minutes
Multi-processor option significantly decreases design iteration time at element and array levels
Case StudiesCase StudiesCase StudiesCase Studies
© 2010 ANSYS, Inc. All rights reserved. 28 2010 ANSYS Regional Conferences
Finite Array on Spacecraft
• Electrically very large model with high level of geometrical detail
• Historically beyond the realm of full-wave EM solvers
– Typically analyzed using – Typically analyzed using asymptotic approximations which may sacrifice accuracy
– Challenging but important design problem
• Full-wave analysis now possible using HFSS v12
Array on Spacecraft
• Efficiently solved using HPC Option
– Domain solver
• Solution profile
– 25M unknowns
– 34 domains– 34 domains
– 6 GB average RAM per domain
– 204 GB Total Effective RAM used
• Domain solver used to
predict on-vehicle patterns
– 11 domains
– 1.7 GB average RAM per
domain
– 19 GB Total Effective
RAM used
Automobile with GSM Antenna
Surface Currents
Electric Fields Around Vehicle
Far-field Radiation Pattern
Radiation Pattern
Apache Helicopter RCS
• Military rotary-wing aircraft
– 1 GHz
– Monostatic RCS
• Domain solver used to predict scattering signature
– 12 domains
– 4.5 GB average RAM per domain– 4.5 GB average RAM per domain
– 6M unknowns
– 54 GB Total Effective RAM used
Surface Currents
Monostatic RCS
Ground Transport Vehicle with
Covert Patch Antennas
• Domain solver used to predict installed antenna patterns
– Two L-band patch elements
mounted on Humvee roof
• Solution profile
– 6 domains
– 0.75 GB average RAM per domain– 0.75 GB average RAM per domain
– 4.5 GB Total Effective RAM used
RFID System in Loading Dock
• Domain solver used to solve RFID system in industrial dock door environment
– 900 MHz system
– 2 patch antenna readers
on pedestalson pedestals
– 12 tags distributed throughout pallet of
packaged items
• Solution profile
– 7 domains
– 2 GB average RAM per
domain
– 14 GB Total Effective RAM used
Field due to reader
Field due to tag
How big can you go?
• HumVee with 1 L-band patch antenna
• In proximity to cement wall with rebar
• Freq = 1.8 GHz
• 14,424 λλλλ3• 14,424 λλλλ
• 64 Domains
• 2 procs / domain
• 128 cores
• 50M Unknowns
• 409 GB Total RAM
SummarySummarySummarySummary
© 2010 ANSYS, Inc. All rights reserved. 36 2010 ANSYS Regional Conferences
Summary: DSO
• Many parametric variations / design space exploration
– DoE & ANSYS DesignXplorer
• Broadband Frequency Sweeps
– Signal Integrity / EMI problems
• In both cases, each parametric variation or frequency point will be limited by the available RAM per core in the Distributed machine list
Licensing: DSO
• DSO licensed in either single-task or multi-task bundles
– License enables MP functionality for each task
– Typical bundle is 10 tasks
Summary: HPC
• Solving the previously unsolvable models
– electrically HUGE
• Solving the existing problems on one machine, faster than before
Licensing: HPC
• HPC is licensed in either Packs or Workgroup/Enterprise
– 1 Pack enables 8 cores for 1 problem
• Up to 8 cores for domains OR MP
– 2 Packs enables– 2 Packs enables
• 8 cores each for 2 problems OR
• 32 cores for 1 problem
– 3 Packs enables
• 8 cores each for 3 problems OR
• 32 cores for 2 problems AND 8 cores for 1 problem OR
• 128 cores for 1 problem
• Count the cores!