(949) 481-3267 | www.saratechinc.com
FEMAP/NX NASTRAN PERFORMANCE TUNINGChris Teague - Saratech
Saratech proprietary and confidential Slide Number: 2
NX Nastran Hardware Performance History
§Running Nastran in 1984:♦ Cray Y-MP, 32 Bits! (X-MP was only 24 Bits)♦ Four Vector Processors (167 Mhz)♦ 256 MB of RAM (Note the MB, PC was 256K)♦ 333 Mflops per processor♦ $3-$4 Million, plus special room
§Comparison in 2016:♦ 64 Bits♦ Dual Core (1.85 Ghz)♦ 2GB (2048 MB) of RAM♦ 340 Mflops (Single Thread)/
613 Mflops (Multi Thread)♦ iPhone 6s, $689♦ NX Nastran currently not ported to iOS
Saratech proprietary and confidential Slide Number: 3
NX Nastran Hardware Performance History
§Rack Server in 2016:♦ Dell R930, 64 Bits♦ Max 4 Processors at a Max 18 Cores Each
(72 Cores Total), or up to 3.2 Ghz♦ 1.5 TB of RAM♦ 1.8 Gflops per processor♦ High Speed PCIe based SSD disk drive
(2.8 GBs Read/2.2 GBs Write speed)♦ $85K with 6.5TB PCIe SSD, 1.5TB RAM,
4x3.2Ghz 4C Xeon processors§Blade Array System
♦ Up to 30 Blades, each configured like a single server§So how much faster do our Nastran jobs run with this
huge increase in computing Power?
Saratech proprietary and confidential Slide Number: 4
NX Nastran Performance Tuning Tips
§What is LP-64 vs ILP-64?§Hardware and OS Selection§NX Nastran Scratch Drive§I/O Performance, OS Settings§Buffer Size§Hyperthreading§Element Iterative Solver§SMP vs DMP§GPGPU
Saratech proprietary and confidential Slide Number: 5
NX NASTRAN LP-64 vs ILP-64§ There are two 64 bit versions of NX Nastran:
♦ LP-64• Standard version when running through FEMAP• 4-Byte Words• 8 GB RAM limit
♦ ILP-64• Optional version when running through FEMAP• 8-Byte Words• 20 TB RAM limit, which is really the hardware RAM limit
of the machine you are running on§When running NX Nastran on the
command line, the “L” executablesare ILP. “w” executables will bringup a file browser.§ In some cases, ILP-64 may offer
improved accuracy
Saratech proprietary and confidential Slide Number: 6
NX NASTRAN LP-64 vs ILP-64
§ In general, the standard LP-64 version of NX Nastran is faster for modelsthat do not need more than 8GB of RAM allocated to the Solver§ For larger models that need more than 8GB of RAM for the Solver, you
will need to use the ILP-64 version and have available RAM.§ For performance reasons, you don’t want to allocate more than about
50% of RAM to NX Nastran. The other RAM is needed for the OS and I/OCaching, which is a huge help to NX Nastran performance.§ This means that if you need to use the ILP-64
version of NX Nastran, you will want at least16 GB of RAM. Larger models may requiremore. Sometimes LOTS more!
Saratech proprietary and confidential Slide Number: 7
Hardware and OS Selection
§Processors♦ Faster processers are good (Faster I/O Speed is
just as important, if not more though)♦ Large L2 or L3 processer cache can improve
performance (Xeon can help here)♦ Multi-Core is good, but don’t get more cores over
less cores with faster clock speed (Usually)• Intel Xeon E7-8880 v3, 2.3 Ghz, 45M, 18 Core• Intel Xeon E7-8893 v3, 3.2 Ghz, 45M, 4 Core
§Memory♦ As much as budget allows, and the fastest available♦ Saratech ran a large job with mem=24 GB on a
system with 64 GB of RAM. Nastran used up allavailable RAM for 2-3 hours, the extra being usedfor I/O Caching. See Task Manager graph:
Saratech proprietary and confidential Slide Number: 8
Hardware and OS Selection§Disk
♦ SATA based SSD are significantly faster thanmechanical drives
♦ PCIe based SSD devices are even faster still,and are available and laptop, desktop, andserver models.
♦ Example: SanDisk SX350-3200, 3.2 TB, 2.8 GB/sRead, 2.2 GB/s write speed (Servers)Intel 750 Series, 1.2 TB, 2.5 GB/s Read,1.2 GB/S write speed (Workstations)
§Operating System♦ Generally Linux is faster that Windows on the
same hardware due the superior I/O on Linux♦ Because of this, most HPC cluster systems run Linux♦ Windows is more popular on the desktop due to the wide variety of applications that
run on Windows.
Intel 750 Series PCIe SSD
Saratech proprietary and confidential Slide Number: 9
Hardware and OS Selection
§Priorities for getting the most performance for the least money:♦ Maximum number of *fast* cores with large cache♦ Add as much RAM as possible, and go for the fastest RAM allowed♦ Maximize I/O bandwidth and disk speed♦ Add GPU processing for some large dynamics problems (More on this later)
§ I always recommend at least two disks, and 3 if possible:♦ Disk 1: Fast drive for OS & Applications♦ Disk 2: Very fast drive for NASTRAN & FEMAP scratch space (Keep empty when not
running NX Nastran & FEMAP)♦ Disk 3: Large drive for data storage♦ NASTRAN does so much disk I/O, it is better to have it’s own drive for scratch files,
and make sure it is as fast as possible, SSD PCIe, or even a RAID of SSD. We don’twant to let the OS/Application data needs slow down our NASTRAN job.
Saratech proprietary and confidential Slide Number: 10
NX Nastran Scratch Drive
§Nastran scratch folder should point to afast disk, or a RAID array (RAID0)§ Local disk drives are preferred§Using network mounted NFS or SMB
(Windows Shared Drive) connection isgenerally going to have significantperformance penalties§Even laptops can have two drives, try
mSATA cards, oreven PCIe innewer laptops SanDisk Fusion ioMemory – SX350-3200
Samsung 850 EVO M.2 SSD
Saratech proprietary and confidential Slide Number: 11
NX Nastran Scratch Drive
§You can set the NX Nastran scratch drive in the rc file§ The nastran rc file for FEMAP can be found in FEMAPv113/nastran/conf,
where 113 is the version of FEMAP that you have installed§Sample from my laptop:
♦ Auth=28000@LocalHost♦ Sdir=e:\scratch♦ program=FEMAP♦ scr=yes♦ buffsize=32769♦ memory=.45*physical♦ smem=20.0X
§ The “E” drive is a 512GB SSD mSATA card
Samsung 850 EVO M.2 SSD
Saratech proprietary and confidential Slide Number: 12
NX Nastran & FEMAP Scratch Drive in FEMAP Preferences§ FEMAP scratch drive §NX Nastran scratch drive
Saratech proprietary and confidential Slide Number: 13
OS Settings: I/O Cache
§Reading from and writing to disk drives aremuch slower than RAM, even with SSD§Data that is typically written is probably read
back soon§Keeping information in memory instead of
disk will reduce disk seek times§Make use of unallocated memory for disk
buffer I/O Cache
Saratech proprietary and confidential Slide Number: 14
OS Settings: Enabling Disk I/O Cache
§Read cache is enabled by default onLinux and Windows§Enable write cache on Linux using
“hdparm” command or equivalent§On Windows, use “Device Manager”
property settings to enable write-cacheon the Nastran scratch drive in the“Policies” tab
Saratech proprietary and confidential Slide Number: 15
Buffer Size
§ The NX Nastran buffer size is the size of each I/O unit§ The default size in NX Nastran 9 is 8193 and works well for small models
(<100K DOF)§ For larger models (>400K DOF), increasing the default buffer size to
32769 may help. This is the default in NX Nastran 10§ This can be done by editing the nastran rc file and editing the line to be:Buffsize=32769
§ The nastran rc file for FEMAP can be found in FEMAPv113/nastran/conf,where 113 is the version of FEMAP that you have installed
Saratech proprietary and confidential Slide Number: 16
NX Nastran Settings: Memory
§Starting with NX Nastran 10, the new default memory settings in the rcffile are:♦ Memory=0.45*physical (45% of total RAM installed in the workstation)♦ Smem=20.0X (20% of Memory in line above)♦ Buffpool=20.0X (Same as Smem)
§ These settings are more appropriate for large models and machines withmore RAM§ Inspect the F04 file to see if you have optimum settings for your model§Note: Unless SMEM is large enough to contain all scratch files, it is
better to set it to zero
Saratech proprietary and confidential Slide Number: 17
NX NASTRAN MEMORY LAYOUT
Saratech proprietary and confidential Slide Number: 18
NX NASTRAN MEMORY
§ The f04 file will give a summary of the memory that was allocated. Theallocations will be the areas shown on the previous slide.§Here is an example from TET10 model around 650,000 elements:** PHYSICAL FILES LARGER THAN 2GB ARE SUPPORTED ON THIS PLATFORM
0 ** MASTER DIRECTORIES ARE LOADED IN MEMORY.USER OPENCORE (HICORE) = 433308364 WORDSEXECUTIVE SYSTEM WORK AREA = 316925 WORDSMASTER(RAM) = 70822 WORDSSCRATCH(MEM) AREA = 3276900 WORDS ( 100 BUFFERS)BUFFER POOL AREA (GINO/EXEC) = 1671729 WORDS ( 51 BUFFERS)TOTAL NX NASTRAN MEMORY LIMIT = 438644740 WORDS
§ This model was run with mem=1673Mb§Remember, LP-64 is 4 bytes per word, and ILP-64 is 8 bytes per word
Saratech proprietary and confidential Slide Number: 19
HOW MUCH MEMORY IS ENOUGH?
§ Look in the f04 file for USER OPENCORE:
§Compare to the HIWATER usage toward the end of the f04 file:
§ If HIWATER is getting close to or over HICORE, then likely the job wouldbenefit from more memory (mem=x)
Saratech proprietary and confidential Slide Number: 20
SETTING MEMORY SIZE IN FEMAP§ FEMAP uses Mb units, and memory can
be set in the NASTRAN Executive andSolution Options form. 0 is the defaultwhich will use NASTRAN’s default in thercf file§ For Windows, don’t allocate more than
about 50% of the physical memory ofthe machine to avoid performance issues(swapping). Less may be better since theother memory is used for I/O Caching byWindows§NX Nastran 10 default of 45% is pretty
good for most cases until you get toworkstations/servers with a large amountof RAM
Saratech proprietary and confidential Slide Number: 21
HYPERTHREADING
§Some modern Intel CPUs support Hyperthreading.§Hyperthreading is a like a virtual CPU, where one CPU can run two
threads.§ There can be a small performance advantage on some desktop
applications, but it’s very small.§Nastran, like other Windows programs sees the virtual CPU as a real
CPU, since that is what Intel intended.§Since NX NASTRAN is very CPU intensive, it expects the virtual CPU
to perform like a real CPU, but it won’t.§NX NASTRAN will usually perform better if you turn off
Hyperthreading. This is typically done in the BIOS.§Some Xeon processors do not have Hyperthreading for this reason
Saratech proprietary and confidential Slide Number: 22
Element Iterative Solver
§ For models that aremostly solid elements,the Element IterativeSolver can offersignificant performanceimprovements. (2-3x)§ It does not help shell
or bar elements, andwill be ignored indynamics solutions§Set this in the Solution
form
Saratech proprietary and confidential Slide Number: 23
NX Nastran Linear Contact Solutions§Specify the proper search distance§ Large Search distances typically
involve more active contacts forthe first few iterations
Saratech proprietary and confidential Slide Number: 24
Multiple CPU’s – SMP vs DMP
§Shared Memory Parallel (SMP) is a singlemachine with multiple processors that sharecommon memory and a common I/O system(disks) as shown in the figure to the right.
Distributed Memory Parallel (DMP)is a set of multiple machines orcluster with one or more processorscommunicating over a network.Each machine has it’s own memoryand it’s own I/O system
SMP
DMP
Saratech proprietary and confidential Slide Number: 25
DMP vs. SMP§ SMP – Shared Memory Parallel
♦ Common Memory Pool, Common I/O Pool♦ Desktop/Laptop hardware♦ Tapers off at 8 or so cores♦ No extra license needed
§ DMP – Distributed Memory Parallel♦ Multiple machines with one or more processors communicating
over a network (Desktop/Cluster)♦ Each machine has its own memory and disk I/O♦ Used Message Passing Interface (MPI) which must be installed in
the OS♦ Highly Scalable♦ Extra license needed – Now can be supported with a Femap license
§ DMP Solutions –♦ 101 – Linear Statics♦ 103 – Normal Modes♦ 105/108/111/112 – Buckling, Direct/Model Frequency,
Modal Transient response♦ 200 – Design Optimization
Saratech proprietary and confidential Slide Number: 26
Multiple CPU’s – SMP Setup in FEMAP§ If you would like to use multiple CPU’s to solve a
NASTRAN run, FEMAP can set that right abovethe Solver Memory.§ If you are running NASTRAN on your desktop
machine, it is recommended to leave one CPUavailable for other applications if you want tocontinue to use the machine for other work§ This can also be done in the input file with:
NASTRAN PARALLEL=x§ PARALLEL is a command line option also, and
can be set in the rc file if you would like to have adefault number of processors§ There is no extra license needed for SMP
27 AMD Professional Graphics for NX® | August 201527
AMD PROFESSIONAL GRAPHICS ADVANTAGE
100+ app certificationsRock-solid drivers
Three year warranty
Simultaneous render & computeUp to six 4K displays1
Intelligent power technologies
Application optimizationsLatest API supportPCIe® 3.0 support
INNOVATION PERFORMANCE RELIABILITY
Image courtesy of Siemens PLM Software
28 AMD Professional Graphics for NX® | August 2015
AMD FIREPRO™ W-SERIES GRAPHICS PRODUCT STACK
UHE
HEM
idra
nge
2D/3
DEn
try
16GB GDDR5275WW9100 16GB GDDR5275WW9100
8GB GDDR5220WW8100 8GB GDDR5220WW8100
8GB GDDR5150WW7100 8GB GDDR5150WW7100
4GB GDDR5<75WW5100 4GB GDDR5<75WW5100
2GB DDR3LP, 26WW2100 2GB DDR3LP, 26WW2100
2GB GDDR5LP, <50WW4100 2GB GDDR5LP, <50WW4100
AMD FirePro W-Series
AMD FireProTM W4100
AMD FireProTM W5100
AMD FireProTM W7100
Recommended for NX/FEMAP
29 AMD Professional Graphics for NX® | August 2015
The Right Solution for your PLM Workflow
Drafting andModeling
Design andValidation
Large Assembliesand Rendering
Visualize, Reviewand Mark-up
SimulationNX Nastran
AMD FireProTM W4100
AMD FireProTM W5100
AMD FireProTM W7100
AMD FireProTM W2100
AMD FireProTM W9100
Images courtesy of Siemens PLM Software
AMD FireProTM W8100
30 AMD Professional Graphics for NX® | August 2015
NX NASTRAN
y High performance GPUs and OpenCL™ accelerate modal frequency response calculations in NX Nastran.
y This solution makes it possible to compute a large number of modes over a wide frequency range,economically and efficiently.
y Results of the AMD FirePro OpenCL acceleration for NX Nastran Modal Frequency Response:
OpenCL-accelerated solution
• Up to 25x faster than serial
• Up to 4x faster than the top of the line 24-core CPU run time
System Configuration: Supermicro H8DGi-F Dual Opteron Motherboard 24 core Magny-Cours with AMD FirePro W8000
Ref. : Siemens 2012 NX CAE Symposium Presentation: Accelerating Modal Frequency Response inNX Nastran with AMD GPUs by Hoffnung and Reymond
31 AMD Professional Graphics for NX® | August 2015
SCALABLE PROFESSIONAL GPU SOLUTIONS
Desktop Workstations
Servers
Mobile Workstations &Thin Clients
} AMD provides a wide range ofproducts for a wide range ofsoftware solutions
Saratech proprietary and confidential Slide Number: 32
Using GPGPU with NX Nastran (OpenCL)§ For modal frequency response (SOL 111) with
more than 5000 modes, and if you have a fastGPU card, such as the AMD FirePro W9100, itmay help turning on the GPGPU acceleration inthe NASTRAN Executive and Solution Optionsform§ NVIDIA Tesla K40 and Intel Xeon Phi 7120D are
also supported by NX Nastran
Saratech proprietary and confidential Slide Number: 33
FEMAP Performance Graphics
§Performance graphics vs. regular graphics comparison• Model: 6 million nodes / elements• Action: full model display / group / full model display
Saratech proprietary and confidential Slide Number: 34
Graphics Preferences - Options§ Hardware Acceleration: This will
disable the hardware driver if youare having significant graphicsproblems and want to find out ifthe graphics driver is the cause§ Performance Graphics (11.1 and
Higher): Uses a new graphicsarchitecture to improveperformance of initial draw anddynamic rotation. NeedsOpenGL 4.2 or higher.§ Memory Optimization: Should be
off unless you models are verybig and swapping is occurring. Ifthat is the case, turning this oncan improve drawing speed. Ifnot, it will slow things down.§ Multi-Model Memory: This will
use more memory to help makethe transition time betweenswitching models faster.§ Auto Regenerate: This will force
a redraw after every command.It’s slower, but keeps thegraphics up to date duringmodifications.
Saratech proprietary and confidential Slide Number: 35
Graphics Preferences – OpenGL§ Enabling the Performance
Graphics option candramatically improveperformance on modelswith a large number of:♦ Solids♦ Points♦ Nodes♦ Solid and Shell Elements
§ Set Max VBO MB (Memory)to no more than 75% ofyour graphics card memory§ Sample is shown for a
graphics card with 2GB ofVRAM§Min VBO B is set to 1024 by
default and this shouldwork well with mostgraphics cards
Saratech proprietary and confidential Slide Number: 36
Graphics Preferences – Dynamic Rotation§ Include in Dynamic Rotation
options - switching off any theseoptions should improveperformance.♦ Some key options:
• Element Symbols - if youhave a lot of lumped massesand springs
• Mesh Size - if you have alarge number of curves withmesh sizes on
• Labels and Undeformed -switching these off helpsperformance.
• Elements as Free Edges –this has a slight delay instarting and finishingdynamic rotation but dynamicrotation is much quicker. Forsome models e.g. a mesh ona sphere, there is no freeedge and you will see nothingas the model rotates.
Saratech proprietary and confidential Slide Number: 37
Graphics Card Performance ConsiderationsDesktop area resolution should be taken intoconsideration when using Femap. Having avery fine screen resolution can increase thetime animations need to generate and the timeindividual windows need to refresh. Somethingto consider for Ultra HD (4K/2160P) monitorswith resolutions of 3840x2160.
If Femap appears to be having graphics errors,it could be the driver for your graphics card.Update the drivers for your graphics cardoften!• Drivers from the manufacturers of the
graphics card chipset tend to be more stablethen the drivers from the maker of thegraphics card. (e.g. use an ATI or nVidiadriver vs. an ASUS driver)
• You should also set your graphics cardperformance settings to the default settings.In some cases, setting a card for optimumperformance for an application may causeFemap to crash.
Saratech proprietary and confidential Slide Number: 38
Database Preferences§ The database memory limit is set to
20% of available system RAM bydefault. When FEMAP needs more,it will just swap to the scratch disk,slowing things down. Increasingthis number will leave lessavailable RAM for other FEMAPoperations besides the database.In some cases it may be better tolower this number.§ The Max Cached Label must be set
to an ID higher than any entity inthe model.§ The Open/Save method may
improve read/write performance ifyou are experiencing slowperformance. Clicking theRead/Write Test button willautomatically run a test anddetermine the best setting for yourhardware. It takes about 1.2 GB ofdisk space and a few minutes oftime
Top Related