Data-Centric Press Workshop - HPC Advisory Council · 2.3 ghz base 71.5mb cache 48 cores 3.8 gh z...

Data-Centric Press WorkshopMove | Store |Process

Dr. Jean-Laurent PhilippeSenior HPC Technical SpecialistIntel Data Center Group Sales

UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.


90%OF WORLD’S DATA WAS CREATED

IN THE PAST TWO YEARS

CLOUDIFICATION EXPANSIONAND SCALE TO THE POINT OF USE DATA-CENTRIC ERA

Deliver new services faster to handle the deluge of data

CLOUD

COMMS

ENTERPRISE

deliver operational efficiencies

Deliver secure infrastructure to protect data privacy 2%AND ONLY

IS BEING USEDSource: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/

PERFORMANCE TO PROPEL INSIGHTS

BUSINESS RESILIENCETHROUGH Hardware-enhanced security

AGILE SERVICE DELIVERYFASTER TIME TO VALUE

GLOBALECOSYSTEM

WORKLOADOPTIMIZED

CUSTOMERORIENTED

UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)

BUILT-INVALUE

© COPYRIGHT 2019. INTEL CORPORATION.


STORE MORE PROCESS EVERYTHING

INTEL ATOM® C PROCESSORS

Move Faster

SILICON PHOTONICS

INTEL® NERVANA™ NNP

SOFTWARE ANDSYSTEM-LEVELOPTIMIZED

HARDWAREENHANCEDSECURITY

FASTER TIME TOVALUE


INTEL® MOVIDIUS™ TECHNOLOGY

OMNI-PATH FABRIC

ETHERNET

DC SERIES SSDQ L C 3 D N A N D D R I V E

INTEL® XEON® D PROCESSORS

INTEL® XEON® SCALABLE PROCESSORS

INTEL® FPGAS




Move Faster

SILICON PHOTONICS




FASTER TIME TOVALUE



OMNI-PATH FABRIC

ETHERNET




INTEL® FPGAS

APRIL 2 ANNOUNCEMENTS

LEADERSHIP WORKLOAD

PERFORMANCE GROUNDBREAKING

MEMORY INNOVATION

EMBEDDEDARTIFICIAL INTELLIGENCE

ACCELERATION

ENHANCEDAGILITY & UTILIZATION

HARDWARE ENHANCED

SECURITY


BUILT-INVALUE


UNINTERRUPTED

INTEL.COM/XEONSCALABLE

Leadership XEONPERFORMANCE




HighPerformance

computing

Analytics &Artificial

intelligence

HIGHDENSITY

INFRASTRUCTURE

2XMORE COMPUTE

DENSITY

UPTO112

CORES3.8

INTEL® TURBO BOOST TECHNOLOGY 2.0

UPTO

GHZ 3

DDR4-2933 Mt/S

TBUP

TOUPTO

2S SYSTEM 2S SYSTEM

2.3 GHz BASE71.5MB CACHE

48 CORES3.8 GHz TURBO

2.6 GHz BASE77MB CACHE


2.3 GHz BASE71.5MB CACHE


Leadership XEONPERFORMANCE



Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 50. No product or component

can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using

specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully

evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.


30Xai performance with

intel® dl boost

UPTO2X

Average performance improvement

5.8XBETTER PERFORMANCETHAN AMD EPYC 7601

UPTO

COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSOR

HIGHEST DENSITYINTEL ® XEON® SCALABLE PROCESSOR

CORES IN A 2S SYSTEM

HIGHEST DDR4NATIVE BANDWIDTH OF ANY

INTEL® XEON® PLATFORM

HIGHEST FLOPSPER 2S SYSTEM WITH

INTEL® ARCHITECTURE

COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSORS (JULY 2017)

COMPARED TO INTEL® XEON® PLATINUM 9282 PROCESSOR RUNNING LINPACK

1 2 3

Intel® Xeon® Platinum 9222 processor: Intel® Turbo Boost Technology frequency and base frequency are preliminary; final values to be published at a future date.


2XSYSTEM MEMORY

CAPACITY

UPTO8+

SOCKETS4.4

INTEL® TURBO BOOST TECHNOLOGY 2.0

UPTO

GHZ 36

SYSTEM MEMORYIn a eight socket system

TB

UPTO

& 16 Gb DIMMs

UPTO DDR42933 MT/s

WORLD RECORDSAND COUNTING…

USING INTEL® OPTANE™ DC PERSISTENT MEMORY AND DRAM

COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSORIN a SYSTEM

95 World Records featuring Intel® Xeon® processors as of September 14, 2018.https://newsroom.intel.com/news/intel-xeon-scalable-processors-set-95-new-performance-world-records/ INTEL.COM/XEONSCALABLE


14Xai performance with

intel® dl boost

UPTO

COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSORS (JULY 2017)

3.1XBETTER PERFORMANCETHAN AMD EPYC 7601

UPTO

COMPARED TO INTEL® XEON® PLATINUM 8282 PROCESSOR RUNNING LINPACK

5-year refreshPerformance improvement

VM DENSITY COMPARED TOINTEL® XEON® E5-2600 V2 PROCESSOR

3.50XUPTO

AVERAGE PERFORMANCE IMPROVEMENT

COMPARED TO INTEL® XEON®GOLD 5100 PROCESSOR

1.33XUPTO

INTEL®INFRASTRUCTUREMANAGEMENTTECHNOLOGIES

AGILE SERVICE DELIVERY WITH ENHANCED EFFICENCY

MITIGATIONS FORSIDE-CHANNELmethods

Encryption +Accelerators

BUSINESS RESILIENCE WITH HARDWARE-ENHANCED SECURITY

INTEL®Securitylibraries

Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 50 and 51. No product or

component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured

using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully



INTEL®DEEPLEARNINGBOOST

INTEL®Speed selectTECHNOLOGY

4 5 6 7

Intel® Speed Select Technology is available on select processors.


Business resilience with HARDWARE-ENHANCED SECURITY Agile service delivery with enhanced EFFICENCY

INTEL® Security libraries• New Intel® Threat Detection Technology• Intel® Trusted Execution Technology • Intel® Cloud Integrity Technology

MITIGATIONS FOR SIDE-CHANNEL methods• Enhanced performance over software-only

mitigations INTEL® Speed select TECHNOLOGY• Configurable core/frequency processor attributes• Prioritize workload performance• Supports platform TCO optimizations

Encryption + Accelerators• Available with Intel® QuickAssist Technology• Integrated Intel® Advanced Vector Extensions 512• Intel® Optane™ DC persistent memory on-module

data encryption


INTEL® INFRASTRUCTURE MANAGEMENT TECHNOLOGIES• Industry-leading Intel® Virtualization Technology

• Seamless VM migration for over 5 generations• Enhanced Intel® Resource Director Technology

• New Intel resource orchestration software• New Intel® Ethernet 800 series with Application Device

Queues (ADQ) and Dynamic Device Personalization (DDP)• New Intel® Optane™ DC D4800X (Dual Port) SSDs

INTEL® DEEP LEARNING BOOST• New integrated AI acceleration for inference


AI.INTEL.COM


Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 52. No product or component

can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using

specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully



I N T E L O P T I M I Z A T I O N F O R C A F F E R E S N E T - 5 0 N E W A I A C C E L E R A T I O N

INF

ER

EN

CE

TH

RO

UG

HP

UT

(IM

AG

ES

/S

EC

)

1.0

5.7X

Jul’17 Dec’18

VECTOR NEURAL NETWORK INSTRUCTION

For BUILT-IN inference Acceleration

Optimizations for Developers & Data ScientistsI N T E L ® X E O N ® P L A T I N U M 8 2 0 0 P R O C E S S O R S W I T H

I N T E L ® D L B O O S T

I N T E L ® X E O N ® P L A T I N U M 9 2 0 0 P R O C E S S O R S W I T H

I N T E L ® D L B O O S T

Apr’19

Toolkit

optimizedFrameworks

Data type 8

9

10

11

I N T E L ® X E O N ® P L A T I N U M 8 1 0 0 P R O C E S S O R S




Move Faster

SILICON PHOTONICS




FASTER TIME TOVALUE



OMNI-PATH FABRIC

ETHERNET




INTEL® FPGAS


Storage

Cost performance GAp

Storage Performance Gap

Persistent Memory

MemoryDRAMHOT TIER

HDD / TAPECOLD TIER

3D NAND SSDWARM TIER

ImprovingSSD performance

Delivering efficient storage

Intel® QLC 3D Nand SSD

INTEL.COM/OPTANE



BREAK IO BOTTLENECKS

FASTER RECOVERY

HIGH SPEED STORAGE

INCREASE MEMORY SIZE

CONSOLIDATE WORKLOADS

SCALE UP TO SCALE OUT

AFFORDABLE

ALTERNATIVE TO DRAM

IMPROVE TCO

ON-MODULE ENCRYPTIONINTEL.COM/OPTANEDCPERSISTENTMEMORY


Deliver benefits on , to support to

INTEL.COM/OPTANEDCPERSISTENTMEMORY


Intel® Optane™ DC Persistent Memory Performance Details▪ Intel® Optane™ DC persistent memory is programmable for

different power limits for power/performance optimization• 12W – 18W, in 0.25 watt granularity - for example: 12.25W, 14.75W, 18W

• Higher power settings give best performance

▪ Performance varies based on traffic pattern• Contiguous 4 cacheline (256B) granularity vs. single random cacheline (64B) granularity

• Read vs. writes

• Examples:Granularity Traffic Module Bandwidth

256B (4x64B) Read

256GB, 18W

8.3 GB/s

256B (4x64B) Write 3.0 GB/s

256B (4x64B) 2 Read/1 Write 5.4 GB/s

64B Read 2.13 GB/s

64B Write 0.73 GB/s

64B 2 Read/1 Write 1.35 GB/s

Intel® Optane™ DC Persistent Memory latency▪ Orders of magnitude lower latency than SSD

▪ 2X read/write bandwidth vs disk, with one module, more with multiple modules

Ranges from 180ns to 340ns(vs. DRAM ~70ns)

Optane SSD (4KB) (P4800X)

DCPMM (256B)Read

Read idle latency

Smaller granularity(vs. 4K)

DCPMM (256B)Read/Write

NAND SSD (4KB) (P4610)

1. 256B granularity (64B accesses). Note 4K granularity gives about same performance as 256B

Performance results are based on testing as of Feb 22, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure.

Results have been estimated based on tests conducted on pre-production systems, and provided to you for informational purposes. Software and workloads used in performance tests may have been optimized for performance only on Intel

microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You

should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to

www.intel.com/benchmarks. Configuration: see slide 41 and 42.

DCPMM (64B)Read/Write

DCPMM (64B)Read

Lo

we

r is

be

tte

r


More capacity

Go fasterMINIMIZE DOWNTIME

SAVE MORECost / DB Terabyte

3 TB DRAM + 6 TB Intel® Optane™ DC persistent memory = 9TB TOTAL

Restart time

20 mins Restart time

90 seconds

~$62,495 USD ~$38,357 USD

Faster restart15,1613x

Cost savings1639%

Performance results are based on testing as of Jan 30, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. For detailed configs and pricing see slide/page 55.

Columnar store entire reload into DRAM for 1.3TB dataset is 20 mins. Entire system restart before is 32 Minutes and with DC PMEM it is 13.5 Minutes (12 mins for OS + 1.5 mins)

Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision.

CPU: 4 Intel® Xeon® Platinum 8280M ProcessorMEMORY: 48 x 128GB DDR4 for 6 TB system

CPU: 4 Intel® Xeon® Platinum 8280L ProcessorMEMORY: 24 x 128GB DDR4 + 24 x 256GB Intel Optane DC persistent memory for 9TB system



http://www.intel.com/benchmarks

DDR4 DRAM Only DDR4 DRAM with

CPU: Intel® Xeon® Platinum 8276 processorMEMORY: 768 GB DDR4 DRAM memory

CPU: Intel® Xeon® Platinum 8276 processorMEMORY: 192 GB DDR4 DRAM memory + 1 TB Intel® Optane™ DC persistent memory

Up to33% More

memory

30% Lower cost per VM18

System memory 768 GB

22 vms

~$1,588 USD/VM

system memory1 TB

30VMs

~$1,108 USD/VM

memory mode – Windows Server® 2019/Hyper-V

Up toSAVE MORE

Up to36% More vms

per node17DO MORE

Performance results are based on testing as of Jan 31, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. For detailed configs and pricing see slide/page 54.





STORAGE

MEMORY

PERSISTENT MEMORY

DRAMHOT TIER

HDD / TAPECOLD TIER

Intel® 3D Nand SSDs

INTEL® OPTANE™ SSD DC D4800X DUAL-PORT • PERFORMANCE + RESILIENCY FOR CRITICAL ENTERPRISE IT APPS• DUAL PORT CONNECTIONS ENABLE 24X7 DATA AVAILABILITY

WITH REDUNDANT, HOT SWAPPABLE DATA PATHS

INTEL® SSD D-5 P4326 E1.L• COST-OPTIMIZED, ENABLES GREATER WARM STORAGE• NEW E1.L FORM FACTOR SCALABLE TO ~1PB (IN 1U)


INTEL.COM/OPTANE


hpcAI / analyticsstorage Infrastructure COMMSdatabase

Hyper-Converged (HCI)

Machine Learning Analytics

Apache Spark*

Real Time Analytics

SAS*

RDMA/Replication

Oracle Exadata*

Content Delivery Network (CDN)

Comms SP custom

Caching/Persistence

SAP HANA*MS-SQL*

Oracle Exadata*Aerospike*

Redis* RocksDB*

Scratch & IO Nodes

HPC Flex Memory

SDS

Ceph*

Memory

VMWare ESXi*MSFT Hyper-V*

KVM*Redis Labs*

Memcached*VDI

Storage memoryVMware vSANMicrosoft S2D

Nutanix*Cisco HyperFlex*

VMWare ESXi*MSFT Hyper-V*

KVM*

INTEL.COM/OPTANE


BEST FIT IS SHOWN, BOTH PRODUCTS MAY BE VIABLE




Move Faster

SILICON PHOTONICS




FASTER TIME TOVALUE



OMNI-PATH FABRIC

ETHERNET




INTEL® FPGAS


Optimized Libraries

Optimized AI Frameworks

Compilers

Profiling tools

Toolkits /Tool Suites

SOFTWARE.INTEL.COM


Parallel Programming FRAMEWORK

Intel® Math Kernel Library for Deep Neural Networks Intel® Machine Learning Scalable Library Intel® Data Analytics Acceleration Library Intel® Integrated Performance Primitives

TensorFlow*, PyTorch*Caffe*, PaddlePaddle*

MXNET*, …

Intel® VTune™ AmplifierIntel® Advisor

Intel® Inspector

Intel® Compilers (LLVM, GCC, Fortran)nGraph

Intel® Parallel Studio XE Intel® System Studio Intel® Distribution of OpenVINO toolkitIntel® Distribution for Python* Data Plane Development Kit Persistent Memory Development KitIntel® Resource Director Technology (OWCA & Intel® PRM) Storage Performance Development Kit

Threading Building BlocksIntel® MPI Library

OpenMP*


OWCA: ORCHESTRATION-AWARE WORKLOAD COLLOCATION AGENTINTEL® PRM: INTEL® PLATFORM RESOURCE MANAGER

UNDER EMBARGO UNTIL 2 APRIL 2019 AT 12:00PM (PACIFIC TIME)

What benefits have you achieved/do you expect to achieve from Solution Templates from trusted 3rd party?

Source: Reduce Complexity To Maximize Performance, a commissioned study conducted by Forrester Consulting on behalf of Intel, December 2018 https://www.intel.com/content/www/us/en/business/reduce-complexity-maximize-performance.html

Technical Benefits BUSINESS BENEFITS49%46%44%

46%42%37%

Better System Performance

Improved Security

Faster Deployment

Faster Time To Value

Reduced Operational Costs

Greater ROI

INTEL.COM/SELECTSOLUTIONS


https://www.intel.com/content/www/us/en/business/reduce-complexity-maximize-performance.html


Benchmarked for specific workloads to

deliver optimal performance

Workload optimized

Deploy smoothly with pre-defined settings and system-wide tuning

Fast & easy to deploy

Eliminate guessworkthrough tightly specified

HW and SW components

Simplified evaluation

Simplify and accelerate deployment of workload-optimized infrastructure




Analytics Artificial Intelligence

Hybrid Cloud

Network Transformation

HPC

Microsoft Azure Stack*

Red Hat OpenShift* Container Platform

Huawei FusionStorage*

Blockchain: Hyperledger Fabric*

Microsoft SQL Server Enterprise Data

WarehouseLinux*

Genomics Analytics

BigDL on Apache Spark*Universal Customer Premises Equipment

NFVi: Red Hat*

NFVi: Ubuntu*

NFVi: FusionSphere*

Simulation & Modeling

ProfessionalVisualization

1H’19 Solution Refresh - Existing Workloads

SAP HANA*

AI Inferencing Microsoft SQL Server Enterprise Data

WarehouseWindows Server* VMware vSAN*

Visual Cloud Delivery Network

HPC AI ConvergedMicrosoft Windows Server Software Defined*

1H’19 Solutions on NEW Workloads



Microsoft SQL Server* Business Operations




ISV/OSV DEVELOPMENT PARTNERS OEMs SCALE PARTNERS




Move Faster

SILICON PHOTONICS




FASTER TIME TOVALUE



OMNI-PATH FABRIC

ETHERNET




INTEL® FPGAS


BUILDING UPON 20+ YEARS OF INTEL® XEON® PLATFORM LEADERSHIPINTEL IS LAUNCHING GROUNDBREAKING INNOVATIONS OPTIMIZED FOR DIGITAL TRANSFORMATION

Move Faster STORE MORE process everything SOFTWARE & SYSTEM-LEVEL OPTIMIZED

PERFORMANCEIMPROVEMENT

GENOVERGEN

MAINSTREAM PERFORMANCE IMPROVEMENTS

GENOVERGEN

ai performance with intel® dl boost

END-TO-END AI ACCELERATED PROCESSORS

GENOVERGEN

PERFORMANCEIMPROVEMENT

WORKLOAD SPECIALIZED FOR NFV, CLOUD AND IOT

Memorycapacity

GENOVERGEN

BREAKTHROUGH INNOVATION

Data-centric portfolio launch

accelerating workloads from the multi-cloud to the edge and back

Fastest Intel® Xeon® Adoption in History

2 5 12

Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 50 and 53. No product or

component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured

using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully


200+ PARTNERDESIGNS

EARLY ACCESSPROGRAM

&


INTEL® XEON® PLATINUM # PROCESSOR2 # # α α

PROCESSOR LEVEL9 PLATINUM8 PLATINUM6 GOLD5 GOLD4 SILVER3 BRONZE

PROCESSOR GENERATION2 SECOND GENERATION1 FIRST GENERATION

PROCESSOR SKU## E.G. 20, 34, … LARGE DDR MEMORY TIER SUPPORT (UP TO 4.5TB)

MEDIUM DDR MEMORY TIER SUPPORT (UP TO 2.0TB)NETWORKING/NETWORK FUNCTION VIRTUALIZATIONSEARCHTHERMALVM DENSITY VALUEINTEL® SPEED SELECT TECHNOLOGY

LMNSTVY

PROCESSOR OPTIONS

ALL INFORMATION PROVIDED IS SUBJECT TO CHANGE WITHOUT NOTICE.INTEL MAY MAKE CHANGES TO SPECIFICATIONS AND PRODUCT DESCRIPTIONS AT ANY TIME, WITHOUT NOTICE.

CONTACT YOUR INTEL REPRESENTATIVE TO OBTAIN THE LATEST INTEL PRODUCT SPECIFICATIONS.© COPYRIGHT 2019. INTEL CORPORATION.


NOTICES & DISCLAIMERS


Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.

No product or component can be absolutely secure.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks .

Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as property of others.

© 2019 Intel Corporation.

http://www.intel.com/

http://www.intel.com/

http://www.intel.com/go/turbo

NOTICES & DISCLAIMERS


FTC Optimization NoticeIf you make any claim about the performance benefits that can be achieved by using Intel software developer tools (compilers or libraries), you must use the entire text on the same viewing plane (slide) as the claim.

Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Notice Revision #20110804


Footnotes and Configuration Details


Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in

performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to

any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more

complete information visit www.intel.com/benchmarks.

1 - 2x Average Performance Improvement compared with Intel® Xeon® Platinum 8180 processor. Geomean of est SPECrate2017_int_base, est SPECrate2017_fp_base, Stream Triad, Intel Distribution of Linpack, server side Java. Platinum 92xx vs Platinum

8180: 1-node, 2x Intel® Xeon® Platinum 9282 cpu on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_65, IC19u1, AVX512, HT on all (off Stream, Linpack), Turbo on all (off Stream, Linpack), result: est

int throughput=635, est fp throughput=526, Stream Triad=407, Linpack=6411, server side java=332913, test by Intel on 2/16/2019. vs. 1-node, 2x Intel® Xeon® Platinum 8180 cpu on Wolf Pass with 384 GB (12 X 32GB 2666) total memory, ucode 0x200004D on

RHEL7.6, 3.10.0-957.el7.x86_65, IC19u1, AVX512, HT on all (off Stream, Linpack), Turbo on all (off Stream, Linpack), result: est int throughput=307, est fp throughput=251, Stream Triad=204, Linpack=3238, server side java=165724, test by Intel on 1/29/2019.

2 - Up to 30X AI performance with Intel® DL Boost compared to Intel® Xeon® Platinum 8180 processor (July 2017). Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282(56 cores per socket), HT ON, turbo ON, Total

Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS:SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187,

MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer DummyData:3x224x224, 56

instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM.

CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘,

OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command,

training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from

https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

3 – Up to 5.8X better performance than AMD EPYC 7601 compared to Intel® Xeon® Platinum 9282 processor running LINPACK. AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS

ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD BLIS

ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score =1095GFs, tested by Intel as of July 31, 2018. vs. 1-node, 2x Intel® Xeon® Platinum 9282 cpu on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6,

3.10.0-957.el7.x86_65, IC19u1, AVX512, HT off, Turbo on, score=6411, test by Intel on 2/16/2019. 1-node, 2x Intel® Xeon® Platinum 8280M cpu on Wolf Pass with 384 GB (12 X 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_65,

IC19u1, AVX512, HT off Linpack, Turbo on, score=3462, test by Intel on 1/30/2019.

4 – Up to 3.50X 5-Year Refresh Performance Improvement VM density compared to Intel® Xeon® E5-2600 v6 processor: 1-node, 2x E5-2697 v2 on Canon Pass with 256 GB (16 slots / 16GB / 1600) total memory, ucode 0x42c on RHEL7.6, 3.10.0-957.el7.x86_65, 1x Intel

400GB SSD OS Drive, 2x P4500 4TB PCIe, 2*82599 dual port Ethernet, Virtualization Benchmark, VM kernel 4.19, HT on, Turbo on, score: VM density=74, test by Intel on 1/15/2019. vs. 1-node, 2x 8280 on Wolf Pass with 768 GB (24 slots / 32GB / 2666) total memory, ucode 0x2000056 on

RHEL7.6, 3.10.0-957.el7.x86_65, 1x Intel 400GB SSD OS Drive, 2x P4500 4TB PCIe, 2*82599 dual port Ethernet, Virtualization Benchmark, VM kernel 4.19, HT on, Turbo on, score: VM density=21, test by Intel on 1/15/2019.

5 –1.33X Average Performance Improvement compared to Intel® Xeon® Gold 5100 Processor: Geomean of est SPECrate2017_int_base, est SPECrate2017_fp_base, Stream Triad, Intel Distribution of Linpack, server side Java. Gold 5218 vs Gold 5118: 1-node,

2x Intel® Xeon® Gold 5218 cpu on Wolf Pass with 384 GB (12 X 32GB 2933 (2666)) total memory, ucode 0x4000013 on RHEL7.6, 3.10.0-957.el7.x86_65, IC18u2, AVX2, HT on all (off Stream, Linpack), Turbo on, result: est int throughput=162, est fp

throughput=172, Stream Triad=185, Linpack=1088, server side java=98333, test by Intel on 12/7/2018. 1-node, 2x Intel® Xeon® Gold 5118 cpu on Wolf Pass with 384 GB (12 X 32GB 2666 (2400)) total memory, ucode 0x200004D on RHEL7.6, 3.10.0-

957.el7.x86_65, IC18u2, AVX2, HT on all (off Stream, Linpack), Turbo on, result: est int throughput=119, est fp throughput=134, Stream Triad=148.6, Linpack=822, server side java=67434, test by Intel on 11/12/2018









6 – Up to 14X AI Performance Improvement with Intel® DL Boost compared to Intel® Xeon® Platinum 8180 Processor (July 2017). Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384

GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep

Learning Framework: Intel® Optimization for Caffe version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model:

https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, DummyData, 4 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28

cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in

SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe:

(http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other

topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries

version 2018.0.20170425. Caffe run with “numactl -l“.

7 – Up to 3.1X Better Performance than AMD EPYC 7601 compared to Intel® Xeon® Platinum 8282 Processor running LINPACK. AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON,

BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD

BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score =1095GFs, tested by Intel as of July 31, 2018. vs. 1-node, 2x Intel® Xeon® Platinum 9282 cpu on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6,

3.10.0-957.el7.x86_65, IC19u1, AVX512, HT off, Turbo on, score=6411, test by Intel on 2/16/2019. 1-node, 2x Intel® Xeon® Platinum 8280M cpu on Wolf Pass with 384 GB (12 X 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_65,

IC19u1, AVX512, HT off Linpack, Turbo on, score=3462, test by Intel on 1/30/2019.



https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt

http://github.com/intel/caffe/

https://github.com/intel/caffe/tree/master/models/intel_optimized_models







8 - 1x inference throughput improvement in July 2017 (baseline): Tested by Intel as of July 11th 2017: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate

driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables:

KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference

measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs

from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but

are functionally equivalent). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

9 - 5.7x inference throughput improvement in December 2018 vs baseline: Tested by Intel as of November 11th 2018 :2 socket Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz / 28 cores HT ON , Turbo ON Total Memory 376.46GB (12slots / 32 GB / 2666

MHz). CentOS Linux-7.3.1611-Core, kernel: 3.10.0-862.3.3.el7.x86_64, SSD sda RS3WC080 HDD 744.1GB,sdb RS3WC080 HDD 1.5TB,sdc RS3WC080 HDD 5.5TB , Deep Learning Framework Intel® Optimization for caffe version:

551a53d63a6183c233abaa1a19458a25b672ad41 Topology::ResNet_50_v1 BIOS:SE5C620.86B.00.01.0014.070920180847 MKLDNN: 4e333787e0d66a1dca1218e99a891d493dbc8ef1 instances: 2 instances socket:2 (Results on Intel® Xeon® Scalable

Processor were measured running multiple instances of the framework. Methodology described here: https://software.intel.com/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi) NoDataLayer. Datatype: INT8

Batchsize=64 vs Tested by Intel as of July 11th 2017:2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release

7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56,

CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe

time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models

(ResNet-50). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.

10 – 14x inference throughput improvement vs baseline: Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS:

SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep Learning Framework: Intel® Optimization for

Caffe version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model:

https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, DummyData, 4 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28

cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in

SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe:

(http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other

topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries

version 2018.0.20170425. Caffe run with “numactl -l“.

11 - 30x inference throughput improvement with CascadeLake-AP vs baseline: Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282(56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/

2933 MHz), BIOS:SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit

hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer DummyData:3x224x224, 56 instance/2 socket, Datatype: INT8

vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core),

Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set

with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time”

command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-

50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.





https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners

https://software.intel.com/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi






https://github.com/intel/caffe d554cbf1










12 – Up to 1.25 to 1.58X NVF Workload Performance Improvement comparing Intel® Xeon® Gold 6230N processor to Intel® Xeon® Gold 6130 processor.

VPP IP Security: Tested by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:

PLYDCRB1.86B.0155.R08.1806130538, ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: VPP IPSec w/AESNI (AES-GCM-128) (Max Gbits/s (1420B)), Workload version: VPP v17.10, Compiler: gcc7.3.0, Results: 179. Tested

by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019

(HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: VPP IPSec w/AESNI (AES-GCM-128) (Max Gbits/s (1420B)), Workload version: VPP v17.10, Compiler: gcc7.3.0, Results: 225

VPP FIB: Tested by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYDCRB1.86B.0155.R08.1806130538, ucode:

0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: VPP FIB (Max Mpackets/s (64B)), Workload version: VPP v17.10 in ipv4fib configuration, Compiler: gcc7.3.0, Results: 160. Tested by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold

6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel:

4.20.0-042000rc6-generic, Benchmark: VPP FIB (Max Mpackets/s (64B)), Workload version: VPP v17.10 in ipv4fib configuration, Compiler: gcc7.3.0, Results: 212.9

Virtual Firewall: Tested by Intel on 10/26/2018 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 4x Intel X710-DA4, Bios: PLYDCRB1.86B.0155.R08.1806130538,

ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: Virtual Firewall (64B Mpps), Workload version: opnfv 6.2.0, Compiler: gcc7.3.0, Results: 38.9. Tested by Intel on 2/04/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on

Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-

generic, Benchmark: Virtual Firewall (64B Mpps), Workload version: opnfv 6.2.0, Compiler: gcc7.3.0, Results: 52.3

Virtual Broadband Network Gateway: Tested by Intel on 11/06/2018 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:

PLYDCRB1.86B.0155.R08.1806130538, ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: Virtual Broadband Network Gateway (88B Mpps), Workload version: DPDK v18.08 ip_pipeline application, Compiler: gcc7.3.0, Results:

56.5. Tested by Intel on 1/2/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode:

0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Virtual Broadband Network Gateway (88B Mpps), Workload version:DPDK v18.08 ip_pipeline application, Compiler: gcc7.3.0, Results: 78.7

VCMTS: Tested by Intel on 1/22/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Supermicro*-X11DPH-Tq platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 4x Intel XXV710-DA2, Bios: American Megatrends Inc.*

version: '2.1', ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Virtual Converged Cable Access Platform (iMIX Gbps), Workload version: vcmts 18.10, Compiler: gcc7.3.0 , Other software: Kubernetes* 1.11, Docker*

18.06, DPDK 18.11, Results: 54.8. Tested by Intel on 1/22/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:

PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Virtual Converged Cable Access Platform (iMIX Gbps), Workload version: vcmts 18.10 , Compiler: gcc7.3.0, Other software:

Kubernetes* 1.11, Docker* 18.06, DPDK 18.11, Results: 83.7

OVS DPDK: Tested by Intel on 1/21/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 4x Intel XXV710-DA2, Bios: PLYXCRB1.86B.0568.D10.1901032132,

ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: Open Virtual Switch (on 4C/4P/8T 64B Mpacket/s), Workload version: OVS 2.10.1, DPDK-17.11.4, Compiler: gcc7.3.0, Other software: QEMU-2.12.1, VPP v18.10, Results: 9.6.

Tested by Intel on 1/18/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.86B.0568.D10.1901032132, ucode:

0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Open Virtual Switch (on 6P/6C/12T 64B Mpacket/s), Workload version: OVS 2.10.1, DPDK-17.11.4, Compiler: gcc7.3.0, Other software: QEMU-2.12.1, VPP v18.10, Results:

15.2. Tested by Intel on 1/18/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor with SST-BF enabled on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:

PLYXCRB1.86B.0568.D10.1901032132, ucode: 0x4000019 (HT= ON, Turbo= ON (SST-BF)), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Open Virtual Switch (on 6P/6C/12T 64B Mpacket/s), Workload version: OVS 2.10.1, DPDK-17.11.4, Compiler:

gcc7.3.0, Other software: QEMU-2.12.1, VPP v18.10, Results: 16.9









13

> 45% latency reduction with Open Source

Redis using 2nd gen Intel® Xeon®

Scalable Processors and Intel®

Ethernet 800 Series with ADQ.

Calculation: (new - old) / old x 100%

Rtt Average Latency across all run for baseline vs ADQ

(382-1249)/1249 * 100% = -69% Reduction

in Rtt Average Latency

14

> 30% throughput improvement with Open Source

Redis using 2nd gen Intel® Xeon®

Scalable Processors and Intel®

Ethernet 800 Series with ADQ.

Calculation: (new - old) / old x 100%

Transaction Request Rate across all run for baseline vs ADQ

(79601-44345)/44345 * 100% = 80% Throughput Improvement

Source: Intel internal testing as of February 2019

Configuration details provided in the following tables:


SUT ClientTest by Intel IntelTest date 2/11/2019 2/11/2019

Platform Intel® Server Board S2600WFTF Dell* PowerEdge* R720

# Nodes 1 11# Sockets 2 2

CPU 2nd Generation Intel® Xeon® Scalable processor 8268 @ 2.8GHz Intel® Xeon® processor E5-2697 v2

Cores/socket, Threads/socket 24/48 12 / 24

ucode 0x3000009 0x428

HT On On

Turbo On On

BIOS version SE5C620.86B.01.00.0833.051120182255 2.5.4

System DDR Mem Config: slots / cap / run-speed 8 slots / 128GB / 2400MT/s 16 slots / 128GB / 1600MT/s

System DCPMM Config: slots / cap / run-speed 2 slots / 1024GB -

Total Memory/Node (DDR+DCPMM) 1024GB 128GB

Storage - boot 1x Intel SSD (OS Drive 64GB) 1x Dell (OS Drive 512GB)

Storage - application drives - -

NIC 1x Intel® Ethernet Network Adapter E810-CQDA2 1x Intel® Ethernet X520-DA2

PCH Intel® C620 Series Chipset Intel® C600 Series Chipset

Other HW (Accelerator)

OS CentOS 7.6 CentOS 7.4

Kernel 4.19.18 (Linux.org Stable) 3.10.0-693.21.1.el7

IBRS (0=disable, 1=enable) 1 0

eIBRS (0=disable, 1=enable) 0 0

Retpoline (0=disable, 1=enable) 1 0

IBPB (0=disable, 1=enable) 1 0

PTI (0=disable, 1=enable) 1 1

Mitigation variants (1,2,3,3a,4, L1TF) 1,2,3,L1TF 1,2,3,L1TF

Workload & version Redis 4.0.10 redis-benchmark 4.0.10

Compiler gcc (GCC) 4.8.5 20150623 -

NIC Driver ice 08.15 ixgbe 4.4.0-k-rh7.4




Baseline Config(DRAM)

AD 2-2-2 Config

System Lightning Ridge (4S) Lightning Ridge (4S)

CPU Intel® Xeon® 8280M Intel® Xeon® 8280L

CPUs per node 4-socket @ 28 core / socket 4-socket @ 28 core / socket

Memory 6TB48x 128GB DDR4 @ 2666 MT/s

9TB24x 256GB Intel® Optane™ DC PMEM24x 128GB DDR4 @ 2666 MT/s

Network 10 GbE Intel X520 NIC 10 GbE Intel X520 NIC

Storage 60x Intel SSD DC S4600 SATA 480GB TB

90x Intel SSD DC S4600 SATA 480GB TB

BIOS WW48’18 WW48’18

OS or VM version

SUSE 15 SUSE 15

WL Version Intel IT workload Intel IT workload

SAP HANA* database size

3TB 6TB

Securitymitigations

Variants 1,2,3 enabled Variants 1,2,3 enabled

Date costsprojected

March 1, 2019 March 1, 2019

15 - 13x faster restart time and 39% less cost configs: 16 - 39% less cost pricing details:



Footnotes and Configuration Details17 & 18 - 36% more VMs per node & 30% lower estimated cost per VM configs:


Config1-DDR4 (Similar Cost) Config2-Intel Optane DC persistent memory (Similar Cost)

Test by Intel Intel

Test date 01/31/2019 01/31/2019

Platform Confidential – Refer to M. Strassmaier if a need to know exists

Confidential – Refer to M. Strassmaier if a need to know exists

# Nodes 1 1

# Sockets 2 2

CPU Cascade Lake B0 8272L Cascade Lake B0 8272L

Cores/socket, Threads/socket 26/52 26/52

HT ON ON

Turbo ON ON

BKC version – E.g. ww47 WW42 WW42

Intel Optane DC persistent memory FW version

5253 5253

System DDR Mem Config: slots / cap / run-speed

24 slots / 32GB / 2666 12 slots / 16 GB / 2666

System DCPMM Config: slots / cap / run-speed

8 slots /128GB / 2666

Total Memory/Node (DDR, DCPMM) 768GB, 0 192GB, 1TB

Storage - boot 1x Samsung PM963 M.2 960GB 1x Samsung PM963 M.2 960GB

Storage - application drives 7 x Samsung PM963 M.2 960GB, 4x Intel SSDs S4600 (1.92TB)

7x Samsung PM963 M.2 960GB, 4x Intel SSDs S4600 (1.92TB)

NIC 1xIntel X520 SR2 (10Gb) 1x Intel X520 SR2 (10Gb)

PCH LBG QS/PRQ – T – B2 LBG QS/PRQ – T – B2

Other HW (Accelerator)

OS Windows Server 2019 RS5-17763 Windows Server 2019 RS5-17763

Kernel

Workload & version OLTP Cloud Benchmark OLTP Cloud Benchmark

Compiler

Libraries

Other SW (Frameworks, Topologies…)

Data-Centric Press Workshop - HPC Advisory Council · 2.3 ghz base 71.5mb cache 48 cores 3.8 gh z...

Documents

Transcript of Data-Centric Press Workshop - HPC Advisory Council · 2.3 ghz base 71.5mb cache 48 cores 3.8 gh z...