Data-Centric Press Workshop - HPC Advisory Council · 2.3 ghz base 71.5mb cache 48 cores 3.8 gh z...
Transcript of Data-Centric Press Workshop - HPC Advisory Council · 2.3 ghz base 71.5mb cache 48 cores 3.8 gh z...
Data-Centric Press WorkshopMove | Store |Process
Dr. Jean-Laurent PhilippeSenior HPC Technical SpecialistIntel Data Center Group Sales
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
90%OF WORLD’S DATA WAS CREATED
IN THE PAST TWO YEARS
CLOUDIFICATION EXPANSIONAND SCALE TO THE POINT OF USE DATA-CENTRIC ERA
Deliver new services faster to handle the deluge of data
CLOUD
COMMS
ENTERPRISE
deliver operational efficiencies
Deliver secure infrastructure to protect data privacy 2%AND ONLY
IS BEING USEDSource: https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/
PERFORMANCE TO PROPEL INSIGHTS
BUSINESS RESILIENCETHROUGH Hardware-enhanced security
AGILE SERVICE DELIVERYFASTER TIME TO VALUE
GLOBALECOSYSTEM
WORKLOADOPTIMIZED
CUSTOMERORIENTED
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
BUILT-INVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
STORE MORE PROCESS EVERYTHING
INTEL ATOM® C PROCESSORS
Move Faster
SILICON PHOTONICS
INTEL® NERVANA™ NNP
SOFTWARE ANDSYSTEM-LEVELOPTIMIZED
HARDWAREENHANCEDSECURITY
FASTER TIME TOVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® MOVIDIUS™ TECHNOLOGY
OMNI-PATH FABRIC
ETHERNET
DC SERIES SSDQ L C 3 D N A N D D R I V E
INTEL® XEON® D PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
INTEL® FPGAS
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
STORE MORE PROCESS EVERYTHING
INTEL ATOM® C PROCESSORS
Move Faster
SILICON PHOTONICS
INTEL® NERVANA™ NNP
SOFTWARE ANDSYSTEM-LEVELOPTIMIZED
HARDWAREENHANCEDSECURITY
FASTER TIME TOVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® MOVIDIUS™ TECHNOLOGY
OMNI-PATH FABRIC
ETHERNET
DC SERIES SSDQ L C 3 D N A N D D R I V E
INTEL® XEON® D PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
INTEL® FPGAS
APRIL 2 ANNOUNCEMENTS
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
STORE MORE PROCESS EVERYTHING
INTEL ATOM® C PROCESSORS
Move Faster
SILICON PHOTONICS
INTEL® NERVANA™ NNP
SOFTWARE ANDSYSTEM-LEVELOPTIMIZED
HARDWAREENHANCEDSECURITY
FASTER TIME TOVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® MOVIDIUS™ TECHNOLOGY
OMNI-PATH FABRIC
ETHERNET
DC SERIES SSDQ L C 3 D N A N D D R I V E
INTEL® XEON® D PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
INTEL® FPGAS
APRIL 2 ANNOUNCEMENTS
LEADERSHIP WORKLOAD
PERFORMANCE GROUNDBREAKING
MEMORY INNOVATION
EMBEDDEDARTIFICIAL INTELLIGENCE
ACCELERATION
ENHANCEDAGILITY & UTILIZATION
HARDWARE ENHANCED
SECURITY
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
BUILT-INVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
UNINTERRUPTED
INTEL.COM/XEONSCALABLE
Leadership XEONPERFORMANCE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
INTEL.COM/XEONSCALABLE
© COPYRIGHT 2019. INTEL CORPORATION.
HighPerformance
computing
Analytics &Artificial
intelligence
HIGHDENSITY
INFRASTRUCTURE
2XMORE COMPUTE
DENSITY
UPTO112
CORES3.8
INTEL® TURBO BOOST TECHNOLOGY 2.0
UPTO
GHZ 3
DDR4-2933 Mt/S
TBUP
TOUPTO
2S SYSTEM 2S SYSTEM
2.3 GHz BASE71.5MB CACHE
48 CORES3.8 GHz TURBO
2.6 GHz BASE77MB CACHE
56 CORES3.8 GHz TURBO
2.3 GHz BASE71.5MB CACHE
32 CORES3.7 GHz TURBO
Leadership XEONPERFORMANCE
INTEL.COM/XEONSCALABLE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 50. No product or component
can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
© COPYRIGHT 2019. INTEL CORPORATION.
30Xai performance with
intel® dl boost
UPTO2X
Average performance improvement
5.8XBETTER PERFORMANCETHAN AMD EPYC 7601
UPTO
COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSOR
HIGHEST DENSITYINTEL ® XEON® SCALABLE PROCESSOR
CORES IN A 2S SYSTEM
HIGHEST DDR4NATIVE BANDWIDTH OF ANY
INTEL® XEON® PLATFORM
HIGHEST FLOPSPER 2S SYSTEM WITH
INTEL® ARCHITECTURE
COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSORS (JULY 2017)
COMPARED TO INTEL® XEON® PLATINUM 9282 PROCESSOR RUNNING LINPACK
1 2 3
Intel® Xeon® Platinum 9222 processor: Intel® Turbo Boost Technology frequency and base frequency are preliminary; final values to be published at a future date.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
2XSYSTEM MEMORY
CAPACITY
UPTO8+
SOCKETS4.4
INTEL® TURBO BOOST TECHNOLOGY 2.0
UPTO
GHZ 36
SYSTEM MEMORYIn a eight socket system
TB
UPTO
& 16 Gb DIMMs
UPTO DDR42933 MT/s
WORLD RECORDSAND COUNTING…
USING INTEL® OPTANE™ DC PERSISTENT MEMORY AND DRAM
COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSORIN a SYSTEM
95 World Records featuring Intel® Xeon® processors as of September 14, 2018.https://newsroom.intel.com/news/intel-xeon-scalable-processors-set-95-new-performance-world-records/ INTEL.COM/XEONSCALABLE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
14Xai performance with
intel® dl boost
UPTO
COMPARED TO INTEL® XEON® PLATINUM 8180 PROCESSORS (JULY 2017)
3.1XBETTER PERFORMANCETHAN AMD EPYC 7601
UPTO
COMPARED TO INTEL® XEON® PLATINUM 8282 PROCESSOR RUNNING LINPACK
5-year refreshPerformance improvement
VM DENSITY COMPARED TOINTEL® XEON® E5-2600 V2 PROCESSOR
3.50XUPTO
AVERAGE PERFORMANCE IMPROVEMENT
COMPARED TO INTEL® XEON®GOLD 5100 PROCESSOR
1.33XUPTO
INTEL®INFRASTRUCTUREMANAGEMENTTECHNOLOGIES
AGILE SERVICE DELIVERY WITH ENHANCED EFFICENCY
MITIGATIONS FORSIDE-CHANNELmethods
Encryption +Accelerators
BUSINESS RESILIENCE WITH HARDWARE-ENHANCED SECURITY
INTEL®Securitylibraries
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 50 and 51. No product or
component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured
using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL®DEEPLEARNINGBOOST
INTEL®Speed selectTECHNOLOGY
4 5 6 7
Intel® Speed Select Technology is available on select processors.
INTEL.COM/XEONSCALABLE
Business resilience with HARDWARE-ENHANCED SECURITY Agile service delivery with enhanced EFFICENCY
INTEL® Security libraries• New Intel® Threat Detection Technology• Intel® Trusted Execution Technology • Intel® Cloud Integrity Technology
MITIGATIONS FOR SIDE-CHANNEL methods• Enhanced performance over software-only
mitigations INTEL® Speed select TECHNOLOGY• Configurable core/frequency processor attributes• Prioritize workload performance• Supports platform TCO optimizations
Encryption + Accelerators• Available with Intel® QuickAssist Technology• Integrated Intel® Advanced Vector Extensions 512• Intel® Optane™ DC persistent memory on-module
data encryption
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® INFRASTRUCTURE MANAGEMENT TECHNOLOGIES• Industry-leading Intel® Virtualization Technology
• Seamless VM migration for over 5 generations• Enhanced Intel® Resource Director Technology
• New Intel resource orchestration software• New Intel® Ethernet 800 series with Application Device
Queues (ADQ) and Dynamic Device Personalization (DDP)• New Intel® Optane™ DC D4800X (Dual Port) SSDs
INTEL® DEEP LEARNING BOOST• New integrated AI acceleration for inference
INTEL.COM/XEONSCALABLE
AI.INTEL.COM
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 52. No product or component
can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
© COPYRIGHT 2019. INTEL CORPORATION.
I N T E L O P T I M I Z A T I O N F O R C A F F E R E S N E T - 5 0 N E W A I A C C E L E R A T I O N
INF
ER
EN
CE
TH
RO
UG
HP
UT
(IM
AG
ES
/S
EC
)
1.0
5.7X
Jul’17 Dec’18
VECTOR NEURAL NETWORK INSTRUCTION
For BUILT-IN inference Acceleration
Optimizations for Developers & Data ScientistsI N T E L ® X E O N ® P L A T I N U M 8 2 0 0 P R O C E S S O R S W I T H
I N T E L ® D L B O O S T
I N T E L ® X E O N ® P L A T I N U M 9 2 0 0 P R O C E S S O R S W I T H
I N T E L ® D L B O O S T
Apr’19
Toolkit
optimizedFrameworks
Data type 8
9
10
11
I N T E L ® X E O N ® P L A T I N U M 8 1 0 0 P R O C E S S O R S
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
STORE MORE PROCESS EVERYTHING
INTEL ATOM® C PROCESSORS
Move Faster
SILICON PHOTONICS
INTEL® NERVANA™ NNP
SOFTWARE ANDSYSTEM-LEVELOPTIMIZED
HARDWAREENHANCEDSECURITY
FASTER TIME TOVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® MOVIDIUS™ TECHNOLOGY
OMNI-PATH FABRIC
ETHERNET
DC SERIES SSDQ L C 3 D N A N D D R I V E
INTEL® XEON® D PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
INTEL® FPGAS
APRIL 2 ANNOUNCEMENTS
Storage
Cost performance GAp
Storage Performance Gap
Persistent Memory
MemoryDRAMHOT TIER
HDD / TAPECOLD TIER
3D NAND SSDWARM TIER
ImprovingSSD performance
Delivering efficient storage
Intel® QLC 3D Nand SSD
INTEL.COM/OPTANE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
BREAK IO BOTTLENECKS
FASTER RECOVERY
HIGH SPEED STORAGE
INCREASE MEMORY SIZE
CONSOLIDATE WORKLOADS
SCALE UP TO SCALE OUT
AFFORDABLE
ALTERNATIVE TO DRAM
IMPROVE TCO
ON-MODULE ENCRYPTIONINTEL.COM/OPTANEDCPERSISTENTMEMORY
© COPYRIGHT 2019. INTEL CORPORATION.
Deliver benefits on , to support to
INTEL.COM/OPTANEDCPERSISTENTMEMORY
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Intel® Optane™ DC Persistent Memory Performance Details▪ Intel® Optane™ DC persistent memory is programmable for
different power limits for power/performance optimization• 12W – 18W, in 0.25 watt granularity - for example: 12.25W, 14.75W, 18W
• Higher power settings give best performance
▪ Performance varies based on traffic pattern• Contiguous 4 cacheline (256B) granularity vs. single random cacheline (64B) granularity
• Read vs. writes
• Examples:Granularity Traffic Module Bandwidth
256B (4x64B) Read
256GB, 18W
8.3 GB/s
256B (4x64B) Write 3.0 GB/s
256B (4x64B) 2 Read/1 Write 5.4 GB/s
64B Read 2.13 GB/s
64B Write 0.73 GB/s
64B 2 Read/1 Write 1.35 GB/s
Intel® Optane™ DC Persistent Memory latency▪ Orders of magnitude lower latency than SSD
▪ 2X read/write bandwidth vs disk, with one module, more with multiple modules
Ranges from 180ns to 340ns(vs. DRAM ~70ns)
Optane SSD (4KB) (P4800X)
DCPMM (256B)Read
Read idle latency
Smaller granularity(vs. 4K)
DCPMM (256B)Read/Write
NAND SSD (4KB) (P4610)
1. 256B granularity (64B accesses). Note 4K granularity gives about same performance as 256B
Performance results are based on testing as of Feb 22, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure.
Results have been estimated based on tests conducted on pre-production systems, and provided to you for informational purposes. Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to
www.intel.com/benchmarks. Configuration: see slide 41 and 42.
DCPMM (64B)Read/Write
DCPMM (64B)Read
Lo
we
r is
be
tte
r
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
More capacity
Go fasterMINIMIZE DOWNTIME
SAVE MORECost / DB Terabyte
3 TB DRAM + 6 TB Intel® Optane™ DC persistent memory = 9TB TOTAL
Restart time
20 mins Restart time
90 seconds
~$62,495 USD ~$38,357 USD
Faster restart15,1613x
Cost savings1639%
Performance results are based on testing as of Jan 30, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. For detailed configs and pricing see slide/page 55.
Columnar store entire reload into DRAM for 1.3TB dataset is 20 mins. Entire system restart before is 32 Minutes and with DC PMEM it is 13.5 Minutes (12 mins for OS + 1.5 mins)
Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision.
CPU: 4 Intel® Xeon® Platinum 8280M ProcessorMEMORY: 48 x 128GB DDR4 for 6 TB system
CPU: 4 Intel® Xeon® Platinum 8280L ProcessorMEMORY: 24 x 128GB DDR4 + 24 x 256GB Intel Optane DC persistent memory for 9TB system
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL.COM/OPTANEDCPERSISTENTMEMORY
DDR4 DRAM Only DDR4 DRAM with
CPU: Intel® Xeon® Platinum 8276 processorMEMORY: 768 GB DDR4 DRAM memory
CPU: Intel® Xeon® Platinum 8276 processorMEMORY: 192 GB DDR4 DRAM memory + 1 TB Intel® Optane™ DC persistent memory
Up to33% More
memory
30% Lower cost per VM18
System memory 768 GB
22 vms
~$1,588 USD/VM
system memory1 TB
30VMs
~$1,108 USD/VM
memory mode – Windows Server® 2019/Hyper-V
Up toSAVE MORE
Up to36% More vms
per node17DO MORE
Performance results are based on testing as of Jan 31, 2019 and may not reflect all publicly available security updates. No product or component can be absolutely secure.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. For detailed configs and pricing see slide/page 54.
Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
INTEL.COM/OPTANEDCPERSISTENTMEMORY
STORAGE
MEMORY
PERSISTENT MEMORY
DRAMHOT TIER
HDD / TAPECOLD TIER
Intel® 3D Nand SSDs
INTEL® OPTANE™ SSD DC D4800X DUAL-PORT • PERFORMANCE + RESILIENCY FOR CRITICAL ENTERPRISE IT APPS• DUAL PORT CONNECTIONS ENABLE 24X7 DATA AVAILABILITY
WITH REDUNDANT, HOT SWAPPABLE DATA PATHS
INTEL® SSD D-5 P4326 E1.L• COST-OPTIMIZED, ENABLES GREATER WARM STORAGE• NEW E1.L FORM FACTOR SCALABLE TO ~1PB (IN 1U)
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL.COM/OPTANE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
hpcAI / analyticsstorage Infrastructure COMMSdatabase
Hyper-Converged (HCI)
Machine Learning Analytics
Apache Spark*
Real Time Analytics
SAS*
RDMA/Replication
Oracle Exadata*
Content Delivery Network (CDN)
Comms SP custom
Caching/Persistence
SAP HANA*MS-SQL*
Oracle Exadata*Aerospike*
Redis* RocksDB*
Scratch & IO Nodes
HPC Flex Memory
SDS
Ceph*
Memory
VMWare ESXi*MSFT Hyper-V*
KVM*Redis Labs*
Memcached*VDI
Storage memoryVMware vSANMicrosoft S2D
Nutanix*Cisco HyperFlex*
VMWare ESXi*MSFT Hyper-V*
KVM*
INTEL.COM/OPTANE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
BEST FIT IS SHOWN, BOTH PRODUCTS MAY BE VIABLE
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
STORE MORE PROCESS EVERYTHING
INTEL ATOM® C PROCESSORS
Move Faster
SILICON PHOTONICS
INTEL® NERVANA™ NNP
SOFTWARE ANDSYSTEM-LEVELOPTIMIZED
HARDWAREENHANCEDSECURITY
FASTER TIME TOVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® MOVIDIUS™ TECHNOLOGY
OMNI-PATH FABRIC
ETHERNET
DC SERIES SSDQ L C 3 D N A N D D R I V E
INTEL® XEON® D PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
INTEL® FPGAS
APRIL 2 ANNOUNCEMENTS
Optimized Libraries
Optimized AI Frameworks
Compilers
Profiling tools
Toolkits /Tool Suites
SOFTWARE.INTEL.COM
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Parallel Programming FRAMEWORK
Intel® Math Kernel Library for Deep Neural Networks Intel® Machine Learning Scalable Library Intel® Data Analytics Acceleration Library Intel® Integrated Performance Primitives
TensorFlow*, PyTorch*Caffe*, PaddlePaddle*
MXNET*, …
Intel® VTune™ AmplifierIntel® Advisor
Intel® Inspector
Intel® Compilers (LLVM, GCC, Fortran)nGraph
Intel® Parallel Studio XE Intel® System Studio Intel® Distribution of OpenVINO toolkitIntel® Distribution for Python* Data Plane Development Kit Persistent Memory Development KitIntel® Resource Director Technology (OWCA & Intel® PRM) Storage Performance Development Kit
Threading Building BlocksIntel® MPI Library
OpenMP*
© COPYRIGHT 2019. INTEL CORPORATION.
OWCA: ORCHESTRATION-AWARE WORKLOAD COLLOCATION AGENTINTEL® PRM: INTEL® PLATFORM RESOURCE MANAGER
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 12:00PM (PACIFIC TIME)
What benefits have you achieved/do you expect to achieve from Solution Templates from trusted 3rd party?
Source: Reduce Complexity To Maximize Performance, a commissioned study conducted by Forrester Consulting on behalf of Intel, December 2018 https://www.intel.com/content/www/us/en/business/reduce-complexity-maximize-performance.html
Technical Benefits BUSINESS BENEFITS49%46%44%
46%42%37%
Better System Performance
Improved Security
Faster Deployment
Faster Time To Value
Reduced Operational Costs
Greater ROI
INTEL.COM/SELECTSOLUTIONS
© COPYRIGHT 2019. INTEL CORPORATION.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 12:00PM (PACIFIC TIME)
Benchmarked for specific workloads to
deliver optimal performance
Workload optimized
Deploy smoothly with pre-defined settings and system-wide tuning
Fast & easy to deploy
Eliminate guessworkthrough tightly specified
HW and SW components
Simplified evaluation
Simplify and accelerate deployment of workload-optimized infrastructure
INTEL.COM/SELECTSOLUTIONS
© COPYRIGHT 2019. INTEL CORPORATION.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 12:00PM (PACIFIC TIME)
Analytics Artificial Intelligence
Hybrid Cloud
Network Transformation
HPC
Microsoft Azure Stack*
Red Hat OpenShift* Container Platform
Huawei FusionStorage*
Blockchain: Hyperledger Fabric*
Microsoft SQL Server Enterprise Data
WarehouseLinux*
Genomics Analytics
BigDL on Apache Spark*Universal Customer Premises Equipment
NFVi: Red Hat*
NFVi: Ubuntu*
NFVi: FusionSphere*
Simulation & Modeling
ProfessionalVisualization
1H’19 Solution Refresh - Existing Workloads
SAP HANA*
AI Inferencing Microsoft SQL Server Enterprise Data
WarehouseWindows Server* VMware vSAN*
Visual Cloud Delivery Network
HPC AI ConvergedMicrosoft Windows Server Software Defined*
1H’19 Solutions on NEW Workloads
INTEL.COM/SELECTSOLUTIONS
© COPYRIGHT 2019. INTEL CORPORATION.
Microsoft SQL Server* Business Operations
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 12:00PM (PACIFIC TIME)
INTEL.COM/SELECTSOLUTIONS
© COPYRIGHT 2019. INTEL CORPORATION.
ISV/OSV DEVELOPMENT PARTNERS OEMs SCALE PARTNERS
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
STORE MORE PROCESS EVERYTHING
INTEL ATOM® C PROCESSORS
Move Faster
SILICON PHOTONICS
INTEL® NERVANA™ NNP
SOFTWARE ANDSYSTEM-LEVELOPTIMIZED
HARDWAREENHANCEDSECURITY
FASTER TIME TOVALUE
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® MOVIDIUS™ TECHNOLOGY
OMNI-PATH FABRIC
ETHERNET
DC SERIES SSDQ L C 3 D N A N D D R I V E
INTEL® XEON® D PROCESSORS
INTEL® XEON® SCALABLE PROCESSORS
INTEL® FPGAS
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
BUILDING UPON 20+ YEARS OF INTEL® XEON® PLATFORM LEADERSHIPINTEL IS LAUNCHING GROUNDBREAKING INNOVATIONS OPTIMIZED FOR DIGITAL TRANSFORMATION
Move Faster STORE MORE process everything SOFTWARE & SYSTEM-LEVEL OPTIMIZED
PERFORMANCEIMPROVEMENT
GENOVERGEN
MAINSTREAM PERFORMANCE IMPROVEMENTS
GENOVERGEN
ai performance with intel® dl boost
END-TO-END AI ACCELERATED PROCESSORS
GENOVERGEN
PERFORMANCEIMPROVEMENT
WORKLOAD SPECIALIZED FOR NFV, CLOUD AND IOT
Memorycapacity
GENOVERGEN
BREAKTHROUGH INNOVATION
Data-centric portfolio launch
accelerating workloads from the multi-cloud to the edge and back
Fastest Intel® Xeon® Adoption in History
2 5 12
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. Configurations and benchmark details can be found on slide/page 50 and 53. No product or
component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured
using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully
evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
200+ PARTNERDESIGNS
EARLY ACCESSPROGRAM
© COPYRIGHT 2019. INTEL CORPORATION.
&
© COPYRIGHT 2019. INTEL CORPORATION.
INTEL® XEON® PLATINUM # PROCESSOR2 # # α α
PROCESSOR LEVEL9 PLATINUM8 PLATINUM6 GOLD5 GOLD4 SILVER3 BRONZE
PROCESSOR GENERATION2 SECOND GENERATION1 FIRST GENERATION
PROCESSOR SKU## E.G. 20, 34, … LARGE DDR MEMORY TIER SUPPORT (UP TO 4.5TB)
MEDIUM DDR MEMORY TIER SUPPORT (UP TO 2.0TB)NETWORKING/NETWORK FUNCTION VIRTUALIZATIONSEARCHTHERMALVM DENSITY VALUEINTEL® SPEED SELECT TECHNOLOGY
LMNSTVY
PROCESSOR OPTIONS
ALL INFORMATION PROVIDED IS SUBJECT TO CHANGE WITHOUT NOTICE.INTEL MAY MAKE CHANGES TO SPECIFICATIONS AND PRODUCT DESCRIPTIONS AT ANY TIME, WITHOUT NOTICE.
CONTACT YOUR INTEL REPRESENTATIVE TO OBTAIN THE LATEST INTEL PRODUCT SPECIFICATIONS.© COPYRIGHT 2019. INTEL CORPORATION.
© COPYRIGHT 2019. INTEL CORPORATION.
NOTICES & DISCLAIMERS
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.
No product or component can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks .
Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as property of others.
© 2019 Intel Corporation.
NOTICES & DISCLAIMERS
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
FTC Optimization NoticeIf you make any claim about the performance benefits that can be achieved by using Intel software developer tools (compilers or libraries), you must use the entire text on the same viewing plane (slide) as the claim.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.Notice Revision #20110804
© COPYRIGHT 2019. INTEL CORPORATION.
Footnotes and Configuration Details
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in
performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
1 - 2x Average Performance Improvement compared with Intel® Xeon® Platinum 8180 processor. Geomean of est SPECrate2017_int_base, est SPECrate2017_fp_base, Stream Triad, Intel Distribution of Linpack, server side Java. Platinum 92xx vs Platinum
8180: 1-node, 2x Intel® Xeon® Platinum 9282 cpu on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_65, IC19u1, AVX512, HT on all (off Stream, Linpack), Turbo on all (off Stream, Linpack), result: est
int throughput=635, est fp throughput=526, Stream Triad=407, Linpack=6411, server side java=332913, test by Intel on 2/16/2019. vs. 1-node, 2x Intel® Xeon® Platinum 8180 cpu on Wolf Pass with 384 GB (12 X 32GB 2666) total memory, ucode 0x200004D on
RHEL7.6, 3.10.0-957.el7.x86_65, IC19u1, AVX512, HT on all (off Stream, Linpack), Turbo on all (off Stream, Linpack), result: est int throughput=307, est fp throughput=251, Stream Triad=204, Linpack=3238, server side java=165724, test by Intel on 1/29/2019.
2 - Up to 30X AI performance with Intel® DL Boost compared to Intel® Xeon® Platinum 8180 processor (July 2017). Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282(56 cores per socket), HT ON, turbo ON, Total
Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS:SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187,
MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer DummyData:3x224x224, 56
instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM.
CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘,
OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command,
training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from
https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.
3 – Up to 5.8X better performance than AMD EPYC 7601 compared to Intel® Xeon® Platinum 9282 processor running LINPACK. AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS
ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD BLIS
ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score =1095GFs, tested by Intel as of July 31, 2018. vs. 1-node, 2x Intel® Xeon® Platinum 9282 cpu on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6,
3.10.0-957.el7.x86_65, IC19u1, AVX512, HT off, Turbo on, score=6411, test by Intel on 2/16/2019. 1-node, 2x Intel® Xeon® Platinum 8280M cpu on Wolf Pass with 384 GB (12 X 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_65,
IC19u1, AVX512, HT off Linpack, Turbo on, score=3462, test by Intel on 1/30/2019.
4 – Up to 3.50X 5-Year Refresh Performance Improvement VM density compared to Intel® Xeon® E5-2600 v6 processor: 1-node, 2x E5-2697 v2 on Canon Pass with 256 GB (16 slots / 16GB / 1600) total memory, ucode 0x42c on RHEL7.6, 3.10.0-957.el7.x86_65, 1x Intel
400GB SSD OS Drive, 2x P4500 4TB PCIe, 2*82599 dual port Ethernet, Virtualization Benchmark, VM kernel 4.19, HT on, Turbo on, score: VM density=74, test by Intel on 1/15/2019. vs. 1-node, 2x 8280 on Wolf Pass with 768 GB (24 slots / 32GB / 2666) total memory, ucode 0x2000056 on
RHEL7.6, 3.10.0-957.el7.x86_65, 1x Intel 400GB SSD OS Drive, 2x P4500 4TB PCIe, 2*82599 dual port Ethernet, Virtualization Benchmark, VM kernel 4.19, HT on, Turbo on, score: VM density=21, test by Intel on 1/15/2019.
5 –1.33X Average Performance Improvement compared to Intel® Xeon® Gold 5100 Processor: Geomean of est SPECrate2017_int_base, est SPECrate2017_fp_base, Stream Triad, Intel Distribution of Linpack, server side Java. Gold 5218 vs Gold 5118: 1-node,
2x Intel® Xeon® Gold 5218 cpu on Wolf Pass with 384 GB (12 X 32GB 2933 (2666)) total memory, ucode 0x4000013 on RHEL7.6, 3.10.0-957.el7.x86_65, IC18u2, AVX2, HT on all (off Stream, Linpack), Turbo on, result: est int throughput=162, est fp
throughput=172, Stream Triad=185, Linpack=1088, server side java=98333, test by Intel on 12/7/2018. 1-node, 2x Intel® Xeon® Gold 5118 cpu on Wolf Pass with 384 GB (12 X 32GB 2666 (2400)) total memory, ucode 0x200004D on RHEL7.6, 3.10.0-
957.el7.x86_65, IC18u2, AVX2, HT on all (off Stream, Linpack), Turbo on, result: est int throughput=119, est fp throughput=134, Stream Triad=148.6, Linpack=822, server side java=67434, test by Intel on 11/12/2018
© COPYRIGHT 2019. INTEL CORPORATION.
Footnotes and Configuration Details
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in
performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
6 – Up to 14X AI Performance Improvement with Intel® DL Boost compared to Intel® Xeon® Platinum 8180 Processor (July 2017). Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384
GB (12 slots/ 32GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep
Learning Framework: Intel® Optimization for Caffe version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model:
https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, DummyData, 4 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28
cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in
SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe:
(http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other
topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries
version 2018.0.20170425. Caffe run with “numactl -l“.
7 – Up to 3.1X Better Performance than AMD EPYC 7601 compared to Intel® Xeon® Platinum 8282 Processor running LINPACK. AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON,
BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD
BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score =1095GFs, tested by Intel as of July 31, 2018. vs. 1-node, 2x Intel® Xeon® Platinum 9282 cpu on Walker Pass with 768 GB (24x 32GB 2933) total memory, ucode 0x400000A on RHEL7.6,
3.10.0-957.el7.x86_65, IC19u1, AVX512, HT off, Turbo on, score=6411, test by Intel on 2/16/2019. 1-node, 2x Intel® Xeon® Platinum 8280M cpu on Wolf Pass with 384 GB (12 X 32GB 2933) total memory, ucode 0x400000A on RHEL7.6, 3.10.0-957.el7.x86_65,
IC19u1, AVX512, HT off Linpack, Turbo on, score=3462, test by Intel on 1/30/2019.
© COPYRIGHT 2019. INTEL CORPORATION.
Footnotes and Configuration Details
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in
performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
8 - 1x inference throughput improvement in July 2017 (baseline): Tested by Intel as of July 11th 2017: Platform: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate
driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables:
KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference
measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs
from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),and https://github.com/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners (ConvNet benchmarks; files were updated to use newer Caffe prototxt format but
are functionally equivalent). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.
9 - 5.7x inference throughput improvement in December 2018 vs baseline: Tested by Intel as of November 11th 2018 :2 socket Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz / 28 cores HT ON , Turbo ON Total Memory 376.46GB (12slots / 32 GB / 2666
MHz). CentOS Linux-7.3.1611-Core, kernel: 3.10.0-862.3.3.el7.x86_64, SSD sda RS3WC080 HDD 744.1GB,sdb RS3WC080 HDD 1.5TB,sdc RS3WC080 HDD 5.5TB , Deep Learning Framework Intel® Optimization for caffe version:
551a53d63a6183c233abaa1a19458a25b672ad41 Topology::ResNet_50_v1 BIOS:SE5C620.86B.00.01.0014.070920180847 MKLDNN: 4e333787e0d66a1dca1218e99a891d493dbc8ef1 instances: 2 instances socket:2 (Results on Intel® Xeon® Scalable
Processor were measured running multiple instances of the framework. Methodology described here: https://software.intel.com/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi) NoDataLayer. Datatype: INT8
Batchsize=64 vs Tested by Intel as of July 11th 2017:2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release
7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56,
CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe
time” command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models
(ResNet-50). Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.
10 – 14x inference throughput improvement vs baseline: Tested by Intel as of 2/20/2019. 2 socket Intel® Xeon® Platinum 8280 Processor, 28 cores HT On Turbo ON Total Memory 384 GB (12 slots/ 32GB/ 2933 MHz), BIOS:
SE5C620.86B.0D.01.0271.120720180605 (ucode: 0x200004d), Ubuntu 18.04.1 LTS, kernel 4.15.0-45-generic, SSD 1x sda INTEL SSDSC2BA80 SSD 745.2GB, nvme1n1 INTEL SSDPE2KX040T7 SSD 3.7TB, Deep Learning Framework: Intel® Optimization for
Caffe version: 1.1.3 (commit hash: 7010334f159da247db3fe3a9d96a3116ca06b09a) , ICC version 18.0.1, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a, model:
https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, DummyData, 4 instance/2 socket, Datatype: INT8 vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28
cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in
SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe:
(http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, dummy dataset was used. For other
topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries
version 2018.0.20170425. Caffe run with “numactl -l“.
11 - 30x inference throughput improvement with CascadeLake-AP vs baseline: Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282(56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/
2933 MHz), BIOS:SE5C620.86B.0D.01.0241.112020180249, Centos 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit
hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer DummyData:3x224x224, 56 instance/2 socket, Datatype: INT8
vs Tested by Intel as of July 11th 2017: 2S Intel® Xeon® Platinum 8180 CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core),
Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set
with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time”
command. For “ConvNet” topologies, dummy dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-
50),. Intel C++ compiler ver. 17.0.2 20170213, Intel MKL small libraries version 2018.0.20170425. Caffe run with “numactl -l“.
© COPYRIGHT 2019. INTEL CORPORATION.
Footnotes and Configuration Details
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in
performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
12 – Up to 1.25 to 1.58X NVF Workload Performance Improvement comparing Intel® Xeon® Gold 6230N processor to Intel® Xeon® Gold 6130 processor.
VPP IP Security: Tested by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:
PLYDCRB1.86B.0155.R08.1806130538, ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: VPP IPSec w/AESNI (AES-GCM-128) (Max Gbits/s (1420B)), Workload version: VPP v17.10, Compiler: gcc7.3.0, Results: 179. Tested
by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019
(HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: VPP IPSec w/AESNI (AES-GCM-128) (Max Gbits/s (1420B)), Workload version: VPP v17.10, Compiler: gcc7.3.0, Results: 225
VPP FIB: Tested by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYDCRB1.86B.0155.R08.1806130538, ucode:
0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: VPP FIB (Max Mpackets/s (64B)), Workload version: VPP v17.10 in ipv4fib configuration, Compiler: gcc7.3.0, Results: 160. Tested by Intel on 1/17/2019 1-Node, 2x Intel® Xeon® Gold
6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel:
4.20.0-042000rc6-generic, Benchmark: VPP FIB (Max Mpackets/s (64B)), Workload version: VPP v17.10 in ipv4fib configuration, Compiler: gcc7.3.0, Results: 212.9
Virtual Firewall: Tested by Intel on 10/26/2018 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 4x Intel X710-DA4, Bios: PLYDCRB1.86B.0155.R08.1806130538,
ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: Virtual Firewall (64B Mpps), Workload version: opnfv 6.2.0, Compiler: gcc7.3.0, Results: 38.9. Tested by Intel on 2/04/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on
Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-
generic, Benchmark: Virtual Firewall (64B Mpps), Workload version: opnfv 6.2.0, Compiler: gcc7.3.0, Results: 52.3
Virtual Broadband Network Gateway: Tested by Intel on 11/06/2018 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:
PLYDCRB1.86B.0155.R08.1806130538, ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: Virtual Broadband Network Gateway (88B Mpps), Workload version: DPDK v18.08 ip_pipeline application, Compiler: gcc7.3.0, Results:
56.5. Tested by Intel on 1/2/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.PFT.0569.D08.1901141837, ucode:
0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Virtual Broadband Network Gateway (88B Mpps), Workload version:DPDK v18.08 ip_pipeline application, Compiler: gcc7.3.0, Results: 78.7
VCMTS: Tested by Intel on 1/22/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Supermicro*-X11DPH-Tq platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 4x Intel XXV710-DA2, Bios: American Megatrends Inc.*
version: '2.1', ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Virtual Converged Cable Access Platform (iMIX Gbps), Workload version: vcmts 18.10, Compiler: gcc7.3.0 , Other software: Kubernetes* 1.11, Docker*
18.06, DPDK 18.11, Results: 54.8. Tested by Intel on 1/22/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:
PLYXCRB1.PFT.0569.D08.1901141837, ucode: 0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Virtual Converged Cable Access Platform (iMIX Gbps), Workload version: vcmts 18.10 , Compiler: gcc7.3.0, Other software:
Kubernetes* 1.11, Docker* 18.06, DPDK 18.11, Results: 83.7
OVS DPDK: Tested by Intel on 1/21/2019 1-Node, 2x Intel® Xeon® Gold 6130 Processor on Neon City platform with 12x 16GB DDR4 2666MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 4x Intel XXV710-DA2, Bios: PLYXCRB1.86B.0568.D10.1901032132,
ucode: 0x200004d (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.15.0-42-generic, Benchmark: Open Virtual Switch (on 4C/4P/8T 64B Mpacket/s), Workload version: OVS 2.10.1, DPDK-17.11.4, Compiler: gcc7.3.0, Other software: QEMU-2.12.1, VPP v18.10, Results: 9.6.
Tested by Intel on 1/18/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios: PLYXCRB1.86B.0568.D10.1901032132, ucode:
0x4000019 (HT= ON, Turbo= OFF), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Open Virtual Switch (on 6P/6C/12T 64B Mpacket/s), Workload version: OVS 2.10.1, DPDK-17.11.4, Compiler: gcc7.3.0, Other software: QEMU-2.12.1, VPP v18.10, Results:
15.2. Tested by Intel on 1/18/2019 1-Node, 2x Intel® Xeon® Gold 6230N Processor with SST-BF enabled on Neon City platform with 12x 16GB DDR4 2999MHz (384GB total memory), Storage: 1x Intel® 240GB SSD, Network: 6x Intel XXV710-DA2, Bios:
PLYXCRB1.86B.0568.D10.1901032132, ucode: 0x4000019 (HT= ON, Turbo= ON (SST-BF)), OS: Ubuntu* 18.04 with kernel: 4.20.0-042000rc6-generic, Benchmark: Open Virtual Switch (on 6P/6C/12T 64B Mpacket/s), Workload version: OVS 2.10.1, DPDK-17.11.4, Compiler:
gcc7.3.0, Other software: QEMU-2.12.1, VPP v18.10, Results: 16.9
© COPYRIGHT 2019. INTEL CORPORATION.
Footnotes and Configuration Details
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)
Performance results are based on testing as of dates shown in configuration and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Software and workloads used in
performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to
any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit www.intel.com/benchmarks.
13
> 45% latency reduction with Open Source
Redis using 2nd gen Intel® Xeon®
Scalable Processors and Intel®
Ethernet 800 Series with ADQ.
Calculation: (new - old) / old x 100%
Rtt Average Latency across all run for baseline vs ADQ
(382-1249)/1249 * 100% = -69% Reduction
in Rtt Average Latency
14
> 30% throughput improvement with Open Source
Redis using 2nd gen Intel® Xeon®
Scalable Processors and Intel®
Ethernet 800 Series with ADQ.
Calculation: (new - old) / old x 100%
Transaction Request Rate across all run for baseline vs ADQ
(79601-44345)/44345 * 100% = 80% Throughput Improvement
Source: Intel internal testing as of February 2019
Configuration details provided in the following tables:
© COPYRIGHT 2019. INTEL CORPORATION.
SUT ClientTest by Intel IntelTest date 2/11/2019 2/11/2019
Platform Intel® Server Board S2600WFTF Dell* PowerEdge* R720
# Nodes 1 11# Sockets 2 2
CPU 2nd Generation Intel® Xeon® Scalable processor 8268 @ 2.8GHz Intel® Xeon® processor E5-2697 v2
Cores/socket, Threads/socket 24/48 12 / 24
ucode 0x3000009 0x428
HT On On
Turbo On On
BIOS version SE5C620.86B.01.00.0833.051120182255 2.5.4
System DDR Mem Config: slots / cap / run-speed 8 slots / 128GB / 2400MT/s 16 slots / 128GB / 1600MT/s
System DCPMM Config: slots / cap / run-speed 2 slots / 1024GB -
Total Memory/Node (DDR+DCPMM) 1024GB 128GB
Storage - boot 1x Intel SSD (OS Drive 64GB) 1x Dell (OS Drive 512GB)
Storage - application drives - -
NIC 1x Intel® Ethernet Network Adapter E810-CQDA2 1x Intel® Ethernet X520-DA2
PCH Intel® C620 Series Chipset Intel® C600 Series Chipset
Other HW (Accelerator)
OS CentOS 7.6 CentOS 7.4
Kernel 4.19.18 (Linux.org Stable) 3.10.0-693.21.1.el7
IBRS (0=disable, 1=enable) 1 0
eIBRS (0=disable, 1=enable) 0 0
Retpoline (0=disable, 1=enable) 1 0
IBPB (0=disable, 1=enable) 1 0
PTI (0=disable, 1=enable) 1 1
Mitigation variants (1,2,3,3a,4, L1TF) 1,2,3,L1TF 1,2,3,L1TF
Workload & version Redis 4.0.10 redis-benchmark 4.0.10
Compiler gcc (GCC) 4.8.5 20150623 -
NIC Driver ice 08.15 ixgbe 4.4.0-k-rh7.4
Footnotes and Configuration Details
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
Baseline Config(DRAM)
AD 2-2-2 Config
System Lightning Ridge (4S) Lightning Ridge (4S)
CPU Intel® Xeon® 8280M Intel® Xeon® 8280L
CPUs per node 4-socket @ 28 core / socket 4-socket @ 28 core / socket
Memory 6TB48x 128GB DDR4 @ 2666 MT/s
9TB24x 256GB Intel® Optane™ DC PMEM24x 128GB DDR4 @ 2666 MT/s
Network 10 GbE Intel X520 NIC 10 GbE Intel X520 NIC
Storage 60x Intel SSD DC S4600 SATA 480GB TB
90x Intel SSD DC S4600 SATA 480GB TB
BIOS WW48’18 WW48’18
OS or VM version
SUSE 15 SUSE 15
WL Version Intel IT workload Intel IT workload
SAP HANA* database size
3TB 6TB
Securitymitigations
Variants 1,2,3 enabled Variants 1,2,3 enabled
Date costsprojected
March 1, 2019 March 1, 2019
15 - 13x faster restart time and 39% less cost configs: 16 - 39% less cost pricing details:
Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision.
UNDER EMBARGO UNTIL 2 APRIL 2019 AT 10:00 AM (PACIFIC TIME)© COPYRIGHT 2019. INTEL CORPORATION.
Footnotes and Configuration Details17 & 18 - 36% more VMs per node & 30% lower estimated cost per VM configs:
Pricing Guidance as of March 1, 2019. Intel does not guarantee any costs or cost reduction. You should consult other information and performance tests to assist you in your purchase decision.
Config1-DDR4 (Similar Cost) Config2-Intel Optane DC persistent memory (Similar Cost)
Test by Intel Intel
Test date 01/31/2019 01/31/2019
Platform Confidential – Refer to M. Strassmaier if a need to know exists
Confidential – Refer to M. Strassmaier if a need to know exists
# Nodes 1 1
# Sockets 2 2
CPU Cascade Lake B0 8272L Cascade Lake B0 8272L
Cores/socket, Threads/socket 26/52 26/52
HT ON ON
Turbo ON ON
BKC version – E.g. ww47 WW42 WW42
Intel Optane DC persistent memory FW version
5253 5253
System DDR Mem Config: slots / cap / run-speed
24 slots / 32GB / 2666 12 slots / 16 GB / 2666
System DCPMM Config: slots / cap / run-speed
8 slots /128GB / 2666
Total Memory/Node (DDR, DCPMM) 768GB, 0 192GB, 1TB
Storage - boot 1x Samsung PM963 M.2 960GB 1x Samsung PM963 M.2 960GB
Storage - application drives 7 x Samsung PM963 M.2 960GB, 4x Intel SSDs S4600 (1.92TB)
7x Samsung PM963 M.2 960GB, 4x Intel SSDs S4600 (1.92TB)
NIC 1xIntel X520 SR2 (10Gb) 1x Intel X520 SR2 (10Gb)
PCH LBG QS/PRQ – T – B2 LBG QS/PRQ – T – B2
Other HW (Accelerator)
OS Windows Server 2019 RS5-17763 Windows Server 2019 RS5-17763
Kernel
Workload & version OLTP Cloud Benchmark OLTP Cloud Benchmark
Compiler
Libraries
Other SW (Frameworks, Topologies…)
© COPYRIGHT 2019. INTEL CORPORATION.