The Future of SPARC64 - Fujitsu...
Transcript of The Future of SPARC64 - Fujitsu...
0
The Future of SPARC64TM
Kevin Oltendorf VP, R&D Fujitsu Management Services of America
1
Raffle at the Conclusion
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Your benefits of Grandstand Suite Seats Free Beer, Wine and Fried snacks Early access with Fujitsu private shuttle
2
Today’s SPARC Enterprise Server Family
Server line-up fully adapted to changing business environments Supported OS : Oracle Solaris 10 and Oracle Solaris 11
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
M-Series
Server Consolidation Platform for Broad Range of Applications
The Most Cost efficient Platform for Web and Application Servers
T-Series
SPARC64 VII/VII+ High performance High scalability High reliability applicable to mission
critical server
SPARC T4-1, T4-2, T4-4 High throughput Energy and space saving High Performance up to 4 sockets/32 core
16 CPU(64 core) 3.0 GHz
64CPU(256 core) 3.0 GHz
4 CPU(16 core) 2.66 GHz
8 CPU(32 core) 2.66 GHz
4CPU(32core) 3.0GHz
M3000 1 CPU(2/4 core)
2.86 GHz
T4-4 1CPU(8core)
2.85GHz
T4-1 2CPU(16core)
2.85GHz
T4-2
M4000 M5000 M8000 M9000
3
Video: SPARC64™ X Introduction Evolution from SPARC64 VIIIfx (K computer)
• 8 core/socket SPARC64 VIIIfx -> 16 core/socket SPARC64 X • 1 thread/core -> 2 threads/core • <2GHz clock speed -> 3GHz • -> 24MB shared L2$ • -> ~3 Billion transistors • -> 28nm CMOS technology • -> 288 GIPS / 102 GB/s (peak) memory throughput • -> SWoC (Software on Chip)
Hybrid air/water cooling Scalable building block design High Speed Interconnect
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
5
Fujitsu Processor Development
2000~2003 ~1999
SPARC64
SPARC64 II
Tr=190M CMOS Cu 130nm
Tr=30M CMOS Cu
180nm / 150nm
Tr=190M0 CMOS Cu 130nm
SPARC64 V
SPARC64 GP
Tr=30M CMOS Al
250nm / 220nm
Tr=46M CMOS Cu 180nm
GS8900
:Technology generation
GS21 600
GS8600
GS8800B
Tr=400M CMOS Cu
90nm
SPARC64 VII
GS21 SPARC64
V +
SPARC64
VI
GS8800
GS21 900
Tr=500M CMOS Cu
90nm Tr=10M CMOS Al 350nm
Tr=540M CMOS Cu
90nm
SPARC64™ Processor
Mainframe
Tr=600M CMOS Cu
65nm
Tr=760M CMOS Cu
45nm
Hig
h Pe
rform
ance
Te
chno
logy
H
igh
Relia
bilit
y Te
chno
logy
Store Ahead Branch History Prefetch
Single-chip CPU
Non-Blocking $ O-O-O Execution Super-Scalar
L2$ on Die
HPC-ACE System On Chip Hardware Barrier
$ ECC Register/ALU Parity Instruction Retry $ Dynamic Degradation RC/RT/History
Multi-core Multi-thread
2004~2007 2008~2011
SPARC64 GP
2012~
SPARC64 IXfx
SPARC64 X
SPARC64 VIIIfx
Virtual Machine Architecture Software On Chip Tr=1B
CMOS Cu 40nm
Tr=2.95B CMOS Cu
28nm
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
6
SPARC64™ X Design Concept
Combine UNIX and HPC Fujitsu processor features to realize an extremely high throughput UNIX processor.
• SPARC64TM VII/VII+ (UNIX processor) features • High CPU frequency (up-to 3GHz) • Multicore/Multithread • Scalability: up to 64 sockets
• SPARC64TM VIIIfx (HPC processor) features • HPC-ACE: Innovative ISA extensions to SPARC-V9 • High Memory B/W: peak 64GB/s, Embedded Memory Controller
Added new features vital to current and future UNIX servers
• Virtual Machine Architecture • Software On Chip • Embedded IOC (PCI-GEN3 controller) • Direct CPU-CPU interconnect
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
7
SPARC64™ X Chip Overview
Architecture Features • 16 cores x 2 threads • SWoC (Software on Chip) • Shared 24 MB L2$ • Embedded Memory and IO Controller
28nm CMOS • 23.5mm x 25.0mm • 2,950M transistors • 1,500 signal pins • 3GHz
Performance (peak) • 288GIPS/382GFlops • 102GB/s memory throughput
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
DDR3 Interface
DDR3 Interface
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
Core Core
L2 Cache Data
L2 Cache Data
L2 Cache Control
MAC
MAC
8
SPARC64TM X interconnects
SPARC64TM VII/VII+ interconnects 4 CPUs require 8 additional LSIs to be
connected with DIMM
DIMM i/f: 4.35GB/s (STREAMtriad)
SPARC64TM X interconnects No additional LSIs to be connected with
DIMM
DIMM i/f: 65.6GB/s (STREAMtriad) • CPU i/f: 14.5GB/s x 5ports (peak) 3
ports: glueless 4way CPU interconnect
• 2 ports: > 4way CPU
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
SC
SC
MAC
MAC
MAC
MAC
CPU
CPU
CPU
CPU
DDR2 DIMMs SC
SC
SPARC64TM VII/VII+ interconnects SPARC Enterprise M8000
SPARC64TM X interconnects
DDR2 DIMMs
DDR2 DIMMs
DDR2 DIMMs
DDR3 DIMMs DDR3
DIMMs DDR3
DIMMs DDR3
DIMMs
102GB/s CPU
14.5GB/s
CPU
CPU
CPU
9
Reliability, Availability, Serviceability
New RAS features from SPARC64™ VII/VII+ Floating-point registers are ECC protected #checkers increased to ~53,000 to identify a failure point more precisely → Guarantees Data Integrity Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
1bit error Correctable
SPARC64TM X RAS diagram
1bit error harmless
1bit error Detectable
Units Error detection and correction scheme
Cache (Tag) ECC / Duplicate & Parity
Cache (Data) ECC / Parity
Register ECC (INT/FP) / Parity(Others)
ALU Parity/Residue
Cache dynamic degradation Yes
HW Instruction Retry Yes
History Yes
10
SPECint®_rate2006
0
5,000
10,000
15,000
20,000
25,000
IBM POWER 7 SPARC64 X (64 sockets)
11,300
22,445
The IBM POWER7 result reflects a result published as of November, 2010. The SPARC64 X prototype result is a measured estimate run on pre-production hardware. SPEC and SPECint are registered trademarks of the Standard Performance Evaluation Corporation. For the latest SPEC benchmark results, visit http://www.spec.org
(estimated)
SPARC64 X Prototype provided
twice the performance of IBM
Power 7 server, based on
measured estimates.
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
x2
11
Software on Chip performance - Highlight
(*) : HASH is one of the functions being used frequently in DBMS transactions.
NUMBER
x430
Crypto x163
Compare x15 Copy x12 HASH (*) x7
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
12
NUMBER Performance
Format conversion required between decimal on Database and binary on Processors
Exclusive arithmetic unit inside of SPARC64 X offloads the conversion to hardware.
0
500
1000
SPARC64 VII+(3GHz)
SPARC64 X(3GHz)
103
793
w/Vector
w/o Vector
SUM(8B+8B)
Chip Throughput
Mop/s
45,044
x430
x7.7
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
13
Crypto Performance
・ Crypto-compliance
mandatory for current & future ICT transactions
・ SPARC64 X’s micro-
architecture boosts encryption & decryption processes
0
10,000
20,000
30,000
40,000
50,000
SPARC64 VII+(3GHz)
SPARC64 X(3GHz)
295
46,880
Chip Throughput
MB/s
X158.9
0
10,000
20,000
30,000
40,000
50,000
SPARC64 VII+(3GHz)
SPARC64 X(3GHz)
286
46,495
Encrypt Decrypt
X162.6
MB/s
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
14
STREAM Benchmark Triad
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
IBM POWER 595 SPARC64 X (64 sockets)
806
3527 X4.9
SPARC64 X prototype provides
x5 performance of IBM Power 595,
based on measured estimates
SPARC64 X result is measured by pre-production system. Source of IBM POWER 595 data is in http://www.streambench.org/
15
SPARC64TM X CPU Summary
SPARC64TM X is Fujitsu’s10th SPARC processor, designed to be used for Fujitsu’s next generation UNIX server
SPARC64TM X integrates 16cores + 24MB L2 cache with over 100GB/s(peak) memory bandwidth
SPARC64TM X increases strong RAS features
SPARC64TM X chip is up and running in the lab SPARC64TM X has shown 7 times throughput of SPARC64TM VII+ w/o compiler
tuning
SWoC is effective to accelerate specific software functions Fujitsu will continue to develop SPARC64TM series
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
16
Cooling
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
17
Smart Cooling Chassis
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Straight Cooling Frame • Backplane-less
LLC Liquid Loop Cooling system
New Technologies
High -Efficiency/Density PSU
Reduced Size: (No metal heat sinks)
Space savings & performance: CPUs & DIMMs much closer, without engineering compromises
Reduced Size: (50% smaller than prior PSUs with same rated amperage)
Less ventilation space & lower fan power
18 Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Hybrid Cooling with LLC (Liquid Loop Cooling) • Air Cooling + Liquid Cooling Coolant circulates heat from CPUs to a radiator to be air-cooled. • Self-contained liquid cooling for System-on-Chip CPU • Anti-freeze-like liquid flows through plates, tubes, tanks & radiator. • Employs new gasket technology • 2 LLCs per Building Block, each cooling 2 CPUs • 12x2 (redundant) inline micro-pumps per Building Block • Pump rotation and liquid temperature constantly monitored • Heatsink-less design allows components to be densely placed
• Lower latency • Saves 3U of space in each 4-socket Building Block • Weight savings of 55kg per Building Block
• Self-contained cooling for the life of the system • No maintenance required • No external pipes/hoses
Air Cooling
Radiator
Pump
Liquid Cooling
19
SPARC64TM X Prototype Systems
Prototype 4U Prototype
1U Prototype
20
SPARC64TM X Prototype Power On (Years in development/thousands of staff overcoming challenges)
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
21
Scalable Building Blocks
Linear Hardware Scalability Investment Protection: start small,
grow very big Maximum 16 Building Blocks tested
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Linear Performance/Scalability with unique Fujitsu 14.5GB/s Optical Interconnects (no software stack/no device drivers to interrupt OS): Embedded System on Chip IOCs: connect up to 4BBs. For >4BBs, IOBox is used.
22
Solaris Operating Environment
SPARC64TM X Prototypes in testing w/Solaris 10/11 internal builds sun4v architecture with Oracle VM support (same as T-series) Solaris 10 & 11 Control and Guest Domains
High performance and manageability Uniform server management and same OS across all platforms Scalable up to max. 1024 cores / 2048 threads
Investment protection through binary compatibility Long history of Solaris binary compatibility over ten years assures customer investment
protection Variety and choice in ISV solutions Many more ISV products available on Solaris
TCO reduction through virtualization at no extra cost Virtualization functions such as Oracle VM and software partitioning (Solaris Containers) can be
used at no extra cost
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
23
Oracle VM for SPARC
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Hypervisor (firmware)
S11 Control Domain
Oracle Solaris 10 Guest Domain
Oracle Solaris 11 Guest Domain
Solaris 11 Container
Solaris 10 Container
Solaris 10 Container
Solaris 8/9 Container
Application A Application B Application C Application D
• Flexible virtualization at the firmware layer • Virtual hardware domain has CPU thread-level granularity • Multiple OS versions co-exist • Redundant Virtual I/O Service • Common management interface across SPARC (Same OVM for T-series) • Finer granularity ensures the most efficient resource utilization • No CPU or Memory virtualization overhead • Choice: Hardware or Virtual partitioning + Solaris Containers!
24 24 Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
25
26
Raffle
Copyright 2012 Fujitsu Management Services of America, Inc. All rights reserved.
Your benefits of Grandstand Suite Seats Free Beer, Wine and Fried snacks Early access with Fujitsu private shuttle