7a.3.09
-
Upload
madhunandha9285 -
Category
Documents
-
view
216 -
download
0
Transcript of 7a.3.09
-
8/7/2019 7a.3.09
1/41
1
Power-efficient scalable multi-
core and high-speed IO clockingarchitecture
Nasser Kurd
Praveen Mosalikanti
Intel Corporation
Session 7A
CMOSETSep 25, 09
-
8/7/2019 7a.3.09
2/41
2
Introduction
Clocking has significant impact on power andperformance Large percentage of power
Deep state exit latencies & voltage/frequency transitions
Clock skew margins IO timing: QPI, PCIe, DDR
Adaptive techniques to reduce power and improvemargins
Talk cover: clock circuit innovations enablingpower-efficient, scalable and modular Intel Core i7and i5 (Nehalem) family
-
8/7/2019 7a.3.09
3/41
3
Outline
High level Nehalem overview
Clock Generation
Clock Distribution Adaptive Frequency System
Intel QuickPath Technology
Clocking Conclusion
-
8/7/2019 7a.3.09
4/41
4
The First Nehalem Processor
A Modular Design forA Modular Design for
FlexibilityFlexibility
MiscI
O
MiscI
O
QPI1
QPI0
Memory Controller
Core Core Core CoreQu
eue
Shared L3 Cache
QPI:IntelQuickPath
Interconnect
BW up to
~25.6GB/sNehalem: Next Generation Intel Microarchitecture
Memory BW
up to32GB/s
-
8/7/2019 7a.3.09
5/41
5
Clock Generation Architecture
-
8/7/2019 7a.3.09
6/41
6
Clock Generation Design Goals
Nehalem: Next Generation Intel Microarchitecture
Modular & scalable
Decoupled frequency and voltages
Power efficient clocking architecture
Q
P
I
0
Q
P
I
1
Memory Controller
CoreCore CoreCore
LLC
-
8/7/2019 7a.3.09
7/41
7
PLL Architecture
Local PLL placement On-die LVR per PLL
FPLLBCLK
133MHz
UPLL
CPLL CPLL CPLLQPLL QPLL
DPLL
CPLL
LPLLCPLL: Core PLLQPLL: QPI PLL
FPLL: Filter PLL
DPLL: DDR PLL
UPLL: Un-core PLL
4.8, 5.9, 6.4GTs
800, 1066, 1333MTs
667-multi GHz
266, 533MHz
-
8/7/2019 7a.3.09
8/41
8
PLL Loop
Filter PLL: higher sampling frequencies Clock distribution in PLL loop
Adaptive duty cycle adjust loop
Adaptive clocking system
central
filter PLL
feedback
divider local
adaptive
PLL
core
feedback
divider
global clock
Dist.
ref ck
1X
2X
4X
analog
supply
digital
supply
fb ck local clocking
DCS
duty cycle
adjust
duty cycle
sentinel
-
8/7/2019 7a.3.09
9/41
9
Measured Lock Time And Jitter
30% jitter reduction
56% lock time reduction
lock time
long term jitter
1X 2X 4X
0.75
0.44
1
0.7
0.8
-
8/7/2019 7a.3.09
10/41
10
Why Adaptive clocking
Fixed Freq
Varying Core Digital Supply
Varying Freq
Varying Latency
Setup problem
CLK CLK
PLL
Clk Distribution
Data PathFlop Flop
Analog SupplyDigital Supply
Digital Supply
-
8/7/2019 7a.3.09
11/41
11
Why 1st droop
6666
-
8/7/2019 7a.3.09
12/41
12
Adaptive Frequency System (AFS)
Digital supply noise resistive coupling 1st droop
Voltage Compare And Track (VCAT) DC tracking
Frequency Voltage
freq
voltage
time
on-dieLVR
adaptive PLL
control
R1 R2
core
on-
boardVRM
V
C
O
PFDCP
clock
clock frequency control
analogsupply
supplycontrol
digitalsupply
mixer
VCAT
-
8/7/2019 7a.3.09
13/41
13
Adaptive Frequency Benefit
DC Load
Line
Core Current
Core
Vo
l tag
e
1st Droop
Transient
AFS
Benefit
-
8/7/2019 7a.3.09
14/41
14
Measured AFS Frequency Upside
Higher sensitivity increases benefit
Dependent on voltage, temp, & cores
0.1%
50%
99.9%
2.5% 5%0%
low sen. higher sen.
-
8/7/2019 7a.3.09
15/41
15
Summary of Clock Generation
Scalable performance and power efficient
architecture are enabled by
Filter PLL
Fast lock PLL
Local PLL with decoupled frequency and voltages Adaptive duty cycle correction
Adaptive Frequency
Improves top bin yield
Up to 5% frequency improvement at same voltage Lower power at same frequency
-
8/7/2019 7a.3.09
16/41
-
8/7/2019 7a.3.09
17/41
17
Core Clock Distribution Design
Metrics
Low power
High level of automation
Scalable to next process generation
Approach: pseudo-Grid topology
-
8/7/2019 7a.3.09
18/41
18
Core Clock Distribution
VerticalSpine Horizontal
Spine
M8
Grid
Wire
PLL
PLL
-
8/7/2019 7a.3.09
19/41
19
Un-Core Clock Distribution Issues
Reality of Un-Core Long routes & large variation in clock density
Multiple clock and voltage Domains
Difficult to fully automate
Un-Core Approach Hybrid clocking
Custom solution per domain
Clock grid in highly loaded regions
Point to point clock distribution in lightly loaded
Adaptive clock compensation
-
8/7/2019 7a.3.09
20/41
20
Un-Core Distribution
Architecture
L. Grid
LLC Spine
R. GridUPLL
CORE-0
CORE-2
CORE-3
CORE-1
-
8/7/2019 7a.3.09
21/41
21
Clock Distribution Summary
Extensive power/performance tradeoffs
in all clocks
Un-core custom solutions save routing
Trade higher skew for lower power
High degree of automation in core
Quickly retune for changes
Generates all required schematics/layout
-
8/7/2019 7a.3.09
22/41
22
Configurable Intel QuickPath
Technology Clocking Architecture
-
8/7/2019 7a.3.09
23/41
23
I/O Clock Design Goals
Enable very high bandwidth
interfaces
Tight clock specs Accumulated jitter
Jitter amplification
Duty cycle
Scalable clocking
Performance and power
-
8/7/2019 7a.3.09
24/41
24
IntelQuickPath Interconnect
(IntelQPI) TX/RX Clock Architecture
TX: low jitter PLL, duty cycle correction, shallow dist.
RX: TA-DLL, low swing distribution
TX RX
20 data pairs
1 clock pair
D Q
PI
DLL
DQ
CLK Amp/DCCCLK Driver
TX Data [20]
TX CLK RX CLK
RX Data [20]
TX
PLL
DCCTX
PLL
DCCDCC
full-swing
low-swing
phase distbias
-
8/7/2019 7a.3.09
25/41
-
8/7/2019 7a.3.09
26/41
26
Reduced I/O PLL Jitter
Lower VCO gain
Adjustable VCO range
Capacitive and load tuning
Decrease noise Increase current
Improve PSRR
On-die VR &exploit higher voltages
-
8/7/2019 7a.3.09
27/41
27
Transmit Duty Cycle Correction
Analog DCC integrated into transmit PLL
VCOCorrector
ck
ckb
Detector
err errb
DCC
CP
+LPF
PFD
/N
refclk
fbclk
-
8/7/2019 7a.3.09
28/41
-
8/7/2019 7a.3.09
29/41
29
IO DLL
Self-Biased DLL (SBDLL)
22.5 degree resolution
Frequency-based capacitive load tuning
improve performance
Time-Averaging
reduce jitter & restore duty cycle
Low swing distribution with PVT tracking
-
8/7/2019 7a.3.09
30/41
30
DLL Delay Element
Frequency-based capacitive load tuning (FCT)
Further extends delay range
pbias
nbias
in inboutb out
FCTenb[0]
FCTenb[1]
FCTenb[0]
FCTenb[1]
-
8/7/2019 7a.3.09
31/41
31
DLL Time AVG Concept 1
C L Kc y c l e n
C L Kc y c l e n-1
t1
t1/2
C L KT A
Phase mix adjacentcycles
Average HF jitter
-
8/7/2019 7a.3.09
32/41
32
DLL Time AVG Concept 2
P hn
P ho u t
P hn-1 Phase mixadjacent clock
phases
Uniform clock
phases
-
8/7/2019 7a.3.09
33/41
33
Time Average(Continued) pbias
nbias
in1 in1#
out# out
in2 in2#
-
8/7/2019 7a.3.09
34/41
34
SBDLL + TA1Ph1#
Ph1
Ph2#
Ph2
Ph3#
Ph3
Ph4#
Ph4
Ph5#
Ph5
Ph6#
Ph6
Ph7#
Ph7
Ph8#
Ph8
TA1
TA1
TA1
TA1
Ck0, Ck180
Ck45, Ck225
Ck90, Ck270 Ck135, Ck315
-
8/7/2019 7a.3.09
35/41
35
SBDLL + TA1 + TA2
TA1 TA1 TA1 TA1
TA2 TA2 TA2 TA2
Ck0
Ck180
Ck45
Ck225
Ck90
Ck270 Ck135
Ck315
Ck0 Ck180Ck45 Ck225
Ck90 Ck270Ck135 Ck315
-
8/7/2019 7a.3.09
36/41
36
TA-DLL Jitter Attenuation Simulation
DLL Jitter attenuation ~27% Final attenuation at the receiver ~20%
0.2
0.4
0.6
0.8
1
1.2
1.4
Delay Line TA1 TA2 BFR PI
1.0
0.6
0.4
1.4
-
8/7/2019 7a.3.09
37/41
37
TA-DLL Duty Cycle Correction
Simulation
DLL +/- 15 duty cycle correction30
35
40
45
50
55
60
65
70
75
65% Input
35% Input
TA 50% Output
Delay Line TA1 TA2 BFR
-
8/7/2019 7a.3.09
38/41
38
Jitter Measurement: TA Disabled
PP jitter: 69.8ps
-
8/7/2019 7a.3.09
39/41
39
Jitter Measurement: TA Enabled
PP jitter reduction ~16%
-
8/7/2019 7a.3.09
40/41
40
I/O Summary
High Speed requires optimum clocking At transmit:
Shallow TX differential clock distribution
Optimally tuned transmit PLL
Transmit duty cycle correction
At receive: innovative receive DLL
27% jitter attenuation
+/-15% receive duty cycle correction
Low-swing clock distribution for better PSRR Continuous PVT tracking
-
8/7/2019 7a.3.09
41/41
41
Conclusion
Clock innovations key enabler modular & scalable processors
Power efficiency
Chip frequency adapts to power supply voltage and droops
Fast power state transitions with faster PLL lock time
Duty cycle adapts to transistor variationand lifetime stress
Dynamic clock skew compensation
High speed IO: QPI/DDR/PCIe
Optimized power, PLL and clock delivery
Jitter attenuating techniques