®
Clock Generation and Distribution for High-Performance Processors
Stefan RusuSenior Principal Engineer
Enterprise Microprocessor DivisionIntel Corporation
SoC 2004 – Stefan Rusu 2®
Outline
• Clock Distribution Trends• Distribution Networks• De-skew Circuits• Jitter Reduction Techniques• Clock Power Dissipation• Future Directions• Summary
SoC 2004 – Stefan Rusu 3®
Clock Definition and Parameters• The clock is a periodic synchronization signal used as a time
reference for data transfers in synchronous digital systems
• Skew– Spatial variation of the clock signal
as distributed through the chip– Global vs. local skew
• Clock jitter– Temporal variation of the clock
with respect to a reference edge– Long-term vs. cycle-to-cycle jitter
• Duty cycle variation– 50/50 design target
RefClk
EndClk
tskew
tjitter
thigh tlow
SoC 2004 – Stefan Rusu 4®
Processor Frequency Trend
10
100
1000
10000
1987 1989 1991 1993 1995 1997 1999 2001 2003 2005
Freq
uenc
y [M
Hz]
386486
Pentium®
Pentium Pro®
Pentium® II
Pentium® III
Pentium® 4
SoC 2004 – Stefan Rusu 5®
Clock Skew Trend
0
100
200
300
400
500
600
100 1000 10000Processor Frequency [MHz]
Clo
ck S
kew
[ps]
Source: ISSCC and JSSC papers
SoC 2004 – Stefan Rusu 6®
Relative Clock Skew
0
2.5
5
7.5
10
100 1000 10000Processor Frequency [MHz]
Clo
ck S
kew
as
Perc
enta
geof
Cyc
le T
ime
[%]
• Clock skew accounts in average for ~5% of the cycle time
Source: ISSCC and JSSC papers
SoC 2004 – Stefan Rusu 7®
Sources of Clock Skew• With a perfectly balanced distribution, device mismatch
is the largest contributor to the clock skew
0 20 40 60
Device Mismatch (Le)
Supply Mismatch
Load Mismatch
Temperature Mismatch
Percent
Geannopoulos, ISSCC-1998
SoC 2004 – Stefan Rusu 8®
Clock Jitter Trend
0
100
200
300
400
500
100 1000 10000Processor Frequency [MHz]
Clo
ck J
itter
[ps]
Source: ISSCC and JSSC papers
SoC 2004 – Stefan Rusu 9®
Outline
• Clock Distribution Trends• Distribution Networks• De-skew Circuits• Jitter Reduction Techniques• Clock Power Dissipation• Future Directions• Summary
SoC 2004 – Stefan Rusu 10®
Clock Distribution Networks
Mesh GridTree
Tapered H-TreeH-Tree X-Tree
SoC 2004 – Stefan Rusu 11®
Inductance Effect
Xanthopoulos, ISSCC-2001
SoC 2004 – Stefan Rusu 12®
Itanium® Processor Clock Hierarchy
GlobalDistribution
VCC/2
DSK
DSK
DSKPLL
CLKPCLKN
RCD
RCDDLCLK
OTB
RegionalDistribution
LocalDistribution
ReferenceClock
MainClock
Rusu, ISSCC-2000
SoC 2004 – Stefan Rusu 13®
Local Clock Distribution• Local clock distribution enables flexible skew
management to support:– Intentional clock skew insertion for timing optimization– Clock gating for power reduction
Normal Local Clock Buffers Intentional
Skew Buffer
CombinatorialBlock
Regional Clock Grid
Rusu, ISSCC-2000
SoC 2004 – Stefan Rusu 14®
Itanium® 2 Processor Clock Distribution• First level: Pseudo-differential, impedance matched branching,
balanced h-tree • Second level: balanced, width and length tuned binary h-tree• Second Level Clock Buffers: adjustable delay buffer• Gaters: all constant input loading with load-tuned drive strength
gaters
SLCBs(33)
primarydriver
Repeaters(5) Each SLCB ~70 tap points of ~8 gaters each
Anderson, ISSCC-2002
SoC 2004 – Stefan Rusu 15®
Optical Skew Probing
Vout
Idn
Photon
Vin
s
• Clock edge generates infrared photon emission• Emission peak indicates clock transition edge
Tam, VLSI Symposium, 2003
SoC 2004 – Stefan Rusu 16®
Optical Probing Results
Tam, VLSI Symposium, 2003
SoC 2004 – Stefan Rusu 17®
130nm Itanium® 2 Skew Profile
-100
10203040506070
1 6 11 16 21Clock Zone
Rel
ativ
e D
elay
(ps)
DefaultFuse AdjustedSCAN Adjusted
Tam, VLSI Symposium, 2003
SoC 2004 – Stefan Rusu 18®
Pentium® 4 Processor Clock Network
PLL
• 2GHz triple-spine clock distribution (180nm)
Kurd, JSSC-2001
SoC 2004 – Stefan Rusu 19®
90nm Clock Distribution
skew in ps
• Sub-10ps clock skew demonstrated in a 90nm processor using clock tree averaging
Bindal, ISSCC 2003
SoC 2004 – Stefan Rusu 20®
Pentium® 4 Processor Clock Skew
22ps
7ps
130nm Pentium® 4 Processor 90nm Pentium® 4 Processor
• 90nm design has 3x lower clock skew than the 130nm
Schutz, ISSCC 2004
SoC 2004 – Stefan Rusu 21®
Alpha* Processors ClockingProduct 21064 21164 21264
300MHz 600MHz
Transistors 1.7M 9.3M 9.3M
Process 0.75um 4ML 0.5um, 4ML 0.35um, 6ML
Power 25W 50W 72W
Clock load 2.75nF 3.75nF 2.8nF
Frequency 166MHz
* Other names and brands may be claimed as the property of others
pre-driver
final drivers
PLL
ClockFloorplan
Clockskewplot
Skew
(ps)
Chip Horizontal AxisChip Vertical Axis
15
30
45
60
75
0
Gronowski, JSSC 1998
SoC 2004 – Stefan Rusu 22®
1.2GHz Alpha* Processor Clock
GCLK
NCLK
L2LCLK L2RCLK
DLL DLL
DLL
PLL
Xanthopoulos, ISSCC-2001* Other names and brands may be claimed as the property of others
SoC 2004 – Stefan Rusu 23®
Power4* Clock Distribution
PLLBypassRef clk in
Clock Distribution
Ref clk out
Feedback
Global ClockGrid
PLL out 123
4
• Dual core, SOI process, 174M transistors• Measured clock skew below 25ps
Restle, ISSCC-2002* Other names and brands may be claimed as the property of others
SoC 2004 – Stefan Rusu 24®
YY
XX
Delay (ps) buffer
level 3
buffer level 2
grid
Tuned sector trees
Sector bufferslevel 4
buffer level 1
Power4* - 3D Skew Visualization
Restle,ISSCC-2002
800
700
600
500
400
300
200
100* Other names and brands may be claimed as the property of others
SoC 2004 – Stefan Rusu 25®
Outline
• Clock Distribution Trends• Distribution Networks• De-skew Circuits• Jitter Reduction Techniques• Clock Power Dissipation• Future Directions• Summary
SoC 2004 – Stefan Rusu 26®
Dual-Zone Clock Deskew
Delay Line
Delay SR
Right Spine
Left Spine CL
Core
PD
FB ClkX Clk
Clk_Gen
Deskew Ctl
Delay Line
Delay SR
Geannopoulos, ISSCC-1998
SoC 2004 – Stefan Rusu 27®
Itanium® Processor Clock Deskew
• Distributed array of deskew buffers to reduce process related skew
• 8 deskew clusters each holding up to 4 buffers
• 30 deskew zones
DSK
DSK
DSK
DSKDSK
DSKDSK
DSK
CDC
DSK = Cluster of 4 deskew buffers
CDC = Central Deskew ControllerRusu, ISSCC-2000
SoC 2004 – Stefan Rusu 28®
Itanium® Processor Deskew Buffer
Enable#
20-bit Delay Control Register
Output
Input
TAP I/F Step size = 8.5psDeskew range = 170ps
• Small step size enables fine skew control over a wide range
• TAP read / write access to Control Register enables faster timing debug and performance tuning
Rusu, ISSCC-2000
SoC 2004 – Stefan Rusu 29®
Pentium® 4 Processor Deskew
Logical diagram of the skew optimization circuit
Phase detector network
Kurd, JSSC-2001
SoC 2004 – Stefan Rusu 30®
Deskew Techniques Summary
Author Source ClockZones
SkewBefore
StepSize
60ps 12ps
8ps
Kurd ISSCC-01 47 64ps 16ps 8ps
Stinson ISSCC-03 23 60ps 7ps 7ps
110ps
ISSCC-98
SkewAfter
Geannopoulos
ISSCC-00
2
30
15ps
Rusu 28ps
• Clock deskew techniques compensate for device and interconnect within-die variations
• Deskew circuits cut clock skew to less than a quarter of the original value
SoC 2004 – Stefan Rusu 31®
Useful Clock Skew
050
100150200250300
Freq
uenc
y Im
prov
emen
t (M
Hz)
1 2 3 4
Samples
0
10
20
30
40
Freq
uenc
y Im
prov
emen
t (M
Hz)
1 2 3
Samples
Initial Stepping Subsequent Stepping
• Use de-skew buffers to insert intentional skew to maximize the processor operating frequency
• Larger benefit achieved in early steppings
Tam, VLSI Symposium, 2003
SoC 2004 – Stefan Rusu 32®
Outline
• Clock Distribution Trends• Distribution Networks• De-skew Circuits• Jitter Reduction Techniques• Clock Power Dissipation• Future Directions• Summary
SoC 2004 – Stefan Rusu 33®
Pentium® 4 Processor Jitter Reduction
Vcc
R
C
IVcc - IR
10% dip inCore Supply
2% dip inFilteredSupply
-60
-40
-20
0
20
40
60
0 10 20 30 40 50Cycle #
Jitte
r (ps
)
With FilterNo Filter
• RC-filtered power supply for clock drivers reduces clock distribution jitter
Kurd, JSSC-2001
SoC 2004 – Stefan Rusu 34®
Alpha* Processor Voltage Regulator
DLL
LPF -+
2.5V1.5V
-60
-50
-40
-30
-20
-10
0
1.0E+02 1.0E+04 1.0E+06 1.0E+08 1.0E+10Frequency [Hz]
PSR
R [d
B]
• Voltage regulator ensures optimum DLL tracking• Supply noise frequencies over 1MHz are attenuated
by more than 15dBXanthopoulos, ISSCC-2001
* Other names and brands may be claimed as the property of others
SoC 2004 – Stefan Rusu 35®
On-Die Clock Jitter Detector
ArrayPhase
Detector
Counter
Internal Clock
PostProcessCircuitry
+Registers
Phase bins
2
0.5 * DL
0.5 * DL
Digital LPF
n
inc/dec
ref
clk
up/dn#
Kuppuswamy, VLSI Symposium 2001
SoC 2004 – Stefan Rusu 36®
Array Phase Detectorclk ref
FF FF FF FF FF FF FFFFFF . . .. . .
• 7 elements above and below center, with increasing positive and negative built-in offset away from center
• Phase offset created by progressively delaying data wrt clock
SoC 2004 – Stefan Rusu 37®
Histogram Mode Operation
Array Phase Detector
XOR Logic
Error Detection Logic jitter error
coun
t
bins
SoC 2004 – Stefan Rusu 38®
Graph Mode OperationA
rray
Pha
se D
etec
tor
XOR
Log
icEr
ror D
etec
tion
Logi
c
jitter error encoded
bins
time
SoC 2004 – Stefan Rusu 39®
Outline
• Clock Distribution Trends• Distribution Networks• De-skew Circuits• Jitter Reduction Techniques• Clock Power Dissipation• Future Directions• Summary
SoC 2004 – Stefan Rusu 40®
Clock Power Breakdown Example• 30% of the total power is attributed to clock• Most of the clock power is used in the final
clock buffers and flip-flops
70.2%
26.2%
1.5%
2.1%
1st Level2nd Level3rd LevelRest of chip
Anderson, ISSCC-2002
SoC 2004 – Stefan Rusu 41®
Clock Power Reduction
• Reduce clock frequency– Multiple frequency domains– Dual edge triggered flip-flops
• Reduce voltage swing– Low swing clocks
Clock Power = f * C * V2
• Reduce clock loading– Clock gating– Clock-on-demand flip-flop– Optimized routing
SoC 2004 – Stefan Rusu 42®
Half Swing Clocking
• Requires four clock signals– Two clock phases with a swing between
Vdd and Vdd/2 drive the PMOS devices
– The other two phases with a swing between Gnd and Vdd/2 drive the NMOS transistors
• Experimental savings of 67% were demonstrated on a 0.5µm CMOS test chip with only 0.5ns speed degradation
• Requires additional area for the special clock drivers and suffers from skew problems between the four phases
Kojima, JSSC 1995
SoC 2004 – Stefan Rusu 43®
Clock-on-demand Flip-Flop
• Activates internal clock only when the input data will change the output - equivalent to single bit clock gating
• Longer setup time and sensitive to hold time violations Hamada, ISSCC 1999
SoC 2004 – Stefan Rusu 44®
XScale Processor Clock Gating• Three hierarchical clock
gating levels– Top level – stop clock
– Unit level – 83 enables
– Local clock buffers –400 unique enables C
LK S
PIN
E
GC
LK_IC1
GC
LK_RF1G
CLK_D
C1
GC
LK_MA2
GLB_M
A_EN
GCLK_DA1
GCLK_DA2
GCLK_IA1
GCLK_IA2DA_BNK1_EN#
DA_BNK1_EN
IA_BNK1_EN#
IA_BNK1_EN
GC
LK_D
A9
GC
LK_D
A10
GC
LK_I
A9
GC
LK_I
A10
EGCLK (M6)
(M5)
Clark, JSSC 11/2001
SoC 2004 – Stefan Rusu 45®
Dual Edge Triggered Flip-Flop
Mn3
Mn4
CLK
D
CLK3
CLK
CLK1
D
CLK4
CLK1
X Y
Inv1 Inv2 Inv3 Inv4
CLK CLK1 CLK2 CLK3 CLK4
Q
1st STAGE: X 1st STAGE: Y2nd STAGE
I1
Mp2
Mp3
CL
Q
Mn1
Mn2
Mp1
Mn7
Mn8
Mp6
Mn5
Mn6
Mp4
Mp5Mp8Mp7
Mn9
Mn10
I2
I3
• Operates at half the clock frequency• Requires tight control of the clock duty cycle
Nedovic, ESSCIRC 2002
SoC 2004 – Stefan Rusu 46®
Outline
• Clock Distribution Trends• Distribution Networks• De-skew Circuits• Jitter Reduction Techniques• Clock Power Dissipation• Future Directions• Summary
SoC 2004 – Stefan Rusu 47®
Rotary Clock Distribution
• Transmission line based,self-regenerating rotary clock generator
Wood, ISSCC-2001
SoC 2004 – Stefan Rusu 48®
Standing Wave Oscillator
O’Mahony, ISSCC-2003
SoC 2004 – Stefan Rusu 49®
10GHz Clock Grid Test Chip
• Fabricated in a 0.18µm 1.8V 6M CMOS process
• Very low clock skew and power consumption
• Attractive alternativefor 10GHz clockingand beyond
SoC 2004 – Stefan Rusu 50®
Optical Clock Distribution• Board-level guided-wave H-tree distribution• Monolithic silicon-based detection• Couplers provide tolerance for horizontal and vertical
misalignment of the flip-chip assembly• Optical transmission is immune to process variations,
power-grid noise and temperature
J.D. Meindl, Georgia Institute of Technology, 2000
SoC 2004 – Stefan Rusu 51®
Summary• High performance processors require a low skew
and jitter clock distribution network
• Clock distribution techniques are optimized to achieve the best skew and jitter with reduced area and power consumption
• Deskew techniques are demonstrated to cut the skew to ¼ of its original value
• On-die supply filters are used to reduce jitter
• Intensive research focuses on novel clock distribution techniques
Top Related