Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor

14
2013 DAC Designer/User Track Presentation Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor Visvesh Sathe 3 , Padelis Papadopoulos 2 , Alvin Loke 3 , Tarek Khan 1 , Anand Raman 2 , Gerry Vandevalk 3 , Nikolas Provatas 2 , Vincent Ross 1 1 Advanced Micro Devices, Inc. 2 Helic, Inc. 3 Formerly at Advanced Micro Devices, Inc.

description

Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor. Visvesh Sathe 3 , Padelis Papadopoulos 2 , Alvin Loke 3 , Tarek Khan 1 , Anand Raman 2 , Gerry Vandevalk 3 , Nikolas Provatas 2 , Vincent Ross 1 1 Advanced Micro Devices, Inc. 2 Helic, Inc. - PowerPoint PPT Presentation

Transcript of Inductor Design for Global Resonant Clock Distribution in a 28-nm CMOS Processor

Page 1: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

2013 DAC Designer/User Track Presentation

Inductor Design for Global Resonant Clock

Distribution in a 28-nm CMOS

ProcessorVisvesh Sathe3, Padelis Papadopoulos2, Alvin Loke3, Tarek Khan1, Anand Raman2, Gerry

Vandevalk3, Nikolas Provatas2, Vincent Ross1

1Advanced Micro Devices, Inc.2Helic, Inc.

3 Formerly at Advanced Micro Devices, Inc.

Page 2: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Outline• Resonant Clock Distribution• Inductor Design and Analysis

Challenges• Helic VeloceRaptor/X • Inductor Extraction using

VeloceRaptor/X• Silicon Correlation• Conclusion

Slide 2

Page 3: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Processor Global Clock Distribution

Slide 3

• Significant global clock loading 7-ps clock skew target across > 20-mm2 core area Constrained clock latency from grid to timing

elements

clocking24%

standard cells19%

gaters16%

macros18%

flops18%

bus5%

• Typical core-power breakdown consumption

AMD “Piledriver”

Page 4: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Basic Resonant Clocking Operation

• Rely on efficient resonance between Ltank and Cclk near ω0

• Efficient operation around ω0 • Driving clock at much lower frequencies

Reduced efficiency, warped clock waveformSlide 4

| Zclk|

frequency

ω0

Zclk

Cclk

clk

Cacgnd (>>Cclk)

Ltankpreclk

Page 5: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

AMD Resonant Clocking

• 90 inductors distributed over custom power grid, signal wires, and core circuitry

Slide 5

Page 6: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Inductor Design

• Clock macro, bump pitch constrain inductor size• Metal sharing with existing power → cut-aways• Centered power straps, HCK tree for mutual

inductance Slide 6

Page 7: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Inductor and Grid Problem Summary

• 87 x 65 μm spiral over 113 x 126 μm custom grid

• 12 metal layers (2 thick) Width: 0.13 to 5.7 μm Thickness: 0.1 to 1.2 μm

• >5μm/μm2 interconnect length to be extracted!

Slide 7

M11-M10 M11-M7 M11-M30

1

2

3

4

5

6

Metal levels

Inte

rcon

nect

Den

sity

(um

/sq

um)

Page 8: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Inductor Design Methodology • Goal: Achieve desired L with maximum Q on a highly

customized inductor• Available design variables

Winding width, outer spacing, inner spacing (NESW) Winding height, winding width

• Multiple extractions within reasonable time is vital• Extraction customization per-metal is crucial

Top metal layers dominate magnetic interaction, lower level metals have minimal interaction

Per-metal extraction/merging mode selection (R/C/RC/RLC/RLCk)

• Process-aware, temperature-sensitive extraction

Slide 8

Page 9: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

What is VeloceRaptor/X ?Rapid, high-capacity multi-GHz EM extraction • Maxwell equations-based RLCk model per metal

segment• Inductance calculations based on magnetic vector

potential Skin and proximity effects, substrate losses, capacitive and

magnetic coupling• Silicon-proven accuracy• Use model:

In situ selection of nets and pin definition Netlist and symbol creation for the marked nets Model annotation and simulation

Slide 9

Page 10: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

VeloceRaptor/X Offers…• High capacity and speed • Multithreading support• S-parameters and RLCk netlist output

Temperature-aware model Mixed-mode R/C/RC/RLC/RLCk per any net

layer Layout-dependent effects captured

• Direct GDS extraction • Batch-mode support• Numerical network reduction

Slide 10

Page 11: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Inductor-over-Grid Model Validation

Slide 11

• Mixed-mode extraction per net layer: M11- Mx: RLCk Mx-1- M3: RC

• RLCk extraction below M07 has negligible impact

Increasing interconnect density, runtime, memory requirement

No improvement in model accuracy

when adding more RLCk layers

MetalsDensity (µm/µm2)

Extraction Time (sec)

RAM(MB)

Netlist Size (KB)

M11-M10: RLCk 3.12E-01 517 880 87M11-M9: RLCk 5.78E-01 528 1020 95M11-M8: RLCk 1.34E+00 3402 3650 96M11-M7: RLCk 2.27E+00 6895 6624 97M11-M6: RLCk 2.93E+00 10033 12564 99M11-M5: RLCk 3.85E+00 14055 21564 102

Best tradeoff between model

accuracy and runtime/memory

requirements

Page 12: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Turnaround Time vs. Metal Density

Slide 12

Page 13: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Test Chip Silicon Validation

• Very good agreement between measured and extracted L and Q

Slide 13

1.90E

+00

2.30E

+00

2.70E

+00

3.10E

+00

3.50E

+00

3.90E

+00

4.30E

+00

4.70E

+00

5.10E

+00

5.50E

+00

5.90E

+000.5

1.5

2.5

3.5

4.5

5.5

1.0

1.2

1.4

1.6

1.8

M11-M06 RLCk M11-M05 RLCkMeasurement M11-M10 RLCkM11-M09 RLCk M11-M08 RLCkM11-M07 RLCk M11-M06 RLCk

Frequency (GHz)

Qua

lity

Fact

or

Indu

ctan

ce (

nH)

Page 14: Inductor Design for Global Resonant Clock Distribution  in a 28-nm CMOS Processor

AMD/Helic, Inductor Design for Resonant-clocked Processor

Conclusions• Resonant clocking feature reduces global

clock power distribution• Use of multiple distributed on-chip inductors

poses a significant challenge to inductor extraction – Metal-rich extraction environment– Significant mutual inductance with underlying and

adjacent circuits and power grids• Exploiting design structure and

VeloceRaptor/X capabilities enabled efficient inductor optimization

• Batch mode and per-metal per-net extraction for extraction of a model with sufficient detail to accurately model silicon behavior. Slide 14