Practical ARM CPU Digital Implementation on TSMC...

32
Yin Yan, Cadence Design Systems ARM Tech Symposia Shenzhen November 2015 Practical ARM ® CPU Digital Implementation on TSMC 10nm

Transcript of Practical ARM CPU Digital Implementation on TSMC...

Page 1: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

Yin Yan, Cadence Design Systems

ARM Tech Symposia

Shenzhen

November 2015

Practical ARM® CPU Digital Implementation on TSMC 10nm

Page 2: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

2 © 2015 Cadence Design Systems, Inc. All rights reserved.

• ARM + Cadence Collaboration

• 10nm Design (Synthesis, Implementation, Signoff) Challenges

• 10nm Design Flow for High-end ARM CPU Implementation

• Cadence® Advanced Node 10nm RTL2Signoff Flow and Unique Solutions

• Foundry Qualification of Cadence Tools

• Summary

Outline

Page 3: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

3 © 2015 Cadence Design Systems, Inc. All rights reserved. 3

• Ongoing close collaboration on ARM CPUs, ARM Mali™ GPUs, system IP, embedded

processors, ARM Artisan® libraries, and ARM POP™ IP

• Cadence and ARM engineering relationship at all ARM design locations worldwide

− Engineering teams at Cambridge/Sheffield, Sophia, San Jose, Austin, Bangalore, and Hsinchu

• Cadence and ARM have collaborated throughout the entire development of multiple cores to

drive technology leadership in the ecosystem for power, performance, and area (PPA) goals

(2011-2015)

− Trial implementations being run in parallel by both ARM and Cadence

− Regular interaction with Cadence AEs and R&D

• ARM uses Cadence® tools and flow internally for high-performance processor and GPU

development

− ARM + Cadence collaborative work-around reference flow development

− Testchips currently include ARM Cortex® -A72, A57, A53, A15 , A12, A9, and A7, plus POP IP and Cortex-

M embedded processors such as M7 and others

ARM and Cadence Collaboration

Page 4: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

4 © 2015 Cadence Design Systems, Inc. All rights reserved.

ARM and Cadence are successfully engaged in several joint projects

2012 2011 2013 2010

Mali-T678 iRM

iRM

Implementation Reference Methodology

Testchip Hardened

Macro

2014

Samsung 20nm testchip

Cortex-A15 iRM

cmos32lp

28nm

Cotex-A57 iRM

cmos32lp

Cortex-A7 iRM

28nm

IBM 20nm

testchip

20nm Cortex-A15 testchip

Quad-core Cortex-A15

GPU Mali-T604 iRM

28nm

Cortex-A7 iRM

Samsung 20nm

testchip

Legend:

40nm POP IP

Dual-core Cortex-A9

28nm

Cortex-A12 iRM

Cortex-A9 Hard macro

IBM ARM 14nm FINFET testchip

16nm Cortex-A57

Cortex-A53 iRM

Cortex-A53

Next-gen Mali Cortex-

A57

28nm

28nm

28nm

Samsung A7 14nm FINFET

ARMv8 ARM big.LITTLE™ +

GPU Testchip

Dual-core Cortex-A15 28nm

Cortex-A15 R3 iRM Update

28nm

Cortex-A15 LP iRM

16nm Cortex-A53 /A57

GF 28nm Cortex-A12

CCN-504 iRM

Cortex-M7 iRM

Next-gen System IP

iRM

Cortex-A57 iRM refresh

GF 28nm Cortex-A17

CCN-504

iRM

28nm

Cortex-A17 iRM

Mali-T720 iRM

28nm

2015

Cortex-A72 iRM

16nm

40nm

Mail-T880

Cortex-A53

Next-gen GPU Next-Gen

Big Core

ARMv8 ARM big.LITTLE +

GPU Testchip

16nm

Next-gen core Foundry Adv node 16nm Cortex-

A53 /A57 PPPA Push

ARM and Cadence Projects History

Page 5: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

5 © 2015 Cadence Design Systems, Inc. All rights reserved. 5

• A global collaborative program to develop advanced process eco-systems

• Ensure ARM next generation CPU IP is optimized for advanced process

• Ensure EDA eco-system and design flows are ready for lead partners

• Provide feedback on power, performance and area bottlenecks to ARM design

teams

• Identify additional physical IP requirements for optimal PPA

• Educate partners and accelerate next generation process adoption

Proven successful approach following on from 16nm collaboration

ARM and Cadence Collaborating on 10nm

Page 6: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

6 © 2015 Cadence Design Systems, Inc. All rights reserved.

• ARM + Cadence collaboration

• 10nm design (synthesis, implementation, signoff) challenges

• 10nm design flow for high-end ARM CPU implementation

• Cadence® Advanced Node 10nm RTL2Signoff flow and unique solutions

• Foundry qualification of Cadence tools

• Summary

Outline

Page 7: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

7 © 2015 Cadence Design Systems, Inc. All rights reserved.

1st gen DPT

Exponential wire Res

1st gen FinFET

Fin-grid snapping

Pin-access

2nd gen FinFET

2nd gen DPT

Coloring a Must!

Design challenges at 10nm

10nm

16nm

20nm

• Full coloring flow required in P&R, extraction, DRC

• Use of M1/M2 to be H/V impacts cell architecture and routability

• Significant increase of critical rules for router and placement

Physical

• Resistance increase on wire and vias (Mx and Vx).

• Variability (wire width, spacing)

• Different color tracks may have dramatic resistance difference.

• Length-based coupling cap handling

• Expect the need for more accuracy in timing and variation

Electrical Route-driven

Implementation

Color-driven

Implementation

Page 8: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

8 © 2015 Cadence Design Systems, Inc. All rights reserved.

Accelerating 10nm Adoption

• Start early so that IP, tools and process can be optimized at the same time

• Ensure ARM IP, process and EDA tools are co-optimized to:

− Provide best in class silicon implementations of ARM based systems

− Pipe-clean the eco-system for ARM partner designs

− Lower the barrier to entry for new process adoption

− Reduce time to market through optimized products and support

Early Process Exploration

Eco-system Development

Library Development

PPA Optimization

Deployment

Page 9: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

9 © 2015 Cadence Design Systems, Inc. All rights reserved.

An evolution from 16nm FinFET

• Focus on efficiency – Optimize performance within established

mobile device power budgets

• Power grid integrity is crucial – Meeting electromigration and IR drop targets

whilst maintaining placement density

• Interconnect (wire) resistance continues to dominate – Use of optimized physical libraries more critical

• Maximize the benefits of process scaling

ARM CPU 10nm Challenges

0

1750

3500

5250

7000

Typic

al P

rocess D

esig

n R

ule

s

SoC

Desig

n C

om

ple

xity

Page 10: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

10 © 2015 Cadence Design Systems, Inc. All rights reserved.

• ARM + Cadence collaboration

• 10nm design (synthesis, implementation, signoff) challenges

• 10nm design flow for high-end ARM CPU implementation

• Cadence® Advanced Node 10nm RTL2Signoff flow and unique solutions

• Foundry qualification of Cadence tools

• Summary

Outline

Page 11: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

11 © 2015 Cadence Design Systems, Inc. All rights reserved.

• Understand critical differences between 16nm and 10nm design flows

• Primary goal is to tweak the flow recipe and understand tool/IP interaction – Impact of new CPU uArchitecture

– Feedback on tool, library, and IP interactions

– Optimizing floorplan, power distribution, and overall power, performance and area

• Traverse the entire flow from RTL to production quality sign-off – All issues uncovered and validated

• Iterate and improve as IP, process, and libraries mature

Design Flow Considerations

Page 12: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

12 © 2015 Cadence Design Systems, Inc. All rights reserved.

Synthesis Strategy

• Exploration to understand power and

performance envelope − Selecting the right Vt mix for design PPA targets

• Analyze library usage for potential bottlenecks − Dominant cell type on critical paths

• Based on Cadence recommended flow − Tuning of parameters for optimal PPA trade-offs

• Physical aware mapping − Ensure correlation and predictability through flow

• Synthesis targets single Vt and channel length − Based on performance and power targets

Page 13: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

13 © 2015 Cadence Design Systems, Inc. All rights reserved.

The Genus Advantage

• Significant run time and overall QoR

benefits

• Improved correlation through the flow

• Increased performance with minimal

leakage impact

• Synthesis run time over 4x faster (~3

hours)

Page 14: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

14 © 2015 Cadence Design Systems, Inc. All rights reserved.

Floorplan

• New process and processor combination − Floorplan trials are critical

− Macro placement changes with process

• Use of placement regions based on

uArchitectural feedback

• Macro placement needs to follow finFET grid

• Must align double pattern layer M2 memory

pins to ensure correct coloring

Page 15: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

15 © 2015 Cadence Design Systems, Inc. All rights reserved.

Power Grid

• Incorrect power grid construction will

significantly impact routing density and

performance

• No metal 2-standard cell power connection − Potential EM and dynamic IR implications

• Vertical power straps need to follow FinFET

grid

• Horizontal straps must follow even poly

pitch

• Traditional “striped” PG methodology used

Both runs: Cortex-A9 CPU with the same power grid pitches

Good design has optimally placed M3 straps

Bad design has suboptimal placement of M3 straps,

blocking unnecessary routing tracks

Page 16: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

16 © 2015 Cadence Design Systems, Inc. All rights reserved.

• Place and route uses single Vt/Channel length library

• Timing and power is optimized against dominant corners – Balance of run time vs overall QoR

• Placement is DRC aware – Respects interaction between standard cells

• Control of wire dominated paths is crucial – Using placement bounds to constrain potential long paths

– Layer promotion for long nets to less resistive metal layers

• Waveform propagation models and AOCV essential at 10nm

Place & Route

Page 17: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

17 © 2015 Cadence Design Systems, Inc. All rights reserved.

• Full production quality sign-off approach – TSMC recommended timing margins – Signed-off for timing, power integrity & physical rules

• Cadence® Voltus™ solution used for all power integrity checks – Static/dynamic IR, electro-migration, in-rush current

• Signed-off and optimized across multiple PVT and extraction corners

• Leakage recovery performed using Tempus™ ECO functionality

• Power analysis – early and often to ensure PG structure is correct

• Stage based OCV critical to avoid over-fixing

Sign-Off

Page 18: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

18 © 2015 Cadence Design Systems, Inc. All rights reserved.

• Implemented using Cadence® tools

• Full production quality DFT solution – Full scan compression

– Memory BIST

– At speed scan and memory BIST support

• Power domain per CPU core

• Full production margins for timing, power and physical sign-off

Design Flow

Cadence Full-Flow

Digital Solution

Page 19: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

19 © 2015 Cadence Design Systems, Inc. All rights reserved.

• ARM + Cadence collaboration

• 10nm design (synthesis, implementation, signoff) challenges

• 10nm design flow for high-end ARM CPU implementation

• Cadence® Advanced Node 10nm RTL2Signoff flow and unique solutions

• Foundry qualification of Cadence tools

• Summary

Outline

Page 20: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

20 © 2015 Cadence Design Systems, Inc. All rights reserved.

Cadence Full-Flow Digital Solution vs. Previous Generation

Synthesis

Implementation

Signoff

Traditional

Flow

placer timing

Opt

placer

router timing

Opt power

extract

timing

power

extract

CTS

Cadence Full-Flow

Digital Solution

Un

ifie

d P

lace

me

nt

En

gin

e

Best-

in-C

las

s P

PA

Op

tim

iza

tio

n

Un

ifie

d T

imin

g/P

ow

er/

Ex

tra

ct

Un

ifie

d C

TS

, G

lob

al R

ou

ter

Up to 10X TAT/Capacity Gain

10-20% Better PPA

Full-Flow Correlation Design Convergence

Massively

Parallel

Unified

Engines

Core PPA

Algorithms

CTS

Early Signoff Opt Reduced iterations

Unified

Engines

Core PPA

Algorithms

Page 21: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

21 © 2015 Cadence Design Systems, Inc. All rights reserved.

Innovus Technology

Slack-driven, layer-aware,

fully analytic

All GigaOpt transforms

made power-aware

Minimizes leakage, internal

and switching power

FlexH

Regular CTS tree

H-tree

Flex H-tree improves

cross-corner variation

Slack-driven routing

reduces SI TNS

GigaPlace™ next-generation placement

GigaOpt™ power-driven optimization

Advanced CCOpt™ and slack-driven routing

Page 22: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

22 © 2015 Cadence Design Systems, Inc. All rights reserved.

Innovus: 10nm color-driven implementation

GigaOpt™

Placer

CCOpt™

Nano

Route

Netlist

GigaPlace™

NanoRoute™

M2 Power

Unique correct-by-

construction approach

Prevents odd-

cycle conflicts

Massively parallel for

increasing rule conflicts

Page 23: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

23 © 2015 Cadence Design Systems, Inc. All rights reserved.

10nm color-driven placement and routing

Color-driven

GigaPlace™

technology

NanoRoute™

via odd-cycle prevention

Actively prevents pin-access that causes “odd-cycles”

Status of SADP-enabled Innovus System

Page 24: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

24 © 2015 Cadence Design Systems, Inc. All rights reserved.

Color-driven GigaPlace technology

Fixed color placement

Vertical edge constraints

Placement considering power buses and power bus

via

Global placement for routability improvement

Boundary (endcap) cell insertion

Power Bus

GigaOpt

Placer

CCOpt

Nano

Route

Netlist

Giga

Place

Nano

Route

GigaOpt™

Placer

CCOpt™

Nano

Route

GigaPlace™

NanoRoute™

Page 25: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

25 © 2015 Cadence Design Systems, Inc. All rights reserved.

Power Bus

All 10nm rules.

Enhance pin access to handle large same-mask

cut spacing.

Fat M2 pin handling: track alignment and rules.

Fixed color methodology: users define color/mask

in cell library

M1 routing.

One side spacing of NDR, mask-NDR

Color-driven NanoRoute technology

Nano

Route

GigaOpt

Placer

CCOpt

Nano

Route

Netlist

Giga

Place

Nano

Route

GigaOpt™

Placer

CCOpt™

Nano

Route

GigaPlace™

NanoRoute™

Page 26: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

26 © 2015 Cadence Design Systems, Inc. All rights reserved.

Production-proven Genus and Innovus speedup

Genus™ TAT speedup

Page 27: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

27 © 2015 Cadence Design Systems, Inc. All rights reserved.

• ARM + Cadence collaboration

• 10nm design (synthesis, implementation, signoff) challenges

• 10nm design flow for high-end ARM CPU implementation

• Cadence® Advanced Node 10nm RTL2Signoff flow and unique solutions

• Foundry qualification of Cadence tools

• Summary

Outline

Page 28: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

28 © 2015 Cadence Design Systems, Inc. All rights reserved.

10nm color-driven Cadence digital flow

Quantus™ QRC

• 10nm extraction certification

• Colored aware resistance change

• Length based coupling cap handling.

Tempus™, Voltus™, PVS Solutions

• 10nm STA qualification, SOCV, SOCV-Driven

MMMC Sign-Off ECO

• Color-aware EM /IR

• Color-aware Metal Fill, Fill ECO

Innovus™ Implementation System

• PreRoute Statistical RC Extraction

• Statistical OCV-driven Optimization

• Color-Aware Track Assignment

Close 10nm collaboration with IP/foundry partners

Page 29: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

29 © 2015 Cadence Design Systems, Inc. All rights reserved.

• ARM + Cadence collaboration

• 10nm design (synthesis, implementation, signoff) challenges

• 10nm design flow for high-end ARM CPU implementation

• Cadence® Advanced Node 10nm RTL2Signoff flow and unique solutions

• Foundry qualification of Cadence tools

• Summary

Outline

Page 30: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

30 © 2015 Cadence Design Systems, Inc. All rights reserved.

• Cadence and ARM continue to collaborate on advanced process nodes

• Our proven successful collaboration model accelerates adoption and time to market for our partners

• 10nm has arrived, and the eco-system is ready for your next ARM-based chip

Summary

Page 31: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

31 © 2015 Cadence Design Systems, Inc. All rights reserved.

Come visit us in Cadence Booth

Page 32: Practical ARM CPU Digital Implementation on TSMC 10nmarmtechforum.com.cn/attached/article/Cadence_Shenzhen20151210134012.pdfGPU Mali-T604 iRM 28nm 28nm Cortex-A7 iRM Samsung 20nm testchip

© 2015 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence and the Cadence logo are registered trademarks of Cadence Design Systems. ARM

and the ARM logo are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. PCI-SIG, PCI Express, and PCIe are

registered trademarks and/or service marks of PCI-SIG. All other trademarks are the property of their respective owners.