Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... ·...

65
VLSI II Final Review Team Members Aseem Sayal (ICS) Kai Wang (CS) Kshitiz Gupta (ICS) Ronak Oswal (ICS) Vipul Goyal (ICS) Instructor Prof. Mark McDermott Team 3 High Performance

Transcript of Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... ·...

Page 1: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Your Sub Title HereVLSI II – Final Review

Team Members

Aseem Sayal (ICS)

Kai Wang (CS)

Kshitiz Gupta (ICS)

Ronak Oswal (ICS)

Vipul Goyal (ICS)

Instructor

Prof. Mark McDermott

Team 3

High Performance

Page 2: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Examples of 800MHz processors…

2

Page 3: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Agenda• Design Specification

• Organization Chart

• ASIC Design Flow and Methodology

• Design Updates

– Synthesis – Optimizations & Results

– Floorplanning – Optimizations & Results

– Powerplanning – Optimizations & Results

– Placement – Optimizations & Results

– CTS – Optimizations & Results

– Route – Optimizations & Results

– ECO – Optimizations & Results

• Timing Convergence

• Formal Verification – Optimizations & Results

• Design Caveats

• Results

• Major Issues faced and their resolution

• How to further improve design?

• Key Learning's – “What would we like to do differently?”

• Economic Aspects

Page 4: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Design Specifications

QoRTarget

Specification

Results

Achieved

Cycle Time (ns) 1.25 1.25

Total Power

(mW)280 218.40

Die Area (mm^2) 1.05 1.00

Utilization (%) 65 70.37

Max. IR Drop

(mV)50 48.42

Page 5: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Design Specifications

QoRTarget

Specification

Results

Achieved

Cycle Time (ns) 1.25 1.25

Total Power

(mW)280 218.40

Die Area (mm^2) 1.05 1.00

Utilization (%) 65 70.37

Max. IR Drop

(mV)50 48.42

Improvement

MET

22%

4.76%

8.26%

3.16%

Page 6: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Organization Chart

Program Manager (Prof. Mark

McDermott )

Netlist/Cons Delivery

Owner: Kai, VipulSupport: Kshitiz

Floor planning, Power planning

Owner: Ronak, VipulSupport: Aseem,Kai

PNR

Owner: Kshitiz, AseemSupport: Ronak, Vipul

Timing, Formal and Power Signoff

Owner: Aseem, RonakSupport: Kshitiz, Kai

Application Engineer Support

(TA Wuxi Li)

Page 7: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

ASIC Design Flow and Methodology

Signoff (Timing/Power)

Route

CTS

Placement

Power Planning

Floor Planning

DC

7

Page 8: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Generic Optimization:

1. Creating Group Path to optimize near critical path:

By default DC only optimizes the worst critical path on a particular clock.

By creating group path with critical range, DC optimize for all path inside the

range

Synthesis

Page 9: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

2. Feeding def from placement under topographical mode

This gives DC compiler more accurate physical design information without relying

approximation based on wire load mode

Synthesis

Page 10: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• The critical path involve the execution stage in amber cores;

• This stage read the register file as well as performing ALU operations

• The reading of register file constitutes ~20% of critical path delay, which involves

combination logic for selection (mux) and condition encoding

• ideally the selection should be performed at the previous stage (decoding)

Synthesis

Page 11: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• The two Amber cores don’t interact with each other.

– Should be symmetrically placed ideally

• The Amber cores don’t talk to the I/O pins directly.

• Ethernet and Boot Memory were not a part of the critical paths.

– Timed on slower clocks

– In our previous (midterm review) floorplan, the Ethernet logic was

disrupting the placement of standard cells for Core 0.

• Aspect Ratio should be ideally 1:1

– horizontal and vertical metal layers

• Standard cell utilization dependency on area of core/die.

Floorplan Considerations

11

Page 12: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Floorplan

12

Core 0 D-Cache

Boot Memory

Core 0 I-Cache Core 1 I-Cache

Core 1 D-Cache

Eth

ern

etR

AM

Boot Memory

Core 0 D-Cache

Boot Memory

Core 0 I-Cache Core 1 I-Cache

Core 1 D-Cache

Eth

ern

etR

AM Boot Memory

• Minimize corners inside the floorplan.

• Have placement blockages around the memories.

Page 13: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Floorplan

13

Core 0 D-Cache

Boot Memory

Core 0 I-Cache Core 1 I-Cache

Core 1 D-Cache

Eth

ern

etR

AM

Boot Memory

Core 0 D-Cache

Boot Memory

Core 0 I-Cache Core 1 I-Cache

Core 1 D-Cache

Eth

ern

etR

AM Boot Memory

• Minimize corners inside the floorplan.

• Have placement blockages around the memories.

• To save area but not reduce utilization

– Added a notch in our design. (No spec to keep it rectangular)

Page 14: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Core 0 D-Cache

Boot Memory

Core 0 I-Cache Core 1 I-Cache

Core 1 D-Cache

Eth

ern

etR

AM

Input

ports

Output,

In/Out

ports

Blockage

• Die area: 1 mm2

• Memories Placed

• Blockages Added

• Pin placement

• Pre-route SRAMs

Floorplan

14

Page 15: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Power rings

Vertical power

straps - M8 layer

Horizontal power

straps - M1 layer

Power Plan Optimizations

15

Page 16: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– psynopt

• Command to do timing driven optimizations over place_opt. Expected to either improve

timing or let it stay the same.

– magnet_placement

• Assign certain cells to be “magnets” so that other cells connected with them are placed

close.

– create_bounds

• Assign a bound to a group to cells. Can either specify physical locations or just allocate

them as a special group to be placed close to each other.

Placement Optimizations & Results

16

Page 17: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– psynopt

• Command to do timing driven optimizations over place_opt. Expected to either improve

timing or let it stay the same.

• Gave good results when used around 5-6 times (depends on the design).

• Does degrade performance when used more than that.

– magnet_placement

• Assign certain cells to be “magnets” so that other cells connected with them are placed

close.

• Works only for macro pins and/or I/O pins.

– create_bounds

• Assign a bound to a group to cells. Can either specify physical locations or just allocate

them as a special group to be placed close to each other.

• When used with one path (no effect as another path becomes the critical)

• When used with all paths (degrades timing)

Placement Optimizations & Results

17

Page 18: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Placement Optimizations & Results

18

Page 19: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Placement Optimizations & Results

19

Page 20: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Placement Optimizations & Results

20

Page 21: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Placement Optimizations & Results

21

Page 22: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– “place_opt –effort high –cts” without congestion constraint

• congestion led to longer paths

– Optimal number of “psynopt” commands

• Random number of optimization commands will not necessarily yield ideal results

• Need to monitor results after every iteration

– Optimizing skew using the “clock-balancing” switch

• Analyses the design to determine which paths can be used for useful skew.

– Optimizing clock with “inter_clock_balance” switch

– Optimize skews for critical path groups, i.e. amber0 and amber1 groups

– Usage of “clock_opt –only_psyn” command

– Use “fix hold” switch with clock_opt

– Optimize clock for concurrent clock and data

– Incrementally optimize clock for both clock and data

CTS Optimizations

22

Page 23: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– “place_opt –effort high –cts” without congestion constraint

• congestion led to longer paths

– Optimal number of “psynopt” commands

• Random number of optimization commands will not necessarily yield ideal results

• Need to monitor results after every iteration

– Optimizing skew using the “clock-balancing” switch

• Analyses the design to determine which paths can be used for useful skew.

– Optimizing clock with “inter_clock_balance” switch

– Optimize skews for critical path groups, i.e. amber0 and amber1 groups

– Usage of “clock_opt –only_psyn” command

– Use “fix hold” switch with clock_opt

– Optimize clock for concurrent clock and data

– Incrementally optimize clock for both clock and data

Order in which the above commands are exercised also matters!

CTS Optimizations

23

Page 24: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Clock Tree

sys_clk sys_clk_slow

24

Page 25: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Clock Tree

mrx_clk mtx_clk brd_clk

25

Page 26: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Clock Buffers

sys_clk sys_clk_slow

26

Page 27: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Critical Path (CTS)

27

Page 28: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Critical Timing Path Slack

28

Page 29: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

CTS Results

29

Page 30: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– Pre-routing instances and standard cells

– Using non-default routing rules for critical group paths

• define wire width and spacing rules and via types

– Exploring switches inside “set_route_options” for global routing

• Timing Driven

• Congestion Weights

• Track Assignment Timing Driven

– Setting up repair loops within “set_route_opt_strategy”

• Specifies the number of detail routing iterations performed by the route_opt

command.

– Restricting critical group paths to specific metal layers using

“set_net_routing_layer_constraints”

– Pre-routing critical paths and clock nets

– Stages of “route_opt” and incremental “route_opt”

Route Optimizations & Results

30

Page 31: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– Pre-routing instances and standard cells

– Using non-default routing rules for critical group paths

• define wire width and spacing rules and via types

– Exploring switches inside “set_route_options” for global routing

• Timing Driven

• Congestion Weights

• Track Assignment Timing Driven

– Setting up repair loops within “set_route_opt_strategy”

• Specifies the number of detail routing iterations performed by the route_opt

command.

– Restricting critical group paths to specific metal layers using

“set_net_routing_layer_constraints”

– Pre-routing critical paths and clock nets

– Stages of “route_opt” and incremental “route_opt”

Route Optimizations & Results

31

Page 32: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Critical Path

32

Page 33: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Critical Path

33

Page 34: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• POST ROUTE RESULTS:

ECO Optimizations and Results

34

Page 35: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• POST ECO ITERATION 1 RESULTS:

– Fixed max_transition, max_capacitance, and hold violations

ECO Optimizations and Results

35

Page 36: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• POST ECO ITERATION 2 RESULTS:

– Fixed hold violations

ECO Optimizations and Results

36

Page 37: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• POST ECO ITERATION 3 RESULTS:

– Fixed max_capacitance and hold violations

ECO Optimizations and Results

37

Page 38: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Post ECO IR Drop Maps

VSS IR DROP MAP VDD IR DROP MAP

38

Page 39: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Timing Convergence

DC Placement CTS Route ECO + PT

Initial Floorplan (Bad) – Without PNR optimizations

Setup WNS: 17ps 250ps 170ps 220ps 224ps

Setup TNS: 8.63ns 471.9ns 226ns 297ns 237.9ns

Hold TNS: - - 240ps 250ps 193ns

Hold TNS: - - 28.58ns 26.81ns 26.7ns

39

Page 40: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Final Floorplan (Good) – Without PNR optimizations

Setup WNS: 6ps 62ps 51ps 88ps 144ps

Setup TNS: 0.8ns 84.9ns 78ns 127ns 189.9ns

Hold TNS: - - 222ps 250ps 0ps

Hold TNS: - - 128.58ns 116.8ns 0

DC Placement CTS Route ECO + PT

Timing Convergence

40

Page 41: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

DC Placement CTS Route ECO + PT

Timing Convergence

Optimizations:

- Path Groups

(Amber1, Amber0)

- Psynopt

- Magnet

placement

- Create

bounds

- Clock_opt

- Clock balancing

- Inter clock balancing

- psynopt

- hold fix

- Clock tree optimizations

- optimal fanout

- selective layers

- Skew optimization

- Timing driven

- Concur. clock/data opt.

- Hold fix

- Trans fix

- Max cap fix

- Setup fix

- PBA mode

- Uncertainty

- Routing options

- track driven

- timing driven

- congestion weights

- Route opt

- NDR critical path

pre route

- Search loop strategy

Final Floorplan (Good) – With PNR optimizations

41

Page 42: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Timing Convergence

Setup WNS: 2.2ps 50.34ps 2.20ps 28.2ps 0ps

Setup TNS: 96.3ps 20.62ns 24.62ps 5.79ns 0ps

Hold TNS: - - 217ps 240ps 0ps

Hold TNS: - - 13.68ns 49.92ns 0ps

Final Floorplan (Good) – Without PNR optimizations

DC Placement CTS Route ECO + PT

42

Page 43: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Critical Path (Max mode)

43

Page 44: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

44

Page 45: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Critical Path (Min mode)

45

Page 46: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Timing QoR

46

Page 47: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

We performed verification for RTL against DC, and RTL and PNR

All verification passed.

Formality

Page 48: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• Before verification, the net of reference and implementation must be

matched. In our case, the matching are based on names, which can be

changed after PNR.

• To ensure correct name matching (unless manually using regex),

synopsys_auto_setup must be set, so FM can read svf file, which contains

name changing information.

• For some reason, synopsys_auto_setup is not set by default

Formality

Page 49: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• In this project, we performed formality verification several times during

design iterations, and in relatively late stages.

• In more complex projects, regression tests should be performed more often

after certain amount of changes to reduce the cost of debugging.

Regression Test

Page 50: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• To deal with high fanout components, such the mux for register file

and for barrel shifter, setting constraint is also a potential

optimization technique, but it is not adopted into the final design.

• In our experiments, we found that such approach introduce trade-

offs in other aspects.

For example, setting max_transisition_time should theoretically

force the tool to increase drive strength for the critical path, however

it also increased the level of logic by 7% and increased

capacitance. The similar is also true for fanout.

Synthesis Exploration

Page 51: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• Reducing placement (by using blockage) area for core logic helps us improving

timing characteristics of critical path, since it reduces interconnect length.

• However as we find out, excessively reducing replacement area have negative

impact on timing, we believe this is due to increased adjacent capacitance and/or

increased congestion

The following is the normalized TNS and # violations for various blockage size

Placement Blockage Experiments

Page 52: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Results

Specification Achieved Value/Spec.

Cycle Time (ns) 1.25

Die Area (mm2) 1.00

Utilization (%) 60.40%

Power Consumption (mW) 218.40

Max. IR Drop VDD (mV) 47.672

Max. IR Drop VSS (mV) 48.421

LVS Check PASS

Formality – DC vs RTL PASS

Formality – ICC vs RTL PASS

Max. Trans Violations 0

Max. Fanout Violations 1

Max. Cap. Violations 23

Page 53: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Placement Utilization Fix

SOFT BLOCKAGE

Hardly any cells

sitting here…

53

Page 54: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Placement Utilization Fix

REPLACE WITH

HARD BLOCKAGE

54

Page 55: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Results

Specification Achieved Value/Spec.

Cycle Time (ns) 1.25

Die Area (mm2) 1.00

Utilization (%) 70.37%

Power Consumption (mW) 218.40

Max. IR Drop VDD (mV) 47.672

Max. IR Drop VSS (mV) 48.421

LVS Check PASS

Formality – DC vs RTL PASS

Formality – ICC vs RTL PASS

Max. Trans Violations 0

Max. Fanout Violations 1

Max. Cap. Violations 23

Page 56: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– Floorplan

• We had to spend a lot of time in getting the floorplan right.

• Multiple iterations with different floorplans till route were fired to see the

effect of notches/ memory placement etc. on timing.

Major Issues Faced (and their resolution)

56

Page 57: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• When WNS at CTS (~60ps), WNS at Route (~90ps)

• Brought WNS/TNS at CTS down to 2ps/24ps

– Route still around 85ps/64ns.

• Tried Route optimizations but to no avail.

• Bringing CTS below 1 ps also did not affect.

Major Issues Faced (and their resolution)

57

Page 58: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• When WNS at CTS (~60ps), WNS at Route (~90ps)

• Brought WNS/TNS at CTS down to 2ps/24ps

– Route still around 85ps/64ns.

• Tried Route optimizations but to no avail.

• Bringing CTS below 1 ps also did not affect.

• Turns out, our utilization at the time was around 75%.

– We looked into detailed reports based on a hunch.

– 60% of the design had a utilization of >87.5%. Congestion too high!!!

• Went back and increased the area at floorplan.

– Fixed the issue.

– Took about 1-2 weeks to figure out.

Major Issues Faced (and their resolution)

58

Page 59: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• Another such issue came during LVS.

• After meeting timing, we ran “verify_lvs”

– Got 2 opens (VDD and VSS).

– Basically standard cells were not connecting to the power rails.

• Our powergrid at the time was only vertical straps.

• Redesigned the complete powergrid using horizontal straps in M1.

– Started getting shorts.

– Had to fix the widths perfectly to remove both opens and shorts.

• The main issue was that we had to do this after meeting timing. So, the TAT

was very high as simulations ran for longer time and possibly our finalized

design could become useless.

– We also had to fix timing for multiple designs.

Major Issues Faced (and their resolution)

59

Page 60: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– Removing the blockages in the design using either notches or maybe shifting two

of the boot memories up in those slots.

• Saves area!

– One strategy for routing was pre-routing the critical paths using switches inside

set_route_opt_strategy. Our simulations using this did not work. It hung up on the

search loop variable.

• Possibly improve performance

– Since our power spec was relaxed and our focus was on timing, we did not

attempt to optimize on power. We can force non-critical paths to use only

HVT/RVT cells to save power (which may be using LVT cells right now).

• Saves power!

How to improve the design?

60

Page 61: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

– Removing the blockages in the design using either notches or maybe shifting two

of the boot memories up in those slots.

• Saves area!

– One strategy for routing was pre-routing the critical paths using switches inside

set_route_opt_strategy. Our simulations using this did not work. It hung up on the

search loop variable.

• Possibly improve performance

– Since our power spec was relaxed and our focus was on timing, we did not

attempt to optimize on power. We can force non-critical paths to use only

HVT/RVT cells to save power (which may be using LVT cells right now).

• Saves power!

– We are meeting the given specs, so any improvement on these fronts is a

tradeoff with $$$$!

How to improve the design?

61

Page 62: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

• LVS and M1 power strips from start – 1 week

• Over designing at Placement doesn’t help much CTS has great potential to

meet timing

• Take the optimization from manuals with the pinch of salt. All switches

doesn’t always help like

– ‘psynopt’ multiple times degraded the design after a certain point,

– ‘focal_opt’ didn’t help in the Route stage,

– Clock_opt –concurent_clock_and_data did not give good results

however –incremental_concurrent_clock_and_data works!

– Random experimentation is not a good option, small experiment and

analysis works.

Key Learning's – “What would we like to do differently?”

62

Page 63: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Economic Aspects

Assumptions:

• This goes in Apple A5 (800Mhz)

• Total SoC cost = $50 (Source: QC)

• CPU cost = 10% (Source: QC)

• No. of shipments = 500 million (Source: Appleinsider, Statista.com )

• Applications: iPhone 4S, iPad 2, iPad mini, iPod touch

63

Page 64: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Area Savings in $$$

• Saving 1mm2 in 100mm2 (1%) die results in $1 saving per each SoC (Source: QC)

• For 5% area savings in 1.05mm2, overall savings per SoC is 10 cents per die

Dollar savings = 0.05*0.1*500million = $2.5million

Economic Aspects

Power Savings in $$$

• Power is a huge concern these days. Snapdragon markets butter test!!

• Saving 22% power in CPU will result in more than 3-4% overall power savings

• Assuming SoC is sold at 0.1% higher cost ($5/CPU, $50/SoC)

Additional dollar earnings= 0. 001*50*500million = $25million

TAT Reduction in $$$

• Assume Physical design phase of 20 weeks (Source: QC)

• TAT reduction = 1 week (5%)

• Assuming 80% is NRE cost and 20% is RE

• RE costs include engineering, infrastructure and licensing cost

Dollar savings = 0.05*0.2*5*500million = $25million 64

Page 65: Team Members Instructor - Aseem Sayalaseemsayal.in/wp-content/uploads/2016/05/Design-of... · (Amber1, Amber0) - Psynopt - Magnet placement - Create bounds - Clock_opt - Clock balancing

Questions?

Thank You!

Q&A