Physical Design Aspects

7
Physical design 1) How to define a clock definition on a divide by 4 clock? How does RTL code look like for a synchronous and asynchronous divide by 4 clock generation? What timing checks are required for divide by 4 clock generation flops? 2) In what scenarios create_generate_clock –combinational option is used? This is a clock steering affect (promiscuous mode)as famously called by Paul Zimmer 3) In what cases register replication or duplication is helpful? How to handle these extra flops in equivalence checking? What happens in gate level simulation for these extra flops? Does something has to be taken care from DFT perspective 4) What are virtual clocks? Why are they required 5) Describe all the timing checks for a integrated clock gating cell 6) What is a lockup latch? Why is it required? Do you use active high or active low lockup latch when negative edge flop of one clock domain is interacting with negative edge flop of other clock domain. 7) Give the following RTL code? Write out its equivalent schematic a)clock gating 8) What is a power switch? How do you distribute across the block? 9) Logic restructuring of a circuit to improve a critical timing path 10) Assume that there are critical paths from bunch of flops to a macro. These paths does not meet timing. However the paths originating from macro to flops are relaxed. What action you have to take in logical synthesis/physica synthesis and in CTS?

Transcript of Physical Design Aspects

Page 1: Physical Design Aspects

Physical design

1) How to define a clock definition on a divide by 4 clock? How does RTL code look like for a synchronous and asynchronous divide by 4 clock generation? What timing checks are required for divide by 4 clock generation flops?

2) In what scenarios create_generate_clock –combinational option is used? This is a clock steering affect (promiscuous mode)as famously called by Paul Zimmer

3) In what cases register replication or duplication is helpful? How to handle these extra flops in equivalence checking? What happens in gate level simulation for these extra flops? Does something has to be taken care from DFT perspective

4) What are virtual clocks? Why are they required5) Describe all the timing checks for a integrated clock gating cell6) What is a lockup latch? Why is it required? Do you use active high or active low lockup latch

when negative edge flop of one clock domain is interacting with negative edge flop of other clock domain.

7) Give the following RTL code? Write out its equivalent schematica)clock gating

8) What is a power switch? How do you distribute across the block?9) Logic restructuring of a circuit to improve a critical timing path10) Assume that there are critical paths from bunch of flops to a macro. These paths does not

meet timing. However the paths originating from macro to flops are relaxed.What action you have to take in logical synthesis/physica synthesis and in CTS?

Page 2: Physical Design Aspects

1) How to define a clock definition on a divide by 4 clock? How does RTL code look like for a synchronous and asynchronous divide by 4 clock generation? What timing checks are required for divide by 4 clock generation flops?

Verilog RTL code:

Synchronous counter Asynchronous counter

always @ (posedge clk) always @ (posedge clk)

If (rst) div(1:0)<=2’b00; if (rst) divclk2 < = 1’b0;

else div(1:0) <= div(1:0) + 1’b1; else divclk2 <=~divclk2

always @ (*) always @ (posedgedivclk2)

divclk4<=div(1); if (rst) divclk4 < = 1’b0; else divclk4 <=~divclk4

Schematic:

Explanation: Advantages of synchronous design over asynchronous design Notice that in synchronous counter both the flops get same clock and the output of

second flop is divide by 4 clock since it will be active high for 2 clock cycle out of 4 clock cycles. In asynchronous counter first divide by 2 flop output becomes as a clock to next flop. This is not a preferred design. Imagine for a div 64 design. You will see a ripple chain of 6 (div2/div4/div8/div16/div32/div64) divide by 2 counters which means the clock generated at div64 will have a 6 clock2Q delays in asynchronous counter. However in synchronous counter even if go for div64 the delay is only 1 clocktoQ delay.

In asynchronous counter design, It will be cumbersome while inserting clock test mux in the rtl

While building CTS synchronous design is preferred. Since in synchronous counter you have to define 1 master clock and 1 div4 generated clock. However in asynchronous counter design you have to define 1 master clock 1 divide by 2 clock and 1 divide by 4 clock. More clocks in the design means more CTS constraints has to be applied. Also timing setup/hold checks has to be done for more clocks

Page 3: Physical Design Aspects

How to define Clock definition for a synchronous div 4 generator:create_clock –name master_clock –period 10 [get_port clock]create_generated_clock –name div4 –divide_by 4 –source [get_clock master_clock] [get_pin div_reg_1/Q]

Timing checks for synchronous counter:Notice that div_reg_0 and div_reg_1 flops gets master clock and div4 clock is applied on div_reg_1 output pin.

Two paths exists for div_reg_0Path1 : Both setup and hold checks have to be doneLaunch clock for div_reg_0 is master_clockCapture clock for div_reg_0 is master clock

Path2 : Both setup and hold checks have to be doneLaunch clock for div_reg_0 is div4Capture clock for div_reg_0 is master clock

one paths exists for div_reg_1

Path1: Both setup and hold checks have to be doneLaunch clock for div_reg_1 is div4Capture clock for div_reg_1 is master clock

Page 4: Physical Design Aspects

3)In what cases register replication or duplication is helpful? How to handle these extra flops in equivalence checking? What happens in gate level simulation for these extra flops? Does something has to be taken care from DFT perspective

What is register replication or duplication?

When a register fanout is 30 and the timing path through this Q pin is critical, you can choose to replicate or duplicate this register by any number of times. The other way of tackling the problem is balancing the clock tree or making the clock early to this flop.

Following is the command to duplicate the registers in Design compiler.

dc_shell> set_register_replication -help Usage: set_register_replication # set_register_replication [-max_fanout <integer>] (Specifies the value to which the 'register_replication' attribute to be set, that is, the maximum fanout: Value >= 1) [-num_copies <integer>] (Specifies the value to replicate the register n times: Value >= 2)

object_list <list> (list of registers on which the attribute 'register_replication' is to be set)

After duplication of this register by 3 times, following is the scenario.

Set_register_replication –max_fanout 10 [get_cell A_reg]

Or

Set_register_replication –number_copies 3 [get_cell A_reg]

D Q

A_regFanout =30

D Q

A_reg Fanout =10

D Q

A_reg_dupFanout =10

D Q

A_reg_dup_1 Fanout =10

Page 5: Physical Design Aspects

Also it is have made sure that there is good amount of slack on D pin of A_reg, else the effort of duplication is of no use.

How to handle these extra flops in equivalence checking?

You have to have an explicit command in the LEC script that so and so flops are equivalent.

Conformal LEC command:

add instance equivalences A_reg A_reg_dup A_reg_dup_1 -revised

What happens in gate level simulation for these extra flops?

Since the extra flops are replica of original base register, on every clock cycle the value on Q pin will be same as original register. No special care has to be taken if the timing in STA is met for all the flops.

Does something have to be taken care from DFT perspective?

You have to make sure that all the duplicated registers have to be in the scan chain

More details about cloning can be found by referring to the following SNUG India Paper 2010

” Automatic cloning of register and combinational logic”