Source: Advanced ASIC Chip Synthesis. 2 nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers Key...
-
Upload
tony-douberly -
Category
Documents
-
view
234 -
download
2
Transcript of Source: Advanced ASIC Chip Synthesis. 2 nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers Key...
Source: Advanced ASIC Chip Synthesis. 2nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers
• Key Problem: Timing assumption during prelayout synthesis widely differs from the post layout reality.
• This happens because the interconnect delay dominates the overall propagation delay in DSM (Deep Sub-Micron) technologies.
• As a result getting a timing closure becomes a challenge.
Architechtural Specs & RTL
coding
RTL Simulation
Logic Synthesis, Optimization &Scan Insertion
Formal Verification(RTL Vs Gates)
Floorplanning, Placement,
CT Insertion & Global Routing
Pre-layout STA
Timing OK?
Detailed Routing
Tape out
Post-layout STA
Timing OK?
Timing OK?
Post Global Route STA
No
No
Yes
Yes
No
Formal Verification(Scan Inserted Netlist
Vs CT Inserted Netlist)
Transfer Clock Tree to DC
Concept + Market Research
Yes
Traditional SOC Design Flow
Develop HDL files
Specify Libraries
Library Objectslink_librarytarget_librarysymbol_librarysynthetic_library
Read Design
analyzeelaborateread_file
Set Design ConstraintsDesign Rule Constraintsset_max_transitionset_max_fanoutset_max_capacitanceDesign Optimisation ConstraintsCreate_clockset_clock_latencyset_propagated_clockset_clock_uncertaintyset_clock_transitionset_input_delayset_output_delayset_max_area
Select Compile StrategyTop DownBottom Up
Optimize the Design
Compile
Analyze and ResolveDesign Problems Check_design
Report_areaReport_constraintReport_timingSave the
Design database write
Define Design Environment
Set_operating_conditionsSet_wire_load_modelSet_driveSet_driving_cellSet_loadSet_fanout_loadSet_min_library
Design Compiler Setup Files• .synopsys_dc.setup
– Library paths– Company wide, project wide design environment related variables and commands– UNIX variables
• Three files at three locations. All three are read in the following order– Synopsys root - $SYNOPSYS/admin/setup
• Affects all users. Only system adminstrator can modify this. In small startups with only single ASIC project, this serves as the place to enforce project wide discipline.
– Home Directory• Content affects all DC activities. Project wide enforcement could happen at these level if the
designer is involved in a single project (less likely). – Working Directory
• Affects the current invocation of DC. If a person is working on more than one Synopsys projects (more likely), then the project wide enforcement should happen at this level. One working directory for each project.
• Repeated commands are overridden
Libraries & Search Path• Technology Library
Created by ASIC vendor in Synopsys format – which is now an open standard.Cells are defined by their names, function, timing, net delay, parasitic information, units for time, resistance, capacitance etc.
• Target Librarya technology library that Design Compiler maps to during optimization.
• Link LibraryThe technology library that contains the definition of the cells used in the mapped design. In principle should be the same as target_library unless a technology translation is being performed.
C
EON1
C
C
C
t
U
z = (a + b)(cd)
a
b
c
d
z
Symbol LibraryDefinition of graphics symbols. Cells in Symbol Library must match
DesignWare LibraryA DesignWare component library is a collection of reusable circuit-design building blocks that are tightly integrated into the Synopsys synthesis environment.
GTECH LibraryThe GTECH library is the Synopsys generic technology library. It is technology-independent and included with Design Compiler software. GTECH parts are Synopsys unmapped representations of Boolean functions (library cell placeholders). GTECH instantiation allows for a technology-independent HDL description and the accuracy of instantiation.
Search_pathIf the library variables only specify file names, search_path is used to locate libraries. By default points to current working directory and $SYNOPSYS/libraries/syn
Synopsys Design Objects• Design
A circuit that performs one or more logical functions• Cell
An instance of a design or library primitive within a design• Reference
The name of the original design that a cell instance points to• Port
The input or output of a design• Pin
The input or output of a cell• Net
A wire that connects ports to ports or ports to pins• Clock
A timing reference object to describe a waveform for timing analysis
Synopsys Design Objects - Schematic
A
B
C
A
B
C
Clk
Ain
Bin
Cin
bus0Q0
Q1
U1 U2
Parity
bus1
inv0
inv1
D0
D1
Regfile
Clk
U3
Clk
INV Q[0:1]
Top
Reference and Design
Clock
Z[0:1]Ain
Bin
Cin
Ain
Bin
Cin
Parity
Q0
Q1
Q1
Q0
U6
U5
Design Cell
Cell
INV
Net
Pin
Port
XOR
Parity TopDesigns {“Top“, “Parity“, “Regfile“} {“Top“, “Parity“, “Regfile“}Cells {"U5", "U6"} {“U1“, “U2“, “U3“, “U4“}References {"EXNOR3", "INVX1”} {“Parity“, “Regfile“, “INVX1“}
U4
Synopsys Design Objects - VHDL
ENTITY Top ISPORT(
A, B, C, Clk : IN STD_LOGIC;Z : OUT STD_LOGIC_VECTOR(1 DOWNTO 0));
END Top;
ARCHITECTURE structural OF Top IS...SIGNAL bus0, bus1, inv0, inv1: STD_LOGIC;
BEGINU1 : Parity
PORT MAP( Ain => A,Bin => B,Cin => C,Q0 => bus0,Q1 => bus1);
U2 : RegfilePORT MAP(...
END structural;
Net
Port
Cell
Reference
Design
DesignName of Entity, function or procedure
CellInstantiated component or subroutine
ReferenceName of used component or subroutine
PortInput/Output port
PinPort inside the reference
NetLocal signals or variables
ClockNo interpretation
Net
Pin
Synopsys Design Objects - VHDL
ENTITY Top ISPORT(
A, B, C, Clk : IN STD_LOGIC;Z : OUT STD_LOGIC_VECTOR(1 DOWNTO 0));
END Top;
ARCHITECTURE structural OF Top IS...SIGNAL bus0, bus1, inv0, inv1: STD_LOGIC;
BEGINU1 : Parity
PORT MAP( Ain => A,Bin => B,Cin => C,Q0 => bus0,Q1 => bus1);
U2 : RegfilePORT MAP(...
END structural;
Net
Port
Cell
Reference
Design
DesignName of Entity, function or procedure
CellInstantiated component or subroutine
ReferenceName of used component or subroutine
PortInput/Output port
PinPort inside the reference
NetLocal signals or variables
ClockNo interpretation
Net
Pin
Reading Assignment
Read about these commands from Synopsys Documentation
Find and FilterRead / Analyze / ElaborateCompileReport_timing
Also read about what are Attributes and Variables
Outline of this course module
• Synopsys Design Environment Essentials• CMOS essentials for logic synthesis• Constraint Classification • Load and Drive Constraints• Clocking constraints• Operating Conditions Constraints• Static Timing Analysis• Chip Level Timing and Multiple Clock Domains
MOSFET Transistor
Source: MIT. Course 6.375. Lecture L06. 2006
Key qualitative Characteristics of MOSFET transistors
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
RC Model of an inverter
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Wires
Source: MIT. Course 6.375. Lecture L06. 2006
Distributed RC wire model
Source: MIT. Course 6.375. Lecture L06. 2006
This is also known as Elmore Delay model
Manual insertion of Repeaters
Source: MIT. Course 6.375. Lecture L06. 2006
Lumped RC wire model
Source: MIT. Course 6.375. Lecture L06. 2006
Estimate the rise time
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
1. Width of transistor is found by multiplying the scaling factor (16/8/2/1) with the minimum width of transistor which is 0.5 mm.
2. Multiply Cg,N/Cg,P/Cd,N/Cd,P with the width of the transistor to get the drain/gate capacitances for P and N transistors.
3. Wider transistor more capacitance
1. Divide Reff,N/Reff,P with the width of the transistor to get the Resistance for the N and P transistors.
2. Wider Transistor Less resistance
The sheet resistance (0.07) is for unit square.Since the wire width is 0,25mm. resistance for 1 mm X 0.25 mm wire is 0.07/0.25. This factor is multiplied by the length 250 mm
The wire capacitance is made up of two parts: Bottom (area) capacitance found using 250 X 0.25 (area) X CA,M2.Side capacitance is found by multiplying length 250 XCL,M32
The factor 2.2 comes from 90% Vdd swing loge(0.9Vdd / 0.1Vdd)
• Technology, Operating and Manufacturing Constraints– Max rise time, max
capacitance– Operating Conditions –
• Vdd, Temperature• Drive current, Load
– Process Variations• Fast corner, Slow corner
– Physical Design• Antenna rules
• Optimisation Constraints– Performance – clock– Area– Power
Constraints
Generic Synthesis Flow
Create a solution
Evaluate the solutionAnalysis
Constraints Met
DesignO
ptim
isati
on C
onst
rain
tsTechnology, O
perating &
Manufacturing Constraints
Static Timing Analysis (STA)
• Exhaustively verifies that – the timing constraints (clock) are met for a design – for given technology (Standard Cell Library) and – a set of specified operating conditions
• Limitations of the alternative – Simulation– Not Exhaustive– Accuracy
• RTL• Gate Level
– SDF back annotation– Dependent on STA
• Circuit Level SPICE simulation are impractical
– Time (STA also takes time, but is bounded)
PROCESS (clk) BEGIN IF rising_edge (clk) THEN s <= a * b; END IF; END
Timing Models - Accuracy
• Untimed• Transaction Level - SystemC
– Multiple Cycles– Bus Transactions, Transmit/Receive, Encode/Decode
• Cycle Accurate – RTL– What happens in each clock cycle is accurately known
• Gate Level – Event Driven– Physical details of computation, storage and interconnect operations known– Delay in wire is not known– Clock is ideal
• Layout Level– Delay in wire known– Clock is real– Relative position of standard cell is known
Delay Parameters – Intrinsic Delay & Slew
A=1
B Z
Vdd
0.5Vdd
t1 t2
PQ
R
B
y
z
x
Z
t1 t2
0.3Vdd
0.7Vdd
Vdd
Path Delay Calculation
Library and Design
Delay Computation Through Gate
Delay Computation Through Wire
Delay and SlewAt Gate Output
Delay and SlewAt Next Gate Input
D B
A
C
Environment Conditions for Analysis
• The intrinsic delays and the slews are characterised using SPICE simulation by sweeping many parameters that affects the Intrinsic delay and Slew
• All the paths are exhaustively covered
Paths & Path Groups
a
b
g
h
i
j
k
l
m
n
r s to
p
qd
ef
c
D Q
clk
D Q
clkb
c
d
e
f g
h
i
j
k
l
m
n
o
p
q r s t
• PathsStart point: Input ports or clock pins of sequential devices andEnd point: Output ports or Data input pins of sequential devices.
• Path groupsPaths are organised in groups identified by clocks controlling their endpoints.
Timing Arcs• positive unate timing arc:
• Combines rise delays with rise delays, and fall delays with fall delays. An example is an AND gate cell delay or an interconnect (net) delay.
• negative unate timing arc: Combines incoming rise delays with local fall delays, and incoming fall delays with local rise delays. An example is a NAND gate.
• nonunate timing arc: Combines local delay with the worst-case incoming delay value. Nonunate timing arcs are present in logic functions whose output value change cannot be predicted by the direction of the change on the input value. An example is an XOR gate.
• Accuracy of estimates is critical• Intrinsic Delays are accurate after logic synthesis• Slew and Net Delays are estimated and known accurately only after
physical synthesis
Factors Affecting Delay and Slew
Discrete Factors:
1. Geometry & Dimension 2. Specific Path3. Transition Direction4. Related Pin
A
B
P1 P2
N1
N2
Z
4 Input NAND gate
Factors Affecting Delay and Slew
Load on the Gate• Load of all the inputs that this output has to drive• Load of the interconnect wires• Tri-stated wires
Input Slew• Transition time at the previous gate• The interconnect• Primary input – drive strength, driver cell
Constraints
Technology Constraints
• Max Transition• Max Fanout• Max Capacitance • Min Capacitance
Design Constraints
• Set Load• Set Drive (inverse of resistance)
A
A
Z3
Z2
Z1 5
set_loador set_drive
set_driving_cell
• If drive or driving cell is not specified, the synthesis tool assumes infinite drive strength
• If load is not specified, the synthesis tool assumes zero load
Technology Constraint; Cannot be relaxed
Design Constraint
Interpolation and Extrapolation
Slew
Load
S1 S2
L1
L2
D11
D12
D21
D22
L
S
D1 D2D
Piece Wise Linear Model
Process, Voltage, Temperature (PVT) Variation & Operating Conditions
Process
De
lay
bestnominal
worst
Voltage
Del
ay
bestnominal
worst
De
lay
bestnominal
worst
Temperature
Operating ConditionsName Library Process Temp Volt Interconnect ModelWCCOM my_lib 1.50 70 1.1 worst_case_treeWCIND my_lib 1.50 80 1.1 worst_case_treeWCMIL my_lib 1.50 125 1.0 worst_case_treeBCCOM my_lib 1.50 0 1.2 best_case_treeBCIND my_lib 1.50 -40 1.2 best_case_treeBCMIL my_lib 1.50 -55 1.3 best_case_tree
PVT Variation: An Example
Now consider the variation in the following parameters:25 % variation in Threshold voltage – Vt
10 % variation in transconductance k’n mainly due to variation in oxide thickness.±0.15mm (about 10 %) variation in W and L. Variations in W and L are uncorrelated as they are ±0.5V (10%) variation in power supply voltage
Speed of device is proportional to the drain current and can thus result in variation of the speed of the circuit.
Id12---k
WL----- Vgs Vt– 2=
= (1/2) 19.6 10-6 (2)(5 - 0.75)2 = 354 A
ID MAX19.6 1.96+
2---------------------------
1.8 0.15+0.9 0.15–------------------------
5 0.5+ 0.75 0.1875– – 2 683A==
ID MIN19.6 1.96–
2---------------------------
1.8 0.15–0.9 0.15+------------------------
5 0.5– 0.75 0.1875+ – 2176A==
Consider a minimum size NMOS device in a 1.2 mm CMOS process. VGS =VDS = 5VThe nominal saturation current for the device size W = 1.8 mm, Leff = 0,9 um
DeratingLibraries are characterized for various operating conditions
Further characterisation is done to see how the delay model responds to change in process, voltage and temperature. This is done by holding two parameters constant and sweeping the third.
This yields derating factors for Process, Voltage and Temperature
Sequential Arcs
Timing relationship between 1. two input pins2. two consecutive events on the same input pin
1. Pulse Width2. Setup3. Hold4. Recovery5. Removal
Pulse Width
rst_n
PulseWidthRequirement
Not met. Reset mayhave no effect
1. Width of High and low phases of clocks2. Width of Active level of asynchronous inputs like reset
Setup
clk
Setup Requirement
Not met. New datamay not get latched
data
Data should be stable setup time before the arrival of clock edge.
What happens if the setup time is violated ?
Hold
clk
HoldRequirement
Not met. Old data maynot get latched
data
Data should be stable hold time after the arrival of clock edge.
What happens if the Hold time is violated ?
Recovery and Removal
rst_n
RecoveryRequirement
Not met. clk maynot have effect
clk
clk
RemovalRequirement
Not met. clk mayoverride rst_n
rst_n
Minimum time between de-assertion of an asynchronous control signal and
the next active clock edge
Minimum time between an active clock edge that an asynchronous
control signal should remain asserted
Can be formulated as a setup check Can be formulated as a hold check
What is the reason for setup and hold
Vin1 Vout1
Vin2 Vout2
c
ba
a
b
c
Vin1, Vout2
Vin 2,
Vout
1
Vin1 = Vout2
Vin2 = Vout1
Transistor Level Schematic of a D-Flophttp://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Working of the D-Flop work at Transistor Level
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Setup and Hold Time at Circuit Level
The time it takes data D to reach node Z is called the setup time.
The time it takes data D to reach node W is called the hold time.
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Negative Hold Time
http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time
Generalizing Setup & Hold Constraints
data
clk
F1
Delay D1
Delay C1
Boundary of the Flop1. Assume C1 is zero2. clk reaches F1 before data has arrived at F1 and
registers wrong data3. To avoid this, data should stabilize D1 time
before the arrival of clk. 4. In reality, C1 is never zero, so data should
stabilize D1-C1 time before the arrival of clk.5. As there are multiple D1 paths and multiple C1
paths, the complete and safe setup constraint is max (data path delays) – min (clock path delays)
Setup Constraint
1. Assume D1 is zero2. Data reaches F1 before clk has arrived at F1. When the clk arrives, new data has
overwritten the previous data.3. To avoid this, data should remain stable C1 time after the arrival of clk. 4. In reality, D11 is never zero, so data should remain stable C1-D1 time after the arrival of
clk.5. The complete and safe hold constraint is max (clock path delays) – min (data path delays)
Hold Constraint
Negative Hold
clk
Negative Hold – Seen At Device Interface
At Device Interface
clk At Latching Element
data Stable New
Stable Newdata
data
clk
F1
Delay D1
Delay C1
Boundary of the Flop 1. Typically clock paths are well buffered and faster2. There can be substantial data path delay,
especially in scan flops3. max (data path delays) – min (clock path delays)
is always positive. This implies that Setup constraint is never negative
4. max (clock path delays) – min (data path delays) can be negative. This implies that Hold constraint can be negative
Setup + Hold (cannot be negative) = Max(clock path) + Max(data path) – Min(clock path) – Min(data path)
Specifying Input Delay
FF 1 m FF 2n
clk
clk-to-Q tsetup
m n
inpdelay
inBlock myDesign
set_input_delay -clock Clock 8 “data_in_2”
Good design practice mandates that inBlock does not have a combinatorial logic (”m”) driving output
These days ”m” is more likely to be the result of global interconnect delay.
Early floorplanning is a good way to estimate the delay due to ”m”
If floorplanning is not done a good bet is 50-60% of the clock cycle
Characterize command automatically calculates input delay from parent design
Specifying Output Delay
FF 1 s FF 2t
clk
clk-to-Q tsetup
s t
outpd elay
myDesign outBlock
set_output_delay -clock Clk -max -fall 10 {"Z<0>" "Z<1>"}
General Timing Constraints
I1
clk
F1C1
F3F2C0 C2 C3 O1
C4I2 O2
Four kinds of path groups exist:1. Input to Output, e.g., I2 to O22. Input to Register, e.g, I1 to F13. Register to Register F1 to F24. Register to Output F3 to O1
TI1, TI2 are input delaysDQ1, DQ2 and DQ3 are clk-to-Q delaysS1, S2 and S3 are setup constraintsH1, H2 and H3 are hold constraintsC0-C3 combinatorial delaysP is the clock Period
O2 = TI2 + C4
TI1 + C0 ≤ P – S1TI1 + C0 ≥ H1Setup Slack: P- S1- TI1- C0Hold Slack: TI1 + C0 - H1Setup and Hold Slacks should be positive
DQ1 + C1 ≤ P – S2DQ2 + C1 ≥ H2Setup Slack: P - S2 - DQ2 - C1Hold Slack: DQ2 + C1 – H2
Gate Level Simulation
Gate Level Design
Simulator
Timing Analysis Tool
Simulation Library Timing Library
SDF File
Clock Distribution
Source: MIT. Course 6.375. Lecture L06. 2006
Clock SkewThe basic assumption in synchronous system is that all the sequential elements in the design sample their input at the same time, marked by a clock signal. In reality, the clock signal does not arrive at the sequential elements at the same time. The difference in time between the reference clock signal and the local clock signal at a sequential element is called the clock skew. In fact clock skew would not be a problem if the clock signal was uniformly delayed at all the sequential elements. It is the non-uniform delay of the clock signal that creates the problem. The delay depends on the distance of the sequential element from the clock source and the local load.The primary reason for the delay is the large amount of load seen by the clock signal. The load consists of all the sequential elements in the design and clock net itself which behaves as a distributed RC line (or higher order models ) and can be several cms long in a large chip. The total capacitance of a single clock line easily measures hundreds of pF and can easily reach into nF range. The total clock capacitance of the Alpha processor equals 3.25 nF, which is 40% of the total switching capacitance of the entire chip.
Clock Skew in Alpha Processor
Clock D
rivers
Clock Skew
Source: MIT. Course 6.375. Lecture L06. 2006
Clock Jitter
Source: MIT. Course 6.375. Lecture L06. 2006
Source: MIT. Course 6.375. Lecture L06. 2006
Clock Skew and Sequential Circuit Performance
CL1 R1 CL2 R2 CL3 R3In Out
t’ t’’ t’’’
tl,min tl,maxtr,min tr,max
ti
R1 R2
’ ’’
tr,min + tl,min +tit’ t’’ =t’ +
data
(a) Race between clock and data.
R1 R2
’ ’’+T
tr,max + tl,max +ti
(b) Data should be stable before clock pulse is applied.
t’ t’’ +T=
data
’’
t’+T
Each synchronous module is composed of combinational logic CL and a Flop and is characterised by six timing parameters: The min. and max. propagation(pg) delays of the register: tr,min, tr,max and combinational logic: tl,min, tl,max. The propagation delay of the interconnect ti and the local clock skew tf.
The max pg. delay corresponds to the time taken by the slowest output to respond to any transition at input. This delay constraints the max. allowable clock speed. The min pg. delay corresponds to the time taken by atleast one output to start responding to a transition at input. This delay is typically much smaller than the max delay and determines the amount of skew a circuit can tolerate before race condition occurs. If d is greater tr,min + ti + tl,min than inputs at R2 can change before the previous inputs are latched.
tf” tf’ + tr,min + ti + tl,min OR
d tr,min + ti + tl,min
tf” + T tf’ + tr,max + ti + tl,max OR
T tr,max + ti + tl,max - d
Positive and Negative Clock Skew
R CL R CL RData
CL
R CL R CL RData
CL
(a) Positive Skew
(b) Negative Skew
tr,min + ti + tl,min
T tr,max + ti + tl,max -
• Positive Skew: d > 0:In this case the clock is routed in the same direction as the data and the first equation needs to be satisfied. Violating it will result in malfuntioning of circuit. Observe that slowing down the clock period does not help. The positive skew actually helps improve the clock speed as it is a negative factor in the constraint on clock period T.
• Negative Skew: d < 0:The negative skew occurs when the data is routed in the direction opposite to the clock signal. The first equation is unconditionally satisfied and the circuit works correctly independent of the skew. Unfortunately, negative skew will limit the clock speed and thus lower the performance, as predicted by the second equation: the skew reduces the time available for computation by |d|.
LaunchClock
Setup time met Hold time met
0a bc
0
a
bc d
CaptureClock
a bd 0
LaunchClock
Setup time violated Hold time violated
a
0b
c d
CaptureClock
a’ b’d 0
a bc 0
LaunchClock
0a b
Setup time violated Hold time met
c
0
a
bc d
CaptureClock
d 0
Setup Violations result from worst case timingHold Violations result from best case timing
FF 1logic FF 2logic
startpoint
endpoint
setup
relationshiphold
relationship
Chip Level Timing Issues
1CGU
2 3 4
6 5
7
4
88
1CGU
2 3 4
6 58
7
4
8
Blocks 4 & 8 communicate and need their clocks to be skew alligned
The data signals between Blocks 4 & 8 could take more than one clock cycle and can get routed through blocks 5 and 6
This makes chip level timing closure difficult and sensitive to geometry.A hierarchical design style, where each chiplets are timing closed independently and chip can be composed from such chiplets. Solution: Latency insensitive design.
Categories of Synchronization
Clock Based Data BasedGS
GALS
Double Latch
Handshake: 2 Phase, 4 Phase
Asynchronous – 2 Clock FIFO
Clock based
synch
ronizationData
based
synch
ronization
ConstraintsComplexity
Late
ncy
ambi
guity
GRLS (KTH Technology)
Send and Forget – Double Latching
ACLS D
CLKs CLKD
PDPS
VinVout
in
VIH
VIL
VMS
t
1
0
v t( ) VMS
v 0( ) VMS
–( ) et /+=
D Q D QPs PD
CLKD
Source Destination
ACL: Asynchronous Communication Link
Send and Forget – Double Latching
Advantages• Good choice for single bit control
data• Grey coded multi bit data
payloads are also target
Disadvantages• No Flow Control Send and Forget• Metastable signal to multiple
targets could resolve to different values
Handshake ACLAsynchronous Communication Link
ACL
S D
CLKs CLKD
PDPS
RS
AS
RD
AD
D Q D Q
AS Q DQ D
Ps PD
FSM
FSM
AD
RDRS
D Q
CLKs
CLKDPs: Source Payload
Pd: Destination Payload
Data payload frequency must be less than the worst-case round trip delay of the flow control
2-phase3Ts + 3Td ≥ TPs
4 phase6Ts + 6Td ≥ TPs
Example:Source: 27 MHz, Destination: 200 MHz
Maximum isochronous data rate using 2 phase protocol3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz
Data payload frequency must be less than the worst-case round trip delay of the flow control
2-phase3Ts + 3Td ≥ TPs
4 phase6Ts + 6Td ≥ TPs
Example:Source: 27 MHz, Destination: 200 MHz
Maximum isochronous data rate using 2 phase protocol3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz
3Ts + 3Td
TPs
6Ts + 6Td
TPs
TPs
The period for which data remains valid/asserted
2-phase3Ts + 3Td
4-phase6Ts + 6Td
1. Note that TPs does not decide data payload frequency. TPs is less than the round trip delay to enable the next payload to be transferred immediately after the round trip delay is over.
2. The period (TPL)corresponding to the data payload frequency has to be more than the worst case round trip delay i.e. 3Ts + 3Td ≤ TPL and 6Ts + 6Td ≤ TPL for 2 and 4 phase protocols respectively. This is illustrated in the example below
2 Clock Asynchronous FIFO
• Fail Safe, Self Correcting:• Write logic could think the FIFO
is full when it is not• Read logic could think that the
FIFO is empty when it is not
• Not suitable for Island hopping:• Storage in Write Island is a
problem• Typically the read side needs to
be read every cycle
GALS Globally Asynchronous Locally Synchronous
Source: ETH, Zurich
GALS
Clocking and Communication Schemes
• Synchronous Design – phase and skew alligned• Mesochronous Design – same clk freq and phase
alligned• Ratiochronous Design
Different Clock freqs but have rational relationship – phase alligned
KTH research
• Pleisochronous– No rational clock relationship – phase relationship
drifts• Asynchronous
Ideal vs Real Clock
During the initial phase of synthesis clock is idealset_auto_disable_drc_nets command should be
used to prevent DC from wasting time on fixing DRC violations on high fanout nets like Resets and Clocks
Model skew and jitter effects using the set_clock_uncertainity command
Model clock network latency using set_clock_latency command
Once clock tree has been inserted use the set_propagated_clock command to use the actual clock. Back annotation using read_sdf command is required
Modelling Clock Skew