PCI Express: PCIE overview, understanding of Gen4 Equalization. · Figure 2 shows the frequency...
Transcript of PCI Express: PCIE overview, understanding of Gen4 Equalization. · Figure 2 shows the frequency...
806, 8th Floor,
BPTP Park Centra, Sector – 30, Gurgaon.
Pin: 122001 Tel: +91-124-4117336
Website: http://www.logic-fruit.com
PCI Express: PCIE overview, understanding of Gen4 Equalization.
By: Aditya Locharla, R & D Engineer
Logic Fruit Technologies
White Paper
whitepaper @ Logic Fruit Technologies P
A
G
PCI Express: PCIE overview, understanding of Gen4 Equalization. Overview:
PCIe is the industry standard I/O interconnect
supporting speed up to 16GT/s through a single
lane in Gen 4.0. Its ability to support such high
speeds in physical layer comes from its capacity
to extract data through the process of
equalization. Equalization is a recommended
process when the device is operating at an
8GT/s and above rates.
This document discusses the introduction to
PCIe followed by the need to do equalization. It
further clearly explains how the equalization is
done in the case of PCIe at 8GT/s and higher
rates.
Keywords: PCIe; Ordered sets; Equalization;
Ordered Sets; Symbols; Pre-cursor, cursor,
Post-cursor.
I. PCIe Introduction:
PCI Bus was first introduced in the early 1990s,
and it had a unifying effect on different I/O buses
available on PC at that time. It was popular for
various reasons such as processor operation,
plug and play operation, etc. Although PCI was
a success, it had great limitations such as not
improving bus clock frequency at the same rate
as the processor speeds which made it
adequate for certain applications. Therefore
need has come for the introduction of
derivatives. PCI Express has been introduced
by Intel primarily to take care of multimedia
applications including streaming audio and
video, which require guaranteed bandwidth and
deterministic latency without which user
experiences glitches.
PCI Express architecture is specified in layers.
Write and Read operations are generated by
software which is transported to transport layer
where packets are generated. Data link layer
adds sequence numbers and CRC to packets or
decodes them in the case of reception. The
basic physical layer contains two simplex
channels that are implemented as transmit and
receive pair. PCI Express introduces the
concept of multiple lanes to increase the
bandwidth. Physical layer provides x1, x2, x4,
x8, x12, x16, and x32. Here the number of lanes
linearly scales the bandwidth. Till now there
have been four generations of PCIE released
with bandwidth almost doubling each time as
2.5, 5, 8, 16Gbps. The PCI Express architecture
is designed in a way to support future
performance enhancements via speed
upgrades and advanced encoding techniques
and the techniques only impact the physical
layer definition. This layering information will
look as shown in figure 1.
Figure 1: PCIE Layering Diagram
Physical Layer
Tx Rx
TRANSACTION
LAYER
DATA LINK LAYER
Logical sub-block
Electrical sub-block
whitepaper @ Logic Fruit Technologies P
A
G
II. PCIe Equalization:
A. Equalization:
Equalization by definition is the process of
adjusting the balance between frequency
components within an electrical signal. After this
process, a frequency response to the system
looks flat (i.e., making the response of all
frequencies equal). Hence the term
“equalization.” Figure 2 shows the frequency
response before and after equalization.
Figure 2: frequency response for uncorrected
and corrected equalization.
PCIe Gen 4.0 has a bit rate of 16GT/s and
provide significant signal attenuations at higher
rates. The high-frequency component of PCIe
4.0 signal gets diminished while passing such a
band-limited channel. The result is distortion
and spreading of the transmitted signal over
multiple symbols, generating Inter-symbol
interference (ISI) and bit errors at the receiver
[3] creating a distorted Eye. We can see a
closed eye once there is no equalization
applied.
To compensate this distortion, Tx equalization,
Rx equalization, and equalization training are
executed in the device. All 8.0 and 16.0 GT/s
transmitters must support for the equalization
procedure while it is optimal in the case of
receivers.
Figure 3: Tx Equalization FIR filter.
Tx voltage parameters include equalization
coefficients, equalization presets, and Max/ Min
voltage swings. The Tx coefficients are based
on FIR filter relationship as shown in figure 3.
Here we see the coefficients C-1, C0, and C+1
are called precursor, cursor and post-cursor
values. The pre-cursor voltage (Vc) is referred
to pre-shoot while that of for post-cursor (Vb) it
is de-emphasis. A low-frequency pattern within
the compliance pattern (selected with minimal
ISI) is used to measure the values of Voltage
swing and presets.
B. De-emphasis and Pre-shoot
Fig. 3 shows an example of the resultant output
signal when a binary input stream is applied to
a 3-tap FIR Filter. The output takes a different
value just before and after polarity inversion of
the input bit stream. Voltages Va, Vb, Vc, and Vd
correspond to De-emphasis, Flat level, Pre-
shoot and Maximum-boost events, respectively
as shown in figure 4.
whitepaper @ Logic Fruit Technologies P
A
G
Figure 4: Tx voltage levels and Equalization
ratios.
Behavioral Rx Equalization for 2.5 and 5.0 GT/s
is not needed. The combination of worst case
channel, behavioural Rx package, and Tx jitter
at 2.5 and 5.0 will yield open eye. In the case of
8.0 and 16GT/s, the stressed eye will be closed
making it unfeasible for measurement. We
follow behavioural receiver equalizer that
implements 1st order CTLE and 2nd order DFE
to overcome closed eye scenarios.
To measure Tx voltage parameters include
equalization coefficients, equalization presets,
and min/max voltage swings. A low-frequency
compliance pattern which has 64 zeros followed
by 64 ones is used to measure Tx voltage swing,
and equalization presets at 8.0GT/s, and 16GT/s.
Following are the list of presets all the devices
has to use for the 8GT/s and 16 GT/s.
Preset Preshoot
(dB)
Deemphasis(dB) C-1 C+1
P4 0.0 0.0 0.000 0.000
P1 0.0 -3.5 +/- 1 dB 0.000 -
0.167
P0 0.0 -6.0 +/- 1.5 dB 0.000 -
0.250
P9 3.5 +/- 1
dB
0.0 -
0.166
0.000
P8 3.5 +/- 1
dB
-3.5 +/- 1 dB -
0.125
-
0.125
P7 3.5 +/- 1
dB
-6.0 +/- 1.5 dB -
0.100
-
0.200
P5 1.9 +/- 1
dB
0.0 -
0.100
0.000
P6 2.5 +/- 1
dB
0.0 -
0.125
0.000
P3 0.0 -2.5 +/- 1 dB 0.000 -
0.125
P2 0.0 -4.4 +/- 1.5 dB 0.000 -
0.200
P10 0.0 - 0.000 -
Table 1: Presets for 8GT/s and 16GT/s.
Presets from P11 – P15 are reserved for future
development.
III. Equalization procedure for PCIe 4.0
PCIe 4.0 specification uses the similar adaptive
algorithm as 3.0 specification to adjust the
transmitter and receiver setup of each lane to
improve signal quality when operating at
8.0GT/s and higher data rates.
All the lanes that are associated with LTSSM
must participate in equalization procedure. The
process must be executed during the first data
rate change to 8.0 GT/s as well as for 16 GT/s.
Components must store the Transmitter setups
that was were agreed to during the equalization
procedures and use them for future operations
at 8.0 GT/s and higher data rates. Components
are permitted to fine-tune their Receiver setup
even after the equalization procedure is
complete as long as doing so does not cause
the Link to be unreliable.
The equalization procedure can be initiated
either autonomously or by software. It is strongly
recommended that components use the
autonomous mechanism. However, a
component that chooses not to participate in the
autonomous mechanism must have its
associated software ensure that the software-
based mechanism is applied.
whitepaper @ Logic Fruit Technologies P
A
G
Once the Transmitter and Receiver setup of each
Lane is adjusted for each common data rate
supported above 5.0 GT/s, The equalization
procedure is considered complete. The
downstream port is required to make the
transition from L0 to Recovery to change the
data rate and to perform equalization procedure.
Upstream is permitted but not required to make
this transition. Downstream must not advertise
16GT/s support in recovery if it entered with an
intention to perform 8.0 GT/s equalization
procedure. Upon completion of the 8.0 GT/s
speed change and equalization, only the
downstream can advertise the next data rate. So
it is mandatory to execute 8 GT/s data rate
equalization before running 16 GT/s
equalization.
If the Downstream Port wants to redo
equalization, it may also request the Upstream
Port to make a re-equalization request. For the
Downstream Port to make this request, it sets the
Request Equalization and Quiesce Guarantee
bits to 1b and sets the Equalization Request Data
Rate bit to the data rate at which the equalization
must be redone in the TS2 Ordered Sets in
Recovery.RcvrCfg. The Upstream Port may
eventually respond with a re-entry to Recovery
with the Request Equalization and Quiesce
Guarantee bits set to 1b and the Equalization
Request Data Rate bit set to the data rate
requested by the Downstream Port if it is capable
of doing so. To understand better about state
transition during linkup refer the “PCIe Express
Base Spec Ver 4.0.”
So as mentioned, state transition happens from
L0 to recovery.Rcvrlock and exits to L0 as it
completes equalization. State transition to this
equalization procedure is as below.
Figure 5: Equalization state machine transitions
From the above figure 5, we can see state
transition happening from path 1 to path 9. These
state changes occur according to paths
mentioned in the ideal case where the Ordered
Sets are transmitted and received without
significant errors, i.e., speed change and link
equalization complete successfully.
“directed_speed_change” and
“start_equalization_w_preset” variables are
enabled and disabled at their respective states to
complete proper state transition.
A. PATH 1:
L0 is the normal operating state where data
packets and top layer packets can be transmitted
and received, i.e., L0 support transport layer
packets and data link layer packets. L0 operation
can have data rate of 2.5 GT/s, 5 GT/s, 8GT/s or
whitepaper @ Logic Fruit Technologies P
A
G
16GT/s. When the device reaches L0 with the
data rate of 2.5GT/s or 5 GT/s or 8 GT/s to link
up for next highest common rate, downstream
has to initiate a state transition from L0 to
Recovery.Rcvrlock following path 1.
Downstream sets the “directed_speed_change”
variable to ‘1’. At Recovery.Rcvrlock, the
transmitter sends TS1 Ordered Set and also sets
speed change (Symbol 4, bit 7) to the same value
as “directed_speed_change” variable. Receiving
this speed change initiation from downstream,
upstream transits from L0 to Recovery.Rcvrlock
following path 1 for both downstream and
upstream.
B. PATH 2:
Device moves from Recovery.Rcvrlock to
Recovery.Rcvrcfg through path two once it
receives eight consecutive TS1 or TS2 Ordered
Sets on all lanes with link and lane numbers that
match what is being transmitted on these lanes.
C. PATH 3:
When the device is in Recovery.Rcvrcfg,
Transmitter sends TS2 Order sets on all the
configuration lanes with received link and lane
numbers. In the transmitted TS2, Transmitter
Preset and Receiver Preset Hint fields are set to
values as specified in Equalization Control
Registers. Also the speed change bit ( symbol 4,
bit 7) is set to the same value as
“directed_speed_change” variable. State
transition happens to the Recovery.Speed
through path three if eight consecutive EQ TS2
or 8GT EQ TS2 are received with speed change
(Symbol 4, bit 7) bit to ‘1’. In Recovery.Rcvrcfg
“start_equalization_w_preset” variable is set to
1b and received preset values are saved as initial
preset values.
D. PATH 4:
Once the device completes speed change,
“directed_speed_change” variable is reset to ‘0’
and device moves to Recovery.rcvrlock state
through path 4.
E. PATH 5:
State transition happens from recovery.rcvrlock
to recovery.equalization as the
“start_equalization_w_preset” variable is
already set ‘1’ when the device is in
recovery.Rcvrcfg state.
The Link equalization procedure enables
components to adjust the Transmitter and the
Receiver setup of each Lane to improve the
signal quality. Equalization method has four
phases as defined below. During equalization,
phase information is transmitted through
Equalization Control (EC) bits in TS Ordered
Sets.
Phase 0: The Upstream Port sends TS1 Ordered
Sets with EC = 00b (Symbol 6, bits 1:0) and
Preset value (Symbol 6, bit 6:3) it received in
EQTS2 symbols from Downstream Port during
Recovery.RcvrCfg sub-state. This Phase is not
applicable for the Downstream Port; it directly
starts with Phase 1. Next state for downstream is
phase 1 if it successfully receives two
consecutive TS1 Ordered Sets with EC = 01b.
Phase 1: In this Phase, both components make
the Link operational enough at 8.0 GT/s or above
data rates to be able to exchange TS1 Ordered
Sets to complete remaining phases for the fine-
tuning their Transmitter/Receiver pairs. In this
Phase, both components advertise their FS
(Symbol 7, bits 5:0) and LF (Symbol 8, bits 5:0)
values in the respective TS1 fields. The
Downstream Port first enters into Phase 1 and
starts transmitting TS1 Ordered Sets with EC =
whitepaper @ Logic Fruit Technologies P
A
G
01b and using the Preset values from each Lane’s
Equalization Control Register (Part of Secondary
PCI Express Extended Capability). After
receiving these TS1 OS with EC = 01b, the
Upstream Port transitions to Phase 1, where it
continues to transmit the same Preset values it
was transmitting in Phase 0. The Downstream
Port, after receiving these TS1 OS with EC = 01,
transitions to Phase 2.
Phase 2: In this Phase, the Upstream Port helps
the Downstream Port to fine tune its transmitter
equalization setting (Preset/Coefficient) until a
BER of less than 10-12 is achieved on all
downstream lanes. Bit error rate is calculated at
physical layer level. Multiple iterations may be
performed to obtain the optimum equalization
Settings The Downstream Port first enters into
Phase 2 and transmits TS1 Ordered Sets with
EC=10b. For the first iteration, the Preset values
are kept same as in Phase 1. For the subsequent
iterations, Presets/Coefficient values are same as
in the Preset/Coefficient change request it
received from the Upstream Port in this Phase.
Use_Preset field (Symbol 6, bit 7) is used to
identify whether the current request is to change
the Preset or the Coefficients.
If Use_Preset = 1, then the current
request is for the Preset change and the
requested Preset is reflected in the
appropriate field (Symbol 6, bits 6:3).
If Use_Preset = 0, then the current
request is for the Coefficient change and
the requested Preset is reflected in the
appropriate fields [Pre-Cursor (Symbol 7,
bits 5:0), Cursor (Symbol 8, bits 5:0),
Post-Cursor (Symbol 9, bits 5:0)]. The
Upstream Port evaluates the received
TS1 OS and may request the Port on the
other side of the link to change Preset or
Coefficients.
The entire process of requesting a different Preset
or coefficients and evaluating the received TS1
OS is repeated until the Upstream Port is satisfied
that the required BER is achieved on the
downstream lanes. Then the Upstream Port
transitions to the next Phase (Phase 3).
Phase 3: This Phase is similar to Phase 2 except
the difference that the roles of the Downstream
and Upstream Ports are interchanged. In this
Phase, the Downstream Port helps the Upstream
Port to fine tune its transmitter equalization
setting (Preset/Coefficient) until a BER of less
than 10-12 is achieved on all upstream lanes.
F. PATH 6:
Once the equalization is successful as instructed
by top layers, the device moves from
Recovery.equalization to Recovery.rcvrlock.
G. PATH 7:
Once the condition of receiving eight
consecutive TS OS is satisfied, device then
moves to Recovery.rcvrcfg.
H. PATH 8:
As the “directed_speed_change” variable is set
to ‘0’ after successful speed change and speed
change bit (Symbol 4, bit 7) in TS received is ‘0’,
the device moves from Recovery.rcvrcfg to
Recovery.idle.
I. PATH 9:
Top layers direct next state to the device by
setting corresponding register. Let’s say the
speed is in 8GT/s but the maximum advertised
and received speed is 16 GT/s, device moves to
whitepaper @ Logic Fruit Technologies P
A
G
L0 without considering directed top layer
registers, else if we achieve common rate, device
moves to the state as directed.
IV. ORDERED SETS
For above-described equalization, we use mainly
five types of Ordered Sets. At 128/130 bit
encoding for rates 8GT/s and above, we have 128
bit (16 symbols) and two header bits. Symbol
description of each Ordered Set is as below.
A. EIEOS: Electrical Idle Exit Ordered Set
The Electrical Idle Exit Ordered Set (EIEOS) is
transmitted only when operating at speeds other
than 2.5 GT/s. It is a low-frequency pattern
transmitted periodically to help ensure that
receiver Electrical Idle exit circuitry can detect
an exit from Electrical Idle. It is transmitted once
every 32 frames.
Symbol numbers Value
Even symbols 00h
Odd symbols FFh Table 2: EIEOS for 8GT/s and 16GT/s.
B. EIOS: Electrical Idle Ordered Set
Before a Transmitter enters Electrical Idle, it
must always send the Electrical Idle Ordered Set
(EIOS), unless otherwise specified.
At 2.5GT/s and 5GT/s, it is a COM symbol
followed by three IDL symbols. At 8GT/s and
above rates, its value is 66h.
C. SKP: Skip Ordered Set
SKP Ordered Sets are used to compensate for
differences in frequencies between bit rates at
two ends of a Link. Receiver Physical layer sub-
block must include elastic buffering which
performs this compensation. SKP can be 8, 12,
16, 20 and 24 symbols as the receiver can add or
remove 4 symbols from the received data.
Information on last four symbols can be LFSR,
Data parity or Error status. The values it carries
changes with LTSSM states. Refer the “PCIe
Express Base Spec Ver 4.0.”
D. TS1 & TS2:
TS1 and TS2 are the training sequences used by
both the devices doing link training to
communicate the capabilities. As explained
before, equalization also uses bits inside symbols
of TS Ordered sets.
TS1
TS2 Symbol 0
1Eh 2Dh
Symbol 1
Link number Link number
Symbol 2
Lane number Lane number
Symbol 3
N_FTS N_FTS
Symbol 4
Data rate identifier Bit 0 – Reserved Bit 1 – 2.5 GT/s Data Rate Supported. Must be set to 1b. Bit 2 – 5.0 GT/s Data Rate Supported. Must be set to 1b Bit 3 – 8.0 GT/s Data Rate Supported. Must be set to 1b Bit 4 – 16.0 GT/s Data Rate Supported. Bit:5 – Reserved.
Data rate identifier Bit 0 – Reserved Bit 1 – 2.5 GT/s Data Rate Supported. Must be set to 1b. Bit 2 – 5.0 GT/s Data Rate Supported. Must be set to 1b Bit 3 – 8.0 GT/s Data Rate Supported. Must be set to 1b Bit 4 – 16.0 GT/s Data Rate Supported. Bit:5 – Reserved. Bit 6 – Autonomous Change/Selectable De-emphasis.
whitepaper @ Logic Fruit Technologies P
A
G
Bit 6 – Autonomous Change/Selectable De-emphasis. Bit 7 – speed_change.
Bit 7 – speed_change.
Symbol 5
Training Control Bit 0 – Hot Reset Bit 0 = 0b, De-assert Bit 0 = 1b, Assert Bit 1 – Disable Link Bit 1 = 0b, De-assert Bit 1 = 1b, Assert Bit 2 – Loopback Bit 2 = 0b, De-assert Bit 2 = 1b, Assert Bit 3 – Disable Scrambling in 2.5 GT/s and 5.0 GT/s data rates; Reserved in other data rates Bit 3 = 0b, De-assert Bit 3 = 1b, Assert Bit 4 – Compliance Receive Bit 4 = 0b, De-assert Bit 4 = 1b, Assert Bit 5:7 – Reserved
Training Control Bit 0 – Hot Reset Bit 0 = 0b, De-assert Bit 0 = 1b, Assert Bit 1 – Disable Link Bit 1 = 0b, De-assert Bit 1 = 1b, Assert Bit 2 – Loopback Bit 2 = 0b, De-assert Bit 2 = 1b, Assert Bit 3 – Disable Scrambling in 2.5 GT/s and 5.0 GT/s data rates; Reserved in other data rates Bit 3 = 0b, De-assert Bit 3 = 1b, Assert Bit 4:7 – Reserved
Symbol 6
Bit 1:0 – Equalization Control (EC). This field is only used in the Recovery.Equalization and Loopback LTSSM states. It must be set to 00b.
Bit 4:0 – Reserved. Bit 5 – Equalization Request Data Rate. This bit is defined for use in the Recovery.RcvrCfg LTSSM state. In all other LTSSM states, it is Reserved. Bit 6 – Quiesce
Bit 2 – Reset EIEOS Interval Count. This bit is defined for use in the Recovery.Equalization LTSSM state. In all other LTSSM states, it is Reserved. Bit 6:3 – Transmitter Preset. Bit 7 – Use Preset/Equalization Redo. This bit is defined for use in the Recovery.Equalization, Recovery.RcvrLock and Loopback LTSSM states.
Guarantee defined in the Recovery.RcvrCfg LTSSM state. In all other LTSSM states, it is Reserved. Bit 7 – Request Equalization is defined for use in the Recovery.RcvrCfg LTSSM state. In all other LTSSM states, it is Reserved.
Symbol 7
Bit 5:0 – FS when the EC field of Symbol 6 is 01b. Otherwise, Pre-cursor Coefficient Bit 7:6 – Reserved.
Bit 2:0 – 16.0 GT/s Receiver Preset Hint. Bit 6:3 – 16.0 GT/s Transmitter Preset. Bit 7 – 1b.
Symbol 8
Bit 5:0 – LF when the EC field of Symbol 6 is 01b. Otherwise, Cursor Coefficient Bit 7:6 – Reserved.
45h
Symbol 9
Bit 5:0 – Post-cursor Coefficient Bit 6 – Reject Coefficient Values Bit 7 – Parity (P).
45h
Symbol 10
4Ah 45h
Symbol 11
4Ah 45h
Symbol 12
4Ah 45h
Symbol 13
4Ah 45h