PCI Express: PCIE overview, understanding of Gen4 Equalization. · Figure 2 shows the frequency...

10
806, 8 th Floor, BPTP Park Centra, Sector – 30, Gurgaon. Pin: 122001 Tel: +91-124-4117336 Website: http://www.logic-fruit.com PCI Express: PCIE overview, understanding of Gen4 Equalization. By: Aditya Locharla, R & D Engineer Logic Fruit Technologies White Paper

Transcript of PCI Express: PCIE overview, understanding of Gen4 Equalization. · Figure 2 shows the frequency...

806, 8th Floor,

BPTP Park Centra, Sector – 30, Gurgaon.

Pin: 122001 Tel: +91-124-4117336

Website: http://www.logic-fruit.com

PCI Express: PCIE overview, understanding of Gen4 Equalization.

By: Aditya Locharla, R & D Engineer

Logic Fruit Technologies

White Paper

whitepaper @ Logic Fruit Technologies P

A

G

PCI Express: PCIE overview, understanding of Gen4 Equalization. Overview:

PCIe is the industry standard I/O interconnect

supporting speed up to 16GT/s through a single

lane in Gen 4.0. Its ability to support such high

speeds in physical layer comes from its capacity

to extract data through the process of

equalization. Equalization is a recommended

process when the device is operating at an

8GT/s and above rates.

This document discusses the introduction to

PCIe followed by the need to do equalization. It

further clearly explains how the equalization is

done in the case of PCIe at 8GT/s and higher

rates.

Keywords: PCIe; Ordered sets; Equalization;

Ordered Sets; Symbols; Pre-cursor, cursor,

Post-cursor.

I. PCIe Introduction:

PCI Bus was first introduced in the early 1990s,

and it had a unifying effect on different I/O buses

available on PC at that time. It was popular for

various reasons such as processor operation,

plug and play operation, etc. Although PCI was

a success, it had great limitations such as not

improving bus clock frequency at the same rate

as the processor speeds which made it

adequate for certain applications. Therefore

need has come for the introduction of

derivatives. PCI Express has been introduced

by Intel primarily to take care of multimedia

applications including streaming audio and

video, which require guaranteed bandwidth and

deterministic latency without which user

experiences glitches.

PCI Express architecture is specified in layers.

Write and Read operations are generated by

software which is transported to transport layer

where packets are generated. Data link layer

adds sequence numbers and CRC to packets or

decodes them in the case of reception. The

basic physical layer contains two simplex

channels that are implemented as transmit and

receive pair. PCI Express introduces the

concept of multiple lanes to increase the

bandwidth. Physical layer provides x1, x2, x4,

x8, x12, x16, and x32. Here the number of lanes

linearly scales the bandwidth. Till now there

have been four generations of PCIE released

with bandwidth almost doubling each time as

2.5, 5, 8, 16Gbps. The PCI Express architecture

is designed in a way to support future

performance enhancements via speed

upgrades and advanced encoding techniques

and the techniques only impact the physical

layer definition. This layering information will

look as shown in figure 1.

Figure 1: PCIE Layering Diagram

Physical Layer

Tx Rx

TRANSACTION

LAYER

DATA LINK LAYER

Logical sub-block

Electrical sub-block

whitepaper @ Logic Fruit Technologies P

A

G

II. PCIe Equalization:

A. Equalization:

Equalization by definition is the process of

adjusting the balance between frequency

components within an electrical signal. After this

process, a frequency response to the system

looks flat (i.e., making the response of all

frequencies equal). Hence the term

“equalization.” Figure 2 shows the frequency

response before and after equalization.

Figure 2: frequency response for uncorrected

and corrected equalization.

PCIe Gen 4.0 has a bit rate of 16GT/s and

provide significant signal attenuations at higher

rates. The high-frequency component of PCIe

4.0 signal gets diminished while passing such a

band-limited channel. The result is distortion

and spreading of the transmitted signal over

multiple symbols, generating Inter-symbol

interference (ISI) and bit errors at the receiver

[3] creating a distorted Eye. We can see a

closed eye once there is no equalization

applied.

To compensate this distortion, Tx equalization,

Rx equalization, and equalization training are

executed in the device. All 8.0 and 16.0 GT/s

transmitters must support for the equalization

procedure while it is optimal in the case of

receivers.

Figure 3: Tx Equalization FIR filter.

Tx voltage parameters include equalization

coefficients, equalization presets, and Max/ Min

voltage swings. The Tx coefficients are based

on FIR filter relationship as shown in figure 3.

Here we see the coefficients C-1, C0, and C+1

are called precursor, cursor and post-cursor

values. The pre-cursor voltage (Vc) is referred

to pre-shoot while that of for post-cursor (Vb) it

is de-emphasis. A low-frequency pattern within

the compliance pattern (selected with minimal

ISI) is used to measure the values of Voltage

swing and presets.

B. De-emphasis and Pre-shoot

Fig. 3 shows an example of the resultant output

signal when a binary input stream is applied to

a 3-tap FIR Filter. The output takes a different

value just before and after polarity inversion of

the input bit stream. Voltages Va, Vb, Vc, and Vd

correspond to De-emphasis, Flat level, Pre-

shoot and Maximum-boost events, respectively

as shown in figure 4.

whitepaper @ Logic Fruit Technologies P

A

G

Figure 4: Tx voltage levels and Equalization

ratios.

Behavioral Rx Equalization for 2.5 and 5.0 GT/s

is not needed. The combination of worst case

channel, behavioural Rx package, and Tx jitter

at 2.5 and 5.0 will yield open eye. In the case of

8.0 and 16GT/s, the stressed eye will be closed

making it unfeasible for measurement. We

follow behavioural receiver equalizer that

implements 1st order CTLE and 2nd order DFE

to overcome closed eye scenarios.

To measure Tx voltage parameters include

equalization coefficients, equalization presets,

and min/max voltage swings. A low-frequency

compliance pattern which has 64 zeros followed

by 64 ones is used to measure Tx voltage swing,

and equalization presets at 8.0GT/s, and 16GT/s.

Following are the list of presets all the devices

has to use for the 8GT/s and 16 GT/s.

Preset Preshoot

(dB)

Deemphasis(dB) C-1 C+1

P4 0.0 0.0 0.000 0.000

P1 0.0 -3.5 +/- 1 dB 0.000 -

0.167

P0 0.0 -6.0 +/- 1.5 dB 0.000 -

0.250

P9 3.5 +/- 1

dB

0.0 -

0.166

0.000

P8 3.5 +/- 1

dB

-3.5 +/- 1 dB -

0.125

-

0.125

P7 3.5 +/- 1

dB

-6.0 +/- 1.5 dB -

0.100

-

0.200

P5 1.9 +/- 1

dB

0.0 -

0.100

0.000

P6 2.5 +/- 1

dB

0.0 -

0.125

0.000

P3 0.0 -2.5 +/- 1 dB 0.000 -

0.125

P2 0.0 -4.4 +/- 1.5 dB 0.000 -

0.200

P10 0.0 - 0.000 -

Table 1: Presets for 8GT/s and 16GT/s.

Presets from P11 – P15 are reserved for future

development.

III. Equalization procedure for PCIe 4.0

PCIe 4.0 specification uses the similar adaptive

algorithm as 3.0 specification to adjust the

transmitter and receiver setup of each lane to

improve signal quality when operating at

8.0GT/s and higher data rates.

All the lanes that are associated with LTSSM

must participate in equalization procedure. The

process must be executed during the first data

rate change to 8.0 GT/s as well as for 16 GT/s.

Components must store the Transmitter setups

that was were agreed to during the equalization

procedures and use them for future operations

at 8.0 GT/s and higher data rates. Components

are permitted to fine-tune their Receiver setup

even after the equalization procedure is

complete as long as doing so does not cause

the Link to be unreliable.

The equalization procedure can be initiated

either autonomously or by software. It is strongly

recommended that components use the

autonomous mechanism. However, a

component that chooses not to participate in the

autonomous mechanism must have its

associated software ensure that the software-

based mechanism is applied.

whitepaper @ Logic Fruit Technologies P

A

G

Once the Transmitter and Receiver setup of each

Lane is adjusted for each common data rate

supported above 5.0 GT/s, The equalization

procedure is considered complete. The

downstream port is required to make the

transition from L0 to Recovery to change the

data rate and to perform equalization procedure.

Upstream is permitted but not required to make

this transition. Downstream must not advertise

16GT/s support in recovery if it entered with an

intention to perform 8.0 GT/s equalization

procedure. Upon completion of the 8.0 GT/s

speed change and equalization, only the

downstream can advertise the next data rate. So

it is mandatory to execute 8 GT/s data rate

equalization before running 16 GT/s

equalization.

If the Downstream Port wants to redo

equalization, it may also request the Upstream

Port to make a re-equalization request. For the

Downstream Port to make this request, it sets the

Request Equalization and Quiesce Guarantee

bits to 1b and sets the Equalization Request Data

Rate bit to the data rate at which the equalization

must be redone in the TS2 Ordered Sets in

Recovery.RcvrCfg. The Upstream Port may

eventually respond with a re-entry to Recovery

with the Request Equalization and Quiesce

Guarantee bits set to 1b and the Equalization

Request Data Rate bit set to the data rate

requested by the Downstream Port if it is capable

of doing so. To understand better about state

transition during linkup refer the “PCIe Express

Base Spec Ver 4.0.”

So as mentioned, state transition happens from

L0 to recovery.Rcvrlock and exits to L0 as it

completes equalization. State transition to this

equalization procedure is as below.

Figure 5: Equalization state machine transitions

From the above figure 5, we can see state

transition happening from path 1 to path 9. These

state changes occur according to paths

mentioned in the ideal case where the Ordered

Sets are transmitted and received without

significant errors, i.e., speed change and link

equalization complete successfully.

“directed_speed_change” and

“start_equalization_w_preset” variables are

enabled and disabled at their respective states to

complete proper state transition.

A. PATH 1:

L0 is the normal operating state where data

packets and top layer packets can be transmitted

and received, i.e., L0 support transport layer

packets and data link layer packets. L0 operation

can have data rate of 2.5 GT/s, 5 GT/s, 8GT/s or

whitepaper @ Logic Fruit Technologies P

A

G

16GT/s. When the device reaches L0 with the

data rate of 2.5GT/s or 5 GT/s or 8 GT/s to link

up for next highest common rate, downstream

has to initiate a state transition from L0 to

Recovery.Rcvrlock following path 1.

Downstream sets the “directed_speed_change”

variable to ‘1’. At Recovery.Rcvrlock, the

transmitter sends TS1 Ordered Set and also sets

speed change (Symbol 4, bit 7) to the same value

as “directed_speed_change” variable. Receiving

this speed change initiation from downstream,

upstream transits from L0 to Recovery.Rcvrlock

following path 1 for both downstream and

upstream.

B. PATH 2:

Device moves from Recovery.Rcvrlock to

Recovery.Rcvrcfg through path two once it

receives eight consecutive TS1 or TS2 Ordered

Sets on all lanes with link and lane numbers that

match what is being transmitted on these lanes.

C. PATH 3:

When the device is in Recovery.Rcvrcfg,

Transmitter sends TS2 Order sets on all the

configuration lanes with received link and lane

numbers. In the transmitted TS2, Transmitter

Preset and Receiver Preset Hint fields are set to

values as specified in Equalization Control

Registers. Also the speed change bit ( symbol 4,

bit 7) is set to the same value as

“directed_speed_change” variable. State

transition happens to the Recovery.Speed

through path three if eight consecutive EQ TS2

or 8GT EQ TS2 are received with speed change

(Symbol 4, bit 7) bit to ‘1’. In Recovery.Rcvrcfg

“start_equalization_w_preset” variable is set to

1b and received preset values are saved as initial

preset values.

D. PATH 4:

Once the device completes speed change,

“directed_speed_change” variable is reset to ‘0’

and device moves to Recovery.rcvrlock state

through path 4.

E. PATH 5:

State transition happens from recovery.rcvrlock

to recovery.equalization as the

“start_equalization_w_preset” variable is

already set ‘1’ when the device is in

recovery.Rcvrcfg state.

The Link equalization procedure enables

components to adjust the Transmitter and the

Receiver setup of each Lane to improve the

signal quality. Equalization method has four

phases as defined below. During equalization,

phase information is transmitted through

Equalization Control (EC) bits in TS Ordered

Sets.

Phase 0: The Upstream Port sends TS1 Ordered

Sets with EC = 00b (Symbol 6, bits 1:0) and

Preset value (Symbol 6, bit 6:3) it received in

EQTS2 symbols from Downstream Port during

Recovery.RcvrCfg sub-state. This Phase is not

applicable for the Downstream Port; it directly

starts with Phase 1. Next state for downstream is

phase 1 if it successfully receives two

consecutive TS1 Ordered Sets with EC = 01b.

Phase 1: In this Phase, both components make

the Link operational enough at 8.0 GT/s or above

data rates to be able to exchange TS1 Ordered

Sets to complete remaining phases for the fine-

tuning their Transmitter/Receiver pairs. In this

Phase, both components advertise their FS

(Symbol 7, bits 5:0) and LF (Symbol 8, bits 5:0)

values in the respective TS1 fields. The

Downstream Port first enters into Phase 1 and

starts transmitting TS1 Ordered Sets with EC =

whitepaper @ Logic Fruit Technologies P

A

G

01b and using the Preset values from each Lane’s

Equalization Control Register (Part of Secondary

PCI Express Extended Capability). After

receiving these TS1 OS with EC = 01b, the

Upstream Port transitions to Phase 1, where it

continues to transmit the same Preset values it

was transmitting in Phase 0. The Downstream

Port, after receiving these TS1 OS with EC = 01,

transitions to Phase 2.

Phase 2: In this Phase, the Upstream Port helps

the Downstream Port to fine tune its transmitter

equalization setting (Preset/Coefficient) until a

BER of less than 10-12 is achieved on all

downstream lanes. Bit error rate is calculated at

physical layer level. Multiple iterations may be

performed to obtain the optimum equalization

Settings The Downstream Port first enters into

Phase 2 and transmits TS1 Ordered Sets with

EC=10b. For the first iteration, the Preset values

are kept same as in Phase 1. For the subsequent

iterations, Presets/Coefficient values are same as

in the Preset/Coefficient change request it

received from the Upstream Port in this Phase.

Use_Preset field (Symbol 6, bit 7) is used to

identify whether the current request is to change

the Preset or the Coefficients.

If Use_Preset = 1, then the current

request is for the Preset change and the

requested Preset is reflected in the

appropriate field (Symbol 6, bits 6:3).

If Use_Preset = 0, then the current

request is for the Coefficient change and

the requested Preset is reflected in the

appropriate fields [Pre-Cursor (Symbol 7,

bits 5:0), Cursor (Symbol 8, bits 5:0),

Post-Cursor (Symbol 9, bits 5:0)]. The

Upstream Port evaluates the received

TS1 OS and may request the Port on the

other side of the link to change Preset or

Coefficients.

The entire process of requesting a different Preset

or coefficients and evaluating the received TS1

OS is repeated until the Upstream Port is satisfied

that the required BER is achieved on the

downstream lanes. Then the Upstream Port

transitions to the next Phase (Phase 3).

Phase 3: This Phase is similar to Phase 2 except

the difference that the roles of the Downstream

and Upstream Ports are interchanged. In this

Phase, the Downstream Port helps the Upstream

Port to fine tune its transmitter equalization

setting (Preset/Coefficient) until a BER of less

than 10-12 is achieved on all upstream lanes.

F. PATH 6:

Once the equalization is successful as instructed

by top layers, the device moves from

Recovery.equalization to Recovery.rcvrlock.

G. PATH 7:

Once the condition of receiving eight

consecutive TS OS is satisfied, device then

moves to Recovery.rcvrcfg.

H. PATH 8:

As the “directed_speed_change” variable is set

to ‘0’ after successful speed change and speed

change bit (Symbol 4, bit 7) in TS received is ‘0’,

the device moves from Recovery.rcvrcfg to

Recovery.idle.

I. PATH 9:

Top layers direct next state to the device by

setting corresponding register. Let’s say the

speed is in 8GT/s but the maximum advertised

and received speed is 16 GT/s, device moves to

whitepaper @ Logic Fruit Technologies P

A

G

L0 without considering directed top layer

registers, else if we achieve common rate, device

moves to the state as directed.

IV. ORDERED SETS

For above-described equalization, we use mainly

five types of Ordered Sets. At 128/130 bit

encoding for rates 8GT/s and above, we have 128

bit (16 symbols) and two header bits. Symbol

description of each Ordered Set is as below.

A. EIEOS: Electrical Idle Exit Ordered Set

The Electrical Idle Exit Ordered Set (EIEOS) is

transmitted only when operating at speeds other

than 2.5 GT/s. It is a low-frequency pattern

transmitted periodically to help ensure that

receiver Electrical Idle exit circuitry can detect

an exit from Electrical Idle. It is transmitted once

every 32 frames.

Symbol numbers Value

Even symbols 00h

Odd symbols FFh Table 2: EIEOS for 8GT/s and 16GT/s.

B. EIOS: Electrical Idle Ordered Set

Before a Transmitter enters Electrical Idle, it

must always send the Electrical Idle Ordered Set

(EIOS), unless otherwise specified.

At 2.5GT/s and 5GT/s, it is a COM symbol

followed by three IDL symbols. At 8GT/s and

above rates, its value is 66h.

C. SKP: Skip Ordered Set

SKP Ordered Sets are used to compensate for

differences in frequencies between bit rates at

two ends of a Link. Receiver Physical layer sub-

block must include elastic buffering which

performs this compensation. SKP can be 8, 12,

16, 20 and 24 symbols as the receiver can add or

remove 4 symbols from the received data.

Information on last four symbols can be LFSR,

Data parity or Error status. The values it carries

changes with LTSSM states. Refer the “PCIe

Express Base Spec Ver 4.0.”

D. TS1 & TS2:

TS1 and TS2 are the training sequences used by

both the devices doing link training to

communicate the capabilities. As explained

before, equalization also uses bits inside symbols

of TS Ordered sets.

TS1

TS2 Symbol 0

1Eh 2Dh

Symbol 1

Link number Link number

Symbol 2

Lane number Lane number

Symbol 3

N_FTS N_FTS

Symbol 4

Data rate identifier Bit 0 – Reserved Bit 1 – 2.5 GT/s Data Rate Supported. Must be set to 1b. Bit 2 – 5.0 GT/s Data Rate Supported. Must be set to 1b Bit 3 – 8.0 GT/s Data Rate Supported. Must be set to 1b Bit 4 – 16.0 GT/s Data Rate Supported. Bit:5 – Reserved.

Data rate identifier Bit 0 – Reserved Bit 1 – 2.5 GT/s Data Rate Supported. Must be set to 1b. Bit 2 – 5.0 GT/s Data Rate Supported. Must be set to 1b Bit 3 – 8.0 GT/s Data Rate Supported. Must be set to 1b Bit 4 – 16.0 GT/s Data Rate Supported. Bit:5 – Reserved. Bit 6 – Autonomous Change/Selectable De-emphasis.

whitepaper @ Logic Fruit Technologies P

A

G

Bit 6 – Autonomous Change/Selectable De-emphasis. Bit 7 – speed_change.

Bit 7 – speed_change.

Symbol 5

Training Control Bit 0 – Hot Reset Bit 0 = 0b, De-assert Bit 0 = 1b, Assert Bit 1 – Disable Link Bit 1 = 0b, De-assert Bit 1 = 1b, Assert Bit 2 – Loopback Bit 2 = 0b, De-assert Bit 2 = 1b, Assert Bit 3 – Disable Scrambling in 2.5 GT/s and 5.0 GT/s data rates; Reserved in other data rates Bit 3 = 0b, De-assert Bit 3 = 1b, Assert Bit 4 – Compliance Receive Bit 4 = 0b, De-assert Bit 4 = 1b, Assert Bit 5:7 – Reserved

Training Control Bit 0 – Hot Reset Bit 0 = 0b, De-assert Bit 0 = 1b, Assert Bit 1 – Disable Link Bit 1 = 0b, De-assert Bit 1 = 1b, Assert Bit 2 – Loopback Bit 2 = 0b, De-assert Bit 2 = 1b, Assert Bit 3 – Disable Scrambling in 2.5 GT/s and 5.0 GT/s data rates; Reserved in other data rates Bit 3 = 0b, De-assert Bit 3 = 1b, Assert Bit 4:7 – Reserved

Symbol 6

Bit 1:0 – Equalization Control (EC). This field is only used in the Recovery.Equalization and Loopback LTSSM states. It must be set to 00b.

Bit 4:0 – Reserved. Bit 5 – Equalization Request Data Rate. This bit is defined for use in the Recovery.RcvrCfg LTSSM state. In all other LTSSM states, it is Reserved. Bit 6 – Quiesce

Bit 2 – Reset EIEOS Interval Count. This bit is defined for use in the Recovery.Equalization LTSSM state. In all other LTSSM states, it is Reserved. Bit 6:3 – Transmitter Preset. Bit 7 – Use Preset/Equalization Redo. This bit is defined for use in the Recovery.Equalization, Recovery.RcvrLock and Loopback LTSSM states.

Guarantee defined in the Recovery.RcvrCfg LTSSM state. In all other LTSSM states, it is Reserved. Bit 7 – Request Equalization is defined for use in the Recovery.RcvrCfg LTSSM state. In all other LTSSM states, it is Reserved.

Symbol 7

Bit 5:0 – FS when the EC field of Symbol 6 is 01b. Otherwise, Pre-cursor Coefficient Bit 7:6 – Reserved.

Bit 2:0 – 16.0 GT/s Receiver Preset Hint. Bit 6:3 – 16.0 GT/s Transmitter Preset. Bit 7 – 1b.

Symbol 8

Bit 5:0 – LF when the EC field of Symbol 6 is 01b. Otherwise, Cursor Coefficient Bit 7:6 – Reserved.

45h

Symbol 9

Bit 5:0 – Post-cursor Coefficient Bit 6 – Reject Coefficient Values Bit 7 – Parity (P).

45h

Symbol 10

4Ah 45h

Symbol 11

4Ah 45h

Symbol 12

4Ah 45h

Symbol 13

4Ah 45h

whitepaper @ Logic Fruit Technologies P

A

G

Symbol 14

4Ah or DC balance

45h or DC balance.

Symbol 15

4Ah or DC balance

45h or DC balance.

Table 2: TS1 and TS2 Ordered Sets.

V. Future PCIe:

About Logic fruit technologies: