Download - Equalizing Filter Design for Cross-talk Cancellation · 2011. 11. 22. · designing cross-talk canceling equalizing ﬁlters that greatly increase the bandwidth of high speed digital

Equalizing Filter Design for Cross-talk Cancellation

by

Jihong Ren

B. Sc. (Electrical Engineering), Huazhong University of Science and Technology, 1995

M. Eng. (Electrical Engineering), Huazhong University of Science and Technology, 1998

M. Sc. (Neuroscience), The University of British Columbia, 2000

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

Master of Science

in

THE FACULTY OF GRADUATE STUDIES

(Department of Computer Science)

we accept this thesis as conformingto the required standard

The University of British Columbia

June 2002

c�

Jihong Ren, 2002

Abstract

As interconnect line width and spacing decreases and operating clock rate increases, in-

terconnect has become a bottleneck in developing high-speed integrated circuits, multichip

modules, printed circuit boards, and systems. With small line spacing, mutual capacitance

and inductance approach the level of self-capacitance and inductance, and can severely de-

grade signal integrity. The well-known equalizing filter method can significantly improve

signal integrity. This thesis explores the effectiveness of equalizing filters in cross-talk can-

cellation for high-speed, off-chip buses. It demonstrates that linear programming provides

effective methods for designing cross-talk canceling equalizing filters that greatly increase

the bandwidth of high-speed digital buses.

ii

Contents

Abstract ii

Contents iii

List of Tables vi

List of Figures vii

Acknowledgments ix

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Method and Proposed System Structure . . . . . . . . . . . . . . . . . . . 2

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Background 6

2.1 Transmission channel limitations . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Design Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

iii

2.2.2 Application of equalizing filters in cross-talk cancellation for the

local telephone subscriber loop . . . . . . . . . . . . . . . . . . . . 12

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Coupled Distributed RLC Interconnect Model 14

3.1 Interconnect Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Bus parameters and Simulation results . . . . . . . . . . . . . . . . . . . . 18

4 Linear Equalizing Filter Design 20

4.1 Measurements of filter performance . . . . . . . . . . . . . . . . . . . . . 20

4.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.2 Matrix Representations of Convolution . . . . . . . . . . . . . . . 25

4.3 Least Squares Optimization Method with Pseudo-random Input . . . . . . . 30

4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3.2 Least Square problem formulation . . . . . . . . . . . . . . . . . . 30

4.3.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Linear Programming Method with Worst-case Input . . . . . . . . . . . . . 38

4.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4.2 Linear Programming Problem formulation . . . . . . . . . . . . . . 44

4.4.3 Smoothing filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.5 Testing results: Comparison of LSQ method and LP method . . . . . . . . 49

4.5.1 Worst-case input sequence . . . . . . . . . . . . . . . . . . . . . . 49

4.5.2 Indirect coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.5.3 Over-fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

iv

4.5.4 Minimum bit time . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.6 Time-variant Linear FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . 62

4.7 Optimized Smoothing Filter . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5 Predictor-Corrector Algorithm with Model Reduction 67

5.1 Mehrotra’s predictor-corrector algorithm . . . . . . . . . . . . . . . . . . . 68

5.2 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.1 Starting and Stopping . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2.2 Solving the linear systems . . . . . . . . . . . . . . . . . . . . . . 72

5.3 Ill-conditioning and Model Reduction . . . . . . . . . . . . . . . . . . . . 72

5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6 Conclusions and Future Work 77

6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Bibliography 81

v

List of Tables

4.1 Performance of equalizing filters with different sizes for a bus 32-bits wide

and 5 cm long. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Performance of equalizing filters with different sizes for buses 32-bits wide.

All filters designed using the LP method. . . . . . . . . . . . . . . . . . . . 61

4.3 Performance of different smoothing filters with �� equalizing filters de-

signed by the LP method at 300 ps. . . . . . . . . . . . . . . . . . . . . . . 65

5.1 linprog() iteration display . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.2 Iteration display of our approach: Mehrotra interior-point method with

model reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

vi

List of Figures

1.1 Proposed transmission network structure. . . . . . . . . . . . . . . . . . . 3

2.1 A coupled microstrip transmission line. . . . . . . . . . . . . . . . . . . . 7

2.2 Simple lumped model for two coupled interconnects . . . . . . . . . . . . 7

2.3 Block diagram of an equalized transmission channel (from [3]). . . . . . . 9

2.4 Simplified model for full-duplex transmission over a linear multi-input/multi-

output channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1 Analytical solution from equation 3.16 vs. Spice simulation results . . . . . 19

4.1 An illustrative eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.2 Example of a data eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 Predistorted signal: equalizing filter output . . . . . . . . . . . . . . . . . . 39

4.4 Examples of output signal for 32-bit interconnect network . . . . . . . . . 40

4.5 Eye-diagrams for a 32-bit interconnect network . . . . . . . . . . . . . . . 41

4.6 Frobenius norm of the bus impulse response. . . . . . . . . . . . . . . . . 46

4.7 System with smoothing filter at the receiver end. . . . . . . . . . . . . . . . 48

4.8 Example of output signals for systems with and without the equalizing filter

designed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

vii

4.9 Pseudo-random test: eye diagrams for systems with and without the equal-

izing filter designed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.10 Worst-case test vs. Pseudo-random test . . . . . . . . . . . . . . . . . . . . 53

4.11 Worst-case performance of different equalizing filters designed with the LP

method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.12 Indirect coupling between non-adjacent lines . . . . . . . . . . . . . . . . 55

4.13 Eye diagram for system with � � ��equalizing filters designed by the LP

method. Grey traces indicate high signal transmitted. Black traces indicate

low signal transmitted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.14 Magnitude of overshoot increases with the size of the equalizing filter de-

signed with the LP method . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.15 The convolution procedure of the time-variant FIR filter . . . . . . . . . . . 63

viii

Acknowledgments

First of all, I would like to thank my supervisor Dr. Mark Greenstreet. This thesis would not

have been possible without his inspiration, extensive support , patience and encouragement.

I also would like to thank my husband, Rui Li, for his consistent support.

JIHONG REN

The University of British Columbia

June 2002

ix

Chapter 1

Introduction

1.1 Motivation

Advances in digital integrated circuit (IC) fabrication technology have resulted in an ex-

ponential growth for the speed and integration levels of ICs. With more and more circuits

placed on each die, high-performance systems require larger and larger I/O bandwidth. This

demand has been addressed by increasing the number of high-speed signals and the per-pin

interconnection bandwidth. Although the number of I/Os has increased from��

pins in the 1970s, to several hundred pins per IC now [18], this growth is being rapidly

out-paced by the bandwidth demands. To continue to improve overall system performance,

the per-pin interconnection bandwidth must scale with the speed and integration level of

ICs. However, without new approaches, we will soon reach the limit set by the intrinsic

properties of copper lines.

The number of I/Os increases by 12% per year, half of which is due to the increase

in chip perimeter and half of which is due to the increase in pin density. On chip, both the

number of devices and clock rates have increased at 50-60% per year, creating a growing

1

bandwidth gap. Higher bit-rates and pin densities have come to a point that interconnections

are no longer well-behaved short interconnections. With the decreasing cross sectional ar-

eas of interconnections, the line resistance per unit length has increased to a point that long

interconnections can no longer be considered lossless. Resistive effects are particularly se-

vere at high bit-rates because of both the high frequency roll-off of RC transmission lines

and the increase of resistance with frequency due to the skin effect. To achieve maximum

packing density, designers attempt to place signal lines as close to each other as possible.

This introduces problems of electromagnetic coupling (cross-talk) which are exacerbated

by high data rates. Cross-talk has become a critical issue in interconnect performance and

hence overall system performance. Traditionally, cross-talk is reduced by carefully control-

ling line geometry and arranging circuits to decrease the coupled line length. Moreover,

signaling conventions that are less susceptible to coupled energy can be used. These meth-

ods reduce cross-talk in a somewhat ad-hoc way. For example, as a rule-of-thumb, a ratio of

two-to-one for line spacing against line width is commonly used, based on the assumption

that cross-talk decreases monotonically with the increase in line spacing. However, this

simple assumption can fail for high bit-rate design. The relationship between line spacing

and line width is non-linear, and a two-to-one ratio between width and spacing may actu-

ally result in higher coupled energy than smaller line spacing [11][20]. Furthermore, while

these methods might reduce the amount of cross-talk, the problem of cross-talk still exists.

New approaches in cross-talk reduction are needed.

1.2 Method and Proposed System Structure

Equalizing filters have been used effectively for cross-talk cancellation in acoustic applica-

tions such as telephone line subscriber system [1][6][7]. Recently, they have been used to

compensate for the frequency-dependent attenuation of transmission lines [2].

2

0

0

Transmitter

filter

filter

filter

filter

Bus

Filter Network

Receiver

Figure 1.1: Proposed transmission network structure.

This thesis explores the effectiveness of equalizing filters in cross-talk cancellation

for high-bandwidth, digital communication. The proposed system structure is depicted in

figure 1.1. In this transmission system, an equalizing filter is assigned to each wire of the

bus. Each filter takes the input signals on a wire and its adjacent wires as its inputs, and

outputs a predistorted signal onto the wire. For a � -bit bus, the filter system can be viewed

as a � �� network. Cross-talk is eliminated if the filter network is designed in a way that

the concatenation of the filter network and the bus has frequency response in the form of a

diagonal matrix.

Several optimal filter design strategies are explored, such as the linear programming

method and the least-squares method. Matlab simulation results show that the resulting

3

filters dramatically reduce cross-talk and substantially increase the maximum bandwidth

that can be achieved by buses on PC boards. Thus, the equalizing filter method is promising

for cross-talk cancellation and merits further investigation.

1.3 Contributions

This thesis demonstrates that linear programming models provide effective methods for

designing cross-talk canceling equalizing filters that greatly increase the bandwidth of high

speed digital buses on printed circuit boards. The following are the major contributions

supporting this thesis:

� Equalizing filter design for high speed digital buses can be formulated as a least

squares optimization problem, using a��

metric for optimality. This metric ensures

the quality of the received signal “on average”.

� The��

metric corresponds to the traditional eye height measurement of signal in-

tegrity and guarantees worst-case performance. The filter design problem for� �

optimality can be formulated as a linear programming problem.

� An evaluation of the linear programming and least squares methods for a variety of

filter configurations shows that both offer a dramatic increase in bandwidth when

compared with a bus with no filter or with transmitter pre-emphasis without cross-

talk cancellation. Furthermore, the filters designed for the� �

optimality criterion

using linear programming significantly outperform their counterparts designed by

traditional, least-squares method, when evaluated for digital data transmission.

� To evaluate these methods, I implemented them both using Matlab. In doing so,

I found that Matlab optimization package does not always converge for the linear

4

programming problems presented in this thesis. Therefore, I implemented an interior-

point method with a model reduction technique that successfully solves the linear

programming problems encountered.

1.4 Thesis Outline

In this thesis, Chapter 2 introduces the equalizing filter technique and its existing applica-

tions. Chapter 3 describes a coupled distributed RLC model for transmission lines. Based

on this model, Chapter 4 discusses various techniques, such as least squares and linear

programming, that I explored to design optimal linear FIR equalizing filters. Chapter 5 is

devoted to Mehrotra’s interior point method with a model reduction technique that is used

to solve our particular linear programming problem introduced in Chapter 4.

5

Chapter 2

Background

Computer system performance is often limited by communication bandwidths between

chips and between subsystems. A typical signaling system consists of a transmitter, a chan-

nel, and a receiver. The transmitter encodes digital information as analogue waveforms on

the transmission channel, such as a circuit board trace. On the other end of the transmission

channel, the receiver samples and quantizes the signal to recover the original digital infor-

mation. Although we often think of transmission channels such as wires as being ideal by

having zero resistance, capacitance and inductance, real wires are not ideal but rather par-

asitic circuit elements whose geometry affects their electrical properties. Moreover, with

small line spacing, inductive and capacitive cross-talk can severely degrade signal integrity.

With the growth in integration levels, the interconnect line width and spacing decreases,

and interconnect has become a bottleneck in high-speed digital designs.

This chapter first discusses the channel characteristics, particularly PC board traces.

I then provide background on the equalizing filter technique and an overview of its related,

existing applications.

6

t

sw w

h

Figure 2.1: A coupled microstrip transmission line.

Figure 2.2: Simple lumped model for two coupled interconnects

2.1 Transmission channel limitations

Transmission channels, such as PC board traces and coaxial or twisted-pair cables, have

limited bandwidths that are determined by their physical characteristics: the size and con-

struction of their conductor and shield, and the dielectric material. In this thesis, I am

particularly interested in high-speed interconnect on PC boards. Thus, the following dis-

cussion focuses on PC board traces. Figure 2.1 shows typical microstrip interconnections.

A simple lumped model for two coupled interconnects is shown in figure 2.2.

The resistance per unit length of a trace is given by the conductance of the trace ma-

terial (typically copper) divided by the cross-sectional area of the trace. The cross-sectional

area is the product of the width of the trace and its thickness. The width is determined by the

design. The thickness is specified when the board is manufactured: thickness is specified

in ounces of copper per square yard. A board with 1 oz copper has a conductor thickness

7

of roughly 35 microns. More accurate models consider the skin effect: at high frequencies,

currents flow closer to the surface of the trace, resulting in a frequency-dependent increase

in the series resistance [10][3].

The capacitance per unit length ( � ) and the inductance per unit length (�) of a mi-

crostrip trace are determined by many factors including its width and height and its separa-

tion from the ground plane. Electric and magnetic fields between adjacent traces lead to the

coupling capacitance, �� , and the mutual inductance,� � , respectively.

For PC board traces, the loss in transmission is primarily due to the series resistive

component of the copper ( � ). Because of this loss, without a special transmission scheme,

off-chip signaling on long wires, even with good current-mode signaling methods, is limited

to about 1GHz [2]. Full-swing unterminated signaling methods that are used in most digital

systems have even lower limits. With narrow wires and smaller line spacing, the coupling

inductance and capacitance between adjacent lines approach the level of self-inductance and

capacitance. In high speed circuits, because of fast signal rise times, coupling effects are

severe and have become a primary concern for present and future high-speed high-density

circuit design. Besides the resistive properties of the line, the coupling effects further limit

the maximum bit-rate at which data can be transmitted correctly.

2.2 Equalization

An ideal transmission channel would in all cases deliver the near end signal � in��

from

the driver without distortion to the far end receiver, i.e. � out�� in �� , where

��is

the propagation delay across the channel. Thus, an ideal channel would have the transfer

function �� , where � � � � and � is the identity matrix. If an equalizing filter has a

transfer function that equals the inverse of the transfer function of the channel, the concate-

nation of the equalizer and the channel has a flat frequency and phase response. This is the

8

Transmitter

Equalizer

G(s)

Channel

H(s)

Figure 2.3: Block diagram of an equalized transmission channel (from [3]).

equalization technique widely used to actively compensate for the channel transfer func-

tion. Channel equalization can be performed at the transmitter end, as shown in figure 2.3,

preceding the actual channel driver. Transmitters that utilize equalizing filters are called

pre-distorting transmitters. The equalizing filter can also be incorporated into the receiver,

called receiver equalization. It can also be split between the two ends.

� Pre-distorting Transmitters

Pre-distorting transmitters integrate equalizing filters, commonly realized as finite

impulse response (FIR) digital filters. While infinite impulse response (IIR) [9] fil-

ters can be more flexible than FIR, they are generally not used for high data rate

transmission because of the difficulty of calculating the IIR recurrence (i.e. feed-

back) at very high rates. The inputs to the equalizing FIR filters are the present and

past transmitted symbols. The output of the FIR filter is a weighted sum of these

symbols. The length of the filter depends on the number of symbols that affect the

response of the channel to the current symbol. The filter coefficients depend on the

channel characteristics.

Pre-distorting transmitters were first used by Poulton et al. [2] in a serial channel over

copper wires at 4Gb/s to reduce intersymbol interference caused by frequency depen-

9

dent attenuation of the channel. Later, other groups [4][17] used the same technique

to design high-speed serial link transceivers. FIR equalizing filters built into trans-

mitters are easy to implement at very high speed because of the availability of trans-

mitted symbols at the transmitter end. Furthermore, because the transmitted symbols

are either 1s or 0s, multiplication with the filter coefficients is easy. For example, in

[2], a five-tap FIR filter is implemented with digital adders, and a digital-to-analog

converter (DAC) is used to generate pre-distorted pulses. However, because trans-

mitters generally don’t have information of received signals, FIR filter coefficients

are obtained either by characterization of channel properties in advance [2][4], or by

adaptive implementation with feedback information from the receiver end [17].

� Receiver Equalization

Receiver equalization can be realized either with analog filters preceding the analog-

to-digital converter (ADC) or with digital filters following the ADC. The latter one is

the usual technique because digital filters are easy to implement and adapt. Moreover,

more complex and non-linear filters can be implemented. However, it is well-known

that receiver equalization amplifies high frequency noise [8]. Furthermore, histori-

cally, high speed ADC technology is behind high speed DAC technology. Therefore,

pre-distorting transmitters are commonly used in high speed transmission systems

that run at GHz speed. Recently, Horowitz’s group realized 8-Gsamples/s ADC in

0.25 � m CMOS, which makes high speed links with equalization at the receiver end

possible [19].

2.2.1 Design Methods

The following are two methods that are currently used to design equalizing filters.

10

� Zero-forcing method

The transfer function � of the channel can be derived from models established for

each particular channel (reviewed in [18]). The frequency response of the channel

and also the desired frequency response of the equalizing filter is then calculated at

each frequency point. This set of discrete points is used to obtain a discrete impulse

response function using inverse Fourier transform. The following two steps are used

to obtain a more manageable impulse response function.

– Windowing: � �� where � �� is the desired impulse response

and� ��

is the windowing function. This step is needed to obtain a filter with

a finite number of taps.

– Delaying: � �� is shifted to the right until the samples are all indexed by a

non-negative integer to obtain a causal filter.

In practice, large windows must be used to obtain effective equalizing filters. Ac-

cordingly, many researchers have turned to using optimization methods to obtain

good approximate equalizing filters. This is the approach that I take in this thesis.

� Least Squares Minimization

With an ideal transmission channel, the received signal is a delayed version of the

transmitted signal. Using least squares minimization, the equalizing filter design

problem is equivalent to the problem of designing equalizing filters to determine the

values for the filter coefficients that minimize the� �

norm of the difference between

the received signal and the delayed version of the transmitted signal.

This method is used in optimal pre-emphasis equalizing filter design in [2][19] to

build serial links that operate at over 1 Gigabits per second. Also it is widely used to

design equalizers for telephone subscriber systems [1][6][7].

11

receiver

b(t)

a(t)

P(t)transmitter filter

G(t)

channel

filtertransmitter

P(t)

channelfar−end

H(t)

near−end

filter

R(t)

n(t)

Figure 2.4: Simplified model for full-duplex transmission over a linear multi-input/multi-output channel. � �� are the impulse responses of the far-end channel,near-end channel, transmitter filter and receive filter respectively.

2.2.2 Application of equalizing filters in cross-talk cancellation for the local

telephone subscriber loop

Equalizing filters are used to reduce intersymbol interference caused by the characteristics

of a single channel [2][4][17][19]. Until now, no work has been reported on the application

of equalizing filters in cross-talk cancellation for high speed buses that run at multi-Gb/s.

Along with the limited bandwidth of transmission channels, cross-talk is another critical

problem that limits the maximum data rate that can be achieved by high density wide buses.

Local telephone subscriber loops have the same problem. Bundles of twisted copper wires

are used in local telephone subscriber loops. Because of the close physical proximity, cross-

talk interference from neighbouring channels is one of the major limitations on the max-

imum data rate that can be achieved over the loops [7]. Multichannel equalization can

effectively suppress both near- and far-end cross-talk [6][7].

In these papers, a cable of twisted pairs that is terminated at a single physical loca-

tion is treated as a single multi-input/multi-output channel. Cross-talk is then characterized

by off-diagonal components of the matrix impulse response of the channel. The multichan-

nel adaptive FIR equalizers, the transmitter and the receiver process the entire vector of

12

inputs and outputs (see figure 2.4). Rather than directly diagonalizing the system trans-

fer function matrix, the multichannel equalizers are designed to minimize the� �

norm of

the difference between the received signal and the transmitted waveform. In Salz’s work

[16], the minimum mean square error (MMSE) linear equalizer for the � �� channel is

completely specified, assuming uncorrelated data and white noise. Later, Honig et al. [6]

generalized Salz’s work by assuming correlated data symbols, pulse amplitude modulation

(PAM) signals and colored noise.

2.3 Summary

The equalization technique has been successfully used to compensate for resistive effects of

transmission lines [2][4][17]. With this technique and carefully chosen signaling methods,

multi-Gb/s serial links have been built. Equalization is also commonly used in telephone

subscriber systems to cancel near-end and far-end cross-talk [7][1][6]. In this thesis, I

explored the effectiveness of the equalization technique in cross-talk cancellation for high-

speed, off-chip buses. Moreover, besides the least squares optimization technique that is

commonly used to design equalizing filters, this thesis is the first work that formulates the

optimal equalizing filter design problem into a linear programming problem for high speed

digital buses.

13

Chapter 3

Coupled Distributed RLC

Interconnect Model

3.1 Interconnect Model

An electrical model of a uniform transmission line has inductance�, resistance � , capaci-

tance � and parallel conductance � , all per unit length. The term � models the effects of

current leakage and is practically zero for most digital transmission on integrated circuit

and printed circuit boards.

We would like our system be able to operate at bit rate greater than 2 Gbits/sec.

Assuming that the rise and fall times are 10% of the bit time, edges have an electrical length

of �� = Rise time (ps)/Delay (ps/cm) = 50 (ps)/33 (ps/cm) = 1.51 cm, where 33 ps/cm is

the speed of light in a vacuum. The propagation delay of signals traveling in other media

such as a PCB trace is larger [10], and thus the corresponding electrical length would be

even smaller. For example, the common FR-4 printed circuit board material has a dielectric

constant of about 4.5 and propagation delay about 71 ps/cm. The electrical length of a bit at

14

2Gbits/sec is 0.7 cm. As a rule of thumb, distributed models should be used when the wire

length is greater than or equal to � �� . Thus the critical dimension separating lumped from

distributed systems for printed circuit board is 0.117 cm. The wire lengths we consider here

are in the range of 2�

50 cm. Thus a distributed model is needed to correctly model the

behavior of this system at multigiga bit/sec data rate. Assuming the TEM mode of wave

propagation, for a lossy multiconductor system of�

wires, we have��

�inductance matrix�

, capacitance matrix � and resistance matrix�

, where�� , � �� is the mutual inductance

and coupling capacitance between line � and � respectively. For simplicity, the following

assumptions are made:

� Coupling between lines is entirely due to mutual inductance and mutual capacitance.

There is no conductance between wires of the bus or between wires of the bus and

ground. Only coupling between adjacent lines are taken into account. We ignore

direct coupling between wires of the bus that are not adjacent.

� Every wire is assumed to have the same characteristics.

� Wires are assumed to be arranged around a cylinder so that every wire is the same as

others.

With the above assumptions, the�

and�

matrices are shown below. The capacitance

matrix � has the same structure as�

.

�

�

� � � � � � � �� ...

......

� � � � � � �

��

�

�

� � � � � �� ...

......

� � � � � �

��

(3.1)

15

The behavior of this distributed system can be described by the following partial

differential equation, where voltage vector � and current vector � are both functions of

position � and time�. � ��

� � � � � � �� (3.2)� ��

� �� (3.3)

Taking the Fourier transformation of these equations yields:

�� (3.4)

� �� (3.5)

where�

is the Fourier transform of � , � is the Fourier transform of � , and � � � � .Differentiating equation 3.4 with respect to � and substituting equation 3.5 into the result

gives � � ��

� � �� (3.6)

Let � �� . Let � be a diagonalizing matrix for � , i.e., � �� is the diag-

onal matrix�

whose diagonal elements are the eigenvalues of � . Rewriting equation 3.6

with � yields:

� ��

� � � �� (3.7)

Let�� and

� � �� , we get

� � ��

� � � �(3.8)

This differential equation has the general solution

��

� ��(3.9)

16

For a bus with non-zero resistive and capacitive or inductive components, the elements of�

and� �

are complex numbers. Combining equation 3.9 with the definition of� �

yields:

� � � � ��

� �� (3.10)

Assuming all source ends are terminated with an impedance of� ��

and the load ends are

left open, we have the following boundary conditions.

�� length �

� (3.11)� �� (3.12)

Combined with equation 3.4 and 3.10, the first boundary condition given above yields:

� ��

� � ��length (3.13)

From equation 3.10, we know that:

� � ��

�� (3.14)

Thus, equation 3.12 yields:

� � � � � � � �� (3.15)

Equations 3.13, 3.15 yield the final solution

� � � � ��

� �� (3.16)

with

� � � � ��

length � � � � �� length � � � � ��

� ��

length � � (3.17)

where � is the identity matrix. Note that� �

� ��, and

� � � � � � �. Thus, the

frequency response of the bus is:

� � � � ��

length � � � � ��

length � (3.18)

17

with � � � �

defined in equation 3.17. The inverse Fourier transform yields the impulse

response of the bus which is used extensively in the next chapter. Note that the frequency

response of the bus is a square matrix at each frequency. The impulse response of the bus

is also a square matrix at each time sample. Entry� � � � � at time

�denotes the response on

wire � at time�

given an impulse input on wire � at time � .

3.2 Bus parameters and Simulation results

I validated the model derived above by comparing its prediction with Spice simulations.

Figure 3.1a shows the solution of equation 3.16 using Matlab and figure 3.1b shows spice

simulation results. The parameters used in both simulation are: bus width = 3, length =

5 cm, � = 0.066 ohm/cm, � = 0.8 pF/cm,�

= 3.99 nH/cm,� � � � = 0.31, � � � � = 0.23, � � �

= 5.0 V, bit time = 500 ps,��

= 10% *bit time = 50 ps. These parameters correspond to

microstrip lines 34.5 � m thick (1 oz copper), 75 � m wide with 75 � m separation between

lines, running above a ground plane with a dielectric thickness of 100 � m, and a dielectric

constant of ��

4.5. The bus parameters are computed using formulas given in [10].

18

Figure 3.1: Analytical solution from equation 3.16 (upper panel) vs. Spice simulationresults (lower panel) of 3-bit bus. All lines are quiet except line 1.

19

Chapter 4

Linear Equalizing Filter Design

In this chapter, I present techniques for the design of linear equalizing filters. I first in-

troduce the idea of a data eye and its use to quantify filter performance. The next section

defines notations that simplify the mathematical presentation of linear equalizing filter de-

sign. Then, I introduce the least squares (LSQ) method and the linear programming (LP)

method, followed by test results. Finally, based on the linear FIR filter designs, time-variant

FIR filter design and optimal smoothing filter design are discussed.

4.1 Measurements of filter performance

The effects of distortion and noises are often illustrated using eye diagrams. An illustrative

eye diagram is shown in figure 4.1. It is called eye diagram because of its shape. During

sample interval, signal is either distinctly high or distinctly low. It must not go through the

center of the eye. This allows the receiver to unambiguously determine the value of the

bit that was transmitted. The signal can change between sampling intervals. I also restrict

how high (or low) the signal may go, otherwise, with scaling any eye opening can be made

20

eye width, w

target

v(t)

Bad

Good

Good

Bad

low

targethigh

SampleInterval

Bad

Bad

IntervalSampleNext

hunder

overh

t

Figure 4.1: An illustrative eye diagram.

arbitrarily large. Eye height height is defined as

height�� under

� ��target

� � over�

(4.1)

where � under and � over are defined in figure 4.1. The eye height and width are often used as

an indication of signal integrity. Figure 4.2 shows how a data eye is formed by overlaying

a signal waveform over multiple cycles.

The eye width, � in figure 4.1, is the time that the separation between high-going

and low-going signals is greater than zero. In practice, the receiver will attempt to sample

the signal near the moment of the widest eye opening. Due to uncertainties in the timing of

the transmitter and receiver and in the delay of the interconnect, the actual sampling may

occur at some time other than this ideal. The eye-width gives an indication of the robustness

of the interface to these timing uncertainties.

In this thesis, the effectiveness of a filter is quantified in the three following ways:

� eye height of the output signal given a pseudo-random input sequence.

21

Figure 4.2: Example of a data eye. Upper panel shows a random signal. Its correspondingeye diagram is shown in the lower panel.

22

� eye height of the output signal given the worst-case input sequence.

� the smallest bit time (or highest bit rate) at which the eye height of output signals

is greater than a specified amount, e.g. 50% of the nominal signal level and the eye

width is greater than another specified amount, e.g. 25% of the bit time.

4.2 Preliminaries

By defining some notation up-front, the presentation of the filter design methods can be

more succinct and direct. The responses of filters and buses are naturally written as con-

volutions while linear and least squares problems are naturally formulated with matrices.

Here I define some notation to show the connection between various convolutions and their

corresponding matrix representations.

Let ��

be a vector of size�

. The�

components are � � � � � � � �� . I’ll write

� � � to denote the size of � , � �� to denote the� �

norm of � , and � �� to denote the� � norm

of � .Some matrix abbreviations used below are:

� � The��

�identity matrix

� � � The � � �matrix of zeros

� � The

��

�matrix where

� � � ��

(4.2)

4.2.1 Convolution

Linear Convolution: Let � and � � be two vectors. The linear convolution of � and � � is

the vector of size� � � � � � � � defined below:

� � � ��

� � � � � � � � � � � � � (4.3)

Linear convolution is commutative and associative.

23

Circular Convolution: Let � and � � be two vectors in ��

. Let � ��

�denote the

circular convolution of � and � � :

� � ��

� � � � � � � � ��

(4.4)

Circular convolution is commutative and associative.

Let � be a vector and�

be an integer with� � ��

. The zero-extension of � pads �with zero elements to produce a vector of size

�:

extend � � ��

�� (4.5)

Zero-extension is a linear operator:


�� (4.6)

Let extend � � � � �� be the left matrix on the right hand side of the equation.

Linear convolution can be expressed as circular convolution of zero-extended vec-

tors:

� � ��

extend � � � � �

� � � � � � � � extend � � � � � � � � � � � � � (4.7)

Block Linear Convolution: Let � �� be a matrix. We can think of � as a column

of � matrices:

�

�

�

� ...

� � ��

��

(4.8)

where each of the � �is a � �� matrix. The block linear convolution of � � � � � � and

� �� is defined similarly as linear convolution:

� � ��

� � ��

�� (4.9)

24

The block linear convolution of matrix � � � � � � and vector � � � � � is defined simi-

larly.

Block Circular Convolution: The block circular convolution of � and�

, where � � � �� :

� ��

� ��

� � �� (4.10)

Block circular convolution is associative. It is commutative if the product of the sub-

matrices is commutative, for example, if the sub-matrices are all symmetric or all circulant

(circulant matrices are defined in sec 4.2.2 below). Extending the extend operator to block

matrices, let � �� be a matrix, and let�� .


�� (4.11)

Zero extension on block matrices is a linear operator just as it is for vectors.

Block linear convolution can be expressed as block circular convolution of zero-

extended matrices:

� ��

extend � � � � �� extend � � � � �

��

(4.12)

where � �� .

4.2.2 Matrix Representations of Convolution

In this section, I will first present matrix representations for linear convolution, then extend

it to block linear convolution.

Let � � ��

be a vector. Let �� be the circulant matrix [5] generated by

� :� � � � � � � � �� (4.13)

25

The form of this circulant matrix is depicted below:

� �

�

� � � � � ��

......

......

� ��

��

(4.14)

Let � and � � be two vectors of the same size. Equations 4.4 and 4.13 yield:

� ��

� (4.15)

Furthermore, if � , ��, . . . , � � are all vectors of the same size, then

� ��

� � � � �� (4.16)

Note that matrix multiplication of circulant matrices is commutative and associative, just

like the corresponding convolution.

Let � �� be a matrix, and let row

� � � � � � � � be the vector such that

row� � � � � � � � � � � � � � (4.17)

Likewise, let col� � � � � ��

�be the vector such that

col� � � � � � � � � � � � � � (4.18)

Convolution can be expressed with all arguments represented as matrices:

� � ��

col� � � � � � ��

��

�(4.19)

Using equation 4.7, linear convolution can be expressed using matrix multiplication:

� � ��

extend � � � � �

� � � � � � � � extend � � � � � � � � � � � � � (4.20)

26

Define � �� as the matrix given by

� � extend � � �� (4.21)

The form of this matrix is depicted below:

� �

v(0)

v(1)

v(m−1)

v(m−1)

v(m−1)

v(m−1)

v(m−1)

v(0)

v(0)

0

0(4.22)

where � � � � . The linear convolution of � � � � � � � can be written as

� � ��

�� col� � ��

(4.23)

where

� � �� (4.24)

The matrix representation for linear convolution described above can be extended

to block linear convolution. Let � � � � � � be a matrix. As described in the previous

section, the matrix � can be regarded as a column of � submatrices of dimension � � �

each.

The block circulant matrix generated by � is

� �

�

� � � ��

......

...

� � ��

��

(4.25)

27

For those who prefer formulas to ellipses:

� � � � � � � � � � �div � � �

�� div � � �� (4.26)

Let � � � � � � � � be matrices. Equations 4.10 and 4.25 yield:

� ��

(4.27)

Using equation 4.12, block linear convolution of � � � � � � and� � � � � �

can be expressed using matrix multiplication:

� ��

extend � � �� extend � � �� extend � � ��

col� � � � � � � � � � � � �

(4.28)

where � � � �� is defined as extend � � �� ,

� � �� , and col

� � � � � � � � is

defined in the obvious manner.

Block linear convolution of � � � � � � with � � � � � can also be expressed as

matrix multiplication:

� � � � �� extend � � �� (4.29)

where� � �

� .

Define the following operators:

� � block� � � � � �� creates circulant blocks from vector � �� .

� � � � � � � � �� div �� (4.30)

The form of this matrix is depicted below:

�

�

� � � � � � � � � � � ��

...

� ��

��

(4.31)

28

� vec2cir� � � � �� converts a vector � � � � � to a circulant matrix:

vec2cir� � � � �� extend � block

� � � � �� (4.32)

Define � �� as the matrix given by vec2cir

� � � � �� . This matrix has the

same form as � � (see equation 4.14), except that now each block is a circulant

matrix of size � � � . Notice that � �� is a block circulant matrix and

extend � � �� col� � � � ��

With these operators, it is straightforward to see that for � � � � � � and � �� ,

� � � col� � � � � � �� (4.33)

where� � �

� .

The block linear convolution of two vectors � � �� and � � � � �� is defined

as:

� �� col

� � � � �� (4.34)

where� �

� � �. The block linear convolution of � ��

�� can be written as

� ��

�� col� � � � ��

� �� (4.35)

where

� � ��

� (4.36)

29

4.3 Least Squares Optimization Method with Pseudo-random

Input

4.3.1 Motivation

As discussed in the previous section, an ideal bus would in all cases deliver the near end

signal without distortion to the far end receiver, with some amount of delay. Thus, we

know that in the ideal case, the expected output signal would be simply a delayed version

of the input signal. The goal of filter design is to find a set of filter coefficients that make

the output signal as close to this ideal output signal as possible. Following the example of

[6][7], I use RMS error (� �

metric) in this section as a measure of the distance of the filter

output from the ideal, delayed signal. In this case, filter design can be formalized as a least

square optimization problem. In section 4.4, I use worst-case difference between a signal

and the target as a measure of distance (� �

metric) and show that the resulting filter design

problem is an instance of linear programming.

4.3.2 Least Square problem formulation

� Input

Consider a bus with � bus wires. Let�

input denote the length of the input training

sequence in bit times. Thus, an input is a function that gives a value, +1 or -1, for

each wire � � � � � � bus� � �

at each time� � � � �� input

� � �. This function can be

represented by a vector, input, with input�� bus� �

��

denoting the value of the ��

wire at time�. Because filter coefficients are given in tap times, oversampling is

needed to convert the input from a sequence in bit times to a sequence in tap times.

Let input � ��

input�bus be a vector and � be a positive integer. The oversample op-

erator, oversample�input

� � � � bus�

computes in � ��

input�bus which is the � times

30

oversample of input:

in� � � input

�� bus �

� � div�� bus � �� bus

��

The oversample operator is linear. In particular, oversample��

input� � � � bus

�is a ma-

trix with:

oversample��


� � � � � � ��

�if� � div

�� bus � �� div � �

and� � �� bus

� � � �� bus�

� otherwise

Thus,

oversample�input

� � � � bus�

oversample��


� �input

Define input � �� as the vector given by oversample

�input

� � � � bus�.

The form of this vector is depicted below:

input � � ��

�

input

input ...

input � �� Repeats � bus

� �more times

input � ��input � ��

...

input� � ��

Repeats � bus� �

more times...

input � � ��

��

(4.37)

31

� Buses

In much the same manner as above, the impulse response of a bus with � bus wires is

a column of � bus � � bus matrices with each matrix giving the response corresponding

to a particular delay. Let�

bus denote the length of the bus impulse response in tap

times. The bus impulse response can be represented by a � bus�

bus � � bus matrix, bus

where bus�� bus� �

� out�� in�

is the response of the � out output wire of the bus after

a delay of�

tap times to the � in input wire.

Let in be the vector for the input of the bus in tap time.

in

input � � ��Let in � � �

� �� denote the value of the input at tap time t:

in �� in

�� bus�� bus

� � �

Likewise, let bus � �� denote the bus impulse response at time

�:

bus �� bus

�� bus�� bus

� � �

Let output � �� be the vector for the output of the bus and let output �

be the output at tap time�:

output � � �� bus � �

�in� (4.38)

Equation 4.38 has the form of a block linear convolution. Thus,

output

col� � � bus � in � �� (4.39)

where� �

bus� � � input. Moreover, in this thesis, for simplicity, I assume that

wires are arranged around a cylinder (see chapter 3). This means that all wires have

the same characteristics, and bus � is a circulant matrix. Let � � �� be the

32

vector whose�� bus

� � � �� bus� �

components are the first column of bus � .That is,

bus

block�

��

bus � � ��

Thus the ouput signal of the bus given input in bit time is:

output

col� � � � � ��

input � � �� (4.40)

where� �

bus� � � input.

� Filter

In figure 1.1, a filter is depicted for each wire of the bus. Because all wires have

the same characteristics, I assume that every filter is the same. For a � bus-bit bus,

this filter system can also be viewed as a � bus � � bus input/output network. Thus,

similar to the bus, the input/output relationship of the filter system with�

fir taps can

be expressed as:

filterOutput

col� � �� input � � �� (4.41)

where� � �

�� is the filter coefficient vector of the filter for wire 0 of the bus. It

33

has the form depicted below:

�

�

� � � ��

� � � � �

� � ��

� � �

� � � � �� fir

� � �

��

(4.42)

where� � ��

denotes the contribution of the input on wire 0 at time 0 to the filter output

for wire � at time�.

Because the bus is symmetric, I restrict my attention to symmetric filters. That is, in

the�

vector depicted above,

� � � � � � � �� for � � � � �� fir� � � � � � � � � � bus

� � �

Moreover, inputs on wires far away produce very little cross-talk. Therefore, it may

be practical to force the filter coefficients for these wires to zero to simplify the im-

plementation of the filter. In this thesis, filters with various sizes are investigated.

Filter size is defined as filter length � filter width. A�

fir �� fir filter contains � fir sets

of�

fir filter coefficients for inputs on wire itself and the � fir� �

nearest wires in both

34

direction. Its filter coefficient vector fir �� is depicted below:

fir

�

fir � � �fir � � �...

fir � � � ��

fir � � �...

fir � � � ��

fir� � �

��

Define filterExtend��

fir�� fir

�� bus� � �

� � � � �� as the matrix depicted below:�

� � � ��

� � � ��

� � � ��

� �

��

(4.43)

where � � � � � � �� denotes the horizontal concatenation of a column vector � � � � � �� with a matrix

� � � � �� . Operator filterExtend�fir

�� fir

�� bus�

transforms fir to the full

filter coefficient vector�

in equation 4.42.

filterExtend�fir

�� fir

�� bus�

filterExtend��

fir�� fir

�� bus� �

fir (4.44)

35

Denote fir� �� as the vector given by filterExtend

�fir

�� fir

�� bus�, which equals the full

filter coefficient vector�

depicted previously. Thus, with equation 4.41, the output

signal of the filter system with�

fir � � fir filters can be expressed as:

filterOutput

col� � �� fir

� �� input � � �� (4.45)

where� �

fir� � � input.

� Target signal

Let � be the target signal which is a delayed version of the input signal.

I considered two ways to approximate the expected delay.

– LC delay: length� � � .

– approximate the delay by determining the peak of the Frobenius norm of the

bus impulse response.

The second one is more accurate because the effect of resistance is also taken into

account, especially for long buses where RC delay dominates LC delay. In this thesis,

all results are obtained with the second method.

Let �� be the following matrix:

��

�� if � � � ��

� otherwise

where�

is the approximated delay in tap time and

� � � input��

The target signal � is given by

�� input � � ��

(4.46)

36

� Output signal

With above analysis, it is straightforward to express the output signal of the system

with�

fir � � fir filters in figure 1.1 using matrix multiplication. Let the vector output �� represent the output of the system in tap time:

output

col� � � h � �� fir

� �� input � � ��

col� � � h � �� input � � �� fir

� ��

h �� input � �� extend �� fir�� bus

�� filterExtend

��fir�� fir

�� bus�fir

(4.47)

where� �

input � � �fir� �

bus.

Let

� h � �� input � � �� extend �� fir

�� bus

�� filterExtend

��fir�� fir

�� bus�

(4.48)

Then

output � �

fir (4.49)

� Least Squares Problem

With equations 4.46 and 4.49, the least squares problem is:

��

fir ��

� ��

fir� �

Given�

and � , I used QR decomposition (i.e. the backslash command in Matlab) to

find the vector fir that minimizes the least square error of the over-determined system

� �fir� .

4.3.3 An example

To show the effectiveness of the equalizing filter approach in cross-talk cancellation, con-

sider a 32-bit bus with length 5 cm. The electrical parameters of the bus are: � = 0.066

37

�/cm, � = 0.8 pF/cm,

�= 3.99 nH/cm,

� � � � = 0.31, � � � � = 0.23. Filter design parameters

are:�

fir� , � fir

�, taps per bit = 4, bit time = 400 ps, length of the training sequence,

�input � � � bits.

For this particular example, in equation 4.46 and 4.49,

� Bus width � bus = 32. Length of the bus impulse response,�

bus is set to be 16 taps.

Thus, h is a vector of length:�

bus � � bus��

.

� input is a vector of length: � bus ��

input �� .

� fir � � �� .

� �is a matrix of size

� � � � �� .

Pseudo-random input sequences are used as test sequence. By comparing the eye

opening with and without the filter designed, we get an indication of the effectiveness of

the filter.

Figure 4.3 shows the predistorted input signal on wire 1. The waveforms in fig-

ure 4.4 clearly show that the �� filter greatly reduces the overshoot and undershoot,

which is also shown by the eye diagrams in figure 4.5. With the � � � FIR filter designed,

the eye height is increased from 31% to 82%. This tells us that equalizing filters are a very

promising method in cross-talk cancellation for high speed buses. More thorough testing

results are presented in section 4.5.

4.4 Linear Programming Method with Worst-case Input

In this section, a linear programming method is introduced with the assumption that we can

solve the formulated linear programming problem.

38

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Vol

tage

(v)

t (ns)

Input signal on wire 1Predistorted signal on wire 1

Figure 4.3: Predistorted signal: equalizing filter output

4.4.1 Motivation

Although the least squares optimization method works and the FIR filter described greatly

improves the eye height of signals transmitted, this method has several shortcomings.

� The filter designed by the LSQ optimization method greatly depends upon the pseudo-

random input pattern used as training sequence. To get a good filter design, a long

training sequence must be used, which makes the speed of the filter design very slow

for wide buses as occur frequently in practice.

� The design objective is to transmit the bits without error. It is assumed that as long as

a bit satisfies the eye specification, it will be received correctly. Thus, getting some

bits that already satisfy the eye specification closer to the target signal doesn’t matter.

It’s the worst-case pattern that determines the eye height. Thus, the� �

metric doesn’t

39

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

t (ns)

Vol

tage

(v)

System without filtersInput signal on wire 1Output signal on wire 1

0 10 20 30 40 50 60−1.5

−1

−0.5

0

0.5

1

1.5

Vol

tage

(v)

t (ns)

System with 8*2 filtersInput signal on wire 1Output signal on wire 1

Figure 4.4: Examples of output signal for 32-bit interconnect network with (lower panel)and without (upper panel) 8 � 2 equalizing filters designed with the LSQ method.

40

0 100 200 300 400 500 600 700 800−2

−1.5

−1

−0.5

0

0.5

1

1.5

2Eye diagram for system without filters (eye height 29%, eye width 75%)

t (ps)

Vol

tage

(v)

0 100 200 300 400 500 600 700 800−1.5

−1

−0.5

0

0.5

1

1.5

t (ps)

Vol

tage

(v)

Eye diagram for system with 8*2 filters (eye height 82%, eye width 75%)

Figure 4.5: Eye-diagrams for a 32-bit interconnect network with (lower panel) and without(upper panel) 8 � 2 equalizing filters designed with the LSQ method. Red traces indicatehigh signal transmitted. Blue traces indicate low signal transmitted.

41

strictly correspond to eye height, the metric defined in equation 4.1. For example, it

is possible that for a training sequence, some filter coefficient set produces very small

RMS but the output signal has 1 bad trace. It is that 1 bad trace which determines

the eye height. Certainly, we can reformulate the same problem into a linear pro-

gramming problem, such that for a given training sequence (pseudo-random input),

the� �

metric is minimized. However, in order to guarantee worst-case performance,

ideally, all possible input combinations should be part of the training sequence. This

is obviously not practical.

It turns out that for a given set of filter coefficients, the worst-case input pattern can

be figured out and thus the worst-case eye height can be computed. This section is

devoted to this method that minimizes� �

over all possible inputs.

First, I show that the search space for the� �

metric is convex even when more general,

non-linear filters are considered. Formalize an eye height specification as a sequence of

tuples:

� ��

A filter�

satisfies�

if and only if for every input every output satisfies:

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� output

�� input

��

output�� input

�� (4.50)

where � is the number of taps per bit,�

is the expected delay of the bus, output

h � ��

input�

and�

is the filter function. Let� � �

denote filter�

satisfies eye�

.

Let� and

� �be two filters that satisfy some eye opening constraint

�. That is,

� � �

and� � � �

. Let� � ��

� � � ��

, where� � � � � � � . Because the bus, h � �� , is linear,

a system with��

produces output signals that are the same linear combination of what is

42

produced by systems with� and

� �. It then follows from equation 4.50 that

� � � �.

Thus the space of filters that satisfy eye opening constraint E is convex.

The objective is to send -1, 1 signals down the bus as clearly as possible. In this

system, every wire of the bus has the same configuration and thus is interchangeable. Thus,

the original objective is the same as trying to send down 1 on wire 1 as clearly as possible

with the worst-case disturbances from other wires and preceding and following bits.

The output signal on wire 1 for the current bit is simply a summation of the effect

on wire 1 at the current bit from

� the input signal on wire 1 for the current bit, which is the signal expected to come

through if there is no disturbance.

� the input signal on wire 1 at other bit times and also the input signals on other wires

for the current bit and other bits, which produce disturbances on the first wire at the

current bit.

Thus, the optimization problem can be restated as the following: Given that the

current bit input on the first wire is 1, find the best set of filter coefficients that makes the

output signal on wire 1 at the current bit as close to 1 as possible for the worst-case input

sequence which produces the largest disturbances on wire 1 from other bit times and other

wires.

� ��

subject to

undisturbed�

disturbances�

��

undisturbed�

disturbances�

��

(4.51)

43

4.4.2 Linear Programming Problem formulation

I now focus on the practical case where the filter is linear and FIR, and show that the design

problem is an instance of linear programming. The goal remains to send down 1 along

wire 1 as clearly as possible. A quantified version of this goal is: for the worst case input

sequence with 1 at the current bit, the output signal at some given sampling time is as close

to 1 as possible. That is, at this sampling point, the eye height is as high as possible. A

reasonable sampling point is�, the delay of the bus. Equation 4.51 shows that to formulate

the LP problem for the equalizing filter design, we need to know the undisturbed output at

the sampling point and the largest total disturbances at the sampling point.

Let in be the input sequence that is 1 bit long and only the bit on the first wire is 1:

in� � �

�� if � �

� otherwise(4.52)

Because the whole system is linear and circulant, the response to this pulse input in gives

us all the information we need to compute the output for the worst-case disturbances. From

section 4.3, for the system with�

fir � � fir FIR filters, we know that the corresponding output

is given by G�fir, where G as given by equation 4.48:

G

h � �� in � � �� extend �� fir

�� bus

�� filterExtend

��fir�� fir

�� bus�

where� � � �

fir� �

bus is the length of the response in tap time. Different rows of G�fir

represent the response on some wire at some tap time. The contribution of the bit from

equation 4.52 to the output at the sampling time is:

undisturbed

row� � � � bus

�G� �

fir (4.53)

Responses from other wires and responses from the first wire arising from earlier and later

bits are the disturbances. For example, the disturbance on wire 1 at the sampling time

44

caused by input on the second wire � bit times earlier, is the same as the disturbance on wire

2 at the sampling time from the input on the first wire � bit times earlier. Moreover, it is the

same as the response of the original pulse input from equation 4.52 observed on wire 2 at

the tap time that is � bit times later than the sampling time. These are due to the linearity

and symmetry of the system. Thus,

disturbance� � � � � row

�� bus� � � G � � fir (4.54)

where disturbance� � � � � is the disturbance on the first wire at the sampling time given an

input of 1 on the ��

wire � bit times earlier. If the disturbances from other bit times and

other wires are all positive, we get the largest total disturbance and hence the worst-case

disturbances. Let d� � � � � denote the worst-case, positive disturbance on wire 1 at the sam-

pling time from the input on ��

wire � bit times earlier. Noting that each input to the filter

is either +1 or -1, the following inequality constraints compute the absolute value function

needed to obtain d� � � � � :

d� � � � � �

row�� bus

� � � G � � fir

d� � � � � � �

row�� bus

� � � G � � fir(4.55)

Because the cost function is positive monotonic in each of the d� � � � � , either the first con-

straint or the second constraint is tight at the optimal point.

Let��

be the matrix that contains all the rows in G that matter. The total number of

rows are � � � �� . To calculate the total disturbances from other bit times and wires, ideally,

an infinitely long history should be considered because of the infinite impulse response

of the bus. This is not practical. Notice that most of the energy of the impulse response

expands over about 6 times LC delay of the bus (see figure 4.6). For the particular bus

model presented in this thesis, the LC delay is about 250 ps. All the results presented here

45

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

t (ps)

Fro

beni

us n

orm

of t

he im

puls

e re

spon

se o

f the

bus

peak

future history

Figure 4.6: Frobenius norm of the bus impulse response.

are obtained with:�

bus � � � length

� � � � � (tap time)

Moreover, notice that the bus impulse response does not rise immediately to the peak. This

means that not only the history bits affect the output of the current bit but also a few future

bits. Among the � � � �� rows, there are � � � � � future bits, 1 current bit and the rest are history

bits. For � � � � � � bus� � � � � � � � � � � � � � � � � � ��

row�� bus

� � � � � � row��

bus��

G�

(4.56)

Thus row� � � � �� bus

� � � �� fir gives the undisturbed output. Here is the equalizing filter

design problem as a linear programming problem in fir�d�

� :

46

� ��

�

�� row� � � � � � � � bus

� � � ��

��

�row� � � � � � � � bus

� � � � � � � � �

��

��

�fir

d

�

��

��

�

� �

��(4.57)

4.4.3 Smoothing filter

It was found that if we average a few taps of the current output bit and use that as the

objective function of the LP problem, the eye height obtained is better than simply asking

the optimizer to bring one tap of the current output bit as close to 1 as possible. However,

a corresponding smoothing filter is needed at the receiver end in order to get the desired

output signal. Fortunately, such averaging behavior is typical of the input circuits on real

chips [10]. The new system structure is shown in figure 4.7. Smoothing filters will be

further examined in section 4.7.

Assuming we are averaging over 3 taps (a more sophisticated strategy will be dis-

cussed in section 4.7), define a smoothing operator:

smooth� � �

�

� � � � � � � � � � � � � �� ...

......

� � � � � � � � � �

�� (4.58)

47

TransmitterEqualizing

filterBUS

Smoothingfilter

Receiver

Figure 4.7: System with smoothing filter at the receiver end.

With the smoothing operator, now

G

smooth h �� in � � �� extend �� fir�� bus

�� filterExtend

��fir�� fir

�� bus�

(4.59)

Moreover, the delay of the bus might not be the best sampling point. It was found

that the best sampling point depended on the filter size and the bit time. For example, at

300 ps bit time, � � � equalizing filters designed with 1 tap extra delay in addition to the

bus delay give the highest eye height (81%) among all � � � filters. It is also better than the

system without the smoothing filter (74%). In the rest of this thesis, all testing results are

obtained with the extra delay varied to give the best eye height.

4.4.4 An example

The following example shows the effectiveness of the equalizing filter approach (LP method)

in cross-talk cancellation. I use the same bus parameters and filter size as the example given

in section 4.3.3. A pseudo-random test sequence is used.

For this particular example, the LP problem formulated has the following properties:

� fir � � �� .

� number of disturbance variables,�d�: 223. Thus, total number of variables is 240.

48

� number of constraints: 448.

From figures 4.5 and 4.9, note that the eye-height for the filter designed by the LP

method (� �

norm) is slightly higher than that for the LSQ filter (� �

norm), � �� vs. � �� .

As expected, optimizing for eye-height produces greater actual eye-height than the “average

case” optimization of the LSQ method. The eye width for LP is significantly smaller than

that for LSQ,� � � vs. �

� �. This is expected because the LP filter is optimized for eye-

height at a specific sampling point, whereas the LSQ objective function considers the entire

waveform. Section 4.5 presents further comparisons.

The speed of FIR filter design with the LP method largely depends on the size of the

LP problem formulated. Thus it depends on how many bits (number of disturbances) are

used to design the filter and the size of the filter. The number of disturbances is determined

by the length of the bus impulse response in bit time. The smaller the bit time, the larger the

LP problem. For an �� filter design at 400 ps, on a Linux box with a 800MHz Pentium

III CPU and 256MB memory, it finishes within a few seconds. Based on this method, I

investigated other variations of linear FIR filters, such as time-variant linear FIR filters and

other types of smoothing filters.

4.5 Testing results: Comparison of LSQ method and LP method

4.5.1 Worst-case input sequence

In section 4.3.3 and 4.4.4, pseudo-random input sequences were used to measure the eye

opening (eye height and eye width). By comparing the eye opening with and without the

filter designed, we get an indication of the effectiveness of the filter. A shortcoming of

using pseudo-random input sequences as testing sequence is the result varies a lot from

time to time if the input sequence is not long enough. But simulation with a very long input

49

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

t (ns)

Vol

tage

(v)

System without filtersInput signal on wire 1Output signal on wire 1

0 10 20 30 40 50 60−1.5

−1

−0.5

0

0.5

1

1.5

t (ns)

Vol

tage

(v)

System with 8*2 filters Input signal on wire 1Output signal on wire 1

Figure 4.8: Example of output signals for systems with (lower panel) and without (upperpanel) the equalizing filter designed with the LP method.

50

0 100 200 300 400 500 600 700 800−2

−1.5

−1

−0.5

0

0.5

1

1.5

2Eye diagram for system without filters (eye height 29%, eye width 75%)

t (ps)

Vol

tage

(v)

0 100 200 300 400 500 600 700 800−1.5

−1

−0.5

0

0.5

1

1.5

t (ps)

Vol

tage

(v)


Figure 4.9: Pseudo-random test: eye diagrams for systems with (lower panel) and without(upper panel) the equalizing filter designed with the LP method. Red traces indicate highsignal transmitted. Blue traces indicate low signal transmitted.

51

sequence takes a long time. Inspired by the LP filter design procedure, I used the worst-

case input sequence for each filter instead of pseudo-random input sequence as the testing

sequence. Since the total disturbance from other bit times and other wires is the largest for

the worst-case input, the eye opening is the smallest among all input sequences and hence

the most representative.

For a given set of filter coefficients, the worst-case input sequence input with length

� � � �� (where�

is defined in equation 4.48, � is the number of taps per bit) can be found

by:

� for every wire � and every bit � , calculate the resulting disturbance on wire 1 at a

given sampling time. If the filter coefficients are obtained with the LP method, the

sampling time is the same as what was used in the LP filter design. For an arbitrary

set of filter coefficients, the sample point is not defined in advance. Instead, I consider

every tap time as a possible sample point and select the one with the best eye height

as the sample point. Accordingly, the worst-case input sequence is determined by

finding the worst-case input for each possible sampling time and concatenating these

sequences together.

� input� � � � � � �� = 1. We are looking for the largest negative disturbances

when 1 is sent. Negation of this input sequence is also a worst-case input sequence.

� If disturbance� � � � � � � , input

� � � � � �. Otherwise input

� � � � � � � , for � �� bus

�, � � � � � � � � � � � .

In this section, all testing results are obtained with input sequences that are con-

catenations of the worst-case input sequence and pseudo-random input sequences, unless

otherwise indicated.

52

0 100 200 300 400 500 600−2

−1.5

−1

−0.5

0

0.5

1

1.5

2V

olta

ge (

v)

t (ps)


0 100 200 300 400 500 600−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

t (ps)

Vol

tage

(v)


Figure 4.10: Worst-case test (upper panel) vs. Pseudo-random test (lower panel): eye dia-grams for systems with � � � equalizing filters designed with the LP method. Red tracesindicate high signal transmitted. Blue traces indicate low signal transmitted.

53

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

0 5 10 15 20

Filter Width

Eye

hei

gh

t

4 taps

8 taps

12 tap

16 taps

Filter Width

Eye

hei

ght

Figure 4.11: Worst-case performance of different equalizing filters designed with the LPmethod. Only equalizing filters with eye width greater than 25% are shown. Simulationparameters are: � � � � � � /cm, � � � pF/cm,

� � �� nH/cm,

� � � � = 0.31, � � � � =0.23. Filter design parameters are: taps per bit = 4, bit time = 300 ps.

Figure 4.10 upper panel shows an eye diagram obtained with such a testing se-

quence. Comparing with the eye diagram shown on the lower panel, it is noticed that

the worst-case input sequence happens rarely and there is a significant difference between

worst-case eye height 80% and random eye height 92%. This gives a possibility that if a

certain amount of bit error is tolerated by using some error correcting code strategy, the

maximum bit rate could be further improved.

4.5.2 Indirect coupling

For simplicity in the distributed coupled RLC model, I only considered capacitive and in-

ductive coupling between adjacent lines. Thus, originally I thought that it should be enough

to equalize one line when only information on this line itself and its two nearest neighbours

54

Figure 4.12: Indirect coupling between non-adjacent lines

are used by the filter design. For a three-bit interconnect, this method considers all wires

and therefore does give the optimal result. However, for a bus with more than three lines,

if we consider more lines instead of just adjacent lines (increase the filter width), better

cross-talk cancellation can be achieved. Figure 4.11 shows the performance of filters with

different sizes at 300 ps bit time. Compared with an � � � filter, the � � � filter has a much

greater eye opening. This indicates that although the direct coupling between non-adjacent

lines is weak and ignored in the interconnect network model, the indirect coupling between

non-adjacent lines is strong and shouldn’t be ignored in the equalizing filter design. It is

also shown in figure 4.11 that for the bus considered this trend nears its asymptote when

filter width is larger than 4.

The indirect coupling between non-adjacent lines is illustrated in figure 4.12. From

figure 4.12, a pattern of transfer function in the frequency domain was conjectured. That is,

� � � � � � � ��

� � � � � � ��

� � � � � � �� If this pattern existed, a simple filter considering all lines could be designed. Unfortunately,

since we are considering far end noise cancellation, which is not only a function of input

voltage but also a function of distance � , this speculated pattern does not occur in practice.

55

0 100 200 300 400 500 600−15

−10

−5

0

5

10

15V

olta

ge (

V)

t (ps)

eye diagram for system with 8*16 filters (eye height 99%, eye width 0%)

Figure 4.13: Eye diagram for system with �� equalizing filters designed by the LP

method. Red traces indicate high signal transmitted. Blue traces indicate low signal trans-mitted.

4.5.3 Over-fitting

Compared with the LSQ method, the LP method is fast and guarantees worst-case per-

formance. However, it has an over-fitting problem. Figure 4.14 shows the trend that the

magnitude of overshoot increases with the size of equalizing filter designed. It suggests

that with more degrees of freedom, the optimizer tends to put more energy into the filter in

order to get higher eye height at the sampling time, which results in much greater overshoot.

For some inputs, the output signal changes abruptly but close to the target right at the sam-

pling time, resulting in a larger eye height but also a much smaller eye width. This effect

is clearly shown in figure 4.13, which shows an eye diagram obtained with � � ��filter

designed with the LP method. The eye height of the diagram is the largest among all filters

with length 8, but it barely has an eye opening, and has an enormous amount of overshoot.

56

1

10

100

1000

10000

100000

1000000

0 100 200 300

Number of filter coefficients

Ove

rsh

oo

t (V

)

��

��

��

��

Ove

rsho

ot (

v)

Number of filter coefficients

1

10

100

1000

10000

100000

1000000

0 5 10 15 20

Filter width

Ove

rsh

oo

t (V

) 4 taps

8 taps

12 taps

16 taps

��

��

��

��

Ove

rsho

ot (

v)

Filter Width

Figure 4.14: Magnitude of overshoot increases with the size of the equalizing filter designedwith the LP method. Upper panel shows this trend with the number of filter coefficients as� axis. Lower panel shows this trend with the filter width as � axis, given filter length. Allsimulations are done with the same parameters as in figure 4.11.

57

The lower panel of figure 4.14 shows that for a given filter length, magnitude of overshoot

increases with the filter width. Whereas, for filter widths less than 6, the magnitude of the

overshoot doesn’t follow this trend with the filter length. Thus, the filter width plays a more

critical role than filter length in the over-fitting problem. This could be explained by the fact

that wires further away produce less disturbances on the wire than history bits on the wire

itself. Thus with a longer filter, the optimizer could easily push the eye height up without

putting more energy into the filter. When the filter length is long enough to cover most

significant portion of the bus impulse response, this trend reaches its limit. As we can see

from figure 4.11, at 300 ps bit time, filters with length 8 taps already do a very good job

in cross-talk cancellation. Compared with 12 tap filters, 16 tap filters don’t significantly

improve the eye height.

Moreover, for the bus considered, the improvement of eye height by increasing filter

width also stops when the filter width is more than 4 for the bus considered (see 4.11). So,

very long and wide filters won’t bring more benefit, yet the filter becomes more and more

complicated and expensive. In this sense, over-fitting problem of LP method may not be a

serious problem in practice.

Furthermore, instead of trying to bring 1 tap as close to 1 as possible, the LP method

can be easily formulated to bring 2 taps as close to 1 as possible. By doing this, sharp tran-

sitions are avoided hence the amount of overshoot is decreased. In practice, this method

works in decreasing the severity of the over-fitting problem of the LP method when design-

ing large filters.

4.5.4 Minimum bit time

To simplify design and yet achieve reasonable cross-talk cancellation, an important question

is how many lines away should be considered when designing the equalizing filter. In other

58

Taps Width Minimum bit time (ps)LP LSQ

4 1 679 6844 2 338 5754 3 332 5514 4 298 3904 6 298 3904 8 240 3904 16 228 4008 1 679 6848 2 240 3438 3 234 3438 4 228 3438 6 200 3438 8 200 34312 1 679 68412 2 234 29712 3 200 23012 4 200 22816 1 679 68416 2 234 29716 3 200 2300 0 740

Table 4.1: Performance of equalizing filters with different sizes for a bus 32-bits wide and5 cm long.

words, what’s an appropriate width for the filter. Another question is how many taps (filter

length) should we use. Obviously, the longer the filter, the better the noise cancellation, but

the more expensive the design.

Table 4.1 shows simulation results with the filter length and width varied. To eval-

uate the performance of an equalizing filter, the maximum operating frequency (minimum

bit time) at which the height of the eye is around 50% and eye width is over 25% is used. In

these simulations, design parameters are the same as in figure 4.10. Table 4.1 shows that:

59

� Equalizing filters designed with both the LP method and the LSQ method effectively

improve the maximum bit rate of the bus.

� Equalizing filters designed with the LP method have better performance than equal-

izing filters designed with the LSQ method for every configuration considered. As

discussed further below, the advantage for the LP method is most pronounced for

wide filters.

� An � � � filter is a good choice in terms of performance and cost. Although An � � �

filter does improve the eye height at lower bit rate (see figure 4.11), it has similar

minimum bit time as the � � � filter.

Note that width = 1 is separate pre-emphasis for each line. With width = 1, LSQ and LP

have similar performance. Because the focus of this work is on cross-talk cancellation,

high-frequency attenuation caused by skin effect is not built in the bus model. Because

of this, the performance of the system without filter (width = 0) and systems with pre-

emphasis filters (width = 1) are similar (740 ps vs. 679 ps). With cross-talk cancellation

(width ��), the performance of the bus is greatly improved (2.7 to 3.4 times higher bit rate

than independent pre-emphasis). With width ��, LP is significantly better than LSQ. This

might not be a completely fair comparison because all the LP results were obtained with

an additional smoothing filter at the receiver end whereas the LSQ results were obtained

without any smoothing filters. The LSQ method can be easily applied to the system with

a smoothing filter at the receiver end. However, the LSQ method with smoothing is no

better than LSQ with increased number of taps. For example, 8 tap filters designed by

the LSQ method with 3 tap smoothing filter is no better than 12 tap filters designed by

the LSQ method without any smoothing filter. Table 4.1 shows that the performance of

the LP method is better than this upper bound of the performance of the LSQ method with

60

Taps Width Minimum bit time (ps)20 cm bus 50 cm bus

4 1 2354 66274 2 1419 25104 3 1341 24464 4 1264 24324 6 1264 24324 8 1108 24328 1 2295 60058 2 1030 23938 3 932 21988 4 874 21988 6 835 2042

12 1 2295 600512 2 932 231512 3 797 204212 4 777 19630 0 2734 6807

Table 4.2: Performance of equalizing filters with different sizes for buses 32-bits wide. Allfilters designed using the LP method.

smoothing filter. It appears that LP and LSQ are comparably well suited for designing filters

for independent pre-emphasis. However, LP is much better for cross-talk cancellation. The

table also shows that cross-talk cancellation is essential to obtain high bit rates with wide

buses. Table 4.2 shows simulation results with the filter length and width varied for buses

20 cm long and 50 cm long. Again, the performance of equalizing filters are quantified with

the minimum bit time at which the height of the eye is around 50% and eye width is over

25%. All filters are designed with the LP method.

61

4.6 Time-variant Linear FIR Filter

A bit time consists of several tap times. In most examples presented previously, there are

4 taps per bit. Among them, the first tap and the last tap are bit-transition taps. The other

two are stable taps. The receiver samples stable taps and is insensitive to the input value at

bit-transition taps. The FIR filter designed above has no information about bit transition. It

treats every tap the same, no matter whether it is a bit-transition tap or a stable tap. What if

we treat them differently and let the filter have the knowledge of bit transition? This leads

to the design of time-variant linear FIR filter. The idea is to assign a set of filter coefficients

for each tap per bit. For example, for a 8 � 3 FIR filter and 4 taps per bit, the time-variant

linear FIR filter contains 4 sets of 8 � 3 filter coefficients. By doing this, the optimizer can

differentiate transition taps from stable taps, and assign different filter coefficients to their

corresponding filter. The way this time-variant filter works is illustrated in figure 4.15. In

the figure, different sets of filter coefficients are indicated by different line style.

This problem can be formulated into a linear programming problem similarly as the

original simple linear FIR filter design. The only difference between these two linear pro-

gramming problem formulation is the way that filter output is calculated. Let fir � � � � fir

�

denote the � set of filter coefficient vectors. The time-variant linear FIR filter coefficient

vector:

F

�

fir fir

�

...

fir�

��

(4.60)

62

FIR 2

FIR 3

FIR 4

FIR 1

Figure 4.15: The convolution procedure of the time-variant FIR filter. Different sets of filtercoefficients are indicated by different line style.

For a given input sequence input in bit time, the filter output is:

filterOutput

shuffle

�

in

in

� � in

in

��

F (4.61)

where in �

input � � �� with� ��

input � � �fir� �

bus� � �� , and shuffle �

�� is the following matrix:

shuffle� � � � �

�� if� � div � � � � �� and

� � �� div� � � �

� otherwise(4.62)

The correctness of the time-variant FIR filter designed is checked by adding a set

of equality constraints which specify that all four sets of filter coefficients are equal. After

63

adding the equality constraints, this design gives the same set of filter coefficients as the

simple FIR filter designed in section 4.4, and the same objective value. So the time-variant

FIR filter designed should be at least as good as the simple FIR filter designed in section

4.4. For example, in the case of 4 taps per bit, and same design parameters as for figure 4.10,

time-variant � � � FIR filter has worst-case eye height 86% (vs. 80% for simple � � � FIR

filter). The improvement of the eye height tells us it does help to assign different filter

coefficients to different taps. However the benefit might not be large enough to justify any

extra cost in an implementation.

4.7 Optimized Smoothing Filter

The system structure shown in figure 4.7 naturally leads to the topic of optimized smoothing

filter design. In previous sections, a smoothing filter which simply averages over 3 taps was

used. Test results show that this is a good choice. However it is not optimal. For example, it

is observed from those eye diagrams that weights assigned to those 3 taps shouldn’t be the

same. The first tap and last tap are closer to tap transition and should have smaller values.

The middle tap contributes more to the eye height and should be assigned a larger weight.

Moreover, there is no reason to limit the window size of the smoothing filter to only 3 taps.

The system where the coefficients of both the equalizing filter and the smoothing

filter are taken as variables is not linear because output values of the bus depend on the

product of the coefficients of the two filters. Thus, a LP formulation is no longer possible in

this case. The following simple strategy addresses this problem. Given a set of smoothing

filter coefficients, a set of optimal equalizing filter coefficients and the resulting optimal

objective value can be easily computed with the LP method presented previously. Thus the

LP method can be treated as a function with smoothing filter coefficients as its variables and

the objective value � as its return value. Then a general optimizer (in particular, fmincon()

64

Filters(taps � width)

SmoothingFilter

eye height atbit time 300 ps

� � 7%8 � 3 No smoothing filter 74%

8 � 3simple averagingover 3 data points 81%

8 � 3 optimized 3 tap window 86%8 � 3 optimized 9 tap window 95.5%

Table 4.3: Performance of different smoothing filters with � � � equalizing filters designedby the LP method at 300 ps.

provided by Matlab Optimization Toolbox) can be used to minimize this function on the

variable space of smoothing filter coefficients. This strategy doesn’t guarantee finding the

optimal solution, actually it doesn’t even give an indication about how close we are to the

optimal point. However it does find better smoothing filter coefficients and equalizing filter

coefficients in terms of decreasing the objective value � and larger eye height (see table 4.3).

Table 4.3 shows the effectiveness of this approach. In this example, the bus is a

32-bit bus with a length of 5 cm. The electrical parameters of the bus are: � = 0.066�

/cm,

� = 0.8 pF/cm,�

= 3.99 nH/cm,� � � � = 0.31, � � � � = 0.23. Other simulation parameters are:

taps per bit = 4,�

fir� , � fir

� .

4.8 Summary

In this chapter, I described two methods of linear equalizing filter design: the LSQ method

corresponding to� �

metric and the LP method corresponding to� �

metric. The simula-

tion results presented in section 4.5 demonstrate that the LP method outperforms the LSQ

method and provides effective methods for designing cross-talk canceling equalizing filters

that greatly increase the bandwidth of high-speed digital buses. Based on the linear FIR

filter designs, time-variant FIR filter design and optimized smoothing filter design were

65

presented.

66

Chapter 5

Predictor-Corrector Algorithm with

Model Reduction

As shown in chapter 4, an optimal filter design problem can be formulated into the following

LP problem:� ��

(5.1)

In chapter 4, the LP method is introduced based on the assumption that the LP problem

formulated could be solved. It turns out that the linprog() routine provided by Matlab does

not converge when more than 7 bits are used to design the filter. In the case of large-

scale problems, linprog() implements LIPSOL (Linear Interior Point Solver [21]), which is

a variant of Mehrotra’s predictor-corrector algorithm, a primal-dual interior-point method.

It is known that when approaching the optimal solution, the system gets more and more

ill-conditioned, which may eventually lead to non-convergence. In this chapter, I describe

an approach that I implemented to overcome the ill-conditioning problem and can be used

to solve problems where�

is too large to be given explicitly. This approach employs

Mehrotra’s predictor-corrector algorithm along with a model reduction technique.

67

Section 5.1 introduces Mehrotra’s predictor-corrector algorithm. Then, several ma-

jor issues in implementation are discussed. Section 5.3 is devoted to the model reduction

technique that I used to overcome the problem of ill-conditioning. Although the method

introduced here is used to solve the linear filter design (with�

explicitly given), it could

easily be adapted to solve linear programs with�

given implicitly, by using an iterative

solver instead of Cholesky factorization to solve the linear systems encountered.

5.1 Mehrotra’s predictor-corrector algorithm

Primal-dual interior point methods outperform the simplex method on many larger prob-

lems and perform better than other interior point methods [14]. Among many general al-

gorithmic approaches, the most effective one in practice has proven to be the primal-dual

infeasible-interior-point approach, including a number of variants and enhancements such

as Mehrotra’s predictor-corrector technique [13]. The Matlab function linprog() imple-

ments an variant of this algorithm in the case of large scale problems.

Consider the LP problem in standard form:

� � � � �� (5.2)

where� � � � � , which determines the sizes of other vectors involved. The dual problem

for equation 5.2 is� ��

��

� � � �

� � � (5.3)

It is well known that primal-dual solutions of equation 5.2 and 5.3 are characterized

by Karush-Kuhn-Tucker conditions [14]:

� � � � � � ��

��

��

��

� � � ��

�� (5.4)

68

where

� � ��

�

��

� �

� � � � � � � � � � � �The system of equations 5.4 can be solved by applying Newton’s method and carrying out

a linear search to enforce the non-negativity constraints on � and � . Unfortunately, often

we can only take a small step before the non-negativity constraints get violated. Therefore,

the pure Newton’s method with linear search converges very slowly in this case. Rather

than solving the system of equations 5.4, primal-dual interior point methods introduce the

concept of a central path. The central path is parameterized by a scalar � , and consists of a

set of points that are solutions of the following linear system for � � � :

� � ��

��

��

��

� � � ��

��

��

�� (5.5)

� � � ��

� �The role of � is to enforce that all the complementarity products have the same

values for all indices. Hence, the central path keeps iterates biased towards the interior of

the nonnegative orthant� � � �� . As � approaches 0, the solution of the linear system 5.5

approaches the optimal solution� � � � � � � which is the solution of the linear system 5.4.

In practice, � is defined as the product of a centering parameter � and a complementarity

gap � , where � � � � � � � and � �

�� .

Mehrotra’s predictor-corrector algorithm [13][14] implements the basic ideas de-

scribed above with extra second-order correction. It consists of three major steps.

69

Given an initial point� � � � � �

�with� � � �

�� .

For � � � � � � � � �

Predictor step: At this step, it computes the pure Newton (affine-scaling) direction�� aff �

�� aff �� aff � by solving:

��

��

�� aff

�� aff

�� aff

��

��

� � � � � �

�� (5.6)

where �� , �� are the residuals in primal and dual

feasibility respectively.

Adaptive approach to compute centering parameter � : This parameter is calculated in

terms of the complementarity gap at the current point and the complementarity gap

after a hypothetical step in the affine scaling direction is taken. The step size in the

affine scaling direction is calculated by

��aff

�� aff�� aff� �

� �aff

�� aff��

� �� aff� � (5.7)

� aff � � � � �aff

� � aff � � � ��

aff�

� aff � � �Then set the centering parameter to �

�� aff � �

� �. Thus, the centering parameter

is small when good progress can be made in the affine direction and large when the

affine direction produce little improvement and more centrality is needed. This is

chosen to trade off between the twin goals of reducing � and improving centrality.

A corrector step: solves the following equations to get a corrected, centered step direction.

It is essentially a step based on the Taylor series expansion of the complementarity

equations [15].

70

�

� � � ��

��

��

��

��

�

� ��

� � � � � � � Diag�� aff � Diag

�� aff � � � � � �

��

(5.8)

Compute step size similarly as in equations 5.7, and update� � � � � � � � � � � � �

� � � � � � � � � �� .

5.2 Implementation Details

With above framework, we still need to specify the following: (1) the initial point and the

stopping criteria; (2) how to solve linear systems 5.6 and 5.8.

5.2.1 Starting and Stopping

The starting point selection is based on [12]. Matlab lsqr() solver solves the system� � �

and makes an initial estimate to the primal variable � , denoted as � . Then define �

� ��

� � �� and � � � � � � � , where � � � denotes

� norm.

Then, for each � � � � � �� , set ��

�and �

��

��. At last compute

�

given� � �

�

�.

Stopping criterion is the standard one [21]:

error � ��

� � ��

��

��

� ��

�� tol (5.9)

where �� , �� are the residuals in primal and dual feasibility. In this implementation,��

is

� � � � by default.

71

5.2.2 Solving the linear systems

The special structure of the left-hand-side matrix in 5.6 and 5.8 allows us to reformulate

them as systems with positive definite matrices [14]. For example, 5.6 can be reformulated

into the following system:

��

� ��

� � ��

� ��

� � � � � � ��

�

(5.10)

with� � ��

� � � � �

and�

� �

�� . Cholesky factorization is used to solve

the first equation in this linear system. Then�

� and�� are obtained. If we don’t know

anything about�

explicitly except its dimensions, iterative methods can be used to solve

this linear system.

5.3 Ill-conditioning and Model Reduction

Each iteration of the predictor-corrector algorithm involves solving linear systems whose

left-hand-side matrix is the same as��

in equation 5.10. As we approach the optimal

point, either � � or � � (for each � � � � � �� ) decreases to zero. Thus the elements of

the diagonal matrix�

take on both huge and tiny values. For this reason, ill conditioning

often occurs during the final stages of the predictor-corrector algorithm. In practice, this

prevented linprog() from converging to a point that satisfied the error tolerance for the

linear programs arising in filter design using the LP formulation presented in chapter 4. My

implementation uses two techniques to handle this ill conditioning: hopping to the optimal

point and model reduction.

� Hop over to the optimal

72

If we are really close to the optimal, the LP solver can hop over to the nearest vertex

on the polytope boundary and check if it is optimal. The simplest thing to do is to

set the smallest� � � components of � to 0 and solve

� � �for the remaining

components of � . As we know that � variables in the primal form reflect the marginal

costs of corresponding constraints in the dual form, � � � means the � � �

constraint

in the dual form is non-essential (the optimal solution is not affected by perturbations

of this constraint). So we solve the remaining part of� �

� � for � . Then we can

get values for the slack variables � . After we get the solution, we need to check for

its feasibility and optimality (i.e. make sure all non-negativity constraints on � and

slack variables � are satisfied).

I’ve tried this for linear FIR filter design problems. The solutions I got had negative

� and � values. This suggests that although we are close to the optimal (system gets

ill-conditioned), we are not close enough to be able to identify the optimal vertex.

� Model Reduction

Although we are not close enough to be able to hop over to the optimal solution di-

rectly, we know that we are in a region close to the optimal vertex. In this region, we

should be able to identify some of the non-essential constraints (although not all of

them). Since � variables of the primal form reflect the marginal costs of correspond-

ing constraints of the dual form, the � � s corresponding to these ready-to-identify

non-essential constraints are very small and contribute to the ill-conditioning of the

linear system. After we identify them, by setting those � � s to 0 and ignoring their

corresponding constraints in the dual form, we reduce the original LP problem to a

smaller size problem and with a smaller condition number. Then we solve the smaller

problem as before. If the total error gets smaller than the tolerance, we stop. If sys-

73

tem gets ill-conditioned, we do model reduction again. We keep doing this till either

the total error is smaller than the tolerance or all� � � non-essential constraints are

identified.

– Empirical criterion for � � being small:

If � ��

, set � � to 0 and label � � �

constraint of the dual form

as non-essential. The choice of� � � � was simply to make sure that � � was

relatively small.

– When to do model reduction?

In this implementation, when the linear system gets ill-conditioned, model re-

duction will be done. cond() provided by Matlab is used to calculate the condi-

tion number of the matrix. A default threshold� � �

is used to determine whether

model reduction is needed. In the case that we don’t have the explicit form of

� (right-hand-side matrix) available, we have no way to calculate its condition

number. However the relative residual of iterative solver minres() is a good in-

dication of how ill-conditioned the system is. It appears that for a fixed number

of iterations, the more the system gets ill-conditioned, the larger the relative

residual is.

– Model reduction guesses are verified at the end by plugging the final solution to

the original problem and checking optimality requirements.

5.4 Results

Suppose 12 bits are used to design an � � � linear FIR filter for a 32-bit bus at 400 ps

bit time. This linear FIR filter design problem can be formulated as a LP problem with

768 inequality constraints and 408 variables. It is naturally in dual form. In this section, we

74

Residuals: Primal Dual Duality TotalInfeasibility Infeasibility Gap Relative� � � � � � � ��

��

ErrorIter 0: 2.74e+03 9.95e+01 3.96e+05 1.00e+03Iter 1: 2.13e-09 5.82e+01 1.46e+05 4.11e+01Iter 2: 5.15e-08 4.83e-02 4.39e+02 1.00e-00Iter 3: 1.11e-07 5.10e-04 3.64e+00 9.99e-01Iter 4: 6.41e-04 1.87e-04 1.02e+00 4.77e-01Iter 5: 1.27e-02 3.72e-05 3.40e-01 2.26e-01

Exiting: One or more of the residuals, duality gap, or total relative errorhas grown 100000 times greater than its minimum value so far: the dual ap-pears to be infeasible (and the primal unbounded). (The primal residual �

TolFun=1.00e-08.)

Table 5.1: linprog() iteration display

Residuals: Primal Dual Duality Total #Infeasibility Infeasibility Gap Relative constraints� � � � � � � ��

��

Error reducedIter 0 0.02 10.8843 194.059 3.5854Iter 1 0.0004 0.21769 7.7587 1.0515Iter 2 8e-06 0.050589 1.871 0.9993Iter 3 1.8542e-07 0.014951 0.5932 0.59672Iter 4 8.1055e-08 0.0048963 0.2152 0.21635Iter 5 4.5586e-08 0.00016507 0.089084 0.089123Iter 6 8.3459e-09 7.0761e-05 0.037025 0.037042Iter 7 1.457e-09 1.9964e-05 0.011225 0.01123Iter 8 4.3047e-10 7.1626e-06 0.0042428 0.0042445Iter 9 1.4528e-10 3.0168e-06 0.0018633 0.001864Iter 0 0.00028457 7.0108e-07 0.00055431 0.00083904 17Iter 0 0.00054624 1.8242e-07 0.00020812 0.00075441 26Iter 0 0.00015359 6.6702e-08 3.7658e-05 0.00019127 39Iter 0 0.00026999 1.4576e-08 7.4835e-06 0.00027748 78Iter 0 4.4736e-05 4.2838e-09 1.603e-06 4.634e-05 93Iter 0 2.7846e-05 1.6497e-10 6.0702e-08 2.7906e-05 75Iter 0 1.1211e-05 3.362e-12 1.2376e-09 1.1212e-05 13Iter 1 2.2422e-07 6.7534e-14 2.2472e-11 2.2424e-07

Table 5.2: Iteration display of our approach: Mehrotra interior-point method with modelreduction.

75

show the results obtained with linprog() and our approach. The linprog() iteration display is

shown in table 5.1. The linprog() routine doesn’t converge on this problem! Table 5.2 shows

the iteration display of the method presented in this chapter. This method does converge!

There are 7 model reductions along the way. The problem size was reduced by 17, 26, 39,

78, 93, 75 and 13 respectively at each time. Altogether 341 constraints were reduced.

76

Chapter 6

Conclusions and Future Work

This thesis explores the effectiveness of equalizing filters in cross-talk cancellation for high-

speed, off-chip buses. It demonstrates that linear programming provides effective methods

for designing cross-talk canceling equalizing filters that greatly increase the bandwidth of

high-speed digital buses. For 5 cm long 32-bit wide PCB buses that are closely spaced (75

� m width and 75 � m separation), with simple full-swing voltage signaling method, system

with � � � equalizing filters can operate at 4.1GHz. Without equalizing filters, such buses

can only operate at 1.3GHz, 1.47GHz with pre-emphasis but no cross-talk cancellation.

In this thesis, a coupled distributed RLC interconnect model is first constructed

and validated. Based on the bus model, the first technique used to design equalizing fil-

ters is the least squares method. The least squares method produces equalizing filters that

greatly improve signal integrity and minimum bit rate at which signals can be received

correctly. Next, because the whole transmission network is linear, the equalizing filter de-

sign problem can be formulated into a linear programming problem which uses the� �

metric corresponding to the traditional eye height measurement of signal integrity. Equal-

izing filters designed with the linear programming approach have better performance than

77

the filters designed with the least squares method (see section 4.5). Another advantage of

the linear programming method over the least squares method is that it does not depend

on pseudo-random sequences and guarantees worst-case performance. The simulation re-

sults presented in chapter 4 show that the equalization technique is a promising method in

cross-talk cancellation for high-speed buses.

Moreover, scaling trends of the VLSI technology favor this approach. Long buses

cost more and support lower data rates. The cost of the bus justifies added circuitry on

the chip. The lower data rate provide more time for the filtering operations. Furthermore,

improvements in chip fabrication are producing smaller and faster circuits for implementing

the filter while buses remain big and slow. This also contributes to the favorability of adding

more sophisticated equalizing filters.

6.1 Future work

This work has demonstrated the effectiveness of equalization technique in cross-talk can-

cellation for high-speed PCB buses by simulation. A natural and necessary work in the

future is circuit design, PCB fabrication and test. Besides this, based on these results, in the

future, the following ideas should be explored to further improve the performance of the

equalizing filter design:

� Figure 4.6 shows the impulse response of the bus considered in this thesis. The source

end of the bus is terminated with resistors whose resistance equals the approximated

input impedance of the bus:� ��

However the bus traces are not perfect LC lines. Its resistance, mutual inductance

and coupling capacitance makes the input impedance calculation inaccurate. This is

78

shown in figure 4.6 by the reflections of the impulse response. Because of the limit in

filter length, these reflections degrade the performance of the equalizing filters. Op-

timization routines, such as Matlab’s fminunc can be used to minimize the amplitude

of reflections by tuning the input resistor.

� With current ADC/DAC technology, more taps per bit can be implemented at over

1GHz bit rate. It was observed that by increasing the number of taps per bit, the

performance of equalizing filters for longer buses can be improved substantially. This

issue should be investigated more thoroughly and systematically in the future.

� skin effect

Because the primary goal of this thesis is cross-talk cancellation, high-frequency at-

tenuation caused by skin effect was not taken into account in the bus model. How-

ever, skin effect is one of the major components that limit the off-chip signaling above

1GHz [2]. It has been shown that pre-emphasis for serial links greatly improves their

bandwidth [2]. In the future, skin effect should be incorporated into the bus model.

The method presented in this thesis can be used directly with the bus models that

include the skin effect.

� Differential signaling

Differential signaling is commonly used to achieve high speed signaling and improve

signal integrity. The down-side of the differential signaling method is that it doubles

the number of wires. Is the differential signaling method inherently the best one

given� � wires for transmitting � signals? Moreover, can we improve performance

by adding � wires to a � -bit bus to transmit � signals? The linear programming

method for equalizing filter design can be adapted to answer these questions.

� Multi-level signaling

79

Multi-level signaling has been a great deal of recent interest. It uses multiple voltage

levels and hence has lower fundamental frequencies than the simple binary signaling

at the same data rate. Many off-chip communication links employ multi-level signal-

ing to achieve higher performance with limited bandwidth. Methods presented in this

thesis can be easily adapted to design equalizing filters for multi-level signaling.

� Non-linear filter design

Because filters are relatively narrow, look-up tables may be a simple and practical im-

plementation. By exploiting the symmetries of the bus, these tables can be made quite

compact. There is no reason that the table entries must correspond to linear combina-

tions of the inputs. Thus, non-linear filters may be easy to implement and they may be

able to handle the apparently rare worst-case input patterns more effectively. I started

to explore this idea. For example, each entry of a look-up table can correspond to the

filter output for a particular type of input. Input types can be divided according to the

number of transitions. Given an input type on a wire at the current bit, input types

on its adjacent wires and preceding bit times are constrained. The linear equalizing

filter design suggests long history and few future bits need to be considered in order

to get a good filter. The chain effect of the input type constraints makes the non-

linear filter design impractical to find the worst-case input. Using more sophisticated

optimization method may solve this and is a topic for future work.

80

Bibliography

[1] P.M. Crespo and M.L. Honig. Pole-zero decision feedback equalization with a rapidlyconverging adaptive IIR algorithm. IEEE Journal of Selected Areas in Communica-tions, 9:817–829, 1991.

[2] W.J. Dally and J.W. Poulton. Transmitter equalization for 4-GBPs signaling. IEEEMicro, 1:48–56, 1997.

[3] W.J. Dally and J.W. Poulton. Digital Systems Engineering. Cambridge UniversityPress, 1998.

[4] A. Fiedler, R. Mactaggart, J. Welch, and S. Krishnan. A 1.0625Gbps transceiver with2x-oversampling and transmit signal pre-emphasis. In Proc. of ISSCC97, pages 238–239, 1997.

[5] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 1996.

[6] M.L. Honig, P. Crespo, and K. Steiglitz. Suppression of near- and far-end crosstalkby linear pre- and pose-filtering. IEEE Journal of Selected Areas in Communications,10:614–629, 1992.

[7] M.L. Honig, K. Steiglitz, and B. Gopinath. Multichannel signal procesing for datacommunications in the presence of crosstalk. IEEE Transactions on Communications,38:551–558, 1990.

[8] M. Horowitz, C. Ken, and S. Sidiropoulos. High speed electrical signalling: Overviewand limitations. IEEE Micro, 18:12–24, 1998.

[9] L. Jackson. Digital Filters and Signal Processing. Kluwer Academic Publishers,1996.

[10] H. Johnson and M. Graham. High-Speed Digital Design:A Handbook of Black Magic.Prentice Hall, 1993.

81

[11] L. Lu and V. Ungvichian. Crosstalk versus interline space in ultra high speed digitalPCBs. In IEEE International Symposium on EMC, pages 629–634, 1998.

[12] I. J. Lustig, R.E. Marsten, and D.F. Shanno. On implementing mehrotra’s predictor-corrector interior point method for linear programming. SIAM Journal on Optimiza-tion, 2:435–449, 1992.

[13] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAMJournal on Optimization, 2:575–601, 1992.

[14] J. Nocedal and S. Wright. Numerical Optimization, pages 395–417. Springer Seriesin Operations Research, Springer Press, 1999.

[15] V. Rico-Ramirez and A.W. Westerberg. Interior point methods on the solution ofconditional models. Technical report, Carnegie Mellon University, 1997.

[16] J. Salz. Digital transmission over cross-coupled linear channels. Bell System Technol-ogy Journal, 64:1147–1159, 1985.

[17] V. Stojanovic, G. Ginis, and M.A. Horowitz. Transmit pre-emphasis for high-speedtime-division-multiplexed serial-link transceiver. IEEE Transactions on Communica-tions, 38:551–558, 2001.

[18] S.K. Tewksbury. Microelectronic Systems Interconnections. IEEE Press, 1995.

[19] C.K. Yang, V. Stojanovic, S. Modjtahedi, M.A. Horowitz, and W.F. Ellersick. A serial-link transceiver based on 8-GSamples/s A/D and D/A converters in 0.25- � m CMOS.In IEEE International Conference on Communications, pages 1934–1939, 2002.

[20] T.S. Yeo, C.S. Ng, M.S. Leong, and P.S. Kooi. Interline coupling of ultra-high-speedpulse propagation on PCB. IEEE Transactions on Electromagnetic Compatibility,35(3):401–404, August 1993.

[21] Y. Zhang. Solving large-scale linear programs by interior-point methods under theMATLAB environment. Technical Report TR96-01, University of Maryland, July1995.

82