Implementation of a FPGA-based Interface to a High Speed ...

76
Institut für Parallele und Verteilte Systeme Abteilung Parallele Systeme Universität Stuttgart Universitätsstraße 38 D70569 Stuttgart Masterarbeit Nr. 2993 Implementation of a FPGA-based Interface to a High Speed Image Sensor Thomas Grob Mai 2010

Transcript of Implementation of a FPGA-based Interface to a High Speed ...

Page 1: Implementation of a FPGA-based Interface to a High Speed ...

Institut für Parallele und Verteilte Systeme Abteilung Parallele Systeme

Universität Stuttgart Universitätsstraße 38

D70569 Stuttgart

Masterarbeit Nr. 2993

Implementation of a FPGA-based Interface to a High Speed Image Sensor

Thomas Grob

Mai 2010

Page 2: Implementation of a FPGA-based Interface to a High Speed ...

Abstract

This thesis is part of a project in which a high speed camera is developed. Subject of this work

is the interconnection of an image sensor LUPA-3000 and a FPGA. The FPGA is connected to the

multi channel Low-Voltage Differential Signaling (LVDS) data interface and handles the calibration

of the individual channels. The LVDS receiver interface can handle asynchronous data signals and

synchronizes them for subsequent processing. This complex LVDS receiver design is discussed and its

functionality explained in detail. For testing, a VHDL design was developed including an asynchronous

LVDS transmitter that transfers data to the receiver component through wires which interconnect the

FPGA IOs. After simulating the entire design it was tested in practice on a FPGA evaluation board.

This communication system was verified utilizing a ChipScope logic analyzer.

The interface design which is connected to the image sensor includes the receiver component as well

as an unit that provides an easy to use configuration interface for programming the image sensor.

Besides, the exposure control is realized within the VHDL design.

To evaluate the hardware design, that is connected to the image sensor, a SystemC testbench was

developed, that includes a software model of the LUPA-3000 image sensor to verify the functionality

of the overall design.

Page 3: Implementation of a FPGA-based Interface to a High Speed ...
Page 4: Implementation of a FPGA-based Interface to a High Speed ...

Contents

1. Introduction 7

1.1. Conceptual formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2. Document configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2. Low-Voltage Differential Signaling 9

2.1. Technical specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1. Differential signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3. LVDS Communication 13

3.1. System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1. FPGA evaluation board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2. Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1. Serializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.2. Asynchronous transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3. Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3.1. Data and clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.2. Data alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.3. Compensation capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4. Design synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.4.1. Clock generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4.2. Pin mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5. The ChipScope logic analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.6. Test & verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.6.1. Test configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4. Image sensor - LUPA-3000 37

4.1. Sensor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.1.1. Pixel architecture and timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2. Serial Peripheral Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2.1. SPI registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3. Readout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.1. Cyclic redundancy check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.4. Software model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

1

Page 5: Implementation of a FPGA-based Interface to a High Speed ...

Contents

5. Data and control interface for the LUPA-3000 image sensor 49

5.1. Serial peripheral interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.2. Exposure control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.1. Timing parameter calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3. LVDS receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4. Clock domain crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6. SystemC test environment 59

6.1. SystemC transport delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.2. Image Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.3. Stimulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.4. Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

7. Conclusion 65

7.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

A. Abbreviations and Acronyms 67

B. FPGA pin mapping 69

C. SystemC and VHDL co-simulation 71

Bibliography 73

2

Page 6: Implementation of a FPGA-based Interface to a High Speed ...

List of Tables

3.1. Data path timing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2. Clock path timing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3. Data path delay overview for Virtex5 XC5VSX50T with -1 speed grade . . . . . . . . 23

3.4. Clock path delay overview for Virtex5 XC5VSX50T with -1 speed grade . . . . . . . . 23

3.5. Delay settings for all LVDS channels, assuming a clock speed of 400 MHz . . . . . . . 32

4.1. Selection of important SPI adresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2. Sync channel values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1. SPI register address space extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

B.1. Pin mapping of the expansion connectors on the ML506 evaluation board . . . . . . . 70

3

Page 7: Implementation of a FPGA-based Interface to a High Speed ...

List of Figures

2.1. LVDS communication line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2. Cross section of a differential wire pair, with its coupled fields . . . . . . . . . . . . . . 11

2.3. Voltage level of a differential signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4. Parallel clock SerDes with 8:1 serialization . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1. Schematic description of the LVDS communication system . . . . . . . . . . . . . . . . 14

3.2. Virtex5 FPGA evaluation board ML506 . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3. Block diagram of the LVDS transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4. 8:1 serializer, consisting of master and slave OSERDES component . . . . . . . . . . . 18

3.5. Example operation of the byte crusher component for a delay of 10 bits. . . . . . . . . 19

3.6. Block diagram of the LVDS receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.7. Data and clock path timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.8. Timing window for sampling clock edge . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.9. Sample data eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.10. Five steps of the bit alignment process . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.11. deserializer, consisting of master and slave ISERDES component . . . . . . . . . . . . 27

3.12. Clock edge adaption for low speed signals . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.13. Delay taps counter settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.14. Data output of ISERDES and the receiver FIFO outputs . . . . . . . . . . . . . . . . 35

3.15. Data inputs of the comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1. LUPA-3000 image sensor, copyright by Cypress Semiconductor Corp . . . . . . . . . . 37

4.2. Column multiplex scheme of the sensor architecture . . . . . . . . . . . . . . . . . . . 38

4.3. Pixel schematic of a 6-T pixel cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4. Pixel timing during frame overhead time (FOT) . . . . . . . . . . . . . . . . . . . . . . 40

4.5. Serial Peripheral Interface (SPI) read timing . . . . . . . . . . . . . . . . . . . . . . . . 40

4.6. SPI write timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.7. Pipelined operations, integration and readout are done in parallel . . . . . . . . . . . . 43

4.8. Exposure and readout timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.9. Sync channel and data channel values during image readout . . . . . . . . . . . . . . . 44

4.10. Circuit for CRC generation with the polynomial implemented in LUPA-3000 . . . . . 45

4.11. Structural diagram of the LUPA-300 SystemC model . . . . . . . . . . . . . . . . . . . 47

5.1. Structural description of the VHDL design . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.2. Finite state machine of the SPIwrapper module . . . . . . . . . . . . . . . . . . . . . . 52

4

Page 8: Implementation of a FPGA-based Interface to a High Speed ...

List of Figures

5.3. First finite state machine of the exposure control module . . . . . . . . . . . . . . . . 53

5.4. Timing diagram of the frame timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.5. Second finite state machine of the exposure control module . . . . . . . . . . . . . . . 54

5.6. Excel sheet for timing parameter calculation . . . . . . . . . . . . . . . . . . . . . . . . 56

5.7. Basic synchronizer circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.1. Complete SystemC testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2. Abstract illustration of a practical image readout . . . . . . . . . . . . . . . . . . . . . 64

5

Page 9: Implementation of a FPGA-based Interface to a High Speed ...
Page 10: Implementation of a FPGA-based Interface to a High Speed ...

1Introduction

This thesis is part of a project, in which a high speed camera is developed. This camera will have

a very high resolution of 1710 x 1696 pixels, with a frame rate of 480 frames per seconds. These

requirements are quite ambitious, because the amount of data that has to be processed is very high

(13.3 GBit/s).

The core components of the camera are a high speed image sensor and a FPGA. The FPGA design

controls the image sensor through a configuration interface as well as exposure signals. In addition, it

has to process the incoming data stream of the image sensor. The LVDS receiver component is tested

in practice on a FPGA evaluation board. Moreover, a VHDL design is developed that controls the

image sensor, provides an external programming interface and includes the LVDS receiver component.

This design is tested with a software model of the image sensor, that was additionally developed based

on the image sensor’s data sheet. The task definition for this thesis is given in the following section.

1.1. Conceptual formulation

Subject of this thesis is the implementation and verification of an interface module on a FPGA that

connects to a high-performance image sensor. The module controls and configures the image sensor

and receives image data from the image sensor. These functionalities require the module to connect

with the following three interfaces of the image sensor:

• Configuration interface: Serial Peripheral Interface

• Control interface: Group of TTL-Signals without clock that control the capture process of an

image

• High-speed data interface: Based on 34 LVDS connections.

7

Page 11: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 1. Introduction

A major problem of high-speed data interfaces with high clock frequencies is that signals of the interface

get out of sync. This asynchronism is caused by length variations of conductor path, tolerances in the

chip package and variances of the voltage levels between signals. In order to counteract, the connection

must be re-synchronized at the receiver side by using delay elements for each signal. Besides the

interface module, the FPGA will contain further logic for image processing. This logic belongs to a

clock domain with a different frequency than the clock domains of the configuration and data interface

of the image sensor. Therefore, data needs to be transferred between different clock domains. The

verification of the interface module should prove

• Functionality of the configuration and control interface

• Functionality and real-time performance of the data interface

The verification of the control and configuration functionality should be based on a model of the

image sensor which has to be developed using a high level language (e.g., C, Matlab). However,

the verification of the data interface requires implementing a transmitter module on the FPGA. The

transmitter module has to be connected to its counterpart using conductor paths outside the FPGA.

1.2. Document configuration

The second chapter of this thesis deals with the basics of a Low-Voltage Differential Signaling (LVDS)

interconnection. Details about the technical specification are discussed and typical architectures in-

volving multiple LVDS connections are explained.

The following chapter introduces a VHDL implementation of a LVDS transmitter and receiver that

use multiple LVDS interconnections. Both components are described in detail, especially the complex

algorithms in the receiver that ensure correct sampling of asynchronous input signals. The transmitter

and receiver are instantiated in a Toplevel design which includes some additional blocks that generate

data that is transmitted through the LVDS interface and received and compared with the originally

sent data. This design is synthesized and tested on a FPGA evaluation board. The transmitter and

receiver component are connected using wires between the FPGA pins.

The image sensor LUPA-3000 is described in Chapter 4. The general architecture and the timing of

an image readout is discussed. Moreover, the serial peripheral interface, which is used to program the

image sensor, is presented. And finally the software model of the LUPA-3000, which was developed

in SystemC, is introduced. SystemC is a C++ class library that can be used to model hardware

components in a high level language.

In Chapter 5 the VHDL interface implementation, that is connected to the image sensor, is discussed.

It consists of an adapted version of the LVDS receiver and some more components that control LUPA-

3000 image sensor.

The complete system, with the LUPA-3000 software model and the VHDL interface is connected in

a testbench, which is discussed in Chapter 6. The testbench includes a stimulator unit and an image

builder that reconstructs an image out of the data provides by the LVDS receiver.

8

Page 12: Implementation of a FPGA-based Interface to a High Speed ...

2Low-Voltage Differential Signaling

Low-Voltage Differential Signaling (LVDS) is commonly used as high speed point to point connection

over small distances. One LVDS channel uses always two interconnections, a positive (p) and a negative

(n) one, which transmit contrary signed voltages. The voltage difference between both interconnections

represents a logical state.

This interface standard for high speed data transmission which uses low voltage signals was developed

and standardized in the mid 90’s. It only describes the physical layer and not the upper layer pro-

tocols. It is commonly used to interconnect components within one circuit board where high speed

transmission is required. LVDS is widely used in different applications e.g., computers where the PCI

Express bus utilizes this technique or many displays are controlled through this interface. Other fields

of application are industrial vision, medical engineering and automotive electronics.

LVDS is standardized in ANSI/TIA/EIA-644-1995 and IEEE Std 1596.3-1996 [IEE96]. With these

standards National Semiconductor has developed a textbook with a lot of detailed explanations and

design guidelines [Nat08].

2.1. Technical specification

There are many differential signaling techniques available and some devices claim to have a LVDS

interface but infringe the ANSI/TIA/EIA-644 standard. Regarding the LVDS standard the maximum

data rate is 3.125 Gbps1 at a voltage swing of ± 350 mV.

1Gigabits per second

9

Page 13: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 2. Low-Voltage Differential Signaling

2.1.1. Differential signaling

A typical LVDS communication line is unidirectional, so there is one driver and one receiver. The

data are always transmitted from the driver to the receiver. A communication line consists of two

conductors which are driven with contrarily signed voltages. The evaluation of the voltage difference

has the advantage that disturbances during transmission have less influence on the transmitted signal,

because both conductors are influenced equally, thus excellent noise immunity is given.

Figure 2.1 clarifies the typical structure of a LVDS communication. The driver consists of a current

source that delivers constantly 3.5 mA. Depending on the bit that should be sent, the driver switches

the voltage on the wires.

100 Ω

Receiver

Current sourceDriver

Figure 2.1.: LVDS communication line

A termination resistor of 100 Ω across the receiver inputs terminates the transmission line. This

termination resistor is equivalent to the line impedance and avoids signal reflection. The nominal

voltage drop at the termination resistor respectively receiver input is 350 mV.

The capacitive coupled (AC coupled) field between both conductors is displayed in Fig. 2.2. It is

responsible for the excellent noise immunity, because electro magnetic disturbances influence both

conductors similarly. Because the receiver only evaluates the difference of the signal, a disturbance

that is present in both conductors poses no problem. More details about the coupling and the line

termination are discussed in [Nat08, chapter 4].

A sample signal with corresponding voltage level (±350 mV) on the conductors is shown in Fig. 2.3.

A change of the transmitted logical state changes only the current direction, not its amplitude. The

color of the voltage level is equivalent to the conductor wire in Fig. 2.1. Further information about

the electrical specification is given in [IEE96]. The maximum distance for an LVDS communication

line is 10 m, but this distance can only be bridged when low loss cables are used. Moreover LVDS is

often used to connect two devices within one circuit board, so large distances are not the main focus

of LVDS.

10

Page 14: Implementation of a FPGA-based Interface to a High Speed ...

2.2. Architecture

Figure 2.2.: Cross section of a differential wire

pair, with its coupled fieldsFigure 2.3.: Voltage level of a differential signal

There are other LVDS topologies available that deal with multiple drivers and receivers. This is

known as Multipoint LVDS (M-LVDS) and is standardized in ANSI/TIA/EIA-899. Another LVDS

type, called Bus LVDS (B-LVDS), focuses on multiple receivers, but is not standardized. Since these

Multipoint topologies are not important in the following, further details are omitted here, but can be

found in [Nat08].

2.2. Architecture

A common architecture which is used in combination with LVDS is the so-called Parallel Clock SerDes.

In this architecture multiple LVDS channels are used in parallel, where one channel transmits the clock

signal and the others are used for data. The fact that the clock signal for the data channel sampling

is transmitted in parallel classifies the system as source synchronous. The SerDes in the architecture

name stands for Serializer/Deserializer. Each LVDS channel takes the data of a multiplexer which

serializes a specified number of parallel bits. On the receiver side the serial data stream is parallelized

again. A schematic describing this architecture is available in Fig. 2.4.

In this example, each LVDS channel serializes 8 parallel bits that are transmitted. Depending on the

system the serialization direction can be most significant bit (MSB) or least significant bit (LSB) first.

Here, the MSB is transmitted first. The receiver will use the received clock signal to sample the data

channels. For this reason it is necessary to generate a clock signal which is as fast as the serial data

on a LVDS channel. Therefore, the clock is multiplied by a factor of 8 at single data rate (SDR) or

by 4 when double data rate (DDR) is used. DDR uses rising and falling edges for sampling, so the

clock speed can be halved compared to SDR. At the receiver side the signal of the data channels is

sampled and parallelized again. Furthermore, the divided clock signal is again synchronous to the

parallel data. The number of LVDS channels can be chosen arbitrarily depending on the problem for

which this data interface should be utilized.

There are also other architectures available e.g., embedded clock SerDes where the rising clock edge

is inserted periodically into the data stream and is used by the receiver for synchronization. Another

architecture is known as 8b/10b SerDes, where 8 data bits are coded to a 10 bit word and an additional

11

Page 15: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 2. Low-Voltage Differential Signaling

P

N

P

N

P

N

P

N

MUX 8:1

D0

D1

D2

.

.

.

D7

MUX 8:1

D0

D1

D2

.

.

.

D7

MUX 8:1

D0

D1

D2

.

.

.

D7

parallel clock

serializer LVDS driver

clockmultiplier

(*8)

D0 D1 ... D6 D7 N

N

N

DEMUX 8:1

D0

D1

D2

.

.

.

D7

DEMUX 8:1

D0

D1

D2

.

.

.

D7

DEMUX 8:1

D0

D1

D2

.

.

.

D7

LVDS buffer

deserializer

P

D0 D1 ... D6 D7P

D0 D1 ... D6 D7P

N

parallel clockclockdivider

(/8)

P

Figure 2.4.: Parallel clock SerDes with 8:1 serialization

comma character is used for synchronization. In this architecture no clock is transmitted, the receiver

uses its own clock signal for sampling. Additional architectures and more details about their design

are given in [Nat08].

12

Page 16: Implementation of a FPGA-based Interface to a High Speed ...

3LVDS Communication

To connect the LUPA-3000 image sensor with the Field Programmable Gate Array (FPGA), a LVDS

receiver design was developed in this work. The image sensor contains a parallel clock SerDes trans-

mitter, similar to the one introduced in the previous chapter, but with 34 LVDS channels. A complete

FPGA-to-FPGA LVDS communication system was developed, which fulfills the requirements estab-

lished by the LUPA-3000 image sensor. The receiver part of the design will later on be adapted to

be connected with the image sensor. Furthermore, the system contains a test unit, which verifies the

functionality of the communication line and especially of the receiver component to exclude errors.

This chapter deals with a demonstration of the LVDS communication system in practice. First, the

VHDL implementation of the LVDS transmitter and the receiver is introduced, synthesized and tested

on a FPGA evaluation board. During this process several problems occur that have to be dealt with

to establish a working connection. The implementation of the transmitter and receiver is based on

[Bur06] and the corresponding code examples, but extended and changed to fulfill the needs of the given

problem. For example, the number of LVDS channels is increased to 34 and the sampling process in

the receiver is adapted to compensate delays among the channels. It is explicitly distinguished between

existing and new work in the following.

3.1. System description

A Toplevel design was developed, which contains the LVDS transmitter and receiver component and

some additional modules that are used to compare the data values before and after transmission. The

output ports of the transmitter and the input ports of the receiver are connected inside the testbench

for the simulation or by wires between the FPGA I/Os when testing practically. A block diagram of

the overall system is shown in Fig. 3.1.

13

Page 17: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

clock

sync

data

...

LVDSReceiver

LVDSTransmitterF

IFO

Data Generator

Data Comparator

clock

sync

data

...

sync converter

data

sync

FIFO sync

FIFO channel 0

FIFO channel 1

FIFO channel 2

FIFO channel 3

FIFO channel 4

FIFO channel 5

Cross clkdomain

Trainingdone

48

48

48

8

8

8

8

8

8

8

Figure 3.1.: Schematic description of the LVDS communication system, the out ports of the transmitter

and the input ports of the receiver are connected inside a testbench or through wires among

the FPGA I/Os

The data generator creates data bytes, one for each channel, that are written to a FIFO and transmit-

ted through the LVDS channels. The LVDS receiver reads the serial data from the channels and writes

the bytes in the corresponding FIFO. Finally, the data comparator reads all FIFOs simultaneously

and compares the sent with the data received. Transmission errors are indicated by a signal.

In the beginning, the transmitter sends a training pattern. This is a predefined byte that is known

by the receiver. The receiver calibrates each channel and sets the training done signal when each

LVDS channel performed the bit and word alignment successfully. During bit alignment the incoming

serial signal is delayed until the sampling clock edge is placed exactly in the middle between two bit

transitions. This has to be done because jitter can decrease the time where a bit is stable. Moreover,

the word alignment process is responsible for correctly concatenating adjacent bits to one byte at the

parallel side.

A set training done signal sent by the receiver indicates indicates the end of the calibration phase.

It enables the data generator and the sync converter module. The data generator starts producing

arbitrary bytes (counting from 0 to 255 in a loop) for each data channel and an extra byte for the

sync channel. The generated data and the sync values are written to the LVDS transmitter, the sync

converter and the FIFO. The sync channel is equivalent to a data channel, the only difference is that

14

Page 18: Implementation of a FPGA-based Interface to a High Speed ...

3.1. System description

the sync channel is interpreted as status information. Every sync value different from 0 indicates valid

data on the data channels. The sync converter activates the write enable signal of the FIFO for each

valid data value.

The receiver uses the incoming LVDS clock signal to sample the incoming data signals. When all

channels are calibrated independently, the training done signal is risen. The whole receiver module is

driven by the incoming LVDS clock. In Fig. 3.1 the part of the design which is connected to this clock

is placed inside the green box, which indicates an independent clock domain. The incoming data from

the receiver is written into the clock domain crossing FIFO of each channel. Moreover, these FIFOs

are used to compensate the skew among the data channels.

Back in the transmitters clock domain the comparator component reads all FIFOs from the receiver.

When the sync value is valid, the FIFO from the data generator is read and compared with values

from the receiver. Both data vectors are only compared when the sync signal of the receiver is valid,

otherwise more data is read from the receiver’s FIFOs until sync is valid. The comparator has a 6 bit

output, these bits represent the correct functionality of the data channels. During the initialization

phase these bits are set. When the comparator detects an error, the corresponding bit of the channel

is reset to 0.

3.1.1. FPGA evaluation board

To test and verify the functionality of the VHDL implementation, a FPGA evaluation board (Xilinx

ML506) is used. This board consists of a Virtex5 FPGA1 and further peripheral hardware. It has

several expansion headers that are connected with LVDS-capable differential pin pairs and some other

I/Os of the FPGA. The output ports of the transmitter and the input ports of the receiver are mapped

to these pin connectors. Finally some wires are used to connect the LVDS in and outputs with each

other. Fig. 3.2 shows the evaluation board with the attached wire interconnection. The ML506 has

one expansion header with 16 pairs of differential signal connections to the FPGA I/Os [Xil09b]. For

this reason, the sample design has only 8 LVDS channels: 6 data channels, one sync channel and one

channel for clock transmission. In general a Virtex FPGA has more differential I/Os, but on this

evaluation board they is only a limited number available through expansion headers

1Device XC5VSX50T of the Virtex5 family, speed grade -1 in a FFG1136 package

15

Page 19: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

Figure 3.2.: Virtex5 FPGA evaluation board ML506 with attached wire connection on the I/O pins

3.2. Transmitter

The transmitter module takes data or the training pattern on the parallel side and performs a 8:1

serialization for each LVDS channel. This results in a frequency which is 8 times higher at the serial

side, than at the parallel side. However, because DDR transmission is used, positive and negative clock

edges are used for sampling, hence the frequency is only 4 times higher. Therefore, the Transmitter

has two clock input signals, clock and clockdiv. The clock signal has the speed of the LVDS channels

and is four times faster than clockdiv (parallel side clock). The Toplevel design contains a phase

locked loop (PLL) that generates the clock signals clock, clockdiv and clk200 a 200 MHz reference

clock. This reference clock is needed for the IODELAY elements. The schematic of the transmitter

module is displayed in Fig. 3.3. The schematic is similar to the one of [Bur06], the only differences

are the number of channels that is decreased to six and the additional byte crusher and ODELAY

components for each channel. These additional components are needed to simulate a transmitter

which is not completely synchronous. Their functionality is explained in Section 3.2.2 in detail.

3.2.1. Serializer

The serialization is done by so-called OSERDES components. These components are available in

the input/output blocks of Xilinx FPGAs (IOB) to handle common I/O operations and to save pro-

grammable logic blocks. A single OSERDES primitive can only perform a 6:1 serialization, but it is

possible to concatenate two OSERDES blocks in a master/slave fashion to serialize up to 10 bits. Fig.

16

Page 20: Implementation of a FPGA-based Interface to a High Speed ...

3.2. Transmitter

DATA_TX_P[00]

data [47:0]/ sync

training pattern

training_done

master

slave

OSERDES

DATA_TX_N[00]

LVDSEXT_25

DATA_TX_P[01]master

slave

OSERDES

DATA_TX_N[01]

LVDSEXT_25

DATA_TX_P[06]master

slave

OSERDES

DATA_TX_N[06]

LVDSEXT_25

ODDRCLOCK_TX_P

CLOCK_TX_N

LVDSEXT_25CLOCKDIV

data [5:0]

data [7:6]

data [13:8]

data [15:14]

sync [5:0]

sync [7:6]

CLK

D1

D2

Q

1

0

CLOCK

byte crusher

byte crusher

byte crusher

ODELAY

ODELAY

ODELAY

ODELAY

Figure 3.3.: Block diagram of the LVDS transmitter

3.4 clarifies the structure of such bundled OSERDES components, on the example of a 8:1 serialization.

There are shiftin and shiftout ports available for the concatenation. It is important to know that the

serialized data stream starts with the LSB or rather input D1.

3.2.2. Asynchronous transmitter

In source synchronous systems the receiver assumes that data and clock arrive synchronously or only

with a minimal skew. Such an idealized system is described in [Bur06]. To test a receiver that should

be able to handle lager skews between the clock and data channels or among the data channels, a

transmitter is needed that can generate such asynchronicities. Therefore, the given basic design from

[Bur06] is extended.

17

Page 21: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

Bit8, Bit 7, … , Bit2, Bit1D1

D2

D3

D4

D5

D6

Q

SHIFTIN1 SHIFTIN2

D1

D2

D3

D4

D5

D6

SHIFTOUT1 SHIFTOUT2

Bit 2

Bit 1

Bit 3

Bit 4

Bit 5

Bit 6

Bit 7

Bit 8

Serial sideParallel side

LSB first

Figure 3.4.: 8:1 serializer, consisting of master and slave OSERDES component

This extended transmitter can generate two different types of delay. One type of delay is generated

by bit insertion, which is done by the byte crusher component. The other one is done by a so-called

output delay primitives available in the I/O blocks of newer Virtex FPGAs.

Byte crusher

The byte crusher introduces a delay to the signal by simply adding a predefined number of bits to

the data stream. This is performed by some logic that works like a shift register. The minimal delay

is given by the duration of one bit which depends on the frequency of the LVDS clock. For DDR

transmission this can be calculated according to following equation:

T =1

2 · frequency

where T is the duration of one bit and frequency the speed of the LVDS clock. Because, the OSERDES

primitive produces a serial data stream which starts with the LSB, the byte crusher has to add the

delay bits in front of the LSB of the first data byte. This process is illustrated in Fig. 3.5 for a delay

of 10 bits. For a serial data stream where the MSB is sent first, the delay bits have to be inserted in

front of the MSB.

The byte crusher has an internal buffer size of number of delay bits plus 8. In this example the buffer

width is 18 bits. In each clock cycle the input byte is written to the upper 8 bits of the buffer, the

lower 8 bits are always assigned to the output of the byte crusher. The buffer content is shifted by 8

to the right and the input byte is written to the upper 8 bits in each clock cycle. With this mechanism

the byte crusher can introduce delays of arbitrary length to the signal.

18

Page 22: Implementation of a FPGA-based Interface to a High Speed ...

3.3. Receiver

1011121314151617 9 15678 4 3 2 0

1567 4 3 2 0

1567 4 3 2 0

output

1 clock cycle

1011121314151617 9 8

output1567 4 3 2 0

input

input

Figure 3.5.: Example operation of the byte crusher component for a delay of 10 bits.

Output delay

The output delay, or in Xilinx terminology IODELAY, is a primitive in the I/O block which can be

used in combination with a SERDES component. It can delay a signal by multiple 75 ps steps, so-

called taps. There are up to 63 taps available in an IODELAY component. The maximally achievable

delay with 63 taps is 4.725 ns. The number of taps can be set to fixed or variable. A fixed tap setting is

hard coded and cannot be changed during operation. Whereas a variable tap setting can be initialized

arbitrarily. Furthermore, it can be changed with increment and decrement signals by single tap steps.

At IODELAY instantiation the developer has to decide whether an input or output signal should be

delayed. In the following, the direction is indicated by the term IDELAY for input or ODELAY for

output signals.

For the transmitter the tap settings of the ODELAY primitives are fixed. For example at a LVDS

clock speed of 400 MHz, one bit has a duration of 1.25 ns, so approximately 17 taps are needed to

perform a shift of one bit. With the combination of a byte crusher and an ODELAY very fine steps

can be attained even for a long delay.

The clock signal is routed through an ODDR and an ODELAY element to the LVDS driver. The

ODDR module sends alternating the signal values of the inputs D1 (1) and D2 (0) for each rising

and falling clock edge. When this component is omitted, such that the clock signal is routed directly

through the ODELAY and to the LVDS driver, the simulation and synthesis works fine. However,

practical tests have shown that, the LVDS receiver on the FPGA does not receive any clock signal.

For this reason an ODDR component is required.

3.3. Receiver

The receiver performs a 1:8 deserialization of the incoming signal. Interesting here are the algorithm

for adapting the sampling position (bit alignment), the word alignment process and finally the skew

compensation among the channels.

As illustrated in the block diagram of the receiver in Fig. 3.6. the incoming differential signals pass

the LVDS input buffers, which transform them to single ended signals. After passing the input delay

19

Page 23: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

primitives, the ISERDES components use the clock signal for sampling. This clock signal is obtained

from the LVDS channel, which is common for source synchronous systems. During calibration after the

system reset the resource sharing control module selects one channel after another to perform the data

alignment. This alignment process is controlled by the bit align machine, which adapts the channel

delay introduced by the IDELAY components and the word alignment performed in the ISERDES.

The control outputs of the bit align machine (bitslip, increment/decrement) are demultiplexed to the

ISERDES and IDELAY elements of each channel, but this is implied in the illustration.

Finally, after successful calibration, the byte parser starts to work. It searches for the first byte that

is equal to a user defined pattern and rises a data valid flag which is used to enable the write signal

for the FIFO. This action automatically compensates skews among the data channels that exceed one

byte.

Apart from the algorithm used for the bit and word alignment, the receiver component was entirely

reworked. The different number of LVDS channels makes it necessary to adapt the resource sharing

control block. One of the most significant changes made in the receiver component affects the internal

distribution of the clock signal, the original implementation was completely replaced.

In [Bur06] the incoming clock signal is distributed by a regional clock buffer (BUFR). This however,

is only possible when all ISERDES components are located within one bank. In Xilinx FPGAs the

I/O blocks, which contain the IODELAY and SERDES primitives, are organized in banks. A common

bank contains 40 I/O blocks, some special banks have only 20 I/O blocks. A clock signal that is driven

by a BUFR, is only available within one bank [Xil09d, chapter 1]. In contrast, a global clock signal is

available in all banks of the FPGA, but depending on the device the number of global clock lines is

limited and the distribution delay is much higher than for regional clocks.

Because this design should face the requirements for the interconnection with the image sensor, it is

assumed that the LVDS channel inputs are spread over multiple banks. Therefore, it is necessary to

use a global clock buffer (BUFG) to distribute the clock signal.

The differential input clock is transformed to a single ended clock by the LVDS input buffer. After-

wards, the signal is feed into a BUFG that drives a phase locked loop (PLL) to generate the clockdiv

signal, which has a 4 times larger period than the clock input. Both PLL outputs, clock (feedback)

and clockdiv (14) are fed into BUFG components to make them available in the whole receiver design.

Virtex5 FPGAs have phase locked loop (PLL) and digital clock manager (DCM) primitives available

to perform up and down sampling of clock signals. For this problem a PLL was chosen, because it

can handle frequencies up to 600 MHz and it offers an easy mechanism for clock down sampling by an

integer divider. In contrast a DCM can handle only frequencies up to 450 MHz which is far less and

not enough to reach the maximum achievable data rate. The fact that the DCM does not need time

to stabilize the output signals can be neglected, because this does not cause any problem. When PLLs

are used, the design has to be kept in reset until the PLL has locked (stabilized) its output signals.

20

Page 24: Implementation of a FPGA-based Interface to a High Speed ...

3.3. Receiver

SYNC [7:0]

Valid[6]

DATA_FROM_ISERDES [15:8]

valid[1]

DATA_TX_P[00]

master

slave

LVDS_25

CLOCKDIV

DATA [7:2]

DATA [1:0]

IDELAY

ISERDES

DATA_TX_N[00]

IDELAYCTRL

200 MHz reference clock

CLOCK_PLVDS_25

CLOCK_NBUFG

PLLCLKOUT 1/4

CLKINBUFG

BUFG CLKFBINCLKFBOUT1/1

DATA_TX_P[01]

master

slave

LVDS_25DATA [15:10]

DATA [9:8]

IDELAY

ISERDES

DATA_TX_N[01]

DATA_TX_P[06]

master

slave

LVDS_25SYNC [7:2]

SYNC [1:0]

IDELAY

ISERDES

DATA_TX_N[06]

byte parser

byte parser

byte parser

bit align machine

resource sharing control

start_align

data_aligned

data_to_machine

bitslip

increment / decrement

CLOCK

DATA_FROM_ISERDES [7:0]

valid[0]

IDELAY

Figure 3.6.: Block diagram of the LVDS receiver

21

Page 25: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

3.3.1. Data and clock timing

To understand the bit alignment process with the adaption of the sampling position, it is necessary

to have a closer look at the post-place & route timing report. This report is generated by the Xilinx

ISE software2 and contains how much time the clock and data signals spent in each component. By

merging the information of the data and clock path timing, it is possible to determine a timing window

where the data signal will be sampled. The information of the timing report is visualized in Fig. 3.7.

For now it is assumed that the transmitter works as an ideal source synchronous component, so neither

the byte crusher nor the ODELAY introduce any delay to the signals.

IBUFDS

IODELAY

I O

Variable delay

IBUFGDS

DataPath

ClockPath

TIOPI TIODDO_IDATAIN TISDCK_DDLY_DDR

D

clk

ISERDESsampling FFs

TPLLCKO_CLKFBOUT

TIOPI TNET1 TNET4

TBGCKO_O

TNET2

PLL

TNET3

BUFG

TBGCKO_O

IODELAY

I O

Variable delay

TIODDO_IDATAIN

BUFG

Figure 3.7.: Data and clock path timing

The diagram shows the path of the data and clock signal from the input pad to the ISERDES primitive

which performs the sampling. The paths are divided into segments which introduce a delay to the

signal. The descriptions of the timing segments of the data path are available in Table 3.1 and the

ones of the clock path in Table 3.2. The concrete (minimum and maximum) timing values for the

design of the individual timing parameters are shown in Table 3.3 and 3.4. These numbers correspond

to the Virtex5 XC5VSX50T with -1 speed grade. The raw timing information is subject to minor

changes in subsequent revisions of the ISE tool3.

With the timing information given, it is possible to calculate the setup and hold time. These timing

parameters refer to the sampling process of a flip flop. The setup time is the duration when the signal

is stable before the sampling edge arrives. In contrast the hold time measures how long the signal is

stable after the sampling edge occurred. So the sum of the setup and hold time describes the timing

window when the signal is stable.

2The report can be found in the ISE software under Tools, Timing Analyzer, Post-Place & Route...3The timing information was generated with ISE Design suite, version 11.3 L.57

22

Page 26: Implementation of a FPGA-based Interface to a High Speed ...

3.3. Receiver

Timing parameter Description

TIOPI Delay of the IOB input buffer

TIODDO IDATAIN Delay from the I pin of IOB pad to the D input of the ISERDES.

Propagation delay through IODELAY

TISCKD DDLY DDR Delay from the D input of the ISERDES to the sampling registers in

the ISERDES (setup and hold times of ISERDES)

with respect to CLK at DDR mode

Table 3.1.: Data path timing definitions

Timing parameter Description

TIOPI Delay of the IOB input buffer

TIODDO IDATAIN Delay from the I pin of IOB pad to the D input of the ISERDES.

Propagation delay through IODELAY

TNETx Distribution delay of the clock net

TBGCKO O Delay from BUFG input to output

TPLLCKO CLKFBOUT Delay introduced by PLL component

Table 3.2.: Clock path timing definitions

Timing parameter Minimum data path delay Maximum data path delay

TIOPI 1.120 ns 1.168 ns

TIODDO IDATAIN 0.917 ns 0.527 ns

TISCKD DDLY DDR -0.089 ns 0.352 ns

Total 1.948 ns 2.047 ns

Table 3.3.: Data path delay overview for Virtex5 XC5VSX50T with -1 speed grade

Timing parameter Minimum clock path delay Maximum clock path delay

TIOPI 1.082 ns 1.123 ns

TIODDO IDATAIN 0.917 ns 0.527 ns

TNET1 1.162 ns 1.263 ns

TBGCKO O 0.230 ns 0.250 ns

TNET2 0.095 ns 1.578 ns

TPLLCKO CLKFBOUT -1.846 ns -3.461 ns

TNET3 1.523 ns 1.655 ns

TBGCKO O 0.230 ns 0.250 ns

TNET4 0.113 ns 1.889 ns

Total 3.506 ns 5.074 ns

Table 3.4.: Clock path delay overview for Virtex5 XC5VSX50T with -1 speed grade

23

Page 27: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

With the given timing information for the clock and data path, the setup time can be obtained.

Setup Time = Max Data Delay − Min Clock Delay (3.1)

= 2.047 ns − 3.506 ns = −1.459 ns

A negative setup time indicates that the clock signal reaches the pin of the input flip-flop after the

data signal. The following equation yields the hold time:

Hold Time = Max Clock Delay − Min Data Delay (3.2)

= 5.074 ns − 1.948 ns = 3.126 ns

The Timing Windows, in which the sampling edge occurs, is calculated by the sum of setup and hold

time.

Timing Window = Setup Time + Hold Time (3.3)

= −1.459 ns + 3.126 ns = 1.667 ns

The calculations for the timing window are independent of the clock speed. Further details about

the timing analysis of source synchronous systems can be found in [KGA03]. The actual situation for

LVDS clock speeds of 200 MHz, 400 MHz and 600 MHz in DDR mode is displayed in Fig. 3.8. The

waveform indicates the duration of a single bit, depending on the clock speed. The timing window for

the sampling clock edge is large because a global clock is used and the predictable timing windows of

the clock path components especially the PLL and the clock net 4 (TNET4) are larger than for regional

clocks.

Data arrivesTiming windowSetup time

1.46 nsHold time3.13 ns

0 1 2 t [ns]

200 MHz - 400 Mb/s - T = 2,5 ns

400 MHz - 800 Mb/s - T = 1,25 ns

600 MHz - 1200 Mb/s - T = 0,83 ns

Figure 3.8.: Timing window for sampling clock edge

This sample timing window depicts the problem that it cannot be assumed that the sampling clock

edge occurs in the middle of the data eye. The data eye pattern, also known as an eye diagram, is a

typical oscilloscope waveform in which a digital data signal is repetitively sampled. An eye represents

the duration of a bit. For transitions between two bits the signal can stay at 1 or 0 or can change its

value. A sample eye diagram is shown in Fig. 3.9, the ideal position of sampling is in the middle of a

data eye.

24

Page 28: Implementation of a FPGA-based Interface to a High Speed ...

3.3. Receiver

For higher speeds or bad signal quality the jitter increases and the transition edges flatten. Therefore

the data eyes becomes smaller. To sample the signal at a position, where the current bit is stable, the

sampling edge has to be moved to the middle between two signal transitions, i.e. to the center of the

data eye. This process is called bit alignment.

time

Binary 1

Binary 0

Signal power

Ideal sampling positions

data eye bit transition

Figure 3.9.: Sample data eye diagram, the ideal sampling position is marked with a dashed line

3.3.2. Data alignment

Bit alignment

Because, there is no possibility to move the sampling clock edge and to adapt each channel separately,

the signals of the channels are delayed to be positioned ideally for sampling. In order to delay the

signal the IDELAY primitives in the receiver are used. This process is controlled by the bit align

machine and can basically be divided into five steps. The following textual descriptions are illustrated

in Fig.3.10.

1. Initial sampling, at an arbitrary position The position may be somewhere in the data stream, in

a data eye or in a transition.

2. Find end of first transition

When the initial sampling position is stable, the signal is delayed until the current and last data

sample (ISERDES output) are different. That is when the transition is found.

3. Walk through the transition and find the beginning

The signal is delayed again until the data samples are again stable. The sampling position is on

the right hand side of the data eye.

4. Walk through the open data eye, count the taps and find the next transition

Now the signal is delayed again in order to move the sampling position to the left hand side of

the data eye. During this process the taps are counted.

25

Page 29: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

5. Go back to the middle of the data eye

When the left hand side of the data eye is found, half of the delay taps are decremented in order

to move the sampling position to the middle of the data eye.

The signal of a channel is delayed by a IDELAY component. This works exactly like the ODELAY

primitive, with the only difference being the direction of the signal, cf. Chapter 3.2.2.

open data eye

fixed sampling position

Bit x Bit x+1

Bit x Bit x+1

Bit x+1

Bit x Bit x+1

1

2

4

3

5

transition

Bit x Bit x+1

Bit x

Figure 3.10.: Five steps of the bit alignment process

The bit sampling is performed by the ISERDES component. Therefore one sampling process is referred

to sample 8 bits in a row and generating one byte. The process described above is simplified and

only deals with a single bits. To understand the state machine, which executes the given algorithm,

it is important to keep that in mind. A precise state diagram, that exactly matches the VHDL

implementation of the state machine, is available in [Bur06, page17]. When the bit alignment is

complete, all bits are sampled in the middle of the data eye, so the first alignment step is completed.

The next step is the word alignment, which concatenates the correct adjacent bits of the data stream

to a byte.

Deserializer and word alignment

The deserializer is the main unit inside the receiver. It is available as primitive in the I/O block, similar

to the OSERDES component in the transmitter. A single ISERDES component can only deserialize 6

bits. Therefore two ISERDES components can be bundle to deserialize up to 10 bits. A block diagram

with a master/slave arrangement of two ISERDES components is shown in Fig. 3.11.

An ISERDES component parallelizes the incoming bit stream, where the first incoming bit is treated

as LSB, the last one as MSB. The ISERDES primitive has the additional functionality of shifting the

sampling window on the serial data stream. This is necessary, because it is not possible to predict

where the sampling starts in the data stream. Hence, the byte at the output of the ISERDES can

contain bits, that originally were located in two adjacent transmitted bytes. To shift the sampling

26

Page 30: Implementation of a FPGA-based Interface to a High Speed ...

3.3. Receiver

Bit1, Bit2, … , Bit7, Bit8Q1

Q2

Q3

Q4

Q5

Q6

D

SHIFTOUT1 SHIFTOUT2

Bit 7

Bit 8

Bit 6

Bit 5

Bit 4

Bit 3

Bit 2

Bit 1

Serial sideParallel side

LSB first

SHIFTIN1 SHIFTIN2Q1

Q2

Q3

Q4

Q5

Q6

Bitslip

Figure 3.11.: deserializer, consisting of master and slave ISERDES component

window, the bitslip signal has to be asserted high during a rising edge of clockdiv. In DDR mode each

bitslip pulse alternatingly results in a shift of 3 bits to the right or 1 bit to the left. After 8 bitslip

cycles the sampling window moves back to the original position. Further details about the ISERDES

and the bitslip operation are given in [Xil09d, Chapter 8].

Byte offset compensation

The given algorithm compensates only the fine grained delays that are smaller than the duration of

8 bits. For larger delays the algorithm works fine as well, but then there is a byte offset among the

channels at the receiver’s output. To compensate this skew, the byte parser component in combination

with a FIFO for each data channel balances the data signals when larger delays occur. The byte parser

is activated when the training is finished and waits for the first byte that is different from the training

pattern. With this mechanism the skew among the channels is compensated.

3.3.3. Compensation capabilities

The combination of bit and word alignment enables the receiver to compensate arbitrary delays that

are smaller than the duration of 8 serial bits. This algorithm was introduced in [Bur06]. The bit

alignment adapts the sampling position, however in the worst case the initial sampling position is

located at the right end of the data eye. Then the bit alignment algorithm has to delay the signal by

nearly two full data eyes, as described before in Chapter 3.3.2. Because there are only 63 delay taps

27

Page 31: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

available, the LVDS speed cannot be arbitrarily slow. With 63 taps the maximum delay that can be

achieved is

63 · 75 ps = 4725 ps.

This maximum delay limits the maximum duration of two bits, which results in a minimum frequency

for the LVDS clock. In the worst case the duration of two bits is equal to 63 taps or 4725 ps. Hence,

the maximum duration of one bit is 2365.5 ps. This results in a minimum clock speed for DDR

transmission of1

2 · 2365.5 10−9s= 211.64 MHz

This minimum speed causes a problem, because the LUPA-3000 image sensor is specified to work at

a speed of 206 MHz. For this reason, the receiver component needs to be extended to work at lower

frequencies.

Sampling position adaption for low speed signals

During the calibration it is neither possible to determine whether the transmission speed is high

enough nor if 63 taps are enough for the bit alignment. The given implementation of the receiver

does not detect a tap overflow in the IDELAY primitive. When the IDELAY component uses 63 taps

and another increment pulse is detected, the tap counter is set to 0. Such a tap overflow or a tap

settings of 63 indicates, that the clock speed is too slow and the sampling clock edge should be delayed.

To avoid the overflow it is necessary to trace the current tap setting and trigger the sampling edge

adaption when the tap counter increases to 63. In the implementation in [Bur06] a tap counter is used

for each channel but only for demonstration purposes, the IDELAY primitive itself does not provide

a tap counter, it does only have a bitslip enable and an increment/decrement input. The tap counter

recognizes these signals and adjusts the counter value.

A possibility to avoid the tap overflows is to delay the clock signal using the IDELAY primitive on the

clock channel. In the extended implementation, developed in this work, the counters for each channel

are used to trigger the clock edge adaption. This is done by means of a state machine inside the

receiver that observes the current tap setting and starts the clock edge adaption as soon as a counter

increases to 63. The state machine controls the reset for the resource sharing control and delays the

clock signal. A phase shift of the incoming clock signal causes an unlock of the PLL which triggers a

reset of all other blocks inside the receiver. This internal reset of the receiver starts a new calibration

for all channels. The clock edge adaption is maximally started once during a calibration process. A

situation where the sampling clock edge has to be adapted is shown in Fig. 3.12.

The original sampling edge occurs at the end of a data eye and the transmission speed is so slow that

more than 63 taps would be needed to detect two bit transitions. When this happens, the clock signal

is delayed by a specific number of taps. This delay should have approximately the length of 14 of a

clock period which is equal to a phase shift of 90, because the new initial sampling position should

be close to the previous bit transition.

28

Page 32: Implementation of a FPGA-based Interface to a High Speed ...

3.4. Design synthesis

> 63 taps

original

Bit x+1Bit x Bit x+2

< 63 taps

delayed sampling position

clock delay

Figure 3.12.: Clock edge adaption for low speed signals

When the worst case is assumed, the original sampling position is located directly infront of the

transition. Hence, a clock delay of 90 would result in the fact that two bit transitions of the data

signal could be detected by using a delay of 1.5 bit durations. There are 62 taps available that could

be used without running into a new overflow. A duration of one bit is62·75 ps

1.5 = 3100 ps, which results

in a new minimum clock speed for DDR transmission of 13100·10−9 s = 161.3 MHz. Interconnections

which run at least this speed are guaranteed to be calibrated correctly, but also slower speeds could be

possible, because the transitions have a duration which increases the probability of detection with less

taps. With these adaptions there should not be any problem for slower interconnections to calibrate

the LVDS channels correctly. This clock edge adaption

3.4. Design synthesis

The complete design introduced in Chapter 3.1 and described in the previous chapters was synthesized

using the Xilinx XST tool. During this process all unnecessary signals and modules are removed due

to design optimization. This could lead to a problem, because signals that should be observed may be

removed in the optimization process.

Following example points this out. The IODELAY primitives have no output signals, which indicates

the current tap setting of the delay element. Therefore, the receiver contains an extra counter that

is sensitive to the increment and decrement signals that control the tap settings. Before the clock

delay extension was introduced, these counters were removed from the design because they were not

connected to any output. The following method can always be applied when a specific signal is needed

for debugging in the synthesized design.

The synthesis tool can be forced to keep signals that would be removed during optimization using the

following declaration in the VHDL code:

attribute KEEP of signalName : signal is "true";

where KEEP is declared as attribute KEEP : string; and signalName is the name of the signal that

should not be trimmed during synthesis.

29

Page 33: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

3.4.1. Clock generator

The Toplevel design which was synthesized has a single ended clock input which expects a 100 MHz

clock. The ML506 board has an Integrated Device Technology (IDT) EEPROM Programmable Clock

Generator which generates a 100 MHz single-ended clock amongst others [Xil09b, Page 19]. This clock

signal is used to drive the overall design, further details are presented in Appendix B.

The Toplevel module contains a PLL which performs an upsampling of the incoming clock signal

to 600 MHz and 150 MHz or 400 MHz and 100 MHz for the clock and clockdiv signals. These clock

signals finally drive all components except the receiver. Fig. 3.1 clarifies the different clock domains,

one for the transmitter side (including data generator and comparator) and the other one for the

receiver side. In addition, the PLL generates a 200 MHz clock which is needed for the IDELAYCTRL

component. This module is connected to nothing else than the reference clock, but it is necessary to

instantiate it for every design that includes an IODELAY primitive. Additional information about

this IDELAYCTRL component can be found in [Xil09d, Chapter 7].

3.4.2. Pin mapping

The inputs and outputs of the Toplevel module are physically mapped to the FPGA pins. This

mapping is specified in a *.ucf file, in the VHDL project directory. The clock input is mapped to a

100 MHz input of the clock generator and the reset is connected to the board reset which is active-low

(pin J14). Internally the reset is handled active-high. For this reason the reset signal is inverted inside

the Toplevel design.

The differential LVDS input and output ports are mapped to the expansion headers. The expansion

header pins are connected with wires to establish a connection between the LVDS transmitter and the

receiver. The input and output pins are distributed over two banks in different regions of the FPGA,

hence it is necessary to use a global clock buffer (BUFG) for the LVDS clock input at the receiver

side. Due to this, the clock input pins have to be chosen carefully because only a few FPGA I/Os

can handle clock signals that have to be distributed globally. The details about this pin mapping

exceptions and the complete mapping are discussed in Appendix B.

3.5. The ChipScope logic analyzer

To verify the functionality of the design under different conditions, it is necessary to visualize some

internal signals like the delay tap settings and the output of the comparator unit. Therefore a special

kind of logic analyzer is used, which is known as ChipScope.

The ChipScope software provides a possibility to trace internal signals of hardware designs. It needs a

cable connection between PC and the FPGA JTAG4 port where so-called logic analyzer cores can be

triggered to record a specific number of data samples. Therefor one or more Integrated Logic Analyzer

4Joint Test Action Group is a standardized interface for debugging and programming embedded hardware

30

Page 34: Implementation of a FPGA-based Interface to a High Speed ...

3.5. The ChipScope logic analyzer

cores (ILAs) can be instantiated in the VHDL design. Such an ILA core has a clock input, at least one

trigger signal and an arbitrary number of data inputs. The clock is used to sample the specified signal

when the trigger condition has fired. Besides, the number of data samples, that should be recorded,

can be specified, but is limited by the capacity of free block rams that are available in the FPGA.

Other cores exist that provide more specific features, but they are not needed to analyze this design,

cf. [Xil09a] for further details. To read the recorded data with the ChipScope software, it is necessary

to instantiate an Integrated Controller core (ICON). This core handles the JTAG connection with the

PC, which is used for data transmission.

It is possible to instantiate the cores directly inside the VHDL code like any other VHDL module and

connect to the desired clock, trigger and data signals. However, there is an easier solution that enables

the core insertion after the synthesis and before the translation step: Therefore ChipScope provides a

Core Inserter tool that takes the results of the XST synthesis and provides a comfortable interface to

select arbitrary signals of the design and to trace them with an ILA.

Finally, the ChipScope analyzer software is used to arm the trigger and display the waveform of the

recorded signals. The PC where the software is running, is connected to the FPGA’s JTAG port via an

USB programming cable. When the connection is established, the trigger condition has to be chosen.

All kinds of logical expressions on the trigger signals can be used to generate a trigger condition which

is used to arm the trigger. When the trigger is armed, the recording of the data signals starts as

soon the trigger condition is fulfilled. The values of the data signals are immediately shown in the

waveform.

There are a few things that should be known when working with the ChipScope software:

• Recording clock signals is impossible an results always in an unrouteable signal error in the Place

& Route phase.

• Each ILA instance contains a latch. These latches are summarized in the Map report of the

design summary.

• Each time the inserted cores are changed, the computational effort for the translate process

increases.

• The ChipScope JTAG connection to the FPGA board is normally blocked by the Windows

firewall.

In the ChipScope user guide [Xil09a] all necessary information for getting started with the software is

provided.

31

Page 35: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

3.6. Test & verification

One purpose of the LVDS communication system was to find the speed limit of a LVDS interconnection

and to verify the algorithm of the data alignment. Furthermore, it was intended to test the skew

compensation among the channels and to evaluate differences in the signal propagation time caused

by length variation of the connecting wire.

The cable connector on the evaluation board, shown in Fig. 3.2, was taken from an old IDE hard disc

drive cable. The cables are crimped into the plug connector. What causes the error-prone weak point

of the connection. The connection cables have a length of 20 cm or 50 cm.

To verify the function of the design, it is necessary to observe the ok output vector of the comparator

module, which indicates the error free transmission of all data channels. Furthermore, the tap counter

setting of each channel, the receiver output and the inputs of the comparator should be recorded which

is done by 3 different ILAs. One ILA for the tap counter settings, another one for the ISERDES output

values, receiver FIFO output values and the ok signals and finally the third one for both comparator

inputs and the ok signals.

3.6.1. Test configuration

For the test all channels, except channel 1, had a 20 cm cable connection, channel 1 had a 50 cm wire.

Moreover, the transmitter introduces different delays to the channels. The left hand side of Table 3.5

contains the delay setting of the byte crusher and ODELAY component that where used in the tests.

The combination of both delays results in a total delay that can be calculated in dependence of the

frequency, which was chosen to be 400 MHz (LVDS clock). Channel 0 does not have any artificial delay

introduced, therefore it can be regarded as reference. The other channels have arbitrary combinations

of both delays.

channel byte crusher ODELAY total delay byte offset bit offset rest delay rest delay

(bits) (taps) (ns) (bytes) (bit) (ps) (taps)

0 0 0 0 0 0 0 0

1 4 15 6.1 0 4 1125 15

2 50 60 67.0 6 5 750 10

3 8 0 10.0 1 0 0 0

4 11 10 14.5 1 3 750 10

5 17 63 26.0 2 4 975 13

sync 23 0 28.8 2 7 0 0

clock - 0 0 0 0 0 0

Table 3.5.: Delay settings for all LVDS channels and expected compensation, assuming a clock speed

of 400 MHz

32

Page 36: Implementation of a FPGA-based Interface to a High Speed ...

3.6. Test & verification

At the receiver the delay of the signal is treated by different components, depending on the duration.

There are three levels of delays that are compensated with different mechanisms.

• Byte offsets (delays with duration of 8 bit or multiples) are compensated with the per channel

FIFOs.

• Bit offsets (delays with duration of 1, 2, ..., 7 bits) are treated by the ISERDES bitslip function.

• All other smaller delays are compensated by adaption of the IDELAY component.

Each delay of a channel is divided into these parts, which results in the fact that the tap counter

for each channel will only indicate delay components that are smaller than a bit or byte offset. The

expected split delay components for each channel were calculated and presented on the right hand

side of Table 3.5.

Fig. 3.13 shows a screen shot of the ChipScope logic analyzer with an ILA that recorded the tap

settings for the data and sync channels. The reference channel 0 (TAP 0) used 23 taps to sample in

the middle of the data eye. Because, there is no additional delay on this channel, the other channels

should use approximately equal or less taps to compensate the small delay component. The delay that

is introduced by the receiver or during transmission, compared to the delay of the reference channel,

decreases the delay that has to be generated by the receiver.

Figure 3.13.: ChipScope screen shot with taps counter for the data and sync channels

Similar to channel 0, channel 3 also has no delay in the small component. Therefore the tap counter

value is equal, but due to variations in the signal quality during the calibration phase it may happen

that the tap counter value varies by ±1 tap.

33

Page 37: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

In general the delay introduced by the transmitter and the IDELAY of the receiver should add up

to approximately 22 - 23 taps or 1.650 ns - 1.725 ns like for channel 0. This also holds for the other

channels. Channel 6 (TAP 6, sync channel) probably does not sample completely in the center of the

data eye, but since no bit error occurs, the sampling position is acceptable. Such variances occur when

a strong jitter is present during calibration. A transition edge is then detected earlier or later and the

measured center of the data eye varies.

For correct interpretation of the screen shot it is important that channel 1 uses a 50 cm wire and all

other channels have 20 cm cables. This causes an extra delay during transmission. Unfortunately,

this delay is not completely measurable, but probably it has a size of approximately 2225 ps. This

obtained by the calculation:

ODELAY + IDELAY − reference tap setting

= 15 + 21 − 23

= 13 taps

where 13 taps generate a delay of 975 ps, which is far too less, because this would result in an unrealistic

signal speed of 307.7 · 109 ms . Hence, there is an additional bit delay which is compensated by the

bitslip function. For a 1 bit delay these are 1250 ps, which results in a total delay of approximately

2225 ps. The corresponding signal speed for this delay would be 134.8 ·106 ms which is a realistic speed.

The previous calculation is only a rough estimate to interpret the results of the simulation. Other

tests with different delay settings led to comparable results.

To test the robustness of the communication, instead of the 20 cm cables, some longer cables with

50 cm were used. At a frequency of 400 MHz the cable length difference had no effect on the error-free

transmission, but for higher speeds e.g., 600 MHz they had many bit errors.

Another ILA traces the signals of the ISERDES output, the receiver FIFO’s outputs that are connected

to the comparator unit and the ok signal. The waveforms, displayed in Fig. 3.14, show the byte offsets

among the channels. The signal data from iserdes 0-5 display the ISERDES outputs. Channel 0 is

the reference channel, the other channels have byte offsets between one and six bytes, compared to

this channel.

The data pattern 00 (hex) clarifies the byte offset of each channel (data from iserdes) compared to

channel 0 which has no delay. The reference position of channel 0 is marked with a red line in Fig.

3.14. Channel 1, 3 and 4 have 1 byte offset, channel 2 has 6 bytes offset and channel 6 has 2 bytes

offset. When comparing these values with the calculation results of the expected values in Table 3.5

it turns out that channel 1 has a 1 byte larger offset than expected. This is due to the fact that the

bitslip is not only used for the compensation of bit offsets, but also to adapt the sampling window.

Sometimes it may happen that the bytes offset increases by one due to shifts of the ISERDES bitslip

function. This effect is non deterministic, because it can not be predicted where the initial sampling

starts. For this reason it is not known how many bitslip cycles are performed by the ISERDES.

Moreover, the screen shot in Fig. 3.14 shows that the data signal (rxfifo1 - rxfifo5 ) arrive completely

balanced at the comparator input (data sample 295) and that the comparator has not detected any

34

Page 38: Implementation of a FPGA-based Interface to a High Speed ...

3.6. Test & verification

Figure 3.14.: Data output of ISERDES, the receiver FIFO outputs at the comparator and the ok signal

bit failures, because the ok signals are still set. This screen shot proves the ability of processing

asynchronous input signals and balancing all channels to generate a synchronous data stream.

Finally, the third ILA recorded the inputs of the comparator, the receiver FIFO’s outputs and the

output of the data generator FIFO and again the ok signal. The result is shown in Fig. 3.15. The

data generator produces values from 0 to 255, then it waits one cycle where the sync signal is set

invalid and then it starts again counting with a valid sync value. Each time when an invalid sync

value occurs the receiver’s FIFOs run out of data. This happens at data sample 27 in Fig. 3.15. The

data generator FIFO does not run out of data because of the delay introduced during transmission,

hence this FIFO contains always more data tokens than the receiver FIFOs. The ok signal stays valid

because the comparator does not compare sample 27, since at least one of the receiver FIFOs is empty.

The data generator FIFO has a width of 48 bits, but for clarification this bitvector is split into single

bytes in order to be able to directly compare it to the corresponding receiver FIFO output.

The tests with the ChipScope logic analyzer has shown that the system is extremely robust at a

clock speed of 400 MHz and 800 Mb/s respectively. No bit error were detected at this clock speed. In

contrast when the clock speed is increased to 600 MHz (1200 MB/s), sometimes bit errors occur. But

more than 90% of the data bytes are still correct. Especially if long cables with a length of 50 cm are

used the bit error rate increases significantly.

The clock speed of the whole design was verified using a clock divider that drives a LED on the

evaluation board. The divider is connected to the 200 MHz reference clock and divides it to a frequency

35

Page 39: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 3. LVDS Communication

Figure 3.15.: Data inputs of the comparator

of 2 Hz and drives the output port, which is mapped to a status LED. The frequency of the design

was verified by counting the light pulses per minute.

In addition, another test was performed to check the correctness of the clock edge adaption algorithm.

The master clock speed of the LVDS was reduced to 150 MHz. This speed causes very like a clock

edge adaption. To make sure that a clock edge adaption was triggered, a LED on the evaluation board

was used to indicate the use of the adaption process. The data signals were checked again with the

ChipScope ILAs. No transmission errors occurred and the LED indicated that the clock edge was

shifted, to allow correct sampling in the middle of the data eye. All these test have shown that the

receiver component can deal with all possible kinds of impairments that may happen.

36

Page 40: Implementation of a FPGA-based Interface to a High Speed ...

4Image sensor - LUPA-3000

This chapter deals with the LUPA-3000 CMOS image sensor. Its functionality and interfaces are

described, since this is the basis for the software model of the image sensor. The specification of the

LUPA-3000 defines requirements for the hardware design, which should be connected later on to the

image sensor.

However, all important aspects that are necessary to understand the functionality of the LVDS data

interface are treated, as well as the exposure control and the Serial Peripheral Interface (SPI) that

are implemented in the SystemC model of the sensor. The interested reader is referred to the original

data sheet [Cyp09] to get more detailed information. A picture of the image sensor is shown in Fig.

4.1.

Figure 4.1.: LUPA-3000 image sensor, copyright by Cypress Semiconductor Corp

37

Page 41: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 4. Image sensor - LUPA-3000

4.1. Sensor Architecture

The sensor has a resolution of 1696 x 1710 pixels (columns x rows) and each row is divided into 53

kernels with a width of 32 pixels each. Pixel position (0,0) is located in the lower left corner. All 32

pixel cell output values of one kernel are transferred to a column amplifier. The column amplifier gains

the signal level and transfers the values to the even or odd kernel bus (32 bit width) depending on

the kernel number. 64 analog digital converters (ADCs) read the kernel buses and generate digital 8

bit values for each pixel. Always two ADCs alternatingly provide the input data for one LVDS driver.

This column multiplex scheme is shown in Fig. 4.2. Consequence, LVDS driver 0 always transmits the

first pixel of a kernel, LVDS driver 1 the second pixel and so on. Due to this multiplex scheme, the

pixel data of one kernel is always transmitted in parallel through 32 LVDS channels.

LVDS driver 0

AD

C 0

AD

C 1

LVDS driver 1

AD

C 2

AD

C 3

LVDS driver 31

ADC

62

ADC

63

32 32 32 32 32 32

32

321 1 1 1 1 1

...32 pixels

Figure 4.2.: Column multiplex scheme of the sensor architecture

The image sensor has a differential clock input, which is known as master clock and specified for a speed

of 206 MHz. The differential clock output which is synchronous to the LVDS channels operates at the

same frequency. The LVDS data and sync channels operate at double data rate (DDR). Internally,

the input clock is divided by four and is called sensor clock (51.5 MHz).

38

Page 42: Implementation of a FPGA-based Interface to a High Speed ...

4.1. Sensor Architecture

4.1.1. Pixel architecture and timing

A pixel consists of a photo diode, 6 transistors and a capacitor. This kind of pixel is known as 6-T pixel

and has a global synchronous shutter feature. This feature allows a simultaneous reset and exposure

of all pixels. The 6-T pixel schematic is shown in Fig. 4.3.

Figure 4.3.: Pixel schematic of a 6-T pixel cell

The signals connected to the pixels are controlled through timers. These timing parameters can

be changed using the configuration interface. At the end of the exposure cycle, each pixel value is

transferred immediately to the Vmem capacitor to wait for its readout. The pixel values are then

readout row by row from the storage capacitors. This use of intermediate storage in the pixel reduces

the gradual overexposure that can occur down the image when the exposure happens not simultaneous

and the rows are readout directly from the active area.

The exposure time is controlled by the exposure1 pin in normal operation mode. In dual slope mode

two exposure pins, exposure1 and exposure2 are used. Dual slope performs a normal exposure first

and then resets all pixels that have reached the maximum value and performs a second expose for the

pixels that have been reset. This dual slope mode can increase the dynamic range of an image, but

it is only applicable on constant lightning conditions. Further details about the multi slope exposure

method are available in [Gmb10].

When the exposure cycle is finished the frame overhead time (FOT) starts, it is the time needed

until the pixel data is stable and ready to be readout. When FOT starts Vmem is brought low and

precharge and sample are set to high. The precharge pulse deletes old information from the storage

node to avoid image lag. When precharge is low again the sampling is completed during the remaining

duration of the sample pulse. The rising edge of Vmem triggers the pixel reset signal. The general

sequence of the pixel control signals in shown in the timing diagram in Fig. 4.4.

Further details about the pixels timing and all controls signals are available in the data sheet [Cyp09,

page 4], but all important details were discussed here. The timer values of the Vmem, precharge,

sample and FOT timer can be programmed by the user through the Serial Peripheral Interface (SPI),

this will be discussed in the next chapter.

39

Page 43: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 4. Image sensor - LUPA-3000

invalid data valid data

FOT _TIMERFOT _TIMER

SAMPLE_TIMERSAMPLE_TIMER

PRECHARGE_TIMERPRECHARGE_TIMER

VMEM _TIMERVMEM _TIMER

vmem

precharge

sample

pixel reset

data

Figure 4.4.: Pixel timing during FOT

4.2. Serial Peripheral Interface

The Serial Peripheral Interface (SPI) is used to program the behavior of the LUPA-3000 or readout

the current settings. This interface is also implemented in the software model of the image sensor and

the hardware design connecting to it has to implemented the counterpart. The LUPA-3000 has 128

SPI registers with a size of 8 bit each to store many different settings like timing parameters, image

size and region (smaller resolution).

Some address ranges are not in use, or at least not documented in the data sheet. The SPI is a simple

bus that uses four signals: clock, chip-select, MOSI (Master out Slave in) and MISO (Master in Slave

out)1. These four signals are used to transmit the address and data bytes serially between master

and slave. The maximum clock frequency supported by the sensor is 10 MHz. All read and write

operations are executed serially starting with the MSB first and they are always initiated by the bus

master. During operation the chip-select (CS) is brought low. First a 8 bit command consisting of a

read/write bit (C) and a 7 bit address (a<6> - a<0>) is sent through the MOSI wire. The timing

diagram for the read timing is visible in Fig. 4.5.

C a<6> a<5> a<1> a<0> don't care

d<7> d<6> d<1> d<0>

CS

spi_clk

MOSI

MISO

Figure 4.5.: SPI read timing

For read commands the read bit (C) is set to zero. When the address is transmitted completely the

LUPA-3000 immediately responds the register content on the MISO signal, starting again with the

MSB.

1In the SPI interconnection the LUPA-3000 image sensor is used as slave device

40

Page 44: Implementation of a FPGA-based Interface to a High Speed ...

4.2. Serial Peripheral Interface

The timing of a SPI write operation is clarified in timing diagram 4.6. When the write bit (C) is set

to one the LUPA-3000 expects 7 address bits (a<6> - a<0>) and a byte value (d<7> - d<0>) on

the MOSI channel. There is no response on the MISO channel.

C a<6> a<5> a<1> a<0> d<7> d<6> d<1> d<0>

CS

spi_clk

MOSI

MISO

Figure 4.6.: SPI write timing

4.2.1. SPI registers

The content of the SPI registers is used to control the LUPA-3000 in a very comprehensive way. A list

of a selection of the most important registers is available in Table 4.1. The given registers influence

the behavior of the image sensor’s software model which will be introduced in Chapter 4.4, the other

registers are not important with respect to the basic functionality of the LUPA-3000.

The readout modes can be changed between normal operation, test image readout and training mode.

The internal pixel timing durations of vmem, precharge, sample, FOT and row overhead time (ROT)

can be specified. The dual slope exposure can be (de)activated. Besides it is possible to reduce the

image size, known as region of interest (ROI) to increase the frame rate. The appropriate ROI can

be controlled through the y start, y end, x start and number of kernels attributes. During changes of

the SPI registers, the sensor should be kept in sequencer reset (SPI address 0, bit 1) which interrupts

light integration and image readout.

Most important SPI registers and their functionality are discussed here. A complete list of all SPI

registers and more detailed description are available in the data sheet [Cyp09].

41

Page 45: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 4. Image sensor - LUPA-3000

Address Bits Name Description

0 <0> Power down Power down analog core

<1> Reset n seq Reset n of on chip sequencer

<2> Red rot Enable reduced ROT mode

<3> Ds en Enable dual slope operation

1 <4:0> ROT TIMER Length of ROT: n+ 2 sensor clocks

2 <7:0> PRECHARGE TIMER Length of pixel precharge: 4 · n sensor clocks

3 <7:0> SAMPLE TIMER Length of pixel sample: 4 · n sensor clocks

4 <7:0> VMEM TIMER Length of pixel vmem: 4 · n sensor clocks

5 <7:0> FOT TIMER Length of FOT: (4 · n) + 2 sensor clocks

6 <5:0> NB OF KERNELS Number of kernels to readout, minimum 4

7 <7:0> Y START <7:0> Start pointer Y readout

8 <2:0> Y START <10:8>

9 <7:0> Y END <7:0> End pointer Y readout

10 <2:0> Y END <10:8>

11 <4:0> X START Start pointer X

12 <0> Training en 0: Transmit test patterns

1: Transmit training patterns

<1> Bypass en 0: Ignore TRAINING EN bit, image readout

1: Evaluate TRAINING EN bit

30 <7:0> FIXED Fixed, read only register

31 <7:0> CHIP REV NB Chip revision number

32 <7:0> SOF Start Of Frame keyword

33 <7:0> SOL Start Of Line keyword

34 <7:0> EOL End Of Line keyword

35 <7:0> IDLE A Idle A keyword, used as training pattern

36 <7:0> IDLE B Idle B keyword, used as training pattern

71 <0> crc en Enable crc for data channel

<1> crc sync en Enable crc for sync channel

96 - 127 <7:0> Test patterns 0 - 31 Test patterns for each channel

Table 4.1.: Selection of important SPI adresses

42

Page 46: Implementation of a FPGA-based Interface to a High Speed ...

4.3. Readout

4.3. Readout

The image sensor operates in pipelined mode, which enables light integration for the next frame and

image readout of the current frame in parallel. This process is visualized in Fig. 4.7.

Integration frame x+1 Integration frame x+2

Readout frame x Readout frame x+1

L0FOT L1 L2 L1709

Readout lines

K1 K2

Readout kernels

ROT K53

Figure 4.7.: Pipelined operations, integration and readout are done in parallel

One frame readout is divided into frame overhead time (FOT) and the specified number of lines.

Furthermore each line is divided into row overhead time (ROT) and the specified number of kernels.

Fig. 4.7 illustrates this process for the maximum resolution supported by the LUPA-3000. Each

kernels’ data is transferred in parallel via the 32 LVDS data channels.

The light integration is controlled through the exposure1 (and the exposure2 ) signal. A falling edge

of the exposure1 signal immediately starts the FOT. This activity is visualized in timing diagram 4.8.

L1 L2 L3 Lx

wait till ROT

sample timerintegration time

FOTFOTFOTFOTpixel Vmem

DATA

exposure1

pixel reset

pixel sample

Figure 4.8.: Exposure and readout timing

When the FOT is finished the pixel reset is set to high and the readout starts. The pixel reset should

be high for at least 3µs. After this period the next exposure cycle can start, which is indicated by a

rising edge of the exposure1 signal. When the rising edge occurs during readout it is internally delayed

until the next ROT, otherwise the falling edge of the pixel reset would introduce disturbance to the

image. The given time diagram visualizes the situation where the exposure is longer active than time

is needed for the readout. When the falling exposure signal occurs during readout it is interrupted,

43

Page 47: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 4. Image sensor - LUPA-3000

but the current line finished because the falling edge is internally delayed until the current line readout

is completed. A timing diagram visualizing this situation and one clarifying the dual slope integration

is available in [Cyp09, page 31].

While having information about the overall frame timing, a closer look can be taken at the sync and

data channels. The sync channel is used to tell the receiver which data is currently transmitted on

the data channels. An exemplary data stream is shown in Fig. 4.9, the corresponding abbreviations

are explained in Table 4.2. The sync value constants can be programmed via the SPI, the addresses

and default values are also mentioned in the table.

Ia Ib Ix SOF EOL Ix Ix SOL a<15:8> a<7:0> Ix EOL Ix Ix SOL a<15:8> a<7:0>

Ia Ib Ix Ix Ix Ix Ix Ix col i col i+32 col i+64 col x CRC Ix Ix col i col i+32

depending on ROTdepending on ROTdepending on ROTdepending on ROT

Sync

Data

Figure 4.9.: Sync channel and data channel values during image readout

Keyword Description SPI address Value

SOF Start of frame 32 32

SOL Start of line 33 34

EOL End of line 34 35

Ia Idle word A 35 235

Ib Idle word B 36 235

Ix Idle word A or B - -

a<15:8> Address of line being readout (upper 8 bits) - -

a<7:0> Address of line being readout (lower 8 bits) - -

CRC CRC checksum of the previous picture row - -

Table 4.2.: Sync channel values

When the image sensor is idle, the sync and data channels send alternatingly the idle A (Ia) and idle B

(Ib) values. Both values are set to 235 by default. These idle values are used as training pattern, too.

During FOT the idle patterns are transmitted on all channels, the last byte during FOT on the sync

channel is the start of frame (SOF) keyword. The first ROT starts with a misplaced end of line (EOL)

which can be ignored. The next values on the sync channel are idle values, their number depends on

the ROT length. The last value sent during ROT is the SOF keyword. Now the data transmission of

the first line starts. All 32 data channels transmit the pixel values of one kernel in parallel, this is done

until all kernels of one line are transmitted, so at least 4 times, because the minimum image width is 4

kernels (128 pixels). The sync channel transmits the line number, split into two bytes, and continues

with sending idle words until the second last kernel is transmitted. The last kernel of a line is indicated

with an end of line (EOL) on the sync channel. If the cyclic redundancy check (CRC) transmission is

enabled, it is sent instead of the next idle byte. Now the next ROT starts and continues as described

above until all lines of the image are transmitted.

44

Page 48: Implementation of a FPGA-based Interface to a High Speed ...

4.3. Readout

For calibration of the receiver the training mode can be activated, it can be set in SPI register 12.

As long as the training mode is activated all channels transmit idle patterns. In the test image

mode, which can be activated alternatively in the same register, the sync channel works like in normal

operation but the data channels transmit the test patterns stored in SPI registers 96 - 127, instead of

image data from the ADCs.

The sync and data channels are synchronous to the output clock and operate at the master clock

speed in DDR mode. There is a delay between input and output clock of approximately 2.5 ns.

4.3.1. Cyclic redundancy check

A cyclic redundancy check (CRC) is a hash function that can be calculated with less computational

effort, but provides a method to detect errors that occurred during transmission. It is calculated

before and after transmission, if the check sum is equal it is very likely that no error occurred during

transmission. The LUPA-3000 calculates a CRC checksum for each line of the picture and each data

channel. The CRC calculation for the data channels is enabled by default, a CRC insertion for the

sync channel is also possible but not enabled by default. The position of the sync CRC is probably at

the same position as in the data channel and replaces an idle byte, but this is not explicitly mentioned

in the data sheet.

The general form of a CRC polynomial in modulo 2 arithmetic is

G(x) = crxr + ...+ c2x

2 + c1x1 + c0x

0 mod 2 (4.1)

where r is the degree of the generator polynomial, it defines the length of the checksum in bits and

cx indicates the presence or absence of a coefficient. The generator polynomial implemented in the

LUPA-3000 is given by following equation:

G(x) = x8 + x6 + x3 + x2 + 1 mod 2 (4.2)

This equation can be implemented by the circuit shown in Fig. 4.10 to calculate the CRC of a serial

data stream. The given circuit is different from the one shown in the data sheet, but generates the

same CRC more comprehensible.

x7+ x6 x5 x4 x3 x2 x1 x0+ + +

c0c2c3c6c8IN

OUT

Figure 4.10.: Circuit for CRC generation with the polynomial implemented in LUPA-3000

In the beginning of a calculation all registers are initialized with 1s to improve the bit error detection

capabilities. The ⊕ operand indicates a XOR concatenation or an addition in modulo 2 arithmetic.

As long as new data is available at the input, the switch is in the lower position. When all bits that

should be included into the checksum are inside the circuit the switch is set to the upper feedback

position to write the 8 checksum bits to the output.

45

Page 49: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 4. Image sensor - LUPA-3000

4.4. Software model

In order to verify the hardware design, that should be connected to the LUPA-3000, it is necessary to

provide a model of the sensor. The model was developed in SystemC, which is a C++ class library.

This library uses general C++ syntax, but provides an easy method to describe concurrent hardware,

similar to VHDL, but in an object oriented way. Besides, SystemC is an open source IEEE standard.

The latest version is available at http://www.systemc.org. Furthermore, ModelSim is able to co-

simulate designs which include SystemC and VHDL modules. For these reasons SystemC was chosen.

The compilation and co-simulation in ModelSim of the LUPA-3000 model is explained in Appendix

C, further details about the subject matter are discussed in [Men04].

The LUPA-3000 is modeled as a single SystemC module, containing several methods, running as

SC THREADs and SC METHODs in order to behave like the original sensor. SC THREADs normally

run continuously and block to wait for an event e.g. a rising edge of a signal. In contrast SC METHODs

do not run continuously, they are called each time a specified event happens. Block diagram 4.11

roughly describes its internal dependencies roughly. All I/O ports are of data type bool, expect the 32

data channel, these are modeled as sc uint which is equal to a bool vector. For the interconnection

with VHDL these ports can be mapped to std logic or std logic vectors, respectively.

Boxes with rotating circles represent SC THREADSs that run in endless loops and block from time

to time to wait for an event to happen. The internal clk thread generates the internal clock signal at

master clock speed for the whole sensor and interacts with the sensor clk thread that generates the

internal sensor clock signal with a four times larger period than the master clock.

The SPI is controlled by the SPI thread that interacts with the interface ports and the SPI register

memory block, which is simply modeled as array of sc uint<8> (8 bit unsigned integer).

Incoming exposure pulses are handled by the frame timing thread. It models the vmem, precharge,

sample and pixel reset signals and handles the timing of the FOT timer, according to the current SPI

register settings. The frame timing thread sets the internal and/or external signals and notifies events

that are trigger event handler methods when the expected duration of the signal is over. These event

handler methods run as SC METHODs that are sensitive to a specific event. The helper functions

are omitted in Fig. 4.11 to keep it as simple as possible.

During FOT the fot output signal is high, but one sensor clock cycle less than the actual FOT duration,

refer to Table 4.2. Similarly, the rot output is high during ROT, but one sensor clock cycle less than

the actual ROT duration. Timing diagrams illustrating this are available in [Cyp09, page 34].

With the overall timing structure generated by the frame timing thread and the current SPI register

settings, the data provider is able to determine which information has to be transferred to the LVDS

driver. The data provider distinguishes between four possible states: FOT, ROT, write picture data

and idle. Generally, the idle state is entered and idle patterns are transmitted on the channels. As

soon as the frame timing thread detects the end of an exposure pulse, it activates the internal FOT

signal which notifies the data provider to change its state to FOT. When the duration of FOT is over

a SC METHOD deactivates the internal and external FOT signals and starts the first ROT. Now the

46

Page 50: Implementation of a FPGA-based Interface to a High Speed ...

4.4. Software model

rot

fot

LUPA-3000 SystemC model

GetPixel picture.mat

Frame timing

Dataprovider

LVDS sync

LVDS buffer 0

LVDS buffer 31

SPI registers

Sensorclk

Internalclk

LVDS driver

SPI

spi_

clk

spi_

cs

spi_

mos

i

spi_

mis

o

clk_in_p

clk_in_n

reset_n

sync_psync_n

lvds_p_0lvds_n_0

lvds_p_31lvds_n_31

clk_out_pclk_out_n

exposure2

exposure1

Figure 4.11.: Structural diagram of the LUPA-300 SystemC model

data provider switches to ROT state and transmits the corresponding values. As soon as ROT is over,

which is detected by another SC METHOD, the next state is always the write picture data. In this

state, the picture data for one line is transmitted. The x start and NB OF KERNELS parameters

that define one line are read from the SPI registers. At the end of each line it is checked whether y

end is already reached or another line follows. Depending on this, a new ROT may be started by the

data provider. When the last line of the image was transmitted the data provider switches back to the

idle state and waits for the next notification of the frame timing thread.

When a normal image readout happens, the getPixel method is utilized to provide access to a 8 bit

gray scale image which is stored in a Matlab workspace file (*.mat). Each Matlab installation contains

C++ libraries that provide some methods to access *.mat files. The data provider writes the current

output values for the lvds driver to internal FIFOs in order to avoid sampling complexities in the

LVDS driver component. The LVDS driver thread reads all FIFOs and outputs the values bit by bit

at master clock speed. When the test image readout is activated the image data is replaced by the

test pattern programmed in the SPI. The serial data stream generated by the LVDS driver starts

with the MSB, in contrast to the Xilinx ISERDES components which always expect the LSB first.

This problem is handled by the LVDS receiver, which is explained later on in detail.

47

Page 51: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 4. Image sensor - LUPA-3000

Each time an exposure cycle is finished, the model prints the settings for the current image readout

on the console. Such a status report is shown below:

************************************************

single slope

exposure duration: 79638400 ps 4100 sensor clock cycles

FOT: 40

ROT: 7

NB_OF_KERNELS: 21

y start: 1010

y end: 1260

x start: 0

lines: 251

pixels per line: 672

************************************************

All parameters that directly influence the exposure and image readout are presented. These are

exposure mode and duration, the FOT and ROT timing values and all parameters belonging to the

region of interest (ROI).

The exposure duration of a single frame is used to adjust the brightness of the image currently

readout. This process is somehow arbitrary, because the physical behavior of the image sensor for

different exposure times is unknown. Therefore a reasonable adaption of the image brightness for an

interval between 3 - 2062µs was chosen. Since all pixel values from the image are in a range from 0

(black) to 255 (white), the adaption in the given interval reaches from +150 for low exposure to -150

for high exposure durations. For exposure durations beyond the interval the maximum adaption of

±150 is chosen. When the addition of the brightness correction value result is out of the range of 0 to

255 the value it set to the maximum or minimum value. This slabs the dynamic of an image.

Furthermore, a Matlab script cut_img_gen.m was built, which generates a Matlab image variable out

of an existing picture. The script reads an arbitrary image, e.g. a *.jpg file, performs a RGB to gray

conversion and crops it to a resolution of 1710 x 1696. The position of the cropped region can be

chosen inside the script.

48

Page 52: Implementation of a FPGA-based Interface to a High Speed ...

5Data and control interface for the LUPA-3000

image sensor

The requirements for the design, which controls the image sensor, were specified in such a way that

the design should provides a simple method to control the exposure duration and to take a specified

number of pictures per second.

The SPI is used to program the exposure duration and the frames per second parameter into the

controlling design. Some unused successive SPI addresses are used for these new parameters. With

these parameters, the knowledge of the constant sensor clock cycles per second and the other SPI

registers, the duration of one frame can be calculated and a specified number of frames per second can

be taken.

Block diagram 5.1 describes the complete structure of the controller design. The design is basically

divided into three units: the SPI wrapper, the exposure control and the LVDS receiver. The SPI

wrapper controls the programming interface of the image sensor and keeps the copies of the SPI

registers in the SPI memory up to date. The exposure control reads the SPI memory in order to get

the current exposure settings specified by the user. The synchronizer block is used to arbitrate the

memory access in order to avoid conflicts. The LVDS receiver is only coupled with other components

of the design through the training done signal, because it only provides the data from the LVDS

interface.

These units work at different clock speeds. The colored boxes in the background of the diagram assign

the component to clock domains. Signals and blocks, that are part of more than one clock domain,

have to be treated in a special way to avoid timing problems. This will be discussed later on in Chapter

5.4.

49

Page 53: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 5. Data and control interface for the LUPA-3000 image sensor

MOSI addr

MOSI data

MISO

CSMOSI

MISO

Training_Done

doutaddr

addrdindout

exposure2

exposure1

Data channel 0

Data channel 31

Sync channel

operate

Figure 5.1.: Structural description of the VHDL design

5.1. Serial peripheral interface

The SPI wrapper component communicates with the LUPA-3000 by means of the standard SPI signals.

Therefore, this component works at a clock speed of 10 Mhz. It receives commands from outside

through the MOSI addr FIFO, which has a width of 8 bit. For write commands it expects to find a

data token in the MOSI data FIFO (8 bit width). All responses for read commands are written to

the MISO FIFO (8 bit width). Read and write commands are forwarded to the image sensor, except

the address range from 18 to 27, because these addresses are used to save the exposure settings. For

write commands where the SPI address is smaller than 32, the corresponding data value is additionally

written to the SPI memory.

The SPI memory in the design has a size of 32 bytes to store a copy of the SPI register 0 - 17 and

uses 10 bytes for the exposure1, exposure2offset and frames per second parameters. These extra bytes

are located at the memory addresses 18 - 27, the exact mapping is described in Table 5.1.

The memory addresses 0-17 are initialized with the default values for the SPI registers, defined in the

data sheet [Cyp09].

50

Page 54: Implementation of a FPGA-based Interface to a High Speed ...

5.1. Serial peripheral interface

Address Bits Name Description

18 <7:0> Frames per second <7:0> Number of frames per second

19 <7:0> Frames per second <15:8>

20 <7:0> exposure 1 <7:0> duration of exposure1 pulse in sensor clocks

21 <7:0> exposure 1 <15:8>

22 <7:0> exposure 1 <23:16>

23 <7:0> exposure 1 <31:24>

24 <7:0> exposure 2 offset <7:0> offset length between start of exposure1

25 <7:0> exposure 2 offset <15:8> and exposure2 in sensor clocks

26 <7:0> exposure 2 offset <23:16>

27 <7:0> exposure 2 offset <31:24>

Table 5.1.: SPI register address space extension

Since the SPI memory is also used by the SPI wrapper and the exposure control block, it is important to

have an arbitration, otherwise it would be possible that the exposure control reads a memory location

that is currently written by the SPI wrapper. This would result in an unpredictable situation. For

this reason, a token is passed between both components to ensure that there is an exclusive memory

access. The token is a single signal which is forwarded by the synchronizer component across the

clock domain boarder. Only the component owning the token is allowed to access the memory, in the

meanwhile the other component is locked.

During system start up the training mode of the image sensor has to be activated. This is achieved

by a state machine implemented in the SPI wrapper. A state diagram is shown in Fig. 5.2. During

reset state some internal signals are set before the sensor is programmed in the INIT state. Following

actions are performed in this state: resetting the sequencer, programming idle patterns a and b,

enabling training and enabling the sequencer. The idle patterns, that are used for the training, are set

to the binary symmetric string ”00100100”, because the LUPA-3000 sends the data with the MSB first

and the LVDS receiver expects the LSB first. With this symmetrical pattern the problem is solved for

the training phase.

Now the state machine remains in the WAIT state until the LVDS receiver sets the TRAINING DONE

signal. This happens when the calibration of all channels was performed successfully. After that, the

state machine switches to the START state to program sensor for normal operation. Moreover, the

idle pattern is changed to value ”11011011” to signal the LVDS receiver that the training phase is over

and the received data is valid and should be written into the output FIFOs.

Now the SPI wrapper enters the IDLE state, which means that the state machine is waiting for the

next SPI read or write operation. As soon as data tokens are available in the MOSI addr FIFO,

the state machine switches to the SPIR state for a read operation or to the SPIW state for a write

operation. Depending on the address, the command is serialized and transmitted to the LUPA-3000,

and/or the SPI memory is accessed to handle the request. The SPI memory is always involved when

addresses smaller than 30 are accessed in an operation. After each command the state machine comes

51

Page 55: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 5. Data and control interface for the LUPA-3000 image sensor

RESET

INIT

WAIT

START

IDLE

SPIR

SPIW

SYNC

LOCK

Figure 5.2.: Finite state machine of the SPIwrapper module

back to the IDLE state. If there are no more commands available and a write command was invoked

previously the SYNC state is accessed. In the SYNC state the activation token is passed through the

synchronizer to the exposure control component. This causes the state machine to switch immediately

to the LOCK state, where it remains until the activation token is passed back.

5.2. Exposure control

The exposure control block contains two state machines that work in parallel, one for handling the

memory access and calculation of the exposure timing parameters, the other one for the control of

the exposure output pins. At first, the FSM for the memory access is discussed. A corresponding

schematic description is shown in Fig. 5.3.

After reset, the LOCK state is entered. Now the state machine waits until the activation token is

received. As soon as this happens the SPI memory is read (addresses 0, 1 , 5 - 10 and 18 - 27). With

the information from SPI registers it is possible to calculate the readout duration of a frame, which

is called frame period and calculated in the CALC READ DURATION state. The formula for the

calculation is taken from [Cyp09, page 5], but changed in a way that the resulting unit is sensor clock

cycles and not seconds, because in a hardware design, there is no notion of time, only of clock cycles.

Therefore, the exposure control block has to run at sensor clock speed to correctly perform timing of

the exposure signals.

52

Page 56: Implementation of a FPGA-based Interface to a High Speed ...

5.2. Exposure control

SYNCCALC_

DELAY_PER_FRAME

DELAY

CALC_TOTAL_

DURATION

CALC_READ_

DURATION

LOCK

READMEM

Figure 5.3.: First finite state machine of the exposure control module

The frame period in sensor clock cycles is calculated by

Frame period = FOT + lines · (ROT +pixels

4· dataPeriod) (5.1)

where FOT is the FOT duration in sensor clock cycles, ROT the ROT duration in sensor clock cycles,

lines the number of lines of the current frame, pixels the number of pixels per line and dataPeriod

the period of one bit on the LVDS channel, measured in sensor clock cycles. The dataPeriod is a

constant of 18 , because DDR is used (factor 1

2) and one sensor clock period is 4 times longer than the

master clock period with a factor of 14 . By knowing this, and due to the fact that pixels is always 32·

NB OF KERNELS the formula is simplified to:

Frame period = FOT + lines · (ROT + NB OF KERNELS) (5.2)

Formula (5.2) can be implemented in hardware, without any problem.

In the CALC TOTAL DURATION state the constant number of sensor clock cycles per second, which

is stored in the constants.vhd file, is divided by the frames per second parameter from the SPI register

extension. The result is called cycles per frame and is the time available for the readout of one frame

and an arbitrary size delay. The division operation is calculated by a Xilinx divider IP core which is

available for free in the ISE Design suite. The division needs 25 clock cycles to complete, meanwhile

the FSM stays in the current state until the divider invokes the ready signal.

The last calculation step is the subtraction of exposure1 duration from the cycles per frame value,

which is done by a subtracter IP core. This core has a delay of one clock cycle. For this reason, the

DELAY state is entered before the exposure offset is calculated in the CALC DELAY PER FRAME

state. For clarification, diagram 5.4 visualizes the dependencies of the timing values.

One second consists of a specific number of sensor clock cycles. Dividing the sensor clock cycles per

second by the frames per second parameter results in the cycles per frame value. Because the image

sensor can expose and readout in parallel, the exposure for the next frame is done during the current

frame readout. For this reason the exposure offset is cycles per frame minus exposure duration. The

53

Page 57: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 5. Data and control interface for the LUPA-3000 image sensor

cycles per second = 1 second

cycles per frame cycles per frame

exposure exposure

readout delaydelay

frame periodexposureoffset

readout

Figure 5.4.: Timing diagram of the frame timing

delay between two readouts depends on the length of the cycles per frame and the current frame period.

For frame rates near the maximum, the delay is very small. The exposure pins have a minimum hold

requirement of 15 master clock cycles. Therefore, the minimum delay value is always 4 sensor clock

cycles. A smaller grading is not long enough.

In the CALC DELAY PER FRAME state it is checked whether the given parameters are correct or

at least achievable. It may happen that the exposure duration is longer than the cycles per frame, the

number of cycles per frame are smaller than the frame period (frame rate to high) or the exposure

offset is smaller than 4. As soon as one of these cases is detected the delay is set to 4 and the exposure

pulses are activated in such a way that the maximally achievable frame rate for the given exposure

duration is obtained.

Finally, all necessary values for the exposure control are calculated and the state machine switches to

the SYNC state in order to pass the activation token back to the SPI wrapper. The duration of the

sync state is chosen to be 15 cycles, because the sensor clock is at least 5 times faster than the SPI

clock. Hence, it is necessary to keep the signal high for at least one rising edge of the SPI clock. When

the token is passed back to the SPI wrapper, the LOCK state is entered and the state machine waits

again for the activation token.

As mentioned in the beginning of this chapter, there is a second state machine that is responsible for

the behavior of the exposure signals. The corresponding state diagram is shown in Fig. 5.5.

Figure 5.5.: Second finite state machine of the exposure control module

There are only three states in the FSM. The initial state is WAIT FOR 1ST SYNC, where the state

machine stays until the concurrently running FSM described before reaches the SYNC state the first

time. This event indicates that all necessary data was read from the memory and the timing parameters

54

Page 58: Implementation of a FPGA-based Interface to a High Speed ...

5.2. Exposure control

for the exposure control were calculated. In the IDLE state the necessary parameters for the exposure

control are copied.

If the OPERATE and TRAINING DONE signals are high and the other FSM is in state SYNC or

LOCK, the state is switched to EXPOSE, otherwise it remains in the IDLE state, until all conditions

are fulfilled. The operate signal is an external input signal that activates the exposure operation as

long as it is high.

During the EXPOSE state the exposure signals are controlled. Generally, the exposure2 signal is al-

ways low, except the dual slope mode is enabled. At the beginning of an exposure cycle, both exposure

signals are low until the exposure offset is over. After that, the exposure1 signal is brought high for

the exposure1 duration. When dual slope is enabled, the exposure2 signal is brought high exactly

exposre2offset sensor clock cycles after the rising edge of the exposure1 signal. Both exposure signals

are brought low again when the exposure1 duration is over. Now the next cycle starts immediately

with the exposure delay.

This continues as long as the internal stopExp signal is low. When the signal is high, the state machine

switches to the IDLE state to copy again the necessary parameters. The stopExp signal is brought

high when the other state machine is in the SYNC state. This indicates that a new configuration

was programmed. The stopExp signal is also brought high when the operate signal is disabled. With

the use of the stopExp signal it is guaranteed that the exposure is not aborted and that a newly

programmed configuration is not ignored.

5.2.1. Timing parameter calculation

To verify the calculations of the exposure control block, an Excel spreadsheet was generated that

determines the timing settings for a given configuration. A screen shot of the spreadsheet is given in

Fig. 5.6. All fields marked in yellow are mandatory fields for user inputs, the other fields are then

calculated automatically. Column D contains the default values of certain parameters.

The master clock speed of the design is used to calculate the master clock period and the cycles per

second constant (in sensor clock cycles). Both values are required in the constants.vhd file and the

period is needed in the test environment (lupa_constants.h). This will be discussed more detailed

in Chapter 6.

The frame period is calculated with the SPI register values for the FOT timer (fot n), ROT timer (rot

n) and the region of interest (ROI) (y start, y end, nb of kernels). The exposure timing values (cycles

per frame, exposure offset and delay), as described in diagram 5.4 are calculated by means of the SPI

register extension. The values of the SPI register extension are converted into binary representation

and split into 8 bit blocks (columns E to H), because this format is needed when the SPI register is

programmed in the SystemC testbench. Besides, the maximum frame rate for the given frame period

is calculated.

55

Page 59: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 5. Data and control interface for the LUPA-3000 image sensor

Figure 5.6.: Excel sheet for timing parameter calculation

Warning messages are displayed when the given timing constraints cannot be fulfilled or when the

hold requirements for the exposure signals are violated.

5.3. LVDS receiver

The basic LVDS receiver component was already introduced in Chapter 3.3. The only difference here

is that the number of data channels is increased to 34. Therefore, it is necessary to add a LVDS

input buffer, an IDELAY, an ISERDES and a byte parser component for each new channel. Besides,

the multiplexer providing the input for the bit align machine needs to be extended and the resource

sharing control has to take into account the larger number of channels.

At the end of the training phase, the idle or training pattern is changed, as already mentioned in

the SPI wrapper description in Chapter 5.1. This is done to inform the byte parser that valid data

is transmitted. The byte parser eliminates byte offsets among the LVDS channels. The transition

between old and new idle patterns indicates a unique synchronization point in the byte stream, which

is used to start the data output of the receiver component. The different idle patterns are also used to

56

Page 60: Implementation of a FPGA-based Interface to a High Speed ...

5.4. Clock domain crossing

notify the byte parser to reorder the bits of a byte received from the ISERDES, because the LUPA-3000

sends the LSB first and the ISERDES expects MSB first.

Now all parts of the VHDL interface design were described. When the single blocks that operate

at different clock speeds communicate with each other the clock domain crossing signals have to be

treated in a special way, which is explained in detail in the next chapter.

5.4. Clock domain crossing

Advanced hardware designs, like the one introduced in the previous chapter, use multiple clocks for

different components. These designs generally have a problem when data or control signals are passed

from one clock domain to another. The signal appears asynchronous in the new clock domain. The

circuit that receives the signal has to synchronize it to avoid metastability.

Metastability appears when a flip flop samples an unstable signal e.g., during transition. Then the flip

flop’s output voltage level is non-deterministic and it is not predictable whether the output voltage will

converge to a correct voltage level, if it stays at an intermediate voltage level or if it oscillates before

it settles down. To avoid metastability, the incoming signal must be stable within a small timing

window around the sampling edge. This window is divided into setup and hold time, the time before

and after the sampling edge where the signal has to remain stable. If a design meets these timing

requirements, the possibility that the flip-flop will fail is negligibly small. Most synthesis tools cannot

determine whether asynchronous signals meet the timing requirements for the sampling flip-flop. For

this reason, circuits that eliminate the effects caused by asynchronous signals should be used.

The easiest method for sampling asynchronous signals is the concatenation of two flip-flops, without

combinatorial logic between them. Besides, the last gate in the transmitting clock domain has to be a

flip-flop, combinatorial logic is not allowed here. The reason for this limitation is that combinatorial

logic can cause signal delays which promote the metastability problem. Such a basic synchronizer

circuit is visualized in Fig. 5.7.

Figure 5.7.: Basic synchronizer circuit

When more than two flip-flops are concatenated in the receiver clock domain the occurrence of metasta-

bility is less probable, but the incoming signal is delayed by more than two clock cycles. This kind

of synchronizer should be used also when the speed of the clock domains is equivalent but the clocks

57

Page 61: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 5. Data and control interface for the LUPA-3000 image sensor

are not synchronous. For clock domains that have different speeds it is necessary that signal in the

sending clock domain is at least stable for two clock cycles of the receiving clock domain. Generally,

the pulse in the transmitting clock domain has to be at least twice the length of a clock cycle in a

receiving domain in order to fulfill the sampling theorem, otherwise it may happen that the pulse is

not detected. The sampling theorem implies that the sampling frequency has to be at least twice the

signal frequency to make sure that all signal states are detected.

As soon as the clock domain crossing signal should be buffered, the easiest method to avoid metasta-

bility is the use of a first in, first out buffer (FIFO) with different read and write clocks. So the

metastability is handled inside the FIFO implementation and the hardware developer does not have

to care about it. This method is usually chosen when the signal has multiple bits or when a buffer

element is needed between the clock domains. In the interface for the image sensor all clock domain

crossing signal can be handle with one of these methods. There are further methods available which

may be used for more specialized applications, c.f. [Ste03].

The LUPA-3000 controller design contains some signals which are used among clock domains. One is

the training done signal set by the LVDS receiver. It passes exactly the basic synchronizer, introduced

before, an output flip-flop in the receiver and two concatenated flip-flops in the exposure control and

the SPI wrapper. The hold requirement of the training done is always fulfilled because this signal does

only have one rising edge that occurs at the end of the training phase.

The synchronizer component, which is used for the arbitration of SPI wrapper and exposure control,

simply implements two basic synchronizers. One basic synchronizer for each direction, to ensure the

correct sampling of the activation token. The hold requirement of two clock cycles in the receiving

clock domain for the activation token is fulfilled, because the state machines ensure that the token

signal is high long enough.

The operate signal from an external source is passed through two flip-flops before it is used internally.

The minimum hold requirement for this signal are two sensor clock cycles.

58

Page 62: Implementation of a FPGA-based Interface to a High Speed ...

6SystemC test environment

The previous chapters presented all parts needed for the complete system. In the next step they

are connected in a SystemC testbench. This testbench instantiates the LUPA-3000 SystemC model,

the VHDL implementation of the interface, a channel delay block that models delays in the LVDS

interconnection, an image builder which takes the data at the output of the VHDL block and generates

image data out of it and finally the stimulator that is connected to the SPI interface and controls the

whole design. A structural overview is shown in Fig. 6.1. Generally ModelSim provides a method that

can be used to co-simulate VHDL and SystemC. A specific description that deals with the simulation of

this project developed in this thesis is explained in Appendix C. General explanation of the possibilities

offered by Modelsim are given presented in [Men04].

Several clock signals are generated in the SystemC topmodule: The master clk p which runs at master

clock speed to drive the LUPA-3000 and an inverse version of it named master clk n. Additionally,

there is the clkExp clock which runs four times slower than the master clock (sensor clock speed)

and drives the exposure control block. The spi clk which runs at 10 MHz and drives the SPI wrapper

component in the VHDL design and the spi clk input of the image sensor. And finally a 200 MHz

reference clock, named clk200, which is required for the IDELAY primitives in the LVDS receiver

component.

The channel delay block is used to model different lengths of the interconnection between the image

sensor and the connecting interface design. These delays have the same functionality as the delays

which were generated in the LVDS transmitter in the test design discussed in Chapter 3. These delays

can be used to model e.g, connections of different length on a printed circuit board (PCB). The delay

component consists of different delay blocks which delay a positive and a negative signal for the same

time. These blocks use two single delay blocks for the positive and negative signals.

59

Page 63: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 6. SystemC test environment

VHDL

SystemC Testbench

LVDS receiver

LUPA-3000

SPI wrapper

SPI memory

Exposure Control

MOSI addr

MOSI data

MISO

CSMOSI

MISO

synchronizer

Training_Done

dout addr

addrdindout

exposure2

exposure1

LVDS_0_PLVDS_0_N

LVDS_31_PLVDS_31_N

LVDS_SYNC_PLVDS_SYNC_N

LVDS_CLK_PLVDS_CLK_N

Image Builder

Stimulator

Data channel 0

Data channel 1

Data channel 30

Data channel 31

Sync channel

operate

channeldelays

Figure 6.1.: Complete SystemC testbench, with the LUPA-3000 model and the VHDL implementation

of the interface design

60

Page 64: Implementation of a FPGA-based Interface to a High Speed ...

6.1. SystemC transport delay

6.1. SystemC transport delay

The implementation of a delay in SystemC is not as simple as it seems, because SystemC has no build-

in delay. In contrast, VHDL has a so-called transport delay which can delay a signal by a certain

specified time. In SystemC the only available is a wait() statement which is loaded with a time value,

or which can wait for an event. With only this mechanism available it is not straight forward to build

a transport delay element, but there is a way to overcome this problem.

To build such a delay a sc fifo and a sc event queue are necessary. A sc fifo is a normal FIFO model of

the SystemC library and a sc event queue is a special kind of FIFO which can queue multiple events

that somehow depended on the same signal or module. A read process (SC METHOD) writes each

new input value to the FIFO and queues an event in the event queue. This is done as soon as the data

input changes. The event is notified in the specified transport delay time. Another SC THREAD is

sensitive to event notification from the event queue which indicates that the latest data token in the

FIFO should be written to the output.

The following code snipped contains the code of the module with the ports, internal variables and the

constructor.

SC_MODULE(Delay)

sc_in<bool> in;

sc_out<bool> out;

sc_fifo<bool> delay_channel_fifo;

sc_event_queue channel_event_q;

sc_time transpDelay;

void read_in();

void write_out();

SC_HAS_PROCESS(Delay);

Delay(sc_module_name name_, sc_time delay_ = sc_time(0,SC_NS)):

sc_module(name_),delay_channel_fifo(50)

transpDelay=delay_;

SC_METHOD(read_in);

sensitive << in;

dont_initialize(); //prevent read_in from initialization

SC_THREAD(write_out);

sensitive << channel_event_q;

;

61

Page 65: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 6. SystemC test environment

The prototypes of the read in() and write out() are declared in the module declaration, the constructor

defines them to behave like a SC METHOD and a SC THREAD. Each time the data input changes

the read in() method is executed, because it is set sensitive to the in input port. The implementation

of the functions is shown in the following code.

void Delay::read_in()

delay_channel_fifo.write(in.read());

channel_event_q.notify(transpDelay);

void Delay::write_out()

while(true)

wait();

out.write(delay_channel_fifo.read());

Each time a event notification is pending the write out() thread is unblocked, because its wait state-

ment is sensitive to event from the channel event q. The sensitivity is defined in the constructor of

the delay module.

This implementation is able to detect signal pulses that are smaller than the actual delay, which

is necessary for a sense full transport delay model. There is only one problem with the current

implementation, the maximum delay is limited by the actual FIFO size and depends on the minimum

data period. For example a clock signal should be delayed, then the period where the signal is stable

is half the clock period. This leads to the fact that the maximum delay is given by

maximum delay = data period · FIFO size

When the delay is chosen to large, it may happen that the simulation fails because read in() method

tries to write into the full FIFO. The standard FIFO size in SystemC is 16, in the given code it is set

to 50, in the constructor.

6.2. Image Builder

During operation the Image builder component reads the FIFO outputs of the VHDL design and

reconstructs the transmitted image. Reconstruction is performed with the evaluation of the sync

channel values. According to the synchronization sequence, the lines of each frame are read after

another and put back together..

Each time a new frame starts, the last one is appended to a Matlab workspace file (results.mat). For

the storage process the Matlab libraries are used, similar to the read access of the getPixel method in

the LUPA-3000 model.

62

Page 66: Implementation of a FPGA-based Interface to a High Speed ...

6.3. Stimulator

The Image builder block should run at least with the speed of the sensor clock, with a slower clock a

FIFO overflow will happen sooner or later and cause data loss at the LVDS receiver output FIFOs.

6.3. Stimulator

The stimulator is the block that controls the complete behavior of the other components. In the

beginning of a simulation the stimulator held the system in reset with the active-low reset n signal for

500 ns. Then arbitrary parameters can be programmed through the SPI. It is necessary to program

at least the exposure1 duration, to bring the system in a state where the active operate signal enables

the exposure process. If a lower frame rate than maximum achievable is desired the corresponding

parameter should bet set. For dual slope exposure it is necessary to program the duration of the

exposure2 offset. Additionally, a region of interest (ROI) or any other arbitrary parameter can be

changed. Configuration settings can be programmed any time during operation. Before configuration

changes are made, the operate signal should be disabled and the sequencer reset (SPI register 0, bit

1) has to be enabled. Now new parameters can be programmed through the SPI. When all changes

are made it is necessary to disable the sequencer reset again, otherwise now image readout is possible.

The clock speed for the stimulator component can be chosen arbitrarily, because all output signals

are treated as asynchronous signals in the receiving components. All components that are controlled

through the reset n expect it to be asynchronous, including the LUPA-3000. The exposure control

block treats the operate signal as asynchronous, utilizing a basic synchronizer as explained in Chapter

5.4. And finally the MOSI addr, MOSI data and MISO FIFOs have different read and write clocks to

decouple the clock domains.

6.4. Demonstration

Finally, the complete system can be simulated. In the beginning all components are reset by the

global reset signal controlled by the stimulator. Then the SPI wrapper programs the settings for the

calibration. The LUPA-3000 transmits the training pattern on all channels. These data signals are

delayed inside the channel delay component arbitrarily. The LVDS receiver calibrates each channel

individually and rises the training done signal when all channels are aligned successfully. After that,

the SPI wrapper brings the image sensor in normal operation mode. In addition, settings from the

external SPI input are programmed to the shared SPI register and the LUPA-3000. Finally, the

exposure control starts to control the image sensor with the given exposure settings. As soon as

data is available from the receiver FIFOs the image builder starts to reconstruct the received image.

The image builder can read entirely synchronous data from the receiver FIFOs, because the receiver

balances all channels. For this reason each kind of delay can be introduced between the image sensor

and the LVDS receiver. The image builder will never recognize any delay among the channels.

The following code shows the status message from the LUPA-3000 model, obtained by a ModelSim

simulation of the entire system:

63

Page 67: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 6. SystemC test environment

# ************************************************

# single slope

# exposure duration: 776960 ns 40000 sensor clock cycles

# FOT: 40

# ROT: 7

# NB_OF_KERNELS: 21

# y start: 1010

# y end: 1260

# x start: 19

# lines: 251

# pixels per line: 672

# ************************************************

The print contains the duration of the last exposure pulse which influences the brightness of the image,

the internal timing setting and the settings for the current region of interest (ROI) are shown. y start

and y end are given as line numbers, whereas x start denotes a (odd) kernel number. These settings

result in an image that is is shown in Fig. 6.2. The images inside the figure are equivalent to the

Matlab files used as input or generated as output of the design. This diagram visualizes the image

readout of the system. All system components are abstracted to show only the control and data signal

flow between the components.

exposure

SPI

LUPA-3000

Image builder

LVDS receiver

region of interest

exposure control

SPI wrapper

userinput

Figure 6.2.: Abstract illustration of a practical image readout, the ROI is indicated by white white

lines in the input image

The received image is a little bit darker than the original one, this depends on the exposure duration.

A shorter exposure pulse would result in a brighter image. The settings for the image exposure

and readout are programmed through the SPI by the stimulator. These settings are hard coded

in the stimulator module. The test was performed with different delays on the channels, however

only different tap settings in the receiver indicate that a delay was compensated. The byte offset

compensation could be observed with the fill level of the FIFOs. The image builder does not recognize

any asynchronicity, because the receiver FIFOs output is always a completely balanced.

64

Page 68: Implementation of a FPGA-based Interface to a High Speed ...

7Conclusion

7.1. Summary

This thesis dealt with the interconnection of a LUPA-3000 image sensor to a FPGA. The challenging

requirements of the LVDS data interface where successfully turned into a VHDL implementation of

the receiver component. The basic of a LVDS connection where discussed to give an introduction to

the problems that have to be dealt with.

For testing purposes a LVDS communication line with a transmitter and receiver component was

established. The transmitter component has the ability to generate asynchronous output signals

which are not common for source synchronous systems. But this asynchronous signals were used to

test the robustness of the LVDS receiver. The receiver uses a calibration algorithm to ensure that

the incoming signals are sampled at an ideal position. In addition, the word alignment assures that

deserialization is done correctly. Finally, the byte offsets among the channels are compensated by the

use of FIFOs.

A design, that contains the transmitter and receiver component, was simulated and finally tested on

a FPGA evaluation board. The FPGA IOs were connected using wire pairs with different length

to evaluate the influence of different interconnection lengths. The design running on the FPGA

was verified utilizing the software logic analyzer ChipScope. A comprehensive introduction to the

ChipScope logic analyzer was given. With the use of this logic analyzer the LVDS communication

line was verified to ensure that a working interconnection was established. The evaluation has shown

that the receiver can handle asynchronous signals that are arbitrarily delayed. In addition, wires of

different length were used, to evaluate the propagation delay. This delay is not negligible, due to this

reason a circuit board designer should be aware of this delay.

65

Page 69: Implementation of a FPGA-based Interface to a High Speed ...

Chapter 7. Conclusion

Furthermore a software model of the image sensor was developed in SystemC, which is a C++ class

library for hardware modeling. The LUPA-3000 model behaves like the original image sensor and was

used to verify the functionality of the interface design that is connected to the sensor. The interface

design, which was developed in VHDL, includes an adapted version of the receiver component, a

component, that controls the exposure signals, and a controller, that connects to the configuration

interface (SPI) of the image sensor.

Finally a SystemC testbench was generated to perform a co-simulation of the LUPA-3000 model and

the corresponding interface design. With this testbench it is possible to model the complete system

behavior and reconstruct an image file out of the data transfered to the LVDS receiver in the VHDL

design.

7.2. Future work

Regarding the LVDS communication on the FPGA it would be interesting to use an oscilloscope to see

the real data eye. This would allow to evaluate the signal quality for different clock speeds and cable

lengths. In addition, deeper research on the dependency between interconnection length and signal

delay could help to explain the simulation results for different cable lengths more precise. Furthermore

the influence of the voltage level, used in the LVDS connection, would be interesting. Do higher voltage

levels increase the signal quality and allow higher speeds?

The LUPA-3000 model implements a brightness adjustment of the image, depending on the exposure

duration. There is no guarantee that the adjustment algorithm behave like the original sensor. For this

reason the behavior should be adjusted in the way that it matches the original sensor. Moreover the

dual slope exposure performs the same brightness adaption as the single slope mode, but in practice

this mode increases the dynamic range of the image, so this adaption should be implemented, too. In

addition the pixel control signals Vmem, precharge, sample and pixel reset are active during the frame

overhead time (FOT) and influence the image. Unfortunately, it is not known what kind of influence

these signals have on the image, hence practical tests with the image sensor could clarify their effect

and help to improve the software model.

Finally, the VHDL implementation of the interface design should be tested on the camera hardware.

Therefore it is necessary to build another VHDL design that is connected to the interface design,

developed in this work. This extended design has to provide an external interface for the image data

readout and a control input that connects to some kind of user interface.

66

Page 70: Implementation of a FPGA-based Interface to a High Speed ...

AAbbreviations and Acronyms

ADC analog digital converter

BUFG global clock buffer

BUFR regional clock buffer

CRC cyclic redundancy check

DCM digital clock manager

DDR double data rate

EOL end of line

FIFO first in, first out buffer

FOT frame overhead time

FPGA Field Programmable Gate Array

ICON Integrated Controller core

ILA Integrated Logic Analyzer core

IOB input/output block of Xilinx FPGAs

LSB least significant bit

MSB most significant bit

PCB printed circuit board

67

Page 71: Implementation of a FPGA-based Interface to a High Speed ...

Appendix A. Abbreviations and Acronyms

PLL phase locked loop

ROI region of interest

ROT row overhead time

SOF start of frame

SOL start of line

SPI Serial Peripheral Interface

LVDS Low-Voltage Differential Signaling

68

Page 72: Implementation of a FPGA-based Interface to a High Speed ...

BFPGA pin mapping

The inputs and outputs of the Toplevel module are physically mapped to the FPGA pins. This

mapping is specified in a *.ucf file. The clock input is mapped to pin AH15 which is driven by the

clock generator at 100 MHz. The global reset which is active-low is connected to the pin J14, internally

the reset is handled active-high.

The LVDS outputs and inputs are mapped to the expansion header which contains the differential

FPGA I/Os. This header is called J4 connector and its pins are located in the I/O banks 11 and 13.

This connector contains normal differential I/Os and two differential clock inputs1, but these clock

inputs can only drive regional clocks. To drive a global clock buffer a special global clock input2 is

necessary, such a differential input is available in the J5 connector. The information about the Virtex5

I/Os of the different devices is available in [Xil09c], the schematic for the ML506 FPGA board is given

in [Xil08], the layout of the expansion headers is shown on [Xil08, page 11]. The complete pin mapping

for the differential I/Os is described in Table B.1.

The first two columns describe the pin numbers in the expansion header, the schematic net name is

the full name of the corresponding pin and the FPGA pin describes the pin position in the FPGA

package, similar to a chessboard (digits for the columns and letters for the rows). The PlanAhead tool

of the ISE Design Suite helps to generate the *.ucf file and shows the FPGA package graphically.

1Known as CC pins, as described in [Xil09c]2Known as GC pins, as described in [Xil09c]

69

Page 73: Implementation of a FPGA-based Interface to a High Speed ...

Appendix B. FPGA pin mapping

Differential Channel

Pin Pair Schematic Net Name FPGA pin Mapping Direction Bank

Pos Neg Pos Neg Pos Neg

4 2 HDR2 4 HDR2 2 L34 K34 0 out 11

8 6 HDR2 8 HDR2 6 K33 K32 1 in 11

12 10 HDR2 12 HDR2 10 P32 N32 2 in 11

16 14 HDR2 16 HDR2 14 T33 R34 3 in 11

20 18 HDR2 20 HDR2 18 R33 R32 4 out 11

24 22 HDR2 24 HDR2 22 U33 T34 5 out 11

28 26 HDR2 28 HDR2 26 U32 U31 sync out 11

32 30 HDR2 32 HDR2 30 V32 V33 clk out 13

36 34 HDR2 36 HDR2 34 W34 V34 - - 13

40 38 HDR2 40 HDR2 38 Y33 AA33 1 out 13

44 42 HDR2 44 HDR2 42 AF34 AE34 0 in 13

48 46 HDR2 48 HDR2 46 AF33 AE33 2 out 13

52 50 HDR2 52 HDR2 50 AC34 AD34 3 out 13

56 54 HDR2 56 HDR2 54 AC32 AB32 4 in 13

60 58 HDR2 60 HDR2 58 AC33 AB33 5 in 13

64 62 HDR2 64 HDR2 62 AN32 AP32 sync in 13

27 28 GPIO LED 2 GPIO LED 4 G15 G16 clk in 3

Table B.1.: Pin mapping of the expansion connectors (J4 and J5) on the ML506 evaluation board

70

Page 74: Implementation of a FPGA-based Interface to a High Speed ...

CSystemC and VHDL co-simulation

To simulate the given code in ModelSim it is necessary to have the C++ compiler for ModelSim

installed. If gcc for ModelSim is not installed (gcc-4.2.1-mingw32 folder in the ModelSim directory is

missing) go through the process as for downloading modelSim. The download is available from the

same site. To download Modelsim:

1. Go to http://www.model.com/content/modelsim-downloads

2. Click on the link to the Modelsim SE

3. Click on the Downloads Tab

4. Click on the link Download

5. Complete the registration form then click on the Request Download button

6. Click on the ftp link

7. Download the gcc-4.2.1-mingw32

8. Unzip the file into the modelSim installation dir

9. Set the CppPatch in the [sccom] section of the modelSim.ini file in the installation dir

e.g., CppPath = C:\modeltech_6.5\gcc-4.2.1-mingw32\bin\g++

The given design was tested with ModelSim SE PLUS 6.5 and Matlab2008a. Following script can

be used to compile the SystemC part of the design. The VHDL components should be compiled in

advance, using the standard compilation flow in ModelSim. Besides, it is necessary to compile and

include the XilinxCoreLib to the ModelSim libraries. To compile these libraries Xilinx provides the

compxlib tool, it can be started from the command line invoking compxlib.

71

Page 75: Implementation of a FPGA-based Interface to a High Speed ...

Appendix C. SystemC and VHDL co-simulation

First all open simulations are exited and the old SystemC objects are deleted.

quit -sim

vdel -allsystemc

Then the VHDL Toplevel file is compiled to generate the corresponding SystemC module with the

scgenmod command which maps the VHDL data types to SystemC. In this case std_logic is mapped

to bool and std_logic_vector is mapped to sc_uint. The output is written to a header file located

in the system path SYSTEMC_SRC where all SystemC files are located.

vcom ./src_vhdl/Toplevel.vhd

scgenmod -map std_logic=bool -map std_logic_vector=sc_uint vhdltop

> $env(SYSTEMC_SRC)/vhdl_toplevel.h

Now all *.cpp files are compiled, including the necessary Matlab headers.

sccom -I"C:/Programme/matlab2008a/extern/include" -g $env(SYSTEMC_SRC)/*.cpp

After successful compilation all object files are linked together, including the necessary Matlab DLLs.

sccom -L "C:/Programme/matlab2008a/bin/win32/" -l libeng -l libmx -l libmat -link

Finally, the simulation with the SystemC testbench is loaded.

vsim -do sc_wave.do -t ps -novopt work.mti_top

run 1200 us

All commands described above are available as compilation script named compileSC.do in the project

directory. For the script it is necessary to set two system variables: SYSTEMC_SRC where all SystemC

files are located and MASTER_DIR which point to the directory where all subproject folders are located.

In addition the path of the Matlab installation has to be adapted in the script. General information

about the SystemC and VHDL co-simulation with ModelSim is given in [Men04].

72

Page 76: Implementation of a FPGA-based Interface to a High Speed ...

Bibliography

[Bur06] Greg Burton. XAPP855: 16-Channel, DDR LVDS Interface with Per-Channel Alignment.

Xilinx, October 2006. v1.0.

[Cyp09] Cypress Semiconductor Corporation. LUPA-3000 Datasheet, August 2009. advance.

[Gmb10] The Imaging Source Europe GmbH. Multi slope. Website, 22 April 2010. The Imaging

Source Europe GmbH.

[IEE96] IEEE. IEEE standard for Low-Voltage Differential Signals (LVDS) for Scalable Coherent

Interface (SCI), Jul 1996.

[KGA03] Sean Koontz, Maria George, and Markus Adhiwiyogo. System Interface Timing Parameters.

Xilinx, April 2003. v1.0.

[Men04] Mentor Graphics. SystemC Verification with ModelSim, 2004.

[Nat08] National Semiconductor. LVDS Owner’s Manual, 2008.

[Ste03] Mike Stein. Crossing the abyss: asynchronous signals in a synchronous world. EDN Electr-

nics Design, Strategy, News, 310388:59 – 69, July 2003.

[Xil08] Xilinx. ML505, ML506, ML507 Schematics, January 2008.

[Xil09a] Xilinx. ChipScope Pro 11.3 Software and Cores - User Guide, 11.3 edition, September 2009.

v11.3.

[Xil09b] Xilinx. ML506 Evaluation Platform User Guide, October 2009. v3.1.1.

[Xil09c] Xilinx. Virtex-5 FPGA Packaging and Pinout Specification, December 2009. v4.7.

[Xil09d] Xilinx. Virtex-5 FPGA User Guide, November 2009. v5.2.

73