Implementation of a FPGA-based Interface to a High Speed ...
Transcript of Implementation of a FPGA-based Interface to a High Speed ...
Institut für Parallele und Verteilte Systeme Abteilung Parallele Systeme
Universität Stuttgart Universitätsstraße 38
D70569 Stuttgart
Masterarbeit Nr. 2993
Implementation of a FPGA-based Interface to a High Speed Image Sensor
Thomas Grob
Mai 2010
Abstract
This thesis is part of a project in which a high speed camera is developed. Subject of this work
is the interconnection of an image sensor LUPA-3000 and a FPGA. The FPGA is connected to the
multi channel Low-Voltage Differential Signaling (LVDS) data interface and handles the calibration
of the individual channels. The LVDS receiver interface can handle asynchronous data signals and
synchronizes them for subsequent processing. This complex LVDS receiver design is discussed and its
functionality explained in detail. For testing, a VHDL design was developed including an asynchronous
LVDS transmitter that transfers data to the receiver component through wires which interconnect the
FPGA IOs. After simulating the entire design it was tested in practice on a FPGA evaluation board.
This communication system was verified utilizing a ChipScope logic analyzer.
The interface design which is connected to the image sensor includes the receiver component as well
as an unit that provides an easy to use configuration interface for programming the image sensor.
Besides, the exposure control is realized within the VHDL design.
To evaluate the hardware design, that is connected to the image sensor, a SystemC testbench was
developed, that includes a software model of the LUPA-3000 image sensor to verify the functionality
of the overall design.
Contents
1. Introduction 7
1.1. Conceptual formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2. Document configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Low-Voltage Differential Signaling 9
2.1. Technical specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1. Differential signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3. LVDS Communication 13
3.1. System description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1. FPGA evaluation board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2. Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1. Serializer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2. Asynchronous transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3. Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1. Data and clock timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2. Data alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3. Compensation capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4. Design synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.1. Clock generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2. Pin mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5. The ChipScope logic analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6. Test & verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6.1. Test configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4. Image sensor - LUPA-3000 37
4.1. Sensor Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.1. Pixel architecture and timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2. Serial Peripheral Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1. SPI registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3. Readout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1. Cyclic redundancy check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4. Software model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1
Contents
5. Data and control interface for the LUPA-3000 image sensor 49
5.1. Serial peripheral interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2. Exposure control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.1. Timing parameter calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3. LVDS receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4. Clock domain crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6. SystemC test environment 59
6.1. SystemC transport delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2. Image Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3. Stimulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.4. Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7. Conclusion 65
7.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2. Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
A. Abbreviations and Acronyms 67
B. FPGA pin mapping 69
C. SystemC and VHDL co-simulation 71
Bibliography 73
2
List of Tables
3.1. Data path timing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2. Clock path timing definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3. Data path delay overview for Virtex5 XC5VSX50T with -1 speed grade . . . . . . . . 23
3.4. Clock path delay overview for Virtex5 XC5VSX50T with -1 speed grade . . . . . . . . 23
3.5. Delay settings for all LVDS channels, assuming a clock speed of 400 MHz . . . . . . . 32
4.1. Selection of important SPI adresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2. Sync channel values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1. SPI register address space extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
B.1. Pin mapping of the expansion connectors on the ML506 evaluation board . . . . . . . 70
3
List of Figures
2.1. LVDS communication line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2. Cross section of a differential wire pair, with its coupled fields . . . . . . . . . . . . . . 11
2.3. Voltage level of a differential signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4. Parallel clock SerDes with 8:1 serialization . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1. Schematic description of the LVDS communication system . . . . . . . . . . . . . . . . 14
3.2. Virtex5 FPGA evaluation board ML506 . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3. Block diagram of the LVDS transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4. 8:1 serializer, consisting of master and slave OSERDES component . . . . . . . . . . . 18
3.5. Example operation of the byte crusher component for a delay of 10 bits. . . . . . . . . 19
3.6. Block diagram of the LVDS receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.7. Data and clock path timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.8. Timing window for sampling clock edge . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.9. Sample data eye diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.10. Five steps of the bit alignment process . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.11. deserializer, consisting of master and slave ISERDES component . . . . . . . . . . . . 27
3.12. Clock edge adaption for low speed signals . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.13. Delay taps counter settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.14. Data output of ISERDES and the receiver FIFO outputs . . . . . . . . . . . . . . . . 35
3.15. Data inputs of the comparator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1. LUPA-3000 image sensor, copyright by Cypress Semiconductor Corp . . . . . . . . . . 37
4.2. Column multiplex scheme of the sensor architecture . . . . . . . . . . . . . . . . . . . 38
4.3. Pixel schematic of a 6-T pixel cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4. Pixel timing during frame overhead time (FOT) . . . . . . . . . . . . . . . . . . . . . . 40
4.5. Serial Peripheral Interface (SPI) read timing . . . . . . . . . . . . . . . . . . . . . . . . 40
4.6. SPI write timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.7. Pipelined operations, integration and readout are done in parallel . . . . . . . . . . . . 43
4.8. Exposure and readout timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.9. Sync channel and data channel values during image readout . . . . . . . . . . . . . . . 44
4.10. Circuit for CRC generation with the polynomial implemented in LUPA-3000 . . . . . 45
4.11. Structural diagram of the LUPA-300 SystemC model . . . . . . . . . . . . . . . . . . . 47
5.1. Structural description of the VHDL design . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2. Finite state machine of the SPIwrapper module . . . . . . . . . . . . . . . . . . . . . . 52
4
List of Figures
5.3. First finite state machine of the exposure control module . . . . . . . . . . . . . . . . 53
5.4. Timing diagram of the frame timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.5. Second finite state machine of the exposure control module . . . . . . . . . . . . . . . 54
5.6. Excel sheet for timing parameter calculation . . . . . . . . . . . . . . . . . . . . . . . . 56
5.7. Basic synchronizer circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1. Complete SystemC testbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2. Abstract illustration of a practical image readout . . . . . . . . . . . . . . . . . . . . . 64
5
1Introduction
This thesis is part of a project, in which a high speed camera is developed. This camera will have
a very high resolution of 1710 x 1696 pixels, with a frame rate of 480 frames per seconds. These
requirements are quite ambitious, because the amount of data that has to be processed is very high
(13.3 GBit/s).
The core components of the camera are a high speed image sensor and a FPGA. The FPGA design
controls the image sensor through a configuration interface as well as exposure signals. In addition, it
has to process the incoming data stream of the image sensor. The LVDS receiver component is tested
in practice on a FPGA evaluation board. Moreover, a VHDL design is developed that controls the
image sensor, provides an external programming interface and includes the LVDS receiver component.
This design is tested with a software model of the image sensor, that was additionally developed based
on the image sensor’s data sheet. The task definition for this thesis is given in the following section.
1.1. Conceptual formulation
Subject of this thesis is the implementation and verification of an interface module on a FPGA that
connects to a high-performance image sensor. The module controls and configures the image sensor
and receives image data from the image sensor. These functionalities require the module to connect
with the following three interfaces of the image sensor:
• Configuration interface: Serial Peripheral Interface
• Control interface: Group of TTL-Signals without clock that control the capture process of an
image
• High-speed data interface: Based on 34 LVDS connections.
7
Chapter 1. Introduction
A major problem of high-speed data interfaces with high clock frequencies is that signals of the interface
get out of sync. This asynchronism is caused by length variations of conductor path, tolerances in the
chip package and variances of the voltage levels between signals. In order to counteract, the connection
must be re-synchronized at the receiver side by using delay elements for each signal. Besides the
interface module, the FPGA will contain further logic for image processing. This logic belongs to a
clock domain with a different frequency than the clock domains of the configuration and data interface
of the image sensor. Therefore, data needs to be transferred between different clock domains. The
verification of the interface module should prove
• Functionality of the configuration and control interface
• Functionality and real-time performance of the data interface
The verification of the control and configuration functionality should be based on a model of the
image sensor which has to be developed using a high level language (e.g., C, Matlab). However,
the verification of the data interface requires implementing a transmitter module on the FPGA. The
transmitter module has to be connected to its counterpart using conductor paths outside the FPGA.
1.2. Document configuration
The second chapter of this thesis deals with the basics of a Low-Voltage Differential Signaling (LVDS)
interconnection. Details about the technical specification are discussed and typical architectures in-
volving multiple LVDS connections are explained.
The following chapter introduces a VHDL implementation of a LVDS transmitter and receiver that
use multiple LVDS interconnections. Both components are described in detail, especially the complex
algorithms in the receiver that ensure correct sampling of asynchronous input signals. The transmitter
and receiver are instantiated in a Toplevel design which includes some additional blocks that generate
data that is transmitted through the LVDS interface and received and compared with the originally
sent data. This design is synthesized and tested on a FPGA evaluation board. The transmitter and
receiver component are connected using wires between the FPGA pins.
The image sensor LUPA-3000 is described in Chapter 4. The general architecture and the timing of
an image readout is discussed. Moreover, the serial peripheral interface, which is used to program the
image sensor, is presented. And finally the software model of the LUPA-3000, which was developed
in SystemC, is introduced. SystemC is a C++ class library that can be used to model hardware
components in a high level language.
In Chapter 5 the VHDL interface implementation, that is connected to the image sensor, is discussed.
It consists of an adapted version of the LVDS receiver and some more components that control LUPA-
3000 image sensor.
The complete system, with the LUPA-3000 software model and the VHDL interface is connected in
a testbench, which is discussed in Chapter 6. The testbench includes a stimulator unit and an image
builder that reconstructs an image out of the data provides by the LVDS receiver.
8
2Low-Voltage Differential Signaling
Low-Voltage Differential Signaling (LVDS) is commonly used as high speed point to point connection
over small distances. One LVDS channel uses always two interconnections, a positive (p) and a negative
(n) one, which transmit contrary signed voltages. The voltage difference between both interconnections
represents a logical state.
This interface standard for high speed data transmission which uses low voltage signals was developed
and standardized in the mid 90’s. It only describes the physical layer and not the upper layer pro-
tocols. It is commonly used to interconnect components within one circuit board where high speed
transmission is required. LVDS is widely used in different applications e.g., computers where the PCI
Express bus utilizes this technique or many displays are controlled through this interface. Other fields
of application are industrial vision, medical engineering and automotive electronics.
LVDS is standardized in ANSI/TIA/EIA-644-1995 and IEEE Std 1596.3-1996 [IEE96]. With these
standards National Semiconductor has developed a textbook with a lot of detailed explanations and
design guidelines [Nat08].
2.1. Technical specification
There are many differential signaling techniques available and some devices claim to have a LVDS
interface but infringe the ANSI/TIA/EIA-644 standard. Regarding the LVDS standard the maximum
data rate is 3.125 Gbps1 at a voltage swing of ± 350 mV.
1Gigabits per second
9
Chapter 2. Low-Voltage Differential Signaling
2.1.1. Differential signaling
A typical LVDS communication line is unidirectional, so there is one driver and one receiver. The
data are always transmitted from the driver to the receiver. A communication line consists of two
conductors which are driven with contrarily signed voltages. The evaluation of the voltage difference
has the advantage that disturbances during transmission have less influence on the transmitted signal,
because both conductors are influenced equally, thus excellent noise immunity is given.
Figure 2.1 clarifies the typical structure of a LVDS communication. The driver consists of a current
source that delivers constantly 3.5 mA. Depending on the bit that should be sent, the driver switches
the voltage on the wires.
100 Ω
Receiver
Current sourceDriver
Figure 2.1.: LVDS communication line
A termination resistor of 100 Ω across the receiver inputs terminates the transmission line. This
termination resistor is equivalent to the line impedance and avoids signal reflection. The nominal
voltage drop at the termination resistor respectively receiver input is 350 mV.
The capacitive coupled (AC coupled) field between both conductors is displayed in Fig. 2.2. It is
responsible for the excellent noise immunity, because electro magnetic disturbances influence both
conductors similarly. Because the receiver only evaluates the difference of the signal, a disturbance
that is present in both conductors poses no problem. More details about the coupling and the line
termination are discussed in [Nat08, chapter 4].
A sample signal with corresponding voltage level (±350 mV) on the conductors is shown in Fig. 2.3.
A change of the transmitted logical state changes only the current direction, not its amplitude. The
color of the voltage level is equivalent to the conductor wire in Fig. 2.1. Further information about
the electrical specification is given in [IEE96]. The maximum distance for an LVDS communication
line is 10 m, but this distance can only be bridged when low loss cables are used. Moreover LVDS is
often used to connect two devices within one circuit board, so large distances are not the main focus
of LVDS.
10
2.2. Architecture
Figure 2.2.: Cross section of a differential wire
pair, with its coupled fieldsFigure 2.3.: Voltage level of a differential signal
There are other LVDS topologies available that deal with multiple drivers and receivers. This is
known as Multipoint LVDS (M-LVDS) and is standardized in ANSI/TIA/EIA-899. Another LVDS
type, called Bus LVDS (B-LVDS), focuses on multiple receivers, but is not standardized. Since these
Multipoint topologies are not important in the following, further details are omitted here, but can be
found in [Nat08].
2.2. Architecture
A common architecture which is used in combination with LVDS is the so-called Parallel Clock SerDes.
In this architecture multiple LVDS channels are used in parallel, where one channel transmits the clock
signal and the others are used for data. The fact that the clock signal for the data channel sampling
is transmitted in parallel classifies the system as source synchronous. The SerDes in the architecture
name stands for Serializer/Deserializer. Each LVDS channel takes the data of a multiplexer which
serializes a specified number of parallel bits. On the receiver side the serial data stream is parallelized
again. A schematic describing this architecture is available in Fig. 2.4.
In this example, each LVDS channel serializes 8 parallel bits that are transmitted. Depending on the
system the serialization direction can be most significant bit (MSB) or least significant bit (LSB) first.
Here, the MSB is transmitted first. The receiver will use the received clock signal to sample the data
channels. For this reason it is necessary to generate a clock signal which is as fast as the serial data
on a LVDS channel. Therefore, the clock is multiplied by a factor of 8 at single data rate (SDR) or
by 4 when double data rate (DDR) is used. DDR uses rising and falling edges for sampling, so the
clock speed can be halved compared to SDR. At the receiver side the signal of the data channels is
sampled and parallelized again. Furthermore, the divided clock signal is again synchronous to the
parallel data. The number of LVDS channels can be chosen arbitrarily depending on the problem for
which this data interface should be utilized.
There are also other architectures available e.g., embedded clock SerDes where the rising clock edge
is inserted periodically into the data stream and is used by the receiver for synchronization. Another
architecture is known as 8b/10b SerDes, where 8 data bits are coded to a 10 bit word and an additional
11
Chapter 2. Low-Voltage Differential Signaling
P
N
P
N
P
N
P
N
MUX 8:1
D0
D1
D2
.
.
.
D7
MUX 8:1
D0
D1
D2
.
.
.
D7
MUX 8:1
D0
D1
D2
.
.
.
D7
parallel clock
serializer LVDS driver
clockmultiplier
(*8)
D0 D1 ... D6 D7 N
N
N
DEMUX 8:1
D0
D1
D2
.
.
.
D7
DEMUX 8:1
D0
D1
D2
.
.
.
D7
DEMUX 8:1
D0
D1
D2
.
.
.
D7
LVDS buffer
deserializer
P
D0 D1 ... D6 D7P
D0 D1 ... D6 D7P
N
parallel clockclockdivider
(/8)
P
Figure 2.4.: Parallel clock SerDes with 8:1 serialization
comma character is used for synchronization. In this architecture no clock is transmitted, the receiver
uses its own clock signal for sampling. Additional architectures and more details about their design
are given in [Nat08].
12
3LVDS Communication
To connect the LUPA-3000 image sensor with the Field Programmable Gate Array (FPGA), a LVDS
receiver design was developed in this work. The image sensor contains a parallel clock SerDes trans-
mitter, similar to the one introduced in the previous chapter, but with 34 LVDS channels. A complete
FPGA-to-FPGA LVDS communication system was developed, which fulfills the requirements estab-
lished by the LUPA-3000 image sensor. The receiver part of the design will later on be adapted to
be connected with the image sensor. Furthermore, the system contains a test unit, which verifies the
functionality of the communication line and especially of the receiver component to exclude errors.
This chapter deals with a demonstration of the LVDS communication system in practice. First, the
VHDL implementation of the LVDS transmitter and the receiver is introduced, synthesized and tested
on a FPGA evaluation board. During this process several problems occur that have to be dealt with
to establish a working connection. The implementation of the transmitter and receiver is based on
[Bur06] and the corresponding code examples, but extended and changed to fulfill the needs of the given
problem. For example, the number of LVDS channels is increased to 34 and the sampling process in
the receiver is adapted to compensate delays among the channels. It is explicitly distinguished between
existing and new work in the following.
3.1. System description
A Toplevel design was developed, which contains the LVDS transmitter and receiver component and
some additional modules that are used to compare the data values before and after transmission. The
output ports of the transmitter and the input ports of the receiver are connected inside the testbench
for the simulation or by wires between the FPGA I/Os when testing practically. A block diagram of
the overall system is shown in Fig. 3.1.
13
Chapter 3. LVDS Communication
clock
sync
data
...
LVDSReceiver
LVDSTransmitterF
IFO
Data Generator
Data Comparator
clock
sync
data
...
sync converter
data
sync
FIFO sync
FIFO channel 0
FIFO channel 1
FIFO channel 2
FIFO channel 3
FIFO channel 4
FIFO channel 5
Cross clkdomain
Trainingdone
48
48
48
8
8
8
8
8
8
8
Figure 3.1.: Schematic description of the LVDS communication system, the out ports of the transmitter
and the input ports of the receiver are connected inside a testbench or through wires among
the FPGA I/Os
The data generator creates data bytes, one for each channel, that are written to a FIFO and transmit-
ted through the LVDS channels. The LVDS receiver reads the serial data from the channels and writes
the bytes in the corresponding FIFO. Finally, the data comparator reads all FIFOs simultaneously
and compares the sent with the data received. Transmission errors are indicated by a signal.
In the beginning, the transmitter sends a training pattern. This is a predefined byte that is known
by the receiver. The receiver calibrates each channel and sets the training done signal when each
LVDS channel performed the bit and word alignment successfully. During bit alignment the incoming
serial signal is delayed until the sampling clock edge is placed exactly in the middle between two bit
transitions. This has to be done because jitter can decrease the time where a bit is stable. Moreover,
the word alignment process is responsible for correctly concatenating adjacent bits to one byte at the
parallel side.
A set training done signal sent by the receiver indicates indicates the end of the calibration phase.
It enables the data generator and the sync converter module. The data generator starts producing
arbitrary bytes (counting from 0 to 255 in a loop) for each data channel and an extra byte for the
sync channel. The generated data and the sync values are written to the LVDS transmitter, the sync
converter and the FIFO. The sync channel is equivalent to a data channel, the only difference is that
14
3.1. System description
the sync channel is interpreted as status information. Every sync value different from 0 indicates valid
data on the data channels. The sync converter activates the write enable signal of the FIFO for each
valid data value.
The receiver uses the incoming LVDS clock signal to sample the incoming data signals. When all
channels are calibrated independently, the training done signal is risen. The whole receiver module is
driven by the incoming LVDS clock. In Fig. 3.1 the part of the design which is connected to this clock
is placed inside the green box, which indicates an independent clock domain. The incoming data from
the receiver is written into the clock domain crossing FIFO of each channel. Moreover, these FIFOs
are used to compensate the skew among the data channels.
Back in the transmitters clock domain the comparator component reads all FIFOs from the receiver.
When the sync value is valid, the FIFO from the data generator is read and compared with values
from the receiver. Both data vectors are only compared when the sync signal of the receiver is valid,
otherwise more data is read from the receiver’s FIFOs until sync is valid. The comparator has a 6 bit
output, these bits represent the correct functionality of the data channels. During the initialization
phase these bits are set. When the comparator detects an error, the corresponding bit of the channel
is reset to 0.
3.1.1. FPGA evaluation board
To test and verify the functionality of the VHDL implementation, a FPGA evaluation board (Xilinx
ML506) is used. This board consists of a Virtex5 FPGA1 and further peripheral hardware. It has
several expansion headers that are connected with LVDS-capable differential pin pairs and some other
I/Os of the FPGA. The output ports of the transmitter and the input ports of the receiver are mapped
to these pin connectors. Finally some wires are used to connect the LVDS in and outputs with each
other. Fig. 3.2 shows the evaluation board with the attached wire interconnection. The ML506 has
one expansion header with 16 pairs of differential signal connections to the FPGA I/Os [Xil09b]. For
this reason, the sample design has only 8 LVDS channels: 6 data channels, one sync channel and one
channel for clock transmission. In general a Virtex FPGA has more differential I/Os, but on this
evaluation board they is only a limited number available through expansion headers
1Device XC5VSX50T of the Virtex5 family, speed grade -1 in a FFG1136 package
15
Chapter 3. LVDS Communication
Figure 3.2.: Virtex5 FPGA evaluation board ML506 with attached wire connection on the I/O pins
3.2. Transmitter
The transmitter module takes data or the training pattern on the parallel side and performs a 8:1
serialization for each LVDS channel. This results in a frequency which is 8 times higher at the serial
side, than at the parallel side. However, because DDR transmission is used, positive and negative clock
edges are used for sampling, hence the frequency is only 4 times higher. Therefore, the Transmitter
has two clock input signals, clock and clockdiv. The clock signal has the speed of the LVDS channels
and is four times faster than clockdiv (parallel side clock). The Toplevel design contains a phase
locked loop (PLL) that generates the clock signals clock, clockdiv and clk200 a 200 MHz reference
clock. This reference clock is needed for the IODELAY elements. The schematic of the transmitter
module is displayed in Fig. 3.3. The schematic is similar to the one of [Bur06], the only differences
are the number of channels that is decreased to six and the additional byte crusher and ODELAY
components for each channel. These additional components are needed to simulate a transmitter
which is not completely synchronous. Their functionality is explained in Section 3.2.2 in detail.
3.2.1. Serializer
The serialization is done by so-called OSERDES components. These components are available in
the input/output blocks of Xilinx FPGAs (IOB) to handle common I/O operations and to save pro-
grammable logic blocks. A single OSERDES primitive can only perform a 6:1 serialization, but it is
possible to concatenate two OSERDES blocks in a master/slave fashion to serialize up to 10 bits. Fig.
16
3.2. Transmitter
DATA_TX_P[00]
data [47:0]/ sync
training pattern
training_done
master
slave
OSERDES
DATA_TX_N[00]
LVDSEXT_25
DATA_TX_P[01]master
slave
OSERDES
DATA_TX_N[01]
LVDSEXT_25
DATA_TX_P[06]master
slave
OSERDES
DATA_TX_N[06]
LVDSEXT_25
ODDRCLOCK_TX_P
CLOCK_TX_N
LVDSEXT_25CLOCKDIV
data [5:0]
data [7:6]
data [13:8]
data [15:14]
sync [5:0]
sync [7:6]
CLK
D1
D2
Q
1
0
CLOCK
byte crusher
byte crusher
byte crusher
ODELAY
ODELAY
ODELAY
ODELAY
Figure 3.3.: Block diagram of the LVDS transmitter
3.4 clarifies the structure of such bundled OSERDES components, on the example of a 8:1 serialization.
There are shiftin and shiftout ports available for the concatenation. It is important to know that the
serialized data stream starts with the LSB or rather input D1.
3.2.2. Asynchronous transmitter
In source synchronous systems the receiver assumes that data and clock arrive synchronously or only
with a minimal skew. Such an idealized system is described in [Bur06]. To test a receiver that should
be able to handle lager skews between the clock and data channels or among the data channels, a
transmitter is needed that can generate such asynchronicities. Therefore, the given basic design from
[Bur06] is extended.
17
Chapter 3. LVDS Communication
Bit8, Bit 7, … , Bit2, Bit1D1
D2
D3
D4
D5
D6
Q
SHIFTIN1 SHIFTIN2
D1
D2
D3
D4
D5
D6
SHIFTOUT1 SHIFTOUT2
Bit 2
Bit 1
Bit 3
Bit 4
Bit 5
Bit 6
Bit 7
Bit 8
Serial sideParallel side
LSB first
Figure 3.4.: 8:1 serializer, consisting of master and slave OSERDES component
This extended transmitter can generate two different types of delay. One type of delay is generated
by bit insertion, which is done by the byte crusher component. The other one is done by a so-called
output delay primitives available in the I/O blocks of newer Virtex FPGAs.
Byte crusher
The byte crusher introduces a delay to the signal by simply adding a predefined number of bits to
the data stream. This is performed by some logic that works like a shift register. The minimal delay
is given by the duration of one bit which depends on the frequency of the LVDS clock. For DDR
transmission this can be calculated according to following equation:
T =1
2 · frequency
where T is the duration of one bit and frequency the speed of the LVDS clock. Because, the OSERDES
primitive produces a serial data stream which starts with the LSB, the byte crusher has to add the
delay bits in front of the LSB of the first data byte. This process is illustrated in Fig. 3.5 for a delay
of 10 bits. For a serial data stream where the MSB is sent first, the delay bits have to be inserted in
front of the MSB.
The byte crusher has an internal buffer size of number of delay bits plus 8. In this example the buffer
width is 18 bits. In each clock cycle the input byte is written to the upper 8 bits of the buffer, the
lower 8 bits are always assigned to the output of the byte crusher. The buffer content is shifted by 8
to the right and the input byte is written to the upper 8 bits in each clock cycle. With this mechanism
the byte crusher can introduce delays of arbitrary length to the signal.
18
3.3. Receiver
1011121314151617 9 15678 4 3 2 0
1567 4 3 2 0
1567 4 3 2 0
output
1 clock cycle
1011121314151617 9 8
output1567 4 3 2 0
input
input
Figure 3.5.: Example operation of the byte crusher component for a delay of 10 bits.
Output delay
The output delay, or in Xilinx terminology IODELAY, is a primitive in the I/O block which can be
used in combination with a SERDES component. It can delay a signal by multiple 75 ps steps, so-
called taps. There are up to 63 taps available in an IODELAY component. The maximally achievable
delay with 63 taps is 4.725 ns. The number of taps can be set to fixed or variable. A fixed tap setting is
hard coded and cannot be changed during operation. Whereas a variable tap setting can be initialized
arbitrarily. Furthermore, it can be changed with increment and decrement signals by single tap steps.
At IODELAY instantiation the developer has to decide whether an input or output signal should be
delayed. In the following, the direction is indicated by the term IDELAY for input or ODELAY for
output signals.
For the transmitter the tap settings of the ODELAY primitives are fixed. For example at a LVDS
clock speed of 400 MHz, one bit has a duration of 1.25 ns, so approximately 17 taps are needed to
perform a shift of one bit. With the combination of a byte crusher and an ODELAY very fine steps
can be attained even for a long delay.
The clock signal is routed through an ODDR and an ODELAY element to the LVDS driver. The
ODDR module sends alternating the signal values of the inputs D1 (1) and D2 (0) for each rising
and falling clock edge. When this component is omitted, such that the clock signal is routed directly
through the ODELAY and to the LVDS driver, the simulation and synthesis works fine. However,
practical tests have shown that, the LVDS receiver on the FPGA does not receive any clock signal.
For this reason an ODDR component is required.
3.3. Receiver
The receiver performs a 1:8 deserialization of the incoming signal. Interesting here are the algorithm
for adapting the sampling position (bit alignment), the word alignment process and finally the skew
compensation among the channels.
As illustrated in the block diagram of the receiver in Fig. 3.6. the incoming differential signals pass
the LVDS input buffers, which transform them to single ended signals. After passing the input delay
19
Chapter 3. LVDS Communication
primitives, the ISERDES components use the clock signal for sampling. This clock signal is obtained
from the LVDS channel, which is common for source synchronous systems. During calibration after the
system reset the resource sharing control module selects one channel after another to perform the data
alignment. This alignment process is controlled by the bit align machine, which adapts the channel
delay introduced by the IDELAY components and the word alignment performed in the ISERDES.
The control outputs of the bit align machine (bitslip, increment/decrement) are demultiplexed to the
ISERDES and IDELAY elements of each channel, but this is implied in the illustration.
Finally, after successful calibration, the byte parser starts to work. It searches for the first byte that
is equal to a user defined pattern and rises a data valid flag which is used to enable the write signal
for the FIFO. This action automatically compensates skews among the data channels that exceed one
byte.
Apart from the algorithm used for the bit and word alignment, the receiver component was entirely
reworked. The different number of LVDS channels makes it necessary to adapt the resource sharing
control block. One of the most significant changes made in the receiver component affects the internal
distribution of the clock signal, the original implementation was completely replaced.
In [Bur06] the incoming clock signal is distributed by a regional clock buffer (BUFR). This however,
is only possible when all ISERDES components are located within one bank. In Xilinx FPGAs the
I/O blocks, which contain the IODELAY and SERDES primitives, are organized in banks. A common
bank contains 40 I/O blocks, some special banks have only 20 I/O blocks. A clock signal that is driven
by a BUFR, is only available within one bank [Xil09d, chapter 1]. In contrast, a global clock signal is
available in all banks of the FPGA, but depending on the device the number of global clock lines is
limited and the distribution delay is much higher than for regional clocks.
Because this design should face the requirements for the interconnection with the image sensor, it is
assumed that the LVDS channel inputs are spread over multiple banks. Therefore, it is necessary to
use a global clock buffer (BUFG) to distribute the clock signal.
The differential input clock is transformed to a single ended clock by the LVDS input buffer. After-
wards, the signal is feed into a BUFG that drives a phase locked loop (PLL) to generate the clockdiv
signal, which has a 4 times larger period than the clock input. Both PLL outputs, clock (feedback)
and clockdiv (14) are fed into BUFG components to make them available in the whole receiver design.
Virtex5 FPGAs have phase locked loop (PLL) and digital clock manager (DCM) primitives available
to perform up and down sampling of clock signals. For this problem a PLL was chosen, because it
can handle frequencies up to 600 MHz and it offers an easy mechanism for clock down sampling by an
integer divider. In contrast a DCM can handle only frequencies up to 450 MHz which is far less and
not enough to reach the maximum achievable data rate. The fact that the DCM does not need time
to stabilize the output signals can be neglected, because this does not cause any problem. When PLLs
are used, the design has to be kept in reset until the PLL has locked (stabilized) its output signals.
20
3.3. Receiver
SYNC [7:0]
Valid[6]
DATA_FROM_ISERDES [15:8]
valid[1]
DATA_TX_P[00]
master
slave
LVDS_25
CLOCKDIV
DATA [7:2]
DATA [1:0]
IDELAY
ISERDES
DATA_TX_N[00]
IDELAYCTRL
200 MHz reference clock
CLOCK_PLVDS_25
CLOCK_NBUFG
PLLCLKOUT 1/4
CLKINBUFG
BUFG CLKFBINCLKFBOUT1/1
DATA_TX_P[01]
master
slave
LVDS_25DATA [15:10]
DATA [9:8]
IDELAY
ISERDES
DATA_TX_N[01]
DATA_TX_P[06]
master
slave
LVDS_25SYNC [7:2]
SYNC [1:0]
IDELAY
ISERDES
DATA_TX_N[06]
byte parser
byte parser
byte parser
bit align machine
resource sharing control
start_align
data_aligned
data_to_machine
bitslip
increment / decrement
CLOCK
DATA_FROM_ISERDES [7:0]
valid[0]
IDELAY
Figure 3.6.: Block diagram of the LVDS receiver
21
Chapter 3. LVDS Communication
3.3.1. Data and clock timing
To understand the bit alignment process with the adaption of the sampling position, it is necessary
to have a closer look at the post-place & route timing report. This report is generated by the Xilinx
ISE software2 and contains how much time the clock and data signals spent in each component. By
merging the information of the data and clock path timing, it is possible to determine a timing window
where the data signal will be sampled. The information of the timing report is visualized in Fig. 3.7.
For now it is assumed that the transmitter works as an ideal source synchronous component, so neither
the byte crusher nor the ODELAY introduce any delay to the signals.
IBUFDS
IODELAY
I O
Variable delay
IBUFGDS
DataPath
ClockPath
TIOPI TIODDO_IDATAIN TISDCK_DDLY_DDR
D
clk
ISERDESsampling FFs
TPLLCKO_CLKFBOUT
TIOPI TNET1 TNET4
TBGCKO_O
TNET2
PLL
TNET3
BUFG
TBGCKO_O
IODELAY
I O
Variable delay
TIODDO_IDATAIN
BUFG
Figure 3.7.: Data and clock path timing
The diagram shows the path of the data and clock signal from the input pad to the ISERDES primitive
which performs the sampling. The paths are divided into segments which introduce a delay to the
signal. The descriptions of the timing segments of the data path are available in Table 3.1 and the
ones of the clock path in Table 3.2. The concrete (minimum and maximum) timing values for the
design of the individual timing parameters are shown in Table 3.3 and 3.4. These numbers correspond
to the Virtex5 XC5VSX50T with -1 speed grade. The raw timing information is subject to minor
changes in subsequent revisions of the ISE tool3.
With the timing information given, it is possible to calculate the setup and hold time. These timing
parameters refer to the sampling process of a flip flop. The setup time is the duration when the signal
is stable before the sampling edge arrives. In contrast the hold time measures how long the signal is
stable after the sampling edge occurred. So the sum of the setup and hold time describes the timing
window when the signal is stable.
2The report can be found in the ISE software under Tools, Timing Analyzer, Post-Place & Route...3The timing information was generated with ISE Design suite, version 11.3 L.57
22
3.3. Receiver
Timing parameter Description
TIOPI Delay of the IOB input buffer
TIODDO IDATAIN Delay from the I pin of IOB pad to the D input of the ISERDES.
Propagation delay through IODELAY
TISCKD DDLY DDR Delay from the D input of the ISERDES to the sampling registers in
the ISERDES (setup and hold times of ISERDES)
with respect to CLK at DDR mode
Table 3.1.: Data path timing definitions
Timing parameter Description
TIOPI Delay of the IOB input buffer
TIODDO IDATAIN Delay from the I pin of IOB pad to the D input of the ISERDES.
Propagation delay through IODELAY
TNETx Distribution delay of the clock net
TBGCKO O Delay from BUFG input to output
TPLLCKO CLKFBOUT Delay introduced by PLL component
Table 3.2.: Clock path timing definitions
Timing parameter Minimum data path delay Maximum data path delay
TIOPI 1.120 ns 1.168 ns
TIODDO IDATAIN 0.917 ns 0.527 ns
TISCKD DDLY DDR -0.089 ns 0.352 ns
Total 1.948 ns 2.047 ns
Table 3.3.: Data path delay overview for Virtex5 XC5VSX50T with -1 speed grade
Timing parameter Minimum clock path delay Maximum clock path delay
TIOPI 1.082 ns 1.123 ns
TIODDO IDATAIN 0.917 ns 0.527 ns
TNET1 1.162 ns 1.263 ns
TBGCKO O 0.230 ns 0.250 ns
TNET2 0.095 ns 1.578 ns
TPLLCKO CLKFBOUT -1.846 ns -3.461 ns
TNET3 1.523 ns 1.655 ns
TBGCKO O 0.230 ns 0.250 ns
TNET4 0.113 ns 1.889 ns
Total 3.506 ns 5.074 ns
Table 3.4.: Clock path delay overview for Virtex5 XC5VSX50T with -1 speed grade
23
Chapter 3. LVDS Communication
With the given timing information for the clock and data path, the setup time can be obtained.
Setup Time = Max Data Delay − Min Clock Delay (3.1)
= 2.047 ns − 3.506 ns = −1.459 ns
A negative setup time indicates that the clock signal reaches the pin of the input flip-flop after the
data signal. The following equation yields the hold time:
Hold Time = Max Clock Delay − Min Data Delay (3.2)
= 5.074 ns − 1.948 ns = 3.126 ns
The Timing Windows, in which the sampling edge occurs, is calculated by the sum of setup and hold
time.
Timing Window = Setup Time + Hold Time (3.3)
= −1.459 ns + 3.126 ns = 1.667 ns
The calculations for the timing window are independent of the clock speed. Further details about
the timing analysis of source synchronous systems can be found in [KGA03]. The actual situation for
LVDS clock speeds of 200 MHz, 400 MHz and 600 MHz in DDR mode is displayed in Fig. 3.8. The
waveform indicates the duration of a single bit, depending on the clock speed. The timing window for
the sampling clock edge is large because a global clock is used and the predictable timing windows of
the clock path components especially the PLL and the clock net 4 (TNET4) are larger than for regional
clocks.
Data arrivesTiming windowSetup time
1.46 nsHold time3.13 ns
0 1 2 t [ns]
200 MHz - 400 Mb/s - T = 2,5 ns
400 MHz - 800 Mb/s - T = 1,25 ns
600 MHz - 1200 Mb/s - T = 0,83 ns
Figure 3.8.: Timing window for sampling clock edge
This sample timing window depicts the problem that it cannot be assumed that the sampling clock
edge occurs in the middle of the data eye. The data eye pattern, also known as an eye diagram, is a
typical oscilloscope waveform in which a digital data signal is repetitively sampled. An eye represents
the duration of a bit. For transitions between two bits the signal can stay at 1 or 0 or can change its
value. A sample eye diagram is shown in Fig. 3.9, the ideal position of sampling is in the middle of a
data eye.
24
3.3. Receiver
For higher speeds or bad signal quality the jitter increases and the transition edges flatten. Therefore
the data eyes becomes smaller. To sample the signal at a position, where the current bit is stable, the
sampling edge has to be moved to the middle between two signal transitions, i.e. to the center of the
data eye. This process is called bit alignment.
time
Binary 1
Binary 0
Signal power
Ideal sampling positions
data eye bit transition
Figure 3.9.: Sample data eye diagram, the ideal sampling position is marked with a dashed line
3.3.2. Data alignment
Bit alignment
Because, there is no possibility to move the sampling clock edge and to adapt each channel separately,
the signals of the channels are delayed to be positioned ideally for sampling. In order to delay the
signal the IDELAY primitives in the receiver are used. This process is controlled by the bit align
machine and can basically be divided into five steps. The following textual descriptions are illustrated
in Fig.3.10.
1. Initial sampling, at an arbitrary position The position may be somewhere in the data stream, in
a data eye or in a transition.
2. Find end of first transition
When the initial sampling position is stable, the signal is delayed until the current and last data
sample (ISERDES output) are different. That is when the transition is found.
3. Walk through the transition and find the beginning
The signal is delayed again until the data samples are again stable. The sampling position is on
the right hand side of the data eye.
4. Walk through the open data eye, count the taps and find the next transition
Now the signal is delayed again in order to move the sampling position to the left hand side of
the data eye. During this process the taps are counted.
25
Chapter 3. LVDS Communication
5. Go back to the middle of the data eye
When the left hand side of the data eye is found, half of the delay taps are decremented in order
to move the sampling position to the middle of the data eye.
The signal of a channel is delayed by a IDELAY component. This works exactly like the ODELAY
primitive, with the only difference being the direction of the signal, cf. Chapter 3.2.2.
open data eye
fixed sampling position
Bit x Bit x+1
Bit x Bit x+1
Bit x+1
Bit x Bit x+1
1
2
4
3
5
transition
Bit x Bit x+1
Bit x
Figure 3.10.: Five steps of the bit alignment process
The bit sampling is performed by the ISERDES component. Therefore one sampling process is referred
to sample 8 bits in a row and generating one byte. The process described above is simplified and
only deals with a single bits. To understand the state machine, which executes the given algorithm,
it is important to keep that in mind. A precise state diagram, that exactly matches the VHDL
implementation of the state machine, is available in [Bur06, page17]. When the bit alignment is
complete, all bits are sampled in the middle of the data eye, so the first alignment step is completed.
The next step is the word alignment, which concatenates the correct adjacent bits of the data stream
to a byte.
Deserializer and word alignment
The deserializer is the main unit inside the receiver. It is available as primitive in the I/O block, similar
to the OSERDES component in the transmitter. A single ISERDES component can only deserialize 6
bits. Therefore two ISERDES components can be bundle to deserialize up to 10 bits. A block diagram
with a master/slave arrangement of two ISERDES components is shown in Fig. 3.11.
An ISERDES component parallelizes the incoming bit stream, where the first incoming bit is treated
as LSB, the last one as MSB. The ISERDES primitive has the additional functionality of shifting the
sampling window on the serial data stream. This is necessary, because it is not possible to predict
where the sampling starts in the data stream. Hence, the byte at the output of the ISERDES can
contain bits, that originally were located in two adjacent transmitted bytes. To shift the sampling
26
3.3. Receiver
Bit1, Bit2, … , Bit7, Bit8Q1
Q2
Q3
Q4
Q5
Q6
D
SHIFTOUT1 SHIFTOUT2
Bit 7
Bit 8
Bit 6
Bit 5
Bit 4
Bit 3
Bit 2
Bit 1
Serial sideParallel side
LSB first
SHIFTIN1 SHIFTIN2Q1
Q2
Q3
Q4
Q5
Q6
Bitslip
Figure 3.11.: deserializer, consisting of master and slave ISERDES component
window, the bitslip signal has to be asserted high during a rising edge of clockdiv. In DDR mode each
bitslip pulse alternatingly results in a shift of 3 bits to the right or 1 bit to the left. After 8 bitslip
cycles the sampling window moves back to the original position. Further details about the ISERDES
and the bitslip operation are given in [Xil09d, Chapter 8].
Byte offset compensation
The given algorithm compensates only the fine grained delays that are smaller than the duration of
8 bits. For larger delays the algorithm works fine as well, but then there is a byte offset among the
channels at the receiver’s output. To compensate this skew, the byte parser component in combination
with a FIFO for each data channel balances the data signals when larger delays occur. The byte parser
is activated when the training is finished and waits for the first byte that is different from the training
pattern. With this mechanism the skew among the channels is compensated.
3.3.3. Compensation capabilities
The combination of bit and word alignment enables the receiver to compensate arbitrary delays that
are smaller than the duration of 8 serial bits. This algorithm was introduced in [Bur06]. The bit
alignment adapts the sampling position, however in the worst case the initial sampling position is
located at the right end of the data eye. Then the bit alignment algorithm has to delay the signal by
nearly two full data eyes, as described before in Chapter 3.3.2. Because there are only 63 delay taps
27
Chapter 3. LVDS Communication
available, the LVDS speed cannot be arbitrarily slow. With 63 taps the maximum delay that can be
achieved is
63 · 75 ps = 4725 ps.
This maximum delay limits the maximum duration of two bits, which results in a minimum frequency
for the LVDS clock. In the worst case the duration of two bits is equal to 63 taps or 4725 ps. Hence,
the maximum duration of one bit is 2365.5 ps. This results in a minimum clock speed for DDR
transmission of1
2 · 2365.5 10−9s= 211.64 MHz
This minimum speed causes a problem, because the LUPA-3000 image sensor is specified to work at
a speed of 206 MHz. For this reason, the receiver component needs to be extended to work at lower
frequencies.
Sampling position adaption for low speed signals
During the calibration it is neither possible to determine whether the transmission speed is high
enough nor if 63 taps are enough for the bit alignment. The given implementation of the receiver
does not detect a tap overflow in the IDELAY primitive. When the IDELAY component uses 63 taps
and another increment pulse is detected, the tap counter is set to 0. Such a tap overflow or a tap
settings of 63 indicates, that the clock speed is too slow and the sampling clock edge should be delayed.
To avoid the overflow it is necessary to trace the current tap setting and trigger the sampling edge
adaption when the tap counter increases to 63. In the implementation in [Bur06] a tap counter is used
for each channel but only for demonstration purposes, the IDELAY primitive itself does not provide
a tap counter, it does only have a bitslip enable and an increment/decrement input. The tap counter
recognizes these signals and adjusts the counter value.
A possibility to avoid the tap overflows is to delay the clock signal using the IDELAY primitive on the
clock channel. In the extended implementation, developed in this work, the counters for each channel
are used to trigger the clock edge adaption. This is done by means of a state machine inside the
receiver that observes the current tap setting and starts the clock edge adaption as soon as a counter
increases to 63. The state machine controls the reset for the resource sharing control and delays the
clock signal. A phase shift of the incoming clock signal causes an unlock of the PLL which triggers a
reset of all other blocks inside the receiver. This internal reset of the receiver starts a new calibration
for all channels. The clock edge adaption is maximally started once during a calibration process. A
situation where the sampling clock edge has to be adapted is shown in Fig. 3.12.
The original sampling edge occurs at the end of a data eye and the transmission speed is so slow that
more than 63 taps would be needed to detect two bit transitions. When this happens, the clock signal
is delayed by a specific number of taps. This delay should have approximately the length of 14 of a
clock period which is equal to a phase shift of 90, because the new initial sampling position should
be close to the previous bit transition.
28
3.4. Design synthesis
> 63 taps
original
Bit x+1Bit x Bit x+2
< 63 taps
delayed sampling position
clock delay
Figure 3.12.: Clock edge adaption for low speed signals
When the worst case is assumed, the original sampling position is located directly infront of the
transition. Hence, a clock delay of 90 would result in the fact that two bit transitions of the data
signal could be detected by using a delay of 1.5 bit durations. There are 62 taps available that could
be used without running into a new overflow. A duration of one bit is62·75 ps
1.5 = 3100 ps, which results
in a new minimum clock speed for DDR transmission of 13100·10−9 s = 161.3 MHz. Interconnections
which run at least this speed are guaranteed to be calibrated correctly, but also slower speeds could be
possible, because the transitions have a duration which increases the probability of detection with less
taps. With these adaptions there should not be any problem for slower interconnections to calibrate
the LVDS channels correctly. This clock edge adaption
3.4. Design synthesis
The complete design introduced in Chapter 3.1 and described in the previous chapters was synthesized
using the Xilinx XST tool. During this process all unnecessary signals and modules are removed due
to design optimization. This could lead to a problem, because signals that should be observed may be
removed in the optimization process.
Following example points this out. The IODELAY primitives have no output signals, which indicates
the current tap setting of the delay element. Therefore, the receiver contains an extra counter that
is sensitive to the increment and decrement signals that control the tap settings. Before the clock
delay extension was introduced, these counters were removed from the design because they were not
connected to any output. The following method can always be applied when a specific signal is needed
for debugging in the synthesized design.
The synthesis tool can be forced to keep signals that would be removed during optimization using the
following declaration in the VHDL code:
attribute KEEP of signalName : signal is "true";
where KEEP is declared as attribute KEEP : string; and signalName is the name of the signal that
should not be trimmed during synthesis.
29
Chapter 3. LVDS Communication
3.4.1. Clock generator
The Toplevel design which was synthesized has a single ended clock input which expects a 100 MHz
clock. The ML506 board has an Integrated Device Technology (IDT) EEPROM Programmable Clock
Generator which generates a 100 MHz single-ended clock amongst others [Xil09b, Page 19]. This clock
signal is used to drive the overall design, further details are presented in Appendix B.
The Toplevel module contains a PLL which performs an upsampling of the incoming clock signal
to 600 MHz and 150 MHz or 400 MHz and 100 MHz for the clock and clockdiv signals. These clock
signals finally drive all components except the receiver. Fig. 3.1 clarifies the different clock domains,
one for the transmitter side (including data generator and comparator) and the other one for the
receiver side. In addition, the PLL generates a 200 MHz clock which is needed for the IDELAYCTRL
component. This module is connected to nothing else than the reference clock, but it is necessary to
instantiate it for every design that includes an IODELAY primitive. Additional information about
this IDELAYCTRL component can be found in [Xil09d, Chapter 7].
3.4.2. Pin mapping
The inputs and outputs of the Toplevel module are physically mapped to the FPGA pins. This
mapping is specified in a *.ucf file, in the VHDL project directory. The clock input is mapped to a
100 MHz input of the clock generator and the reset is connected to the board reset which is active-low
(pin J14). Internally the reset is handled active-high. For this reason the reset signal is inverted inside
the Toplevel design.
The differential LVDS input and output ports are mapped to the expansion headers. The expansion
header pins are connected with wires to establish a connection between the LVDS transmitter and the
receiver. The input and output pins are distributed over two banks in different regions of the FPGA,
hence it is necessary to use a global clock buffer (BUFG) for the LVDS clock input at the receiver
side. Due to this, the clock input pins have to be chosen carefully because only a few FPGA I/Os
can handle clock signals that have to be distributed globally. The details about this pin mapping
exceptions and the complete mapping are discussed in Appendix B.
3.5. The ChipScope logic analyzer
To verify the functionality of the design under different conditions, it is necessary to visualize some
internal signals like the delay tap settings and the output of the comparator unit. Therefore a special
kind of logic analyzer is used, which is known as ChipScope.
The ChipScope software provides a possibility to trace internal signals of hardware designs. It needs a
cable connection between PC and the FPGA JTAG4 port where so-called logic analyzer cores can be
triggered to record a specific number of data samples. Therefor one or more Integrated Logic Analyzer
4Joint Test Action Group is a standardized interface for debugging and programming embedded hardware
30
3.5. The ChipScope logic analyzer
cores (ILAs) can be instantiated in the VHDL design. Such an ILA core has a clock input, at least one
trigger signal and an arbitrary number of data inputs. The clock is used to sample the specified signal
when the trigger condition has fired. Besides, the number of data samples, that should be recorded,
can be specified, but is limited by the capacity of free block rams that are available in the FPGA.
Other cores exist that provide more specific features, but they are not needed to analyze this design,
cf. [Xil09a] for further details. To read the recorded data with the ChipScope software, it is necessary
to instantiate an Integrated Controller core (ICON). This core handles the JTAG connection with the
PC, which is used for data transmission.
It is possible to instantiate the cores directly inside the VHDL code like any other VHDL module and
connect to the desired clock, trigger and data signals. However, there is an easier solution that enables
the core insertion after the synthesis and before the translation step: Therefore ChipScope provides a
Core Inserter tool that takes the results of the XST synthesis and provides a comfortable interface to
select arbitrary signals of the design and to trace them with an ILA.
Finally, the ChipScope analyzer software is used to arm the trigger and display the waveform of the
recorded signals. The PC where the software is running, is connected to the FPGA’s JTAG port via an
USB programming cable. When the connection is established, the trigger condition has to be chosen.
All kinds of logical expressions on the trigger signals can be used to generate a trigger condition which
is used to arm the trigger. When the trigger is armed, the recording of the data signals starts as
soon the trigger condition is fulfilled. The values of the data signals are immediately shown in the
waveform.
There are a few things that should be known when working with the ChipScope software:
• Recording clock signals is impossible an results always in an unrouteable signal error in the Place
& Route phase.
• Each ILA instance contains a latch. These latches are summarized in the Map report of the
design summary.
• Each time the inserted cores are changed, the computational effort for the translate process
increases.
• The ChipScope JTAG connection to the FPGA board is normally blocked by the Windows
firewall.
In the ChipScope user guide [Xil09a] all necessary information for getting started with the software is
provided.
31
Chapter 3. LVDS Communication
3.6. Test & verification
One purpose of the LVDS communication system was to find the speed limit of a LVDS interconnection
and to verify the algorithm of the data alignment. Furthermore, it was intended to test the skew
compensation among the channels and to evaluate differences in the signal propagation time caused
by length variation of the connecting wire.
The cable connector on the evaluation board, shown in Fig. 3.2, was taken from an old IDE hard disc
drive cable. The cables are crimped into the plug connector. What causes the error-prone weak point
of the connection. The connection cables have a length of 20 cm or 50 cm.
To verify the function of the design, it is necessary to observe the ok output vector of the comparator
module, which indicates the error free transmission of all data channels. Furthermore, the tap counter
setting of each channel, the receiver output and the inputs of the comparator should be recorded which
is done by 3 different ILAs. One ILA for the tap counter settings, another one for the ISERDES output
values, receiver FIFO output values and the ok signals and finally the third one for both comparator
inputs and the ok signals.
3.6.1. Test configuration
For the test all channels, except channel 1, had a 20 cm cable connection, channel 1 had a 50 cm wire.
Moreover, the transmitter introduces different delays to the channels. The left hand side of Table 3.5
contains the delay setting of the byte crusher and ODELAY component that where used in the tests.
The combination of both delays results in a total delay that can be calculated in dependence of the
frequency, which was chosen to be 400 MHz (LVDS clock). Channel 0 does not have any artificial delay
introduced, therefore it can be regarded as reference. The other channels have arbitrary combinations
of both delays.
channel byte crusher ODELAY total delay byte offset bit offset rest delay rest delay
(bits) (taps) (ns) (bytes) (bit) (ps) (taps)
0 0 0 0 0 0 0 0
1 4 15 6.1 0 4 1125 15
2 50 60 67.0 6 5 750 10
3 8 0 10.0 1 0 0 0
4 11 10 14.5 1 3 750 10
5 17 63 26.0 2 4 975 13
sync 23 0 28.8 2 7 0 0
clock - 0 0 0 0 0 0
Table 3.5.: Delay settings for all LVDS channels and expected compensation, assuming a clock speed
of 400 MHz
32
3.6. Test & verification
At the receiver the delay of the signal is treated by different components, depending on the duration.
There are three levels of delays that are compensated with different mechanisms.
• Byte offsets (delays with duration of 8 bit or multiples) are compensated with the per channel
FIFOs.
• Bit offsets (delays with duration of 1, 2, ..., 7 bits) are treated by the ISERDES bitslip function.
• All other smaller delays are compensated by adaption of the IDELAY component.
Each delay of a channel is divided into these parts, which results in the fact that the tap counter
for each channel will only indicate delay components that are smaller than a bit or byte offset. The
expected split delay components for each channel were calculated and presented on the right hand
side of Table 3.5.
Fig. 3.13 shows a screen shot of the ChipScope logic analyzer with an ILA that recorded the tap
settings for the data and sync channels. The reference channel 0 (TAP 0) used 23 taps to sample in
the middle of the data eye. Because, there is no additional delay on this channel, the other channels
should use approximately equal or less taps to compensate the small delay component. The delay that
is introduced by the receiver or during transmission, compared to the delay of the reference channel,
decreases the delay that has to be generated by the receiver.
Figure 3.13.: ChipScope screen shot with taps counter for the data and sync channels
Similar to channel 0, channel 3 also has no delay in the small component. Therefore the tap counter
value is equal, but due to variations in the signal quality during the calibration phase it may happen
that the tap counter value varies by ±1 tap.
33
Chapter 3. LVDS Communication
In general the delay introduced by the transmitter and the IDELAY of the receiver should add up
to approximately 22 - 23 taps or 1.650 ns - 1.725 ns like for channel 0. This also holds for the other
channels. Channel 6 (TAP 6, sync channel) probably does not sample completely in the center of the
data eye, but since no bit error occurs, the sampling position is acceptable. Such variances occur when
a strong jitter is present during calibration. A transition edge is then detected earlier or later and the
measured center of the data eye varies.
For correct interpretation of the screen shot it is important that channel 1 uses a 50 cm wire and all
other channels have 20 cm cables. This causes an extra delay during transmission. Unfortunately,
this delay is not completely measurable, but probably it has a size of approximately 2225 ps. This
obtained by the calculation:
ODELAY + IDELAY − reference tap setting
= 15 + 21 − 23
= 13 taps
where 13 taps generate a delay of 975 ps, which is far too less, because this would result in an unrealistic
signal speed of 307.7 · 109 ms . Hence, there is an additional bit delay which is compensated by the
bitslip function. For a 1 bit delay these are 1250 ps, which results in a total delay of approximately
2225 ps. The corresponding signal speed for this delay would be 134.8 ·106 ms which is a realistic speed.
The previous calculation is only a rough estimate to interpret the results of the simulation. Other
tests with different delay settings led to comparable results.
To test the robustness of the communication, instead of the 20 cm cables, some longer cables with
50 cm were used. At a frequency of 400 MHz the cable length difference had no effect on the error-free
transmission, but for higher speeds e.g., 600 MHz they had many bit errors.
Another ILA traces the signals of the ISERDES output, the receiver FIFO’s outputs that are connected
to the comparator unit and the ok signal. The waveforms, displayed in Fig. 3.14, show the byte offsets
among the channels. The signal data from iserdes 0-5 display the ISERDES outputs. Channel 0 is
the reference channel, the other channels have byte offsets between one and six bytes, compared to
this channel.
The data pattern 00 (hex) clarifies the byte offset of each channel (data from iserdes) compared to
channel 0 which has no delay. The reference position of channel 0 is marked with a red line in Fig.
3.14. Channel 1, 3 and 4 have 1 byte offset, channel 2 has 6 bytes offset and channel 6 has 2 bytes
offset. When comparing these values with the calculation results of the expected values in Table 3.5
it turns out that channel 1 has a 1 byte larger offset than expected. This is due to the fact that the
bitslip is not only used for the compensation of bit offsets, but also to adapt the sampling window.
Sometimes it may happen that the bytes offset increases by one due to shifts of the ISERDES bitslip
function. This effect is non deterministic, because it can not be predicted where the initial sampling
starts. For this reason it is not known how many bitslip cycles are performed by the ISERDES.
Moreover, the screen shot in Fig. 3.14 shows that the data signal (rxfifo1 - rxfifo5 ) arrive completely
balanced at the comparator input (data sample 295) and that the comparator has not detected any
34
3.6. Test & verification
Figure 3.14.: Data output of ISERDES, the receiver FIFO outputs at the comparator and the ok signal
bit failures, because the ok signals are still set. This screen shot proves the ability of processing
asynchronous input signals and balancing all channels to generate a synchronous data stream.
Finally, the third ILA recorded the inputs of the comparator, the receiver FIFO’s outputs and the
output of the data generator FIFO and again the ok signal. The result is shown in Fig. 3.15. The
data generator produces values from 0 to 255, then it waits one cycle where the sync signal is set
invalid and then it starts again counting with a valid sync value. Each time when an invalid sync
value occurs the receiver’s FIFOs run out of data. This happens at data sample 27 in Fig. 3.15. The
data generator FIFO does not run out of data because of the delay introduced during transmission,
hence this FIFO contains always more data tokens than the receiver FIFOs. The ok signal stays valid
because the comparator does not compare sample 27, since at least one of the receiver FIFOs is empty.
The data generator FIFO has a width of 48 bits, but for clarification this bitvector is split into single
bytes in order to be able to directly compare it to the corresponding receiver FIFO output.
The tests with the ChipScope logic analyzer has shown that the system is extremely robust at a
clock speed of 400 MHz and 800 Mb/s respectively. No bit error were detected at this clock speed. In
contrast when the clock speed is increased to 600 MHz (1200 MB/s), sometimes bit errors occur. But
more than 90% of the data bytes are still correct. Especially if long cables with a length of 50 cm are
used the bit error rate increases significantly.
The clock speed of the whole design was verified using a clock divider that drives a LED on the
evaluation board. The divider is connected to the 200 MHz reference clock and divides it to a frequency
35
Chapter 3. LVDS Communication
Figure 3.15.: Data inputs of the comparator
of 2 Hz and drives the output port, which is mapped to a status LED. The frequency of the design
was verified by counting the light pulses per minute.
In addition, another test was performed to check the correctness of the clock edge adaption algorithm.
The master clock speed of the LVDS was reduced to 150 MHz. This speed causes very like a clock
edge adaption. To make sure that a clock edge adaption was triggered, a LED on the evaluation board
was used to indicate the use of the adaption process. The data signals were checked again with the
ChipScope ILAs. No transmission errors occurred and the LED indicated that the clock edge was
shifted, to allow correct sampling in the middle of the data eye. All these test have shown that the
receiver component can deal with all possible kinds of impairments that may happen.
36
4Image sensor - LUPA-3000
This chapter deals with the LUPA-3000 CMOS image sensor. Its functionality and interfaces are
described, since this is the basis for the software model of the image sensor. The specification of the
LUPA-3000 defines requirements for the hardware design, which should be connected later on to the
image sensor.
However, all important aspects that are necessary to understand the functionality of the LVDS data
interface are treated, as well as the exposure control and the Serial Peripheral Interface (SPI) that
are implemented in the SystemC model of the sensor. The interested reader is referred to the original
data sheet [Cyp09] to get more detailed information. A picture of the image sensor is shown in Fig.
4.1.
Figure 4.1.: LUPA-3000 image sensor, copyright by Cypress Semiconductor Corp
37
Chapter 4. Image sensor - LUPA-3000
4.1. Sensor Architecture
The sensor has a resolution of 1696 x 1710 pixels (columns x rows) and each row is divided into 53
kernels with a width of 32 pixels each. Pixel position (0,0) is located in the lower left corner. All 32
pixel cell output values of one kernel are transferred to a column amplifier. The column amplifier gains
the signal level and transfers the values to the even or odd kernel bus (32 bit width) depending on
the kernel number. 64 analog digital converters (ADCs) read the kernel buses and generate digital 8
bit values for each pixel. Always two ADCs alternatingly provide the input data for one LVDS driver.
This column multiplex scheme is shown in Fig. 4.2. Consequence, LVDS driver 0 always transmits the
first pixel of a kernel, LVDS driver 1 the second pixel and so on. Due to this multiplex scheme, the
pixel data of one kernel is always transmitted in parallel through 32 LVDS channels.
LVDS driver 0
AD
C 0
AD
C 1
LVDS driver 1
AD
C 2
AD
C 3
LVDS driver 31
ADC
62
ADC
63
32 32 32 32 32 32
32
321 1 1 1 1 1
...32 pixels
Figure 4.2.: Column multiplex scheme of the sensor architecture
The image sensor has a differential clock input, which is known as master clock and specified for a speed
of 206 MHz. The differential clock output which is synchronous to the LVDS channels operates at the
same frequency. The LVDS data and sync channels operate at double data rate (DDR). Internally,
the input clock is divided by four and is called sensor clock (51.5 MHz).
38
4.1. Sensor Architecture
4.1.1. Pixel architecture and timing
A pixel consists of a photo diode, 6 transistors and a capacitor. This kind of pixel is known as 6-T pixel
and has a global synchronous shutter feature. This feature allows a simultaneous reset and exposure
of all pixels. The 6-T pixel schematic is shown in Fig. 4.3.
Figure 4.3.: Pixel schematic of a 6-T pixel cell
The signals connected to the pixels are controlled through timers. These timing parameters can
be changed using the configuration interface. At the end of the exposure cycle, each pixel value is
transferred immediately to the Vmem capacitor to wait for its readout. The pixel values are then
readout row by row from the storage capacitors. This use of intermediate storage in the pixel reduces
the gradual overexposure that can occur down the image when the exposure happens not simultaneous
and the rows are readout directly from the active area.
The exposure time is controlled by the exposure1 pin in normal operation mode. In dual slope mode
two exposure pins, exposure1 and exposure2 are used. Dual slope performs a normal exposure first
and then resets all pixels that have reached the maximum value and performs a second expose for the
pixels that have been reset. This dual slope mode can increase the dynamic range of an image, but
it is only applicable on constant lightning conditions. Further details about the multi slope exposure
method are available in [Gmb10].
When the exposure cycle is finished the frame overhead time (FOT) starts, it is the time needed
until the pixel data is stable and ready to be readout. When FOT starts Vmem is brought low and
precharge and sample are set to high. The precharge pulse deletes old information from the storage
node to avoid image lag. When precharge is low again the sampling is completed during the remaining
duration of the sample pulse. The rising edge of Vmem triggers the pixel reset signal. The general
sequence of the pixel control signals in shown in the timing diagram in Fig. 4.4.
Further details about the pixels timing and all controls signals are available in the data sheet [Cyp09,
page 4], but all important details were discussed here. The timer values of the Vmem, precharge,
sample and FOT timer can be programmed by the user through the Serial Peripheral Interface (SPI),
this will be discussed in the next chapter.
39
Chapter 4. Image sensor - LUPA-3000
invalid data valid data
FOT _TIMERFOT _TIMER
SAMPLE_TIMERSAMPLE_TIMER
PRECHARGE_TIMERPRECHARGE_TIMER
VMEM _TIMERVMEM _TIMER
vmem
precharge
sample
pixel reset
data
Figure 4.4.: Pixel timing during FOT
4.2. Serial Peripheral Interface
The Serial Peripheral Interface (SPI) is used to program the behavior of the LUPA-3000 or readout
the current settings. This interface is also implemented in the software model of the image sensor and
the hardware design connecting to it has to implemented the counterpart. The LUPA-3000 has 128
SPI registers with a size of 8 bit each to store many different settings like timing parameters, image
size and region (smaller resolution).
Some address ranges are not in use, or at least not documented in the data sheet. The SPI is a simple
bus that uses four signals: clock, chip-select, MOSI (Master out Slave in) and MISO (Master in Slave
out)1. These four signals are used to transmit the address and data bytes serially between master
and slave. The maximum clock frequency supported by the sensor is 10 MHz. All read and write
operations are executed serially starting with the MSB first and they are always initiated by the bus
master. During operation the chip-select (CS) is brought low. First a 8 bit command consisting of a
read/write bit (C) and a 7 bit address (a<6> - a<0>) is sent through the MOSI wire. The timing
diagram for the read timing is visible in Fig. 4.5.
C a<6> a<5> a<1> a<0> don't care
d<7> d<6> d<1> d<0>
CS
spi_clk
MOSI
MISO
Figure 4.5.: SPI read timing
For read commands the read bit (C) is set to zero. When the address is transmitted completely the
LUPA-3000 immediately responds the register content on the MISO signal, starting again with the
MSB.
1In the SPI interconnection the LUPA-3000 image sensor is used as slave device
40
4.2. Serial Peripheral Interface
The timing of a SPI write operation is clarified in timing diagram 4.6. When the write bit (C) is set
to one the LUPA-3000 expects 7 address bits (a<6> - a<0>) and a byte value (d<7> - d<0>) on
the MOSI channel. There is no response on the MISO channel.
C a<6> a<5> a<1> a<0> d<7> d<6> d<1> d<0>
CS
spi_clk
MOSI
MISO
Figure 4.6.: SPI write timing
4.2.1. SPI registers
The content of the SPI registers is used to control the LUPA-3000 in a very comprehensive way. A list
of a selection of the most important registers is available in Table 4.1. The given registers influence
the behavior of the image sensor’s software model which will be introduced in Chapter 4.4, the other
registers are not important with respect to the basic functionality of the LUPA-3000.
The readout modes can be changed between normal operation, test image readout and training mode.
The internal pixel timing durations of vmem, precharge, sample, FOT and row overhead time (ROT)
can be specified. The dual slope exposure can be (de)activated. Besides it is possible to reduce the
image size, known as region of interest (ROI) to increase the frame rate. The appropriate ROI can
be controlled through the y start, y end, x start and number of kernels attributes. During changes of
the SPI registers, the sensor should be kept in sequencer reset (SPI address 0, bit 1) which interrupts
light integration and image readout.
Most important SPI registers and their functionality are discussed here. A complete list of all SPI
registers and more detailed description are available in the data sheet [Cyp09].
41
Chapter 4. Image sensor - LUPA-3000
Address Bits Name Description
0 <0> Power down Power down analog core
<1> Reset n seq Reset n of on chip sequencer
<2> Red rot Enable reduced ROT mode
<3> Ds en Enable dual slope operation
1 <4:0> ROT TIMER Length of ROT: n+ 2 sensor clocks
2 <7:0> PRECHARGE TIMER Length of pixel precharge: 4 · n sensor clocks
3 <7:0> SAMPLE TIMER Length of pixel sample: 4 · n sensor clocks
4 <7:0> VMEM TIMER Length of pixel vmem: 4 · n sensor clocks
5 <7:0> FOT TIMER Length of FOT: (4 · n) + 2 sensor clocks
6 <5:0> NB OF KERNELS Number of kernels to readout, minimum 4
7 <7:0> Y START <7:0> Start pointer Y readout
8 <2:0> Y START <10:8>
9 <7:0> Y END <7:0> End pointer Y readout
10 <2:0> Y END <10:8>
11 <4:0> X START Start pointer X
12 <0> Training en 0: Transmit test patterns
1: Transmit training patterns
<1> Bypass en 0: Ignore TRAINING EN bit, image readout
1: Evaluate TRAINING EN bit
30 <7:0> FIXED Fixed, read only register
31 <7:0> CHIP REV NB Chip revision number
32 <7:0> SOF Start Of Frame keyword
33 <7:0> SOL Start Of Line keyword
34 <7:0> EOL End Of Line keyword
35 <7:0> IDLE A Idle A keyword, used as training pattern
36 <7:0> IDLE B Idle B keyword, used as training pattern
71 <0> crc en Enable crc for data channel
<1> crc sync en Enable crc for sync channel
96 - 127 <7:0> Test patterns 0 - 31 Test patterns for each channel
Table 4.1.: Selection of important SPI adresses
42
4.3. Readout
4.3. Readout
The image sensor operates in pipelined mode, which enables light integration for the next frame and
image readout of the current frame in parallel. This process is visualized in Fig. 4.7.
Integration frame x+1 Integration frame x+2
Readout frame x Readout frame x+1
L0FOT L1 L2 L1709
Readout lines
K1 K2
Readout kernels
ROT K53
Figure 4.7.: Pipelined operations, integration and readout are done in parallel
One frame readout is divided into frame overhead time (FOT) and the specified number of lines.
Furthermore each line is divided into row overhead time (ROT) and the specified number of kernels.
Fig. 4.7 illustrates this process for the maximum resolution supported by the LUPA-3000. Each
kernels’ data is transferred in parallel via the 32 LVDS data channels.
The light integration is controlled through the exposure1 (and the exposure2 ) signal. A falling edge
of the exposure1 signal immediately starts the FOT. This activity is visualized in timing diagram 4.8.
L1 L2 L3 Lx
wait till ROT
sample timerintegration time
FOTFOTFOTFOTpixel Vmem
DATA
exposure1
pixel reset
pixel sample
Figure 4.8.: Exposure and readout timing
When the FOT is finished the pixel reset is set to high and the readout starts. The pixel reset should
be high for at least 3µs. After this period the next exposure cycle can start, which is indicated by a
rising edge of the exposure1 signal. When the rising edge occurs during readout it is internally delayed
until the next ROT, otherwise the falling edge of the pixel reset would introduce disturbance to the
image. The given time diagram visualizes the situation where the exposure is longer active than time
is needed for the readout. When the falling exposure signal occurs during readout it is interrupted,
43
Chapter 4. Image sensor - LUPA-3000
but the current line finished because the falling edge is internally delayed until the current line readout
is completed. A timing diagram visualizing this situation and one clarifying the dual slope integration
is available in [Cyp09, page 31].
While having information about the overall frame timing, a closer look can be taken at the sync and
data channels. The sync channel is used to tell the receiver which data is currently transmitted on
the data channels. An exemplary data stream is shown in Fig. 4.9, the corresponding abbreviations
are explained in Table 4.2. The sync value constants can be programmed via the SPI, the addresses
and default values are also mentioned in the table.
Ia Ib Ix SOF EOL Ix Ix SOL a<15:8> a<7:0> Ix EOL Ix Ix SOL a<15:8> a<7:0>
Ia Ib Ix Ix Ix Ix Ix Ix col i col i+32 col i+64 col x CRC Ix Ix col i col i+32
depending on ROTdepending on ROTdepending on ROTdepending on ROT
Sync
Data
Figure 4.9.: Sync channel and data channel values during image readout
Keyword Description SPI address Value
SOF Start of frame 32 32
SOL Start of line 33 34
EOL End of line 34 35
Ia Idle word A 35 235
Ib Idle word B 36 235
Ix Idle word A or B - -
a<15:8> Address of line being readout (upper 8 bits) - -
a<7:0> Address of line being readout (lower 8 bits) - -
CRC CRC checksum of the previous picture row - -
Table 4.2.: Sync channel values
When the image sensor is idle, the sync and data channels send alternatingly the idle A (Ia) and idle B
(Ib) values. Both values are set to 235 by default. These idle values are used as training pattern, too.
During FOT the idle patterns are transmitted on all channels, the last byte during FOT on the sync
channel is the start of frame (SOF) keyword. The first ROT starts with a misplaced end of line (EOL)
which can be ignored. The next values on the sync channel are idle values, their number depends on
the ROT length. The last value sent during ROT is the SOF keyword. Now the data transmission of
the first line starts. All 32 data channels transmit the pixel values of one kernel in parallel, this is done
until all kernels of one line are transmitted, so at least 4 times, because the minimum image width is 4
kernels (128 pixels). The sync channel transmits the line number, split into two bytes, and continues
with sending idle words until the second last kernel is transmitted. The last kernel of a line is indicated
with an end of line (EOL) on the sync channel. If the cyclic redundancy check (CRC) transmission is
enabled, it is sent instead of the next idle byte. Now the next ROT starts and continues as described
above until all lines of the image are transmitted.
44
4.3. Readout
For calibration of the receiver the training mode can be activated, it can be set in SPI register 12.
As long as the training mode is activated all channels transmit idle patterns. In the test image
mode, which can be activated alternatively in the same register, the sync channel works like in normal
operation but the data channels transmit the test patterns stored in SPI registers 96 - 127, instead of
image data from the ADCs.
The sync and data channels are synchronous to the output clock and operate at the master clock
speed in DDR mode. There is a delay between input and output clock of approximately 2.5 ns.
4.3.1. Cyclic redundancy check
A cyclic redundancy check (CRC) is a hash function that can be calculated with less computational
effort, but provides a method to detect errors that occurred during transmission. It is calculated
before and after transmission, if the check sum is equal it is very likely that no error occurred during
transmission. The LUPA-3000 calculates a CRC checksum for each line of the picture and each data
channel. The CRC calculation for the data channels is enabled by default, a CRC insertion for the
sync channel is also possible but not enabled by default. The position of the sync CRC is probably at
the same position as in the data channel and replaces an idle byte, but this is not explicitly mentioned
in the data sheet.
The general form of a CRC polynomial in modulo 2 arithmetic is
G(x) = crxr + ...+ c2x
2 + c1x1 + c0x
0 mod 2 (4.1)
where r is the degree of the generator polynomial, it defines the length of the checksum in bits and
cx indicates the presence or absence of a coefficient. The generator polynomial implemented in the
LUPA-3000 is given by following equation:
G(x) = x8 + x6 + x3 + x2 + 1 mod 2 (4.2)
This equation can be implemented by the circuit shown in Fig. 4.10 to calculate the CRC of a serial
data stream. The given circuit is different from the one shown in the data sheet, but generates the
same CRC more comprehensible.
x7+ x6 x5 x4 x3 x2 x1 x0+ + +
c0c2c3c6c8IN
OUT
Figure 4.10.: Circuit for CRC generation with the polynomial implemented in LUPA-3000
In the beginning of a calculation all registers are initialized with 1s to improve the bit error detection
capabilities. The ⊕ operand indicates a XOR concatenation or an addition in modulo 2 arithmetic.
As long as new data is available at the input, the switch is in the lower position. When all bits that
should be included into the checksum are inside the circuit the switch is set to the upper feedback
position to write the 8 checksum bits to the output.
45
Chapter 4. Image sensor - LUPA-3000
4.4. Software model
In order to verify the hardware design, that should be connected to the LUPA-3000, it is necessary to
provide a model of the sensor. The model was developed in SystemC, which is a C++ class library.
This library uses general C++ syntax, but provides an easy method to describe concurrent hardware,
similar to VHDL, but in an object oriented way. Besides, SystemC is an open source IEEE standard.
The latest version is available at http://www.systemc.org. Furthermore, ModelSim is able to co-
simulate designs which include SystemC and VHDL modules. For these reasons SystemC was chosen.
The compilation and co-simulation in ModelSim of the LUPA-3000 model is explained in Appendix
C, further details about the subject matter are discussed in [Men04].
The LUPA-3000 is modeled as a single SystemC module, containing several methods, running as
SC THREADs and SC METHODs in order to behave like the original sensor. SC THREADs normally
run continuously and block to wait for an event e.g. a rising edge of a signal. In contrast SC METHODs
do not run continuously, they are called each time a specified event happens. Block diagram 4.11
roughly describes its internal dependencies roughly. All I/O ports are of data type bool, expect the 32
data channel, these are modeled as sc uint which is equal to a bool vector. For the interconnection
with VHDL these ports can be mapped to std logic or std logic vectors, respectively.
Boxes with rotating circles represent SC THREADSs that run in endless loops and block from time
to time to wait for an event to happen. The internal clk thread generates the internal clock signal at
master clock speed for the whole sensor and interacts with the sensor clk thread that generates the
internal sensor clock signal with a four times larger period than the master clock.
The SPI is controlled by the SPI thread that interacts with the interface ports and the SPI register
memory block, which is simply modeled as array of sc uint<8> (8 bit unsigned integer).
Incoming exposure pulses are handled by the frame timing thread. It models the vmem, precharge,
sample and pixel reset signals and handles the timing of the FOT timer, according to the current SPI
register settings. The frame timing thread sets the internal and/or external signals and notifies events
that are trigger event handler methods when the expected duration of the signal is over. These event
handler methods run as SC METHODs that are sensitive to a specific event. The helper functions
are omitted in Fig. 4.11 to keep it as simple as possible.
During FOT the fot output signal is high, but one sensor clock cycle less than the actual FOT duration,
refer to Table 4.2. Similarly, the rot output is high during ROT, but one sensor clock cycle less than
the actual ROT duration. Timing diagrams illustrating this are available in [Cyp09, page 34].
With the overall timing structure generated by the frame timing thread and the current SPI register
settings, the data provider is able to determine which information has to be transferred to the LVDS
driver. The data provider distinguishes between four possible states: FOT, ROT, write picture data
and idle. Generally, the idle state is entered and idle patterns are transmitted on the channels. As
soon as the frame timing thread detects the end of an exposure pulse, it activates the internal FOT
signal which notifies the data provider to change its state to FOT. When the duration of FOT is over
a SC METHOD deactivates the internal and external FOT signals and starts the first ROT. Now the
46
4.4. Software model
rot
fot
LUPA-3000 SystemC model
GetPixel picture.mat
Frame timing
Dataprovider
LVDS sync
LVDS buffer 0
LVDS buffer 31
SPI registers
Sensorclk
Internalclk
LVDS driver
SPI
spi_
clk
spi_
cs
spi_
mos
i
spi_
mis
o
clk_in_p
clk_in_n
reset_n
sync_psync_n
lvds_p_0lvds_n_0
lvds_p_31lvds_n_31
clk_out_pclk_out_n
exposure2
exposure1
Figure 4.11.: Structural diagram of the LUPA-300 SystemC model
data provider switches to ROT state and transmits the corresponding values. As soon as ROT is over,
which is detected by another SC METHOD, the next state is always the write picture data. In this
state, the picture data for one line is transmitted. The x start and NB OF KERNELS parameters
that define one line are read from the SPI registers. At the end of each line it is checked whether y
end is already reached or another line follows. Depending on this, a new ROT may be started by the
data provider. When the last line of the image was transmitted the data provider switches back to the
idle state and waits for the next notification of the frame timing thread.
When a normal image readout happens, the getPixel method is utilized to provide access to a 8 bit
gray scale image which is stored in a Matlab workspace file (*.mat). Each Matlab installation contains
C++ libraries that provide some methods to access *.mat files. The data provider writes the current
output values for the lvds driver to internal FIFOs in order to avoid sampling complexities in the
LVDS driver component. The LVDS driver thread reads all FIFOs and outputs the values bit by bit
at master clock speed. When the test image readout is activated the image data is replaced by the
test pattern programmed in the SPI. The serial data stream generated by the LVDS driver starts
with the MSB, in contrast to the Xilinx ISERDES components which always expect the LSB first.
This problem is handled by the LVDS receiver, which is explained later on in detail.
47
Chapter 4. Image sensor - LUPA-3000
Each time an exposure cycle is finished, the model prints the settings for the current image readout
on the console. Such a status report is shown below:
************************************************
single slope
exposure duration: 79638400 ps 4100 sensor clock cycles
FOT: 40
ROT: 7
NB_OF_KERNELS: 21
y start: 1010
y end: 1260
x start: 0
lines: 251
pixels per line: 672
************************************************
All parameters that directly influence the exposure and image readout are presented. These are
exposure mode and duration, the FOT and ROT timing values and all parameters belonging to the
region of interest (ROI).
The exposure duration of a single frame is used to adjust the brightness of the image currently
readout. This process is somehow arbitrary, because the physical behavior of the image sensor for
different exposure times is unknown. Therefore a reasonable adaption of the image brightness for an
interval between 3 - 2062µs was chosen. Since all pixel values from the image are in a range from 0
(black) to 255 (white), the adaption in the given interval reaches from +150 for low exposure to -150
for high exposure durations. For exposure durations beyond the interval the maximum adaption of
±150 is chosen. When the addition of the brightness correction value result is out of the range of 0 to
255 the value it set to the maximum or minimum value. This slabs the dynamic of an image.
Furthermore, a Matlab script cut_img_gen.m was built, which generates a Matlab image variable out
of an existing picture. The script reads an arbitrary image, e.g. a *.jpg file, performs a RGB to gray
conversion and crops it to a resolution of 1710 x 1696. The position of the cropped region can be
chosen inside the script.
48
5Data and control interface for the LUPA-3000
image sensor
The requirements for the design, which controls the image sensor, were specified in such a way that
the design should provides a simple method to control the exposure duration and to take a specified
number of pictures per second.
The SPI is used to program the exposure duration and the frames per second parameter into the
controlling design. Some unused successive SPI addresses are used for these new parameters. With
these parameters, the knowledge of the constant sensor clock cycles per second and the other SPI
registers, the duration of one frame can be calculated and a specified number of frames per second can
be taken.
Block diagram 5.1 describes the complete structure of the controller design. The design is basically
divided into three units: the SPI wrapper, the exposure control and the LVDS receiver. The SPI
wrapper controls the programming interface of the image sensor and keeps the copies of the SPI
registers in the SPI memory up to date. The exposure control reads the SPI memory in order to get
the current exposure settings specified by the user. The synchronizer block is used to arbitrate the
memory access in order to avoid conflicts. The LVDS receiver is only coupled with other components
of the design through the training done signal, because it only provides the data from the LVDS
interface.
These units work at different clock speeds. The colored boxes in the background of the diagram assign
the component to clock domains. Signals and blocks, that are part of more than one clock domain,
have to be treated in a special way to avoid timing problems. This will be discussed later on in Chapter
5.4.
49
Chapter 5. Data and control interface for the LUPA-3000 image sensor
MOSI addr
MOSI data
MISO
CSMOSI
MISO
Training_Done
doutaddr
addrdindout
exposure2
exposure1
Data channel 0
Data channel 31
Sync channel
operate
Figure 5.1.: Structural description of the VHDL design
5.1. Serial peripheral interface
The SPI wrapper component communicates with the LUPA-3000 by means of the standard SPI signals.
Therefore, this component works at a clock speed of 10 Mhz. It receives commands from outside
through the MOSI addr FIFO, which has a width of 8 bit. For write commands it expects to find a
data token in the MOSI data FIFO (8 bit width). All responses for read commands are written to
the MISO FIFO (8 bit width). Read and write commands are forwarded to the image sensor, except
the address range from 18 to 27, because these addresses are used to save the exposure settings. For
write commands where the SPI address is smaller than 32, the corresponding data value is additionally
written to the SPI memory.
The SPI memory in the design has a size of 32 bytes to store a copy of the SPI register 0 - 17 and
uses 10 bytes for the exposure1, exposure2offset and frames per second parameters. These extra bytes
are located at the memory addresses 18 - 27, the exact mapping is described in Table 5.1.
The memory addresses 0-17 are initialized with the default values for the SPI registers, defined in the
data sheet [Cyp09].
50
5.1. Serial peripheral interface
Address Bits Name Description
18 <7:0> Frames per second <7:0> Number of frames per second
19 <7:0> Frames per second <15:8>
20 <7:0> exposure 1 <7:0> duration of exposure1 pulse in sensor clocks
21 <7:0> exposure 1 <15:8>
22 <7:0> exposure 1 <23:16>
23 <7:0> exposure 1 <31:24>
24 <7:0> exposure 2 offset <7:0> offset length between start of exposure1
25 <7:0> exposure 2 offset <15:8> and exposure2 in sensor clocks
26 <7:0> exposure 2 offset <23:16>
27 <7:0> exposure 2 offset <31:24>
Table 5.1.: SPI register address space extension
Since the SPI memory is also used by the SPI wrapper and the exposure control block, it is important to
have an arbitration, otherwise it would be possible that the exposure control reads a memory location
that is currently written by the SPI wrapper. This would result in an unpredictable situation. For
this reason, a token is passed between both components to ensure that there is an exclusive memory
access. The token is a single signal which is forwarded by the synchronizer component across the
clock domain boarder. Only the component owning the token is allowed to access the memory, in the
meanwhile the other component is locked.
During system start up the training mode of the image sensor has to be activated. This is achieved
by a state machine implemented in the SPI wrapper. A state diagram is shown in Fig. 5.2. During
reset state some internal signals are set before the sensor is programmed in the INIT state. Following
actions are performed in this state: resetting the sequencer, programming idle patterns a and b,
enabling training and enabling the sequencer. The idle patterns, that are used for the training, are set
to the binary symmetric string ”00100100”, because the LUPA-3000 sends the data with the MSB first
and the LVDS receiver expects the LSB first. With this symmetrical pattern the problem is solved for
the training phase.
Now the state machine remains in the WAIT state until the LVDS receiver sets the TRAINING DONE
signal. This happens when the calibration of all channels was performed successfully. After that, the
state machine switches to the START state to program sensor for normal operation. Moreover, the
idle pattern is changed to value ”11011011” to signal the LVDS receiver that the training phase is over
and the received data is valid and should be written into the output FIFOs.
Now the SPI wrapper enters the IDLE state, which means that the state machine is waiting for the
next SPI read or write operation. As soon as data tokens are available in the MOSI addr FIFO,
the state machine switches to the SPIR state for a read operation or to the SPIW state for a write
operation. Depending on the address, the command is serialized and transmitted to the LUPA-3000,
and/or the SPI memory is accessed to handle the request. The SPI memory is always involved when
addresses smaller than 30 are accessed in an operation. After each command the state machine comes
51
Chapter 5. Data and control interface for the LUPA-3000 image sensor
RESET
INIT
WAIT
START
IDLE
SPIR
SPIW
SYNC
LOCK
Figure 5.2.: Finite state machine of the SPIwrapper module
back to the IDLE state. If there are no more commands available and a write command was invoked
previously the SYNC state is accessed. In the SYNC state the activation token is passed through the
synchronizer to the exposure control component. This causes the state machine to switch immediately
to the LOCK state, where it remains until the activation token is passed back.
5.2. Exposure control
The exposure control block contains two state machines that work in parallel, one for handling the
memory access and calculation of the exposure timing parameters, the other one for the control of
the exposure output pins. At first, the FSM for the memory access is discussed. A corresponding
schematic description is shown in Fig. 5.3.
After reset, the LOCK state is entered. Now the state machine waits until the activation token is
received. As soon as this happens the SPI memory is read (addresses 0, 1 , 5 - 10 and 18 - 27). With
the information from SPI registers it is possible to calculate the readout duration of a frame, which
is called frame period and calculated in the CALC READ DURATION state. The formula for the
calculation is taken from [Cyp09, page 5], but changed in a way that the resulting unit is sensor clock
cycles and not seconds, because in a hardware design, there is no notion of time, only of clock cycles.
Therefore, the exposure control block has to run at sensor clock speed to correctly perform timing of
the exposure signals.
52
5.2. Exposure control
SYNCCALC_
DELAY_PER_FRAME
DELAY
CALC_TOTAL_
DURATION
CALC_READ_
DURATION
LOCK
READMEM
Figure 5.3.: First finite state machine of the exposure control module
The frame period in sensor clock cycles is calculated by
Frame period = FOT + lines · (ROT +pixels
4· dataPeriod) (5.1)
where FOT is the FOT duration in sensor clock cycles, ROT the ROT duration in sensor clock cycles,
lines the number of lines of the current frame, pixels the number of pixels per line and dataPeriod
the period of one bit on the LVDS channel, measured in sensor clock cycles. The dataPeriod is a
constant of 18 , because DDR is used (factor 1
2) and one sensor clock period is 4 times longer than the
master clock period with a factor of 14 . By knowing this, and due to the fact that pixels is always 32·
NB OF KERNELS the formula is simplified to:
Frame period = FOT + lines · (ROT + NB OF KERNELS) (5.2)
Formula (5.2) can be implemented in hardware, without any problem.
In the CALC TOTAL DURATION state the constant number of sensor clock cycles per second, which
is stored in the constants.vhd file, is divided by the frames per second parameter from the SPI register
extension. The result is called cycles per frame and is the time available for the readout of one frame
and an arbitrary size delay. The division operation is calculated by a Xilinx divider IP core which is
available for free in the ISE Design suite. The division needs 25 clock cycles to complete, meanwhile
the FSM stays in the current state until the divider invokes the ready signal.
The last calculation step is the subtraction of exposure1 duration from the cycles per frame value,
which is done by a subtracter IP core. This core has a delay of one clock cycle. For this reason, the
DELAY state is entered before the exposure offset is calculated in the CALC DELAY PER FRAME
state. For clarification, diagram 5.4 visualizes the dependencies of the timing values.
One second consists of a specific number of sensor clock cycles. Dividing the sensor clock cycles per
second by the frames per second parameter results in the cycles per frame value. Because the image
sensor can expose and readout in parallel, the exposure for the next frame is done during the current
frame readout. For this reason the exposure offset is cycles per frame minus exposure duration. The
53
Chapter 5. Data and control interface for the LUPA-3000 image sensor
cycles per second = 1 second
cycles per frame cycles per frame
exposure exposure
readout delaydelay
frame periodexposureoffset
readout
Figure 5.4.: Timing diagram of the frame timing
delay between two readouts depends on the length of the cycles per frame and the current frame period.
For frame rates near the maximum, the delay is very small. The exposure pins have a minimum hold
requirement of 15 master clock cycles. Therefore, the minimum delay value is always 4 sensor clock
cycles. A smaller grading is not long enough.
In the CALC DELAY PER FRAME state it is checked whether the given parameters are correct or
at least achievable. It may happen that the exposure duration is longer than the cycles per frame, the
number of cycles per frame are smaller than the frame period (frame rate to high) or the exposure
offset is smaller than 4. As soon as one of these cases is detected the delay is set to 4 and the exposure
pulses are activated in such a way that the maximally achievable frame rate for the given exposure
duration is obtained.
Finally, all necessary values for the exposure control are calculated and the state machine switches to
the SYNC state in order to pass the activation token back to the SPI wrapper. The duration of the
sync state is chosen to be 15 cycles, because the sensor clock is at least 5 times faster than the SPI
clock. Hence, it is necessary to keep the signal high for at least one rising edge of the SPI clock. When
the token is passed back to the SPI wrapper, the LOCK state is entered and the state machine waits
again for the activation token.
As mentioned in the beginning of this chapter, there is a second state machine that is responsible for
the behavior of the exposure signals. The corresponding state diagram is shown in Fig. 5.5.
Figure 5.5.: Second finite state machine of the exposure control module
There are only three states in the FSM. The initial state is WAIT FOR 1ST SYNC, where the state
machine stays until the concurrently running FSM described before reaches the SYNC state the first
time. This event indicates that all necessary data was read from the memory and the timing parameters
54
5.2. Exposure control
for the exposure control were calculated. In the IDLE state the necessary parameters for the exposure
control are copied.
If the OPERATE and TRAINING DONE signals are high and the other FSM is in state SYNC or
LOCK, the state is switched to EXPOSE, otherwise it remains in the IDLE state, until all conditions
are fulfilled. The operate signal is an external input signal that activates the exposure operation as
long as it is high.
During the EXPOSE state the exposure signals are controlled. Generally, the exposure2 signal is al-
ways low, except the dual slope mode is enabled. At the beginning of an exposure cycle, both exposure
signals are low until the exposure offset is over. After that, the exposure1 signal is brought high for
the exposure1 duration. When dual slope is enabled, the exposure2 signal is brought high exactly
exposre2offset sensor clock cycles after the rising edge of the exposure1 signal. Both exposure signals
are brought low again when the exposure1 duration is over. Now the next cycle starts immediately
with the exposure delay.
This continues as long as the internal stopExp signal is low. When the signal is high, the state machine
switches to the IDLE state to copy again the necessary parameters. The stopExp signal is brought
high when the other state machine is in the SYNC state. This indicates that a new configuration
was programmed. The stopExp signal is also brought high when the operate signal is disabled. With
the use of the stopExp signal it is guaranteed that the exposure is not aborted and that a newly
programmed configuration is not ignored.
5.2.1. Timing parameter calculation
To verify the calculations of the exposure control block, an Excel spreadsheet was generated that
determines the timing settings for a given configuration. A screen shot of the spreadsheet is given in
Fig. 5.6. All fields marked in yellow are mandatory fields for user inputs, the other fields are then
calculated automatically. Column D contains the default values of certain parameters.
The master clock speed of the design is used to calculate the master clock period and the cycles per
second constant (in sensor clock cycles). Both values are required in the constants.vhd file and the
period is needed in the test environment (lupa_constants.h). This will be discussed more detailed
in Chapter 6.
The frame period is calculated with the SPI register values for the FOT timer (fot n), ROT timer (rot
n) and the region of interest (ROI) (y start, y end, nb of kernels). The exposure timing values (cycles
per frame, exposure offset and delay), as described in diagram 5.4 are calculated by means of the SPI
register extension. The values of the SPI register extension are converted into binary representation
and split into 8 bit blocks (columns E to H), because this format is needed when the SPI register is
programmed in the SystemC testbench. Besides, the maximum frame rate for the given frame period
is calculated.
55
Chapter 5. Data and control interface for the LUPA-3000 image sensor
Figure 5.6.: Excel sheet for timing parameter calculation
Warning messages are displayed when the given timing constraints cannot be fulfilled or when the
hold requirements for the exposure signals are violated.
5.3. LVDS receiver
The basic LVDS receiver component was already introduced in Chapter 3.3. The only difference here
is that the number of data channels is increased to 34. Therefore, it is necessary to add a LVDS
input buffer, an IDELAY, an ISERDES and a byte parser component for each new channel. Besides,
the multiplexer providing the input for the bit align machine needs to be extended and the resource
sharing control has to take into account the larger number of channels.
At the end of the training phase, the idle or training pattern is changed, as already mentioned in
the SPI wrapper description in Chapter 5.1. This is done to inform the byte parser that valid data
is transmitted. The byte parser eliminates byte offsets among the LVDS channels. The transition
between old and new idle patterns indicates a unique synchronization point in the byte stream, which
is used to start the data output of the receiver component. The different idle patterns are also used to
56
5.4. Clock domain crossing
notify the byte parser to reorder the bits of a byte received from the ISERDES, because the LUPA-3000
sends the LSB first and the ISERDES expects MSB first.
Now all parts of the VHDL interface design were described. When the single blocks that operate
at different clock speeds communicate with each other the clock domain crossing signals have to be
treated in a special way, which is explained in detail in the next chapter.
5.4. Clock domain crossing
Advanced hardware designs, like the one introduced in the previous chapter, use multiple clocks for
different components. These designs generally have a problem when data or control signals are passed
from one clock domain to another. The signal appears asynchronous in the new clock domain. The
circuit that receives the signal has to synchronize it to avoid metastability.
Metastability appears when a flip flop samples an unstable signal e.g., during transition. Then the flip
flop’s output voltage level is non-deterministic and it is not predictable whether the output voltage will
converge to a correct voltage level, if it stays at an intermediate voltage level or if it oscillates before
it settles down. To avoid metastability, the incoming signal must be stable within a small timing
window around the sampling edge. This window is divided into setup and hold time, the time before
and after the sampling edge where the signal has to remain stable. If a design meets these timing
requirements, the possibility that the flip-flop will fail is negligibly small. Most synthesis tools cannot
determine whether asynchronous signals meet the timing requirements for the sampling flip-flop. For
this reason, circuits that eliminate the effects caused by asynchronous signals should be used.
The easiest method for sampling asynchronous signals is the concatenation of two flip-flops, without
combinatorial logic between them. Besides, the last gate in the transmitting clock domain has to be a
flip-flop, combinatorial logic is not allowed here. The reason for this limitation is that combinatorial
logic can cause signal delays which promote the metastability problem. Such a basic synchronizer
circuit is visualized in Fig. 5.7.
Figure 5.7.: Basic synchronizer circuit
When more than two flip-flops are concatenated in the receiver clock domain the occurrence of metasta-
bility is less probable, but the incoming signal is delayed by more than two clock cycles. This kind
of synchronizer should be used also when the speed of the clock domains is equivalent but the clocks
57
Chapter 5. Data and control interface for the LUPA-3000 image sensor
are not synchronous. For clock domains that have different speeds it is necessary that signal in the
sending clock domain is at least stable for two clock cycles of the receiving clock domain. Generally,
the pulse in the transmitting clock domain has to be at least twice the length of a clock cycle in a
receiving domain in order to fulfill the sampling theorem, otherwise it may happen that the pulse is
not detected. The sampling theorem implies that the sampling frequency has to be at least twice the
signal frequency to make sure that all signal states are detected.
As soon as the clock domain crossing signal should be buffered, the easiest method to avoid metasta-
bility is the use of a first in, first out buffer (FIFO) with different read and write clocks. So the
metastability is handled inside the FIFO implementation and the hardware developer does not have
to care about it. This method is usually chosen when the signal has multiple bits or when a buffer
element is needed between the clock domains. In the interface for the image sensor all clock domain
crossing signal can be handle with one of these methods. There are further methods available which
may be used for more specialized applications, c.f. [Ste03].
The LUPA-3000 controller design contains some signals which are used among clock domains. One is
the training done signal set by the LVDS receiver. It passes exactly the basic synchronizer, introduced
before, an output flip-flop in the receiver and two concatenated flip-flops in the exposure control and
the SPI wrapper. The hold requirement of the training done is always fulfilled because this signal does
only have one rising edge that occurs at the end of the training phase.
The synchronizer component, which is used for the arbitration of SPI wrapper and exposure control,
simply implements two basic synchronizers. One basic synchronizer for each direction, to ensure the
correct sampling of the activation token. The hold requirement of two clock cycles in the receiving
clock domain for the activation token is fulfilled, because the state machines ensure that the token
signal is high long enough.
The operate signal from an external source is passed through two flip-flops before it is used internally.
The minimum hold requirement for this signal are two sensor clock cycles.
58
6SystemC test environment
The previous chapters presented all parts needed for the complete system. In the next step they
are connected in a SystemC testbench. This testbench instantiates the LUPA-3000 SystemC model,
the VHDL implementation of the interface, a channel delay block that models delays in the LVDS
interconnection, an image builder which takes the data at the output of the VHDL block and generates
image data out of it and finally the stimulator that is connected to the SPI interface and controls the
whole design. A structural overview is shown in Fig. 6.1. Generally ModelSim provides a method that
can be used to co-simulate VHDL and SystemC. A specific description that deals with the simulation of
this project developed in this thesis is explained in Appendix C. General explanation of the possibilities
offered by Modelsim are given presented in [Men04].
Several clock signals are generated in the SystemC topmodule: The master clk p which runs at master
clock speed to drive the LUPA-3000 and an inverse version of it named master clk n. Additionally,
there is the clkExp clock which runs four times slower than the master clock (sensor clock speed)
and drives the exposure control block. The spi clk which runs at 10 MHz and drives the SPI wrapper
component in the VHDL design and the spi clk input of the image sensor. And finally a 200 MHz
reference clock, named clk200, which is required for the IDELAY primitives in the LVDS receiver
component.
The channel delay block is used to model different lengths of the interconnection between the image
sensor and the connecting interface design. These delays have the same functionality as the delays
which were generated in the LVDS transmitter in the test design discussed in Chapter 3. These delays
can be used to model e.g, connections of different length on a printed circuit board (PCB). The delay
component consists of different delay blocks which delay a positive and a negative signal for the same
time. These blocks use two single delay blocks for the positive and negative signals.
59
Chapter 6. SystemC test environment
VHDL
SystemC Testbench
LVDS receiver
LUPA-3000
SPI wrapper
SPI memory
Exposure Control
MOSI addr
MOSI data
MISO
CSMOSI
MISO
synchronizer
Training_Done
dout addr
addrdindout
exposure2
exposure1
LVDS_0_PLVDS_0_N
LVDS_31_PLVDS_31_N
LVDS_SYNC_PLVDS_SYNC_N
LVDS_CLK_PLVDS_CLK_N
Image Builder
Stimulator
Data channel 0
Data channel 1
Data channel 30
Data channel 31
Sync channel
operate
channeldelays
Figure 6.1.: Complete SystemC testbench, with the LUPA-3000 model and the VHDL implementation
of the interface design
60
6.1. SystemC transport delay
6.1. SystemC transport delay
The implementation of a delay in SystemC is not as simple as it seems, because SystemC has no build-
in delay. In contrast, VHDL has a so-called transport delay which can delay a signal by a certain
specified time. In SystemC the only available is a wait() statement which is loaded with a time value,
or which can wait for an event. With only this mechanism available it is not straight forward to build
a transport delay element, but there is a way to overcome this problem.
To build such a delay a sc fifo and a sc event queue are necessary. A sc fifo is a normal FIFO model of
the SystemC library and a sc event queue is a special kind of FIFO which can queue multiple events
that somehow depended on the same signal or module. A read process (SC METHOD) writes each
new input value to the FIFO and queues an event in the event queue. This is done as soon as the data
input changes. The event is notified in the specified transport delay time. Another SC THREAD is
sensitive to event notification from the event queue which indicates that the latest data token in the
FIFO should be written to the output.
The following code snipped contains the code of the module with the ports, internal variables and the
constructor.
SC_MODULE(Delay)
sc_in<bool> in;
sc_out<bool> out;
sc_fifo<bool> delay_channel_fifo;
sc_event_queue channel_event_q;
sc_time transpDelay;
void read_in();
void write_out();
SC_HAS_PROCESS(Delay);
Delay(sc_module_name name_, sc_time delay_ = sc_time(0,SC_NS)):
sc_module(name_),delay_channel_fifo(50)
transpDelay=delay_;
SC_METHOD(read_in);
sensitive << in;
dont_initialize(); //prevent read_in from initialization
SC_THREAD(write_out);
sensitive << channel_event_q;
;
61
Chapter 6. SystemC test environment
The prototypes of the read in() and write out() are declared in the module declaration, the constructor
defines them to behave like a SC METHOD and a SC THREAD. Each time the data input changes
the read in() method is executed, because it is set sensitive to the in input port. The implementation
of the functions is shown in the following code.
void Delay::read_in()
delay_channel_fifo.write(in.read());
channel_event_q.notify(transpDelay);
void Delay::write_out()
while(true)
wait();
out.write(delay_channel_fifo.read());
Each time a event notification is pending the write out() thread is unblocked, because its wait state-
ment is sensitive to event from the channel event q. The sensitivity is defined in the constructor of
the delay module.
This implementation is able to detect signal pulses that are smaller than the actual delay, which
is necessary for a sense full transport delay model. There is only one problem with the current
implementation, the maximum delay is limited by the actual FIFO size and depends on the minimum
data period. For example a clock signal should be delayed, then the period where the signal is stable
is half the clock period. This leads to the fact that the maximum delay is given by
maximum delay = data period · FIFO size
When the delay is chosen to large, it may happen that the simulation fails because read in() method
tries to write into the full FIFO. The standard FIFO size in SystemC is 16, in the given code it is set
to 50, in the constructor.
6.2. Image Builder
During operation the Image builder component reads the FIFO outputs of the VHDL design and
reconstructs the transmitted image. Reconstruction is performed with the evaluation of the sync
channel values. According to the synchronization sequence, the lines of each frame are read after
another and put back together..
Each time a new frame starts, the last one is appended to a Matlab workspace file (results.mat). For
the storage process the Matlab libraries are used, similar to the read access of the getPixel method in
the LUPA-3000 model.
62
6.3. Stimulator
The Image builder block should run at least with the speed of the sensor clock, with a slower clock a
FIFO overflow will happen sooner or later and cause data loss at the LVDS receiver output FIFOs.
6.3. Stimulator
The stimulator is the block that controls the complete behavior of the other components. In the
beginning of a simulation the stimulator held the system in reset with the active-low reset n signal for
500 ns. Then arbitrary parameters can be programmed through the SPI. It is necessary to program
at least the exposure1 duration, to bring the system in a state where the active operate signal enables
the exposure process. If a lower frame rate than maximum achievable is desired the corresponding
parameter should bet set. For dual slope exposure it is necessary to program the duration of the
exposure2 offset. Additionally, a region of interest (ROI) or any other arbitrary parameter can be
changed. Configuration settings can be programmed any time during operation. Before configuration
changes are made, the operate signal should be disabled and the sequencer reset (SPI register 0, bit
1) has to be enabled. Now new parameters can be programmed through the SPI. When all changes
are made it is necessary to disable the sequencer reset again, otherwise now image readout is possible.
The clock speed for the stimulator component can be chosen arbitrarily, because all output signals
are treated as asynchronous signals in the receiving components. All components that are controlled
through the reset n expect it to be asynchronous, including the LUPA-3000. The exposure control
block treats the operate signal as asynchronous, utilizing a basic synchronizer as explained in Chapter
5.4. And finally the MOSI addr, MOSI data and MISO FIFOs have different read and write clocks to
decouple the clock domains.
6.4. Demonstration
Finally, the complete system can be simulated. In the beginning all components are reset by the
global reset signal controlled by the stimulator. Then the SPI wrapper programs the settings for the
calibration. The LUPA-3000 transmits the training pattern on all channels. These data signals are
delayed inside the channel delay component arbitrarily. The LVDS receiver calibrates each channel
individually and rises the training done signal when all channels are aligned successfully. After that,
the SPI wrapper brings the image sensor in normal operation mode. In addition, settings from the
external SPI input are programmed to the shared SPI register and the LUPA-3000. Finally, the
exposure control starts to control the image sensor with the given exposure settings. As soon as
data is available from the receiver FIFOs the image builder starts to reconstruct the received image.
The image builder can read entirely synchronous data from the receiver FIFOs, because the receiver
balances all channels. For this reason each kind of delay can be introduced between the image sensor
and the LVDS receiver. The image builder will never recognize any delay among the channels.
The following code shows the status message from the LUPA-3000 model, obtained by a ModelSim
simulation of the entire system:
63
Chapter 6. SystemC test environment
# ************************************************
# single slope
# exposure duration: 776960 ns 40000 sensor clock cycles
# FOT: 40
# ROT: 7
# NB_OF_KERNELS: 21
# y start: 1010
# y end: 1260
# x start: 19
# lines: 251
# pixels per line: 672
# ************************************************
The print contains the duration of the last exposure pulse which influences the brightness of the image,
the internal timing setting and the settings for the current region of interest (ROI) are shown. y start
and y end are given as line numbers, whereas x start denotes a (odd) kernel number. These settings
result in an image that is is shown in Fig. 6.2. The images inside the figure are equivalent to the
Matlab files used as input or generated as output of the design. This diagram visualizes the image
readout of the system. All system components are abstracted to show only the control and data signal
flow between the components.
exposure
SPI
LUPA-3000
Image builder
LVDS receiver
region of interest
exposure control
SPI wrapper
userinput
Figure 6.2.: Abstract illustration of a practical image readout, the ROI is indicated by white white
lines in the input image
The received image is a little bit darker than the original one, this depends on the exposure duration.
A shorter exposure pulse would result in a brighter image. The settings for the image exposure
and readout are programmed through the SPI by the stimulator. These settings are hard coded
in the stimulator module. The test was performed with different delays on the channels, however
only different tap settings in the receiver indicate that a delay was compensated. The byte offset
compensation could be observed with the fill level of the FIFOs. The image builder does not recognize
any asynchronicity, because the receiver FIFOs output is always a completely balanced.
64
7Conclusion
7.1. Summary
This thesis dealt with the interconnection of a LUPA-3000 image sensor to a FPGA. The challenging
requirements of the LVDS data interface where successfully turned into a VHDL implementation of
the receiver component. The basic of a LVDS connection where discussed to give an introduction to
the problems that have to be dealt with.
For testing purposes a LVDS communication line with a transmitter and receiver component was
established. The transmitter component has the ability to generate asynchronous output signals
which are not common for source synchronous systems. But this asynchronous signals were used to
test the robustness of the LVDS receiver. The receiver uses a calibration algorithm to ensure that
the incoming signals are sampled at an ideal position. In addition, the word alignment assures that
deserialization is done correctly. Finally, the byte offsets among the channels are compensated by the
use of FIFOs.
A design, that contains the transmitter and receiver component, was simulated and finally tested on
a FPGA evaluation board. The FPGA IOs were connected using wire pairs with different length
to evaluate the influence of different interconnection lengths. The design running on the FPGA
was verified utilizing the software logic analyzer ChipScope. A comprehensive introduction to the
ChipScope logic analyzer was given. With the use of this logic analyzer the LVDS communication
line was verified to ensure that a working interconnection was established. The evaluation has shown
that the receiver can handle asynchronous signals that are arbitrarily delayed. In addition, wires of
different length were used, to evaluate the propagation delay. This delay is not negligible, due to this
reason a circuit board designer should be aware of this delay.
65
Chapter 7. Conclusion
Furthermore a software model of the image sensor was developed in SystemC, which is a C++ class
library for hardware modeling. The LUPA-3000 model behaves like the original image sensor and was
used to verify the functionality of the interface design that is connected to the sensor. The interface
design, which was developed in VHDL, includes an adapted version of the receiver component, a
component, that controls the exposure signals, and a controller, that connects to the configuration
interface (SPI) of the image sensor.
Finally a SystemC testbench was generated to perform a co-simulation of the LUPA-3000 model and
the corresponding interface design. With this testbench it is possible to model the complete system
behavior and reconstruct an image file out of the data transfered to the LVDS receiver in the VHDL
design.
7.2. Future work
Regarding the LVDS communication on the FPGA it would be interesting to use an oscilloscope to see
the real data eye. This would allow to evaluate the signal quality for different clock speeds and cable
lengths. In addition, deeper research on the dependency between interconnection length and signal
delay could help to explain the simulation results for different cable lengths more precise. Furthermore
the influence of the voltage level, used in the LVDS connection, would be interesting. Do higher voltage
levels increase the signal quality and allow higher speeds?
The LUPA-3000 model implements a brightness adjustment of the image, depending on the exposure
duration. There is no guarantee that the adjustment algorithm behave like the original sensor. For this
reason the behavior should be adjusted in the way that it matches the original sensor. Moreover the
dual slope exposure performs the same brightness adaption as the single slope mode, but in practice
this mode increases the dynamic range of the image, so this adaption should be implemented, too. In
addition the pixel control signals Vmem, precharge, sample and pixel reset are active during the frame
overhead time (FOT) and influence the image. Unfortunately, it is not known what kind of influence
these signals have on the image, hence practical tests with the image sensor could clarify their effect
and help to improve the software model.
Finally, the VHDL implementation of the interface design should be tested on the camera hardware.
Therefore it is necessary to build another VHDL design that is connected to the interface design,
developed in this work. This extended design has to provide an external interface for the image data
readout and a control input that connects to some kind of user interface.
66
AAbbreviations and Acronyms
ADC analog digital converter
BUFG global clock buffer
BUFR regional clock buffer
CRC cyclic redundancy check
DCM digital clock manager
DDR double data rate
EOL end of line
FIFO first in, first out buffer
FOT frame overhead time
FPGA Field Programmable Gate Array
ICON Integrated Controller core
ILA Integrated Logic Analyzer core
IOB input/output block of Xilinx FPGAs
LSB least significant bit
MSB most significant bit
PCB printed circuit board
67
Appendix A. Abbreviations and Acronyms
PLL phase locked loop
ROI region of interest
ROT row overhead time
SOF start of frame
SOL start of line
SPI Serial Peripheral Interface
LVDS Low-Voltage Differential Signaling
68
BFPGA pin mapping
The inputs and outputs of the Toplevel module are physically mapped to the FPGA pins. This
mapping is specified in a *.ucf file. The clock input is mapped to pin AH15 which is driven by the
clock generator at 100 MHz. The global reset which is active-low is connected to the pin J14, internally
the reset is handled active-high.
The LVDS outputs and inputs are mapped to the expansion header which contains the differential
FPGA I/Os. This header is called J4 connector and its pins are located in the I/O banks 11 and 13.
This connector contains normal differential I/Os and two differential clock inputs1, but these clock
inputs can only drive regional clocks. To drive a global clock buffer a special global clock input2 is
necessary, such a differential input is available in the J5 connector. The information about the Virtex5
I/Os of the different devices is available in [Xil09c], the schematic for the ML506 FPGA board is given
in [Xil08], the layout of the expansion headers is shown on [Xil08, page 11]. The complete pin mapping
for the differential I/Os is described in Table B.1.
The first two columns describe the pin numbers in the expansion header, the schematic net name is
the full name of the corresponding pin and the FPGA pin describes the pin position in the FPGA
package, similar to a chessboard (digits for the columns and letters for the rows). The PlanAhead tool
of the ISE Design Suite helps to generate the *.ucf file and shows the FPGA package graphically.
1Known as CC pins, as described in [Xil09c]2Known as GC pins, as described in [Xil09c]
69
Appendix B. FPGA pin mapping
Differential Channel
Pin Pair Schematic Net Name FPGA pin Mapping Direction Bank
Pos Neg Pos Neg Pos Neg
4 2 HDR2 4 HDR2 2 L34 K34 0 out 11
8 6 HDR2 8 HDR2 6 K33 K32 1 in 11
12 10 HDR2 12 HDR2 10 P32 N32 2 in 11
16 14 HDR2 16 HDR2 14 T33 R34 3 in 11
20 18 HDR2 20 HDR2 18 R33 R32 4 out 11
24 22 HDR2 24 HDR2 22 U33 T34 5 out 11
28 26 HDR2 28 HDR2 26 U32 U31 sync out 11
32 30 HDR2 32 HDR2 30 V32 V33 clk out 13
36 34 HDR2 36 HDR2 34 W34 V34 - - 13
40 38 HDR2 40 HDR2 38 Y33 AA33 1 out 13
44 42 HDR2 44 HDR2 42 AF34 AE34 0 in 13
48 46 HDR2 48 HDR2 46 AF33 AE33 2 out 13
52 50 HDR2 52 HDR2 50 AC34 AD34 3 out 13
56 54 HDR2 56 HDR2 54 AC32 AB32 4 in 13
60 58 HDR2 60 HDR2 58 AC33 AB33 5 in 13
64 62 HDR2 64 HDR2 62 AN32 AP32 sync in 13
27 28 GPIO LED 2 GPIO LED 4 G15 G16 clk in 3
Table B.1.: Pin mapping of the expansion connectors (J4 and J5) on the ML506 evaluation board
70
CSystemC and VHDL co-simulation
To simulate the given code in ModelSim it is necessary to have the C++ compiler for ModelSim
installed. If gcc for ModelSim is not installed (gcc-4.2.1-mingw32 folder in the ModelSim directory is
missing) go through the process as for downloading modelSim. The download is available from the
same site. To download Modelsim:
1. Go to http://www.model.com/content/modelsim-downloads
2. Click on the link to the Modelsim SE
3. Click on the Downloads Tab
4. Click on the link Download
5. Complete the registration form then click on the Request Download button
6. Click on the ftp link
7. Download the gcc-4.2.1-mingw32
8. Unzip the file into the modelSim installation dir
9. Set the CppPatch in the [sccom] section of the modelSim.ini file in the installation dir
e.g., CppPath = C:\modeltech_6.5\gcc-4.2.1-mingw32\bin\g++
The given design was tested with ModelSim SE PLUS 6.5 and Matlab2008a. Following script can
be used to compile the SystemC part of the design. The VHDL components should be compiled in
advance, using the standard compilation flow in ModelSim. Besides, it is necessary to compile and
include the XilinxCoreLib to the ModelSim libraries. To compile these libraries Xilinx provides the
compxlib tool, it can be started from the command line invoking compxlib.
71
Appendix C. SystemC and VHDL co-simulation
First all open simulations are exited and the old SystemC objects are deleted.
quit -sim
vdel -allsystemc
Then the VHDL Toplevel file is compiled to generate the corresponding SystemC module with the
scgenmod command which maps the VHDL data types to SystemC. In this case std_logic is mapped
to bool and std_logic_vector is mapped to sc_uint. The output is written to a header file located
in the system path SYSTEMC_SRC where all SystemC files are located.
vcom ./src_vhdl/Toplevel.vhd
scgenmod -map std_logic=bool -map std_logic_vector=sc_uint vhdltop
> $env(SYSTEMC_SRC)/vhdl_toplevel.h
Now all *.cpp files are compiled, including the necessary Matlab headers.
sccom -I"C:/Programme/matlab2008a/extern/include" -g $env(SYSTEMC_SRC)/*.cpp
After successful compilation all object files are linked together, including the necessary Matlab DLLs.
sccom -L "C:/Programme/matlab2008a/bin/win32/" -l libeng -l libmx -l libmat -link
Finally, the simulation with the SystemC testbench is loaded.
vsim -do sc_wave.do -t ps -novopt work.mti_top
run 1200 us
All commands described above are available as compilation script named compileSC.do in the project
directory. For the script it is necessary to set two system variables: SYSTEMC_SRC where all SystemC
files are located and MASTER_DIR which point to the directory where all subproject folders are located.
In addition the path of the Matlab installation has to be adapted in the script. General information
about the SystemC and VHDL co-simulation with ModelSim is given in [Men04].
72
Bibliography
[Bur06] Greg Burton. XAPP855: 16-Channel, DDR LVDS Interface with Per-Channel Alignment.
Xilinx, October 2006. v1.0.
[Cyp09] Cypress Semiconductor Corporation. LUPA-3000 Datasheet, August 2009. advance.
[Gmb10] The Imaging Source Europe GmbH. Multi slope. Website, 22 April 2010. The Imaging
Source Europe GmbH.
[IEE96] IEEE. IEEE standard for Low-Voltage Differential Signals (LVDS) for Scalable Coherent
Interface (SCI), Jul 1996.
[KGA03] Sean Koontz, Maria George, and Markus Adhiwiyogo. System Interface Timing Parameters.
Xilinx, April 2003. v1.0.
[Men04] Mentor Graphics. SystemC Verification with ModelSim, 2004.
[Nat08] National Semiconductor. LVDS Owner’s Manual, 2008.
[Ste03] Mike Stein. Crossing the abyss: asynchronous signals in a synchronous world. EDN Electr-
nics Design, Strategy, News, 310388:59 – 69, July 2003.
[Xil08] Xilinx. ML505, ML506, ML507 Schematics, January 2008.
[Xil09a] Xilinx. ChipScope Pro 11.3 Software and Cores - User Guide, 11.3 edition, September 2009.
v11.3.
[Xil09b] Xilinx. ML506 Evaluation Platform User Guide, October 2009. v3.1.1.
[Xil09c] Xilinx. Virtex-5 FPGA Packaging and Pinout Specification, December 2009. v4.7.
[Xil09d] Xilinx. Virtex-5 FPGA User Guide, November 2009. v5.2.
73