HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION … · This thesis presents the fht phase in the...

HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION WITH GPS CLOCK SYNCHRONIZATION

A Thesis

Submitted to the Facdty of Graduate Studies and Research

in Partial Fulfillment of the Requirements

For the Degree of

Doctor of Philosophy

in Engineering

UNIVERSITY OF REGINA

BY

Anh van Dinh

Regina, Saskatchewan

August 2000

@Copyright 2000: A. V. Dinh

National Library 1*1 ,.,da Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395. rue Wellington Ottawa ON KtA ON4 Ottawa ON K1 A ON4 Canada Canada

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant à la Bïblictheque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format electronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

ABSTRACT

This thesis presents the f h t phase in the hardware implementation of a hi&-

speed trmceiver to be used in a Multi-charnel Multi-point Distribution Services

(MMDS) system. Based on standard specifications, various building biocks were

implemented using FPGA prototypes. Among the blocks, the forward error correction for

data integrity protection is the most complicated and expensive to ïmplement. This block

includes the Reed-Solomon (RS) codec and byte interleaving to correct both random and

burst mors caused by the channel. The high-speed RS decoder was designed using

highly efficient algorithms and low latency VLSI circuits which are used to implement

arithmetic in the Galois field G F ( ~ ~ ) . Results show a data rate of 80 Mbs was obtained

using FPGA prototypes. A data rate of 200 Mbs should be achieved when ASICs are

developed using the synthesizable Verilog code that was developed for this thesis.

In addition to a high bit rate, the MMDS system also requires robust

synchronization in order to transmit a high-speed data stream and to ensure data integrity.

Timing and synchronization are critical in system design. These include both clock and

frame synchronization. In the absence of SONET and SDH, a precise frequency fiom the

GPS is used for the system reference clock instead of crystal oscillators. This fiequency

and time have very hi& accuracy that is directly and continuously traceable to the

Coordinates Universal T h e . A reference 10 MHz clock with TTL output levels was

generated 60m a GPS clock. This clock was used in the development of the high-speed

MMDS systern and the clock is proposed to be used for system clock synchronization.

Precision timing combined with a word synchronization scheme makes the MMDS

system simple, robust, low cost and reliable.

1 wish to express my gratitude to some of the organizations and people whose

support -supervision, guidance, advice and encouragement were helpfid to this research.

First, 1 would like to thank TRLabs for its continued support in financial

assistance, advice, supervision and facility required for the research. The financial

assistance from the Natural Sciences and Engineering Research Council of Canada from

S epternber 1 997 to October 1 999 is highiy appreciated. Various graduate study

scholarships provided by the Faculty of Graduate Studies and Research of the University

of Regina are also acknowledged.

Second, 1 would iike to extend my sincere gratitude to my advisors: Dr. R. J.

Palmer, University of Regina, and Dr. R. J. Bolton, University of Saskatchewan, who

provided support, advice, assistance and technical help during the research. A special

thank to Dr. R. Mason, Carleton University, for his countless help and advice fiom the

beginning to the end of this research. 1 thank Mr. J. Toth of Edge Networks Inc. in

Winnipeg for the initiation of this project. 1 also thank Mr. A. Kostiuk for his help during

my tirne at TRLabs Regina The work of Mr. N. McLeod to synthesize the GF(2m)

inversion circuits using CMOS IS 5 technology at TRLabs Saskatoon is highl y

aclmowledged. 1 also appreciate the help fiom Engineering Faculty Members and Faculty

of Engineering Staff of the University of Regina over my years of study.

Finally, 1 owe a great debt to my family, especially my wife in the past five years

since 1 started my graduate study. I thank my family for their patience, understanding,

encouragement and unfâiling moral support over the long period of time spent on the

research.

TABLE OF CONTENTS

................................................................ Abstract i

. . Acknowledgements ....................................................... il

... ........................................................ Table of Contents iii

. . ListofFigures .......................................................... mi

........................................................... List of Tables ix

ListofAcronyms ........................................................ x

..................................................... 1.INTRODUCTION 1

1.1 The MMDS technology .......................................... 1

1.2 Research objectives, contribution and methodology ................... - 6

1.2.1 Systm modeling and simulation ........................... - 8

1.2.2 Hardware description language for basic building blocks ....... - 9

................................ ..... 1 .2.3 S ynchronization .. 10

1.3Contributions ................................................. 11

1.4Dissertationoutline ............................................ 12

2 . DAVIC VERSION 1.2 SPECIFICATIONS AND MMDS SYSTEM ............. 14

2.1 Digital Audio-Visual Council Version 1.2 specifications ............... 14

2.2MMDSsyst eni ................................................ 16

2.2.1 Baseband interface, S ynchronization and Randornization ....... 19

2.2.2 Red-Solomon codec ................................... -21

CI 2.2.3 Convolutional intedeavedde-interleaver ..................... 23

2.2.4 Byte to symboI mapping ................................ -26

2.2.5 Differential encodeddecoder ........................... - 2 6

... Ill

2.2.6 TCM encodeddecoder .................................. -27

2.2.7 Quadrature amplitude modulation ...................... 30

2.2.8 QAM constellation mapping ............................. -31

2.2.9 Baseband filtering ...................................... 32

2.2.1 0 Radio fkequency interface ............................... 34

2.2.1 1 S ystem synchronization ................................. 34

3 . MMDS TRANSMITTER IMPLEMENTATION ............................ - 3 6

3.1 Baseband interface, Synchronization byte inversion and Randomization ... 36

........................................ 3.2 Reed-Solomon encoder - 3 9

3.3 Convolutional hterleaver ....................................... 42

3.4 Byte-to-m tuple conversion and differential encoder . . . . . . . . . . . . . . . . . . 45

3.4.1 Byte-to-m tuple conversion ............................... 45

3 .4.2 Differential encoding .................................. -45

3SQAMmapping ................................................ 46

4 . MMDS RECElVER IMPLEMENTATION ................................ -48

............................................ 4.1 QAM de-mapping -48

4.2 Differential decoder and m-to-byte conversion ...................... - 4 9

4.2.1 Differentiai decoder ................................... - 49

4.2.2 M-to-byte conversion .................................. - 5 0

4.3 Convolutional de-interleaver ..................................... 50

4.4 Reed-Solornon decoder ........................................ - 5 1

4.4.1 Syndrome calculation .................................. - 5 4

4.4.2 Error Iocator polynomial ................................. 55

...... 4.4.3 Error magnitude polynomial calculation and Chien search 59

................................... 4.4.4 Error value generation 61

............ 4.4.4.1 Low latency power-sum circuit in G F ( ~ ~ ) 64

4.4.4.2 Low latency exponential circuit in G F ( ~ ~ ) ............ 66

4.4.4.3 Low latency inversion and division circuits in G F ( ~ ~ ) . . -67

............................................ 4.4.5 Correction 70

................... 4.4.6 Hi&-speed RS decoder design s u m m q - 7 0

............................................. 4.5De.randornization 71

...................... 5 . M M D S SYNCHRONIZATION USING GPS CLOCK - 7 2

.................................... 5.1 The need for synchronization -72

.............. 5.2 GPS clock derivation and application in a MMDS system 76

........................ 5.2.1 GPS clock versus crystal oscillators 76

........................................... 5.2.2 GPS clock -78

5.2.3 Using GPS clock in MMDS transceiver prototype ............. 83

5.3 MMDS system synchronization .................................. - 8 4

................................... 5.3.1 Clock synchronization 85

.................................. 5.3 -2 Frame synchronization -89

6.RESULTS ........................................................... 90

............................................. 6.1 System simulation 91

6.2 Transceiver FPGA implernentation ............................... - 9 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 GPS dock testing 102

............... 7 . CONCLUSIONS, CONTRIBUTIONS AND FUTURE WORK 104

.................................................. 7.1 Conclusions 104

7.2 Contributions ................................................ 108

7.3 Future work ................................................. 109

REFERENCES ....................................................... 113

APPENDIX A: QAM constellations ....................................... 120

.................................................... A.1 16Q APc/l 121

A.164QAM .................................................... 122

A.1356QAM ................................................... 123

APPENDIX B: Galois Field .............................................. 124

B.l Galois field .................................................. 124

B.2 Construction of GF(2m) ........................................ 125

B.3 Vector representation of GF(23 ................................. 130

APPENDIX C: Algorithm to Find Error Locator Polynomial for RS Decoder . . . 13 1

. C 1 The Berlekamp-Massey algorithm ................................ 132

C.2 The Eulidean algorithm ........................................ 133

LIST OF FIGURES

Figure 1.1 A Spica1 MMDS installation ..................................... - 2

Figure 1 -2 High-speed MMDS system ................................. .. ... - 5

Figure 2.1 Transceiver block diagram ....................................... 17

Figure 2.2 Framing structure of MPEG2-TS. ................................ - 20

Figure 2.3 Conceptua1 diagram of the convolutional interleavedde-interleaver ...... 25

Figure 2.4 Byte to m-tuple conversion for 64 QAM ........................... -26

........... Figure 2.5 Implementation of the differential encoding of the two MSBs 27

Figure 2.6 Detail reference mode1 of the TCM encoder ........................ -28

Figure 3.1 Interface. synchronization and synchronization inversion ............. -37

Figure 3.2 Parallel to serial conversion and Serial to pmdlel conversion . . . . . . . . . . . 37

Figure 3.3 Randornization and De-randomization .............................. 38

Figure 3.4 Reed-Solomon encoder architecture ............................... 40

Figure 3.5 Interleaving 64 QAM ........................................... 43

Figure 3.6 Differential encoding of the two MSB's ............................ 46

Figure 3.7 Mapping and de-mapping of 256 QAM ............................ 47

Figure 4.1 Schematic diagram for the differential decoder ...................... - 5 0

Figure 4.2 De-interleaving for 64 QAM ..................................... 51

Figure 4.3 Reed-Solomon decoder block diagram ............................. 53

Figure 4.4 Syndrome calculation ......................................... - 5 5

Figure 4.5 ~ u l t i ~ l i e r in GF(~') ........................................... -59

Figure 4.6 Chien search algorithm ......................................... 61

Figure 4.7 Schematic diagram for p4 circuit in G F ( ~ ~ ) ......................... - 6 7

vii

............. Figure 4.8 Low latency inversion and division architecture in GF(Z~) - 6 8

............................... Figure 5.1 Mode1 of a communication system - 7 4

.............................. Figure 5.2 Generic GPS receiver block diagram 80

..................................... Figure 5.3 Frequency based GPS clock 81

................................. Figure 5.4 T h e based GPS clock generation 82

......................................... Figure 5.5 GPS dock TTL output - 8 4

.................... Figure 5.6 MMDS system synchronization using GPS clock - 8 7

........................ Figure 5.7 EarlyAate-gate data symbol synchronization -88

.............................................. Figure 6.1 Equipment set up 90

............................. Figure 6.2 Matlab and Simulink simulation set up 92

. . . . . . . . . . . . . . . Figure 6.3 Input and output waveforms of the Simulink simulation 93

.......................................... Figure 6 4 RS encoder waveform 96

.................................. Figure A . 1 Variety of QAM constellations 120

.................................. Figure A.2 16 QAM constellation diagram 121

.................................. Figure A.3 64 QAM constellation diagram 122

................................. Figure A.4 256 QAM constellation diagram 123

LIST OF TABLES

................................ Table 2.1 Conversion of QAM constellations - 3 2

.......... Table 6.1 Prototype resources and operation fiequency of the transceiver 94

.................. Table 6.2 Cornparison of the G F ( ~ ~ ) arithmetic implementation 98

.................. Table 6.3 Cornparison of the GF(2m) inversion implementation 99

.......................... Table 6.4 GF(23 field generating polynomials p(x) 100

........................ Table 6.5 Delay time of the inversion circuit in GF(2") 101

................... Table B . 1 Table of G F ( ~ ~ ) generated b y p(x)=x8+x4+x3+x2+1 127

LIST OF ACRONYMS

............................................. ADC AnalogtoDigitalConverter

................................................. AM .Amplitude Modulation

................................... AS IC .Application Speci fic Integrated Circuit

....................................... AWGN .Additive White Gaussian Noise

BER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B i t E r r o r R a t e

B-M . . . . . . . . . - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B e r l e k a m p - M a s s e y

.................................................... C A Coarse Acquisition

.............................................. CATV Conventional CabIe TV

......................................... DAVIC Digital Audio-Video Council

dB . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - d e c i b e l

.............................................. DBS .Direct Broadcast Satellite

.............................................. DSP .Digital Signal Processing

........................................ EDAC Error Detection And Correction

.................................... FCC Federal Communications Commission

............................................... FEC Forward Error Correction

................................................. FET Field Effect Transistor

FIFO ..................................................... First-inFirst-out

........................................ FPGA Field Programmable Gate Array

GF . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G a l o i s F i e l d

GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G i g a H e r t z

.............................................. GPS Global Positioning System

......................................... HDL Hardware Description Language

............................................... HRM .Hi& Reliability Marker

IF . . . . . . . . . , . . . , . . . . . . . . . - . . . . . . . . . . . . . . . . . . - . . . . . . . I n t e r m e d i a t e F r e quency

.................................................... IP htellectual Property

.................................... ITFS Instructional Television Fixed Services

....................................... P E G .Joint Photographic Expert Group

......................................... LFSR .Linear Feedback Shift Register

LSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - - . . . . . . L e a s t S i g n i f i c a n t B i t

................................................... Mbs .Mega bit per second

........................................ hfDS .Multi-point Distribution Services

W z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . . . . . . . M e gaHem

.......................... MMDS .Multi-channe1 Multi-point Distribution SeMces

MPEG . . . . . . . . - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Mot ionPic tu reExper tGrou p

........................................ MPEG2-TS MPEG2- Transport Stream

.................................................. MSB .Most Significant Bit

MUX.................................-.....-..............-.-Multi plexer

W...,........................+.................-....Non-Retum-to-Zero

PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . .P rogramIdent i f ica r ion

PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . P h a s e L o c k L o o p

............................................... PON .Passive Optical Network

ppm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p u l s e permillion

PPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . . . . . . . . . . . . P u l s e P e r S e c o n d

PN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . P s e u d o r a n d o m N o i s e

....................................... PRBS Pseudo Random Binary Sequence

PSK .................-.............-.-....-c-...........Pha seShifiKeying

................. ................... QAM .. -Quadrature Amplitude Modulation

.............................................. RAM .Random Access Memory

RF . . . . . . . . . , . . . . . . . . . . . . . - . . . . . . . - - . - . . . . . . . . . . . . . . . . . . . . R adioFrequency

ROM . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R e a d O n l y M e m o ~

RMS ................................................... RootMeanSquared

RS . - . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . R e e d - S o l o m o n

................................................... SA Selection Availability

.......................................... SDH Synchronous Digital Hierarchy

.................................................. SNR Sigoal to Noise Ratio

......................... ............. SONET .. .Synchronous Optical Network

............................................... TCM Trellis Code Modulation

............................................... USNO .US Naval Obsetvatory

............................................. UTC .Coordinate Universal Time

............................................. VCO .Voltage Control Oscillator

................................ VHDL Yery Large Scale Integrated Circuit HDL

VHF . . . . . . . . . . . . . - , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V e r y H i g h F r e q uency

........................................... VLSI Very Large Scale Integration

CHAPTER 1

INTRODUCTION

1.1 The MMDS technology

High speed wireless data conunnication in generai, and Multi-channel Multi-

point Distribution Services (MNDS) in particular, are becoming increasingly important

as a means of delivering high speed data services. This is due to the prohibitive cost of

wired solutions in many environments. MMDS provides cable television and data

services transmitted through the air via microwave frequencies.

The MMDS uses a band which is close to the Instructional Television Fixed

Service (ITFS) (i.e., ffequency band of 2500-2686 MHz) and has a configuration similar

to ITFS. The antenna is usually ornmidirectional to reach al1 subscribers in a "coverage

circle", hence the name: Multi-point Distribution SeMces (MDS). This s e ~ c e was the

beginning of what is known as "wireless cable". While ITFS was a distance leaming tool

that was especially suited for delivery of education to business and industry, individuals,

organizations and to otha institutions, MDS was conceived as an altemate or suppIement

to conventional cable TV (CAV). The Federal Communications Commission (FCC)

reallocated eight of the lightly used ITFS channels for use by commercial over-the-air

pay-TV operatims. This allocation allowed for sbultaneous broadcast of many more

channels in addition to MDS. The practice of using these new channels became MMDS:

Multi-channel Multi-point Distribution Services.

Using a network consisting of a main insrnitter and multiple repeaters,

microwave signals of the MMDS systems are transmitted to mal1 receiving antennae on

subscribers's rooftops as shown in Figure 1.1. The transmitter site is centered in the

coverage are% on a pre-existing tower or on the top of a tall building. The transmitter

antema is located at the highest feasible elevation for greater effective coverage. A

typical MMDS installation requires that line-of-sight be maintained between the

transrnitter antenna and receiver antenna. The RF signal is then rou td into the home

f?om the antenna through a coaxial cable into a receiver set-top.

Figure 1.1 A typical MMDS installation [ I l

The cable studio, dong with the headend, receives programming fiom a variety of

rces. These sources are fiom the broadcasts of local TV stations, playback of videos

tapes, direct 'live" feeds fkom various locations, multiple satellite dishes receiving TV

signals fiom around the world and digital data 5om 0 t h sources. Each source is

assigned a channel number, processed to improve quality and then encoded. These

pmgrams are up-converted fiom regular VHF channel fkequencies to the RF band and

supplied to the wireless channel transmitter. The transmitter amplifies and broadcasts the

information on microwave channels.

MMDS utilizes either analog or digital technology to d e b e r information to

subscrîbers. Digital signals yield laser disc picture quality as well as CD quality sound for

cable television. Digital technology increases the oumber of channels that can be

transmitted over the existing wireless spectrum to at least six times the current analog

capacity by using data compression techniques. MMDS can also offer internet access and

data services to the customers. Depending on the terrain, a single MMDS transmitter cm

provide services to the surrounding subscribers within 50 kilometers.

Wireless MMDS has many advantages over the wire cable systems: no expensive

time-consuming cable installation, higher signal reliability, higher channe1 capacity,

higher availability, lower costs of operation, lower capital requirements, and market

competitiveness. Given that signals are traflsmitted in the Gigahertz range (S band),

picture reception is not materially affected by weather interference such as rain fade [2].

One distinct advantage of MMDS is speed. MMDS is capable of handling the

accelerating demands of hi@-speed data need and helps aileviate the bandwidth-starved

1 s t mile bottleneck. However, to remain cornpetitive to Direct Broadcast Satellite (DBS),

MMDS has to provide a cheaper set-top box at the subscriber site and higher speed for

data services.

A fkequency band has been allocated for MMDS. In March 1997 the FCC in the

United States licensed thirty channels, each with a 6 MHz bandwidth, in the frequency

range of 2 150-2 162 MHz and 2500-2689.875 MHz. This licensing allowed for more

extensive research, development and marketing for the services provided by MMDS.

However, the early research was mostIy in the area of fast conversion fiom analog to

digital MMDS to accommodate more channels for the customers connected to the

MMDS networks. The data rate is still in the lower range ( les than 60 Mbs for downlink

and 20 Mbs for uplink).

Ushg the designated fiequency, a total of 33 channels are available, each 6 MHz

wide. By utilizing video compression and spectrally efficient digital modulation, the

number of programs delivered in this bandwidth can be dramatically increased [3]. Audio

and video information can be digitized and compressed and many prograrns cari be

muleiplexed Ulto a single MPEG2 transport stream. Forward Error Correction (FEC) is

used to decrease the system bit error rate @ER) and obtain higher data reliability. The

output data stream is modulated ont0 a 44 MHz IF carrier by a 64 or 256 Quadratiire

Amplitude Modulator (QAM). The use of a higher degree of QAM raises the data rate by

increasing the nurnber of bits per symbol. The IF signal is required to be upconverted into

the GHz range before transmission. Mixers and amplifiers with a flat response are

required in the R F stage, as well as, a low phase-noise local oscillator. The best way to

achieve a low phase-noise performance is to multiply up a voltage control oscillator

(VXCO) phase locked to a GPS clock reference [3]. At the trammitter stage, a Ga& FET

power amplifier should be used since these devices have lower distortion at these

fkquencies [3].

The proposed MMDS system is shown in Figure 1.2. A base station broadcasts

signals through the air. Receivers at the temiinals pick up the signals and process them to

recover digital data in different designated channels. The system RF link is in the S band

(GHz range). The microwave fkequency is 2.5 GHz, which is allocated for MMDS use b y

the FCC. The modulation technique used in the system is either 64 or 256 QAM. A

fundamental aspect of the system will be the use of software radio techniques with

wideband sarnpling and multi-channel digital s ynchronization.

\ -- ZOOMbits/s, 64/256QAM, DAVICl.2 Specification) 0

Termuial - 0

Figure 1.2 High-speed MMDS system

The design is based on the Digital Audio-Visual Council (DAVIC) Version 1.2

(1997) specificanon [4]. The system is intended to carry Motion Picture Expert Group-

Transport Stream (MPEGZ-TS) data packets. The designed hardware of the transceiver is

located in the base station and at the receiving terrninals. The transceiver includes both a

trammitter and a receiver CO-existing in the same ASIC. The need for the transmitter at

the receiver is for the two-way communications link to be established if required. Both

uplink and downlink require synchronization in order to ensure the integrïty of the

transmitted data. Synchronization irnplies that al1 system elernents must run at the same

clock rate. If the h4MDS system is connected to global communication networks, its

clock must be synchronized to a standard tkne. The use of the standard time is necessary

because global timing must be maintainecl for the network to communicate within and to

other networks effectively.

1.2 Research objectives and methodology

The main objective of this research is to develop a low cost, reliable, high-speed

MMDS system. The low cost solution will provide subscnbed customers with hi&-speed

data services without installing fiber or cables. This would be especidly attractive in low-

density populated areas. The means to achieve the high-speed MMDS system is based on

the investigation of a sophisticated VLSI transceiver and system synchronization. The

system will provide customers with small set top boxes with a highly integrated receiver.

This research fulfills the need for the implementation of a hi&-speed MMDS system and

has two objectives:

1) To design a high-speed transceiver for the wireless MMDS data link

with an intended bit rate of 200 Mbs. A high level of integration is required to reduce the

cost of the terminais. Of prhnary concem is the potential for a low cost VLSI

implementation of the baseband portion of the transceiver into a small number of ASICs.

2) To design a framework for the use of GPS clock and timing to perform

digital network synchronization for the communications link in the MMDS system.

Presently, the speed of MMDS is 40 Mbs due to technology and design

limitations. Higher speed requires faster operation ffequency of the transceiver building

blocks. Among the blocks, FEC is the major speed limitation due to the low bit rate of

error correction decoders. Therefore, increasing throughput and successful integration of

this FEC block into the system allow for higher system data rate.

One of the major topics of concentration in this research is the development of a

fast FEC block and the integration of this block into the hi&-speed transceiver. This

development requires a theoretical and architectural irnprovement of the abstract algebra

and its application into the design of error correction integrated circuits. New clocking

schemes, low Iatency circuits and better algorithms to perfonn calculations in Galois

fields must be developed and implernented.

Another major topic of concentration this research is the system synchronization.

A hi&-speed system places more stringent demands on the system's synchronization to

provide data integrity. While ensuring the services, a low cost system must be

maintained. Therefore, an inexpensive and reliable synchronization scheme must be

sought and applied to the MMDS system. In addition, the system must have the ability to

connect synchronously to the global communications network.

To develop the low-cost, high-speed, robusî, reliable MMDS s ystern, three main

areas have been studied:

1) Modeling and simulation of the MMDS system,

2) lmplementation of the transceiver using s ynthesizable Hardware Description

Language (HDL) code for various building blocks into FPGA,

3) Developing a synchronization scheme using Synchronous Optical Network

(SONET), Synchronous Digital Hierarchy (SDH) or a GPS clock.

System simulation was carried out to determine hctionality of the building

blocks and the complete system. Basic building blocks of the baseband circuitry were

developed using FPGA prototypes. HDL Verilog code of each block was wntten,

compiled, simulated and fitted into the designated FPGA. In the future, major

components of the system will be integrated into a high density mixed signal ASIC, (i.e.,

system on chip), using the hardware code on the best available technology.

Synchronization was studied to k d a suitable synchronization scheme, in particular, the

use of the GPS clock at the base station and terminais in the absence of SONET and

SDH. Research methodologies and procedures are described as follows:

1.2.1 System modehg and simulation

The simulation tool available is simuluik8 that runs with MATLAB". This is a

tao1 for modeling, simulating and analyzing dynamic systems. System blocks can be built

in a hardware-Iike fashion using gates and flip-flops or using math functions. These math

functions can be written using available MATLAB or C programming Ianguages.

Hardware-like building blocks simulate much slower in ~imulhk@ than using equivalent

math functions. However, this simulation method is preferred because it is comparable to

hardware implernentation since the simulation uses basic logic gates. A cornputer with a

large amount of rnemory (256 MB) is required to simulate these hardware compatible

biocks. The complete system may be too large to put into a single file for simulation if

the processor has limited memory. Therefore, a combination of hardware blocks and

math functions was used to buiid and to nin simulations at the system level.

The system was built using bottom-up approaches, (i.e. hierarchical and reusable

models). Each unit of the system was built and simulated separately to detennine its

fùnctionality. Al1 the units were then connected into a complete system and simulated

with both digital and analog signal processing. Different levels of white Gaussian or

Rayleigh noise were added into the channel to determine S N R , BER, etc. Using

MATLAB scopes and display blocks, results could be viewed even when the simulation

was ninning. Parameters codd be changed and the results might be viewed immediately

to make 'khat-if' exploration easy.

1.2.2 Hardware description language for basic building blocks

Baseband circuitry was developed using FPGA prototypes. Based on the

fimctionality of each circuit in the system, Vedog HDL code was written to irnplement

the block with appropnate input, output and c lochg requirernents. Verilog HDL is one

of the two most common HDLs used by integrated circuit designers, the other is VHDL.

HDL allows the design to be simulated earlier in the design cycle in order to correct

errors or experirnent with different architectures. Synthesizable Verilog code is used as

the input of the synthesis program that will generate a gate-level description (is., a

netlist) of the circuit for simulation or fabrication purposes. The designs described in

HDL are technology independent, easy to design and debug, and are usually more

readable than schematics, particularly for large circuits. The code can be written in either

structural or behavior description. Vedog has many language constructs that can be used

to d e s d e the design at four levels of abstraction: algorithm level, register transfer level,

gate level, and switch level. Verilog constructs that are not synthesizable were not used in

the design. Hierarchical and reusable designs were emphasized when the modules were

built.

To develop the FPGA prototype, the Altera's MaxPlus II@ software was used.

Written HDL Verilog code of each block was imported, compiled, simulated, fitted and

prograwnied into designated FPGAs. Using the simulation and waveform editor, these

biocks were simulated to determine both functional and timing operations. Symbol and

bit rates were deteIrnuid in the prototype u s h g timing simulation and register

performance analysis. Once compiled, block symbols were created. A graphic editor was

used to connect these block symbols into a multi-block system before recompiling,

partitioning, fitting, simulating, and programming into the FPGA devices.

Sime the HDLs are synthesizable and technology independent, an ASIC can be

developed without rewriting the code. SYNOPSIS" is the synthesis software that is used

to synthesize the codes to generate a netlist. CADENCE@ is an Uitegrated circuit design

and layout package. CADENCE imports the netlist generated from SYNOPSIS and

perfoms the necessary steps to layout the circuit for fabrication. Major components of

the MMDS system will be integrated into high density, mixed signal integrated circuits.

Available technologies will allow cost effective and higher speed, higher circuit density,

srnalIer size and lower power consumption devices. Depending on the technology used to

fabricate the IC in the future, system operation fiequency will be much higher than the

current FPGA prototypes. Al1 of the written Verilog code in the thesis is contained in [SI.

1.2.3 Synchronization

In addition to hardware implementation, synchronization was emphasized in this

research to provide a robust systern with data integrity. Synchronization involves the

estimation of both time and fkequency. In a MMDS system, a cornmon clock is used in

both transmitta and receiver. Al1 the building blocks operate at the same fkequency.

Time synchronization is emphasized in the system for data recovery.

There are three levels of tirne-synchronization: syrnbol synchronization, frame

synchronization and network s ynchronization. The fundamental tirne-s ynchronization

process is symbol synchronization. The time to start and end the symbol detection

procedure must be recognized by the demodulator. A timing error degrades the detection

performance of the receiver. The next time-synchronization level, the £kame

synchronization, allows the reconstruction of the received message. Finally, network

synchronization allows the coordination with other users in order to use the

communications resource efficiently.

In network synchronization, every clock cm be traced back to a highly stable

reference supply. Al1 the major teleco~~l~~lunications networks have set up national

synchronization networks in order to distribute a comrnon timing reference to al1 of the

equipment. This timing reference is traceable to a nationai Primary Reference Clock or

PRC. The PRC is a Time-Server clock that maintains Universal Coordinate Time (UTC).

Since MMDS is connected to larger networks for data transmission, a traceable clock to

UTC is required for the network synchronization. The GPS clock was chosen to clock the

MMDS system instead of crystal oscillators due to its performance and UTC traceable

ability.

1.3 Contributions

Some of the contributions provided by this work greatly improve the efficiency of

any wireless co~~~~~iunicat ion system in both design and implementation.

1. The thesis provides a system simulation source code for MMDS system ushg

Matlab and Simulink. The simulation code c m be used to simulate any wireless

communications system provided a few minor modifications on individual blocks

because MMDS system is a standard communication systern.

2. This thesis provides a technology independent synthesizable HDL source code

for communication industries to fabricate a hi&-speed baseband MMDS transceiver.

Technology transfer is available thrùugh TRLabs with al1 the source code [SI.

3. The thesis proposes new algonthms for fast calculation of multiplication and

inversion in Galois fields. High-speed Galois Field arithmetic HDL source code is also

provideci for different degrees of Galois fields [5] .

4. The thesis shows a novel implementation of a convolutional interleaver and

deinterleaver accomplished by using a new clocking scheme [5 ] .

5. The thesis also provides a synthesizable HDL code for a high-speed, low

complexity (255,239) RS decoder core [SI.

6. The thesis proposed a novel application of the Global Positioning System: a

cost effective synchronization scheme for MMDS system. The system uses a single

precision GPS clock at the transrnitter and transfers this standard tune traceable clock to

al1 of the receivers.

1.4 Dissertation outline

This thesis outlines the proposed high-speed MMDS system development in two

sections. The £ïrst section is on the hardware implementation of the transceiver. The RF

circuit, equalizer and matched filter are excluded in this thesis. The excluded circuits are

heing studied at Carlton University in Ottawa, Canada. The second section is on the

integration of GPS clock into the MMDS system for synchronization purposes.

The implementation of a high speed MMDS systern that meets standard

specifications is dealt in the fïrst section. The intended speed of 200 Mbs is much higher

than the DAVIC specified speed of 60 Mbs [4]. Specifications of the building blocks are

used as they are or modified to suit hi&-speed operation. In spite of using DAVIC

specifications, al1 of the synthesizable Verilog HDL code was manually generated

without using any outside source code.

In this research, an extensive effort was dedicated to the development of the high

performance FEC for hi&-speed operation, The design of the convolutional interleaver

and de-interleaver lowers the FEC hardware complexity. The work on the developrnent

of the low latency VLSI architectures in G F ( ~ ~ ) arithmetic makes it possible to design the

hi&-speed RS decoder used in the system. Three chapters are devoted to the first section

on irnplementation. Chapter 2 describes the details of the MMDS system and the DAVIC

standards used to build the hi&-speed transceiver. Hardware implernentation of the

transmitter and the receiver is described in Chapter 3 and Chapter 4 respectivefy.

The second section de& with system synchronization. The DAVIC specification

does not specie the use of GPS clock for system ~~chronizat ion. The proposed GPS

synchronization architecture was developed. This was also the work camed out during

the research and it is TRLabs property. The proposed synchronization scheme takes

advantages of the new development in a wide variety of GPS applications. A precision

GPS clock, which is iraceable to UTC, is used for system synchronization instead of

crystal oscillators. Chapter 5 addresses synchronization issues and the integration of a

GPS clock into the hi&-speed MMDS system.

Chapter 6 presents the results achieved in the system simulation, the hardware

irnplementation and the GPS clock synchronization. Findly, Chapter 7 draws sorne

conclusions reached in the research and presents the areas where future research msy be

conducted.

CHAPTER 2

DAVIC VERSION 1.2 SPECIFICATIONS AND MMDS SYSTEM

The hi& speed MMDS systern was developed based on the standards and

spe~ifications set b y Digital Audio-Visual Council (DAVIC) Version 1.2. In this chapter,

the specifications fiom this standard body are introduced and the building blocks of the

baseiband transceiver are described.

2.1 Digital Audio-Visual Council Version 1.2 specifications

DAVIC is a non-profit association based in Switzerland, with a membership of

over 219 companies f?orn more than 20 coutries. it represents dl sectors of the audio-

visu-al industry: manufactunng (cornputer, consumer electronics and telecommunications

equiipment) and service (broadcasting, te~ecommunications and CATV) as well as a

nuamber of government agencies and research organizations. DAVIC was established in

1994 "with the aim of promoting the success of interactive digital audio-visual

appPications and seMces by promulgating specifications of open interfaces and protocols

that maximize interoperability, not on1 y acro ss geographical boundaries but also across

diverse applications, services and industries" [4].

The purpose of DAVIC is to advance the success of emerging digital audio-visual

appnications and services, initially of the broadcast and interactive type. This is done by

prowiding intemationally-agreed specifications of open interfaces and protocols that

maxirnize interoperability across countries and applications for services.

The goals of DAVIC are to identik select, augment, develop and obtain the

endorsement by fomal standard bodies for specifications of interfaces, protocols and

architectures of digital audio-visual applications and services. The DAVIC 1.2

specification has been developed by participating DAVIC members. Submissions for this

specification came f h m both members and non-members in response to "Calls For

Proposals" which were issued ui Mar& 1996. The DAVIC 1.2 specification is a super-set

of DAVXC 1.1 and was released in 1 997. This specification includes 13 parts. Part 1 and

2 describe the DAVIC fiinctionalities and system reference models as well as scenarïos in

which it is to operate. Part 3 and 4 provide specifications for the service provider system

architecture and delivery system architecture interfaces. Service consumer system

architecture and high level application program interface is described in Part 5. Part 6 is

reserved for k t w e use. Part 7 describes high and mid-layer protocols. Part 8 provides the

lower-layer protocols and physical interfaces specifications. The remahder of the

specification describes the information representation, the basic security, the usage

information protocols, the reference points and the interfaces and dynarnics and the

conformance and interoperability.

The high speed MMDS transceiver was built, based on the Part 8 specifications.

One section in this part, the physical layer interface, describes the complete physical

layer structure. This structure includes fiarning, charnel coding and modulation for the

carriage of content-information flow fiom a source to a destination through MPEGZ-TS.

This physical layer interface supports unidirectional transmission over radio fiequencies

up to 10 GHz. This is referred to as QAM-link on MMDS. QAM is specified due to its

performance in spectral efficiency. Three Ievels of modulation, 16 QAM, 64 QAM and

256 QAM, are de£ined to allow for flexible implanenîation of the MMDS system.

2.2 MMDS system

The system block diagram of the baseband transceiver is depicted in Figue 2.1.

To ensure a Bit-Error-Rate (BER) of less than 10-I2 at the receiver end, FEC is employed

using a Reed-Solomon ( R S ) codec and Trellis Code Modulation (TCM). Protection

against burst errors is achieved through the use of byte interleaving. A differentid

cracodeddecoder on the two most significant bits of each symbol is used to provide a

rotation-invariant on QAM constellations. Randomization is also being used for spectnim

shaping and synchronization purposes. End-to-end network synchronization is perforrned

using a common clock and a Fame synchronization technique.

Brief function descriptions of the various blocks in the system are as follows:

Baseband interfacing and synchronization: This block adapts the data structure to

the format of the signal source. The fiarning structure will be in accordance with

MPEG2-TS (including synchronization bytes).

Svnc I inversion and randornization: This unit inverts the synchronization 1 byte

according to the MPEG2 fiaming structure, and randomizes the data stream for spectrum

shaping purposes.

Reed-Solomon (RS) coder: This block applies a shortened RS code to each

randomized transport packet to generate an error-protected packet. This code is also

applied to the synchronization byte itself.

ConvoZutional interleavec This block perforns a convolutional interleaving of

the error-protected packets with I=12, M=17 (for 16 and 64 QAM) and I=204, M=l (for

256 QAM). The periodicity of the synchronization bytes remains unchanged.

Bvte to m-tuple conversion: This block performs a conversion of the bytes

generated by the interleaver into QAM symbols.

Differential encodim: In order to get a rotation-invariant constellation, this block

applies a differential encoding of the two MSBS of each symbol.

Trdlis Code Modulation (TCMI encoder/decoder: When used, the TCM replaces

the 'Byte to m-tuple conversion' and 'Differential encoding' blocks. The TCM's purpose

is to convolutionally encode the bits into the modulation and perform the differential

encoding. When it is not used, it will be bypassed.

Baseband shapina: This block performs mapping fiom differentidly encoded m-

tuples to I and Q signals and a square-root raised cosine filtering of the 1 and Q signals

prior to QAM modulation.

OAM modulation and phvsical interface: This block performs QAM modulation.

It is followed by interfacing the QAM modulated signal to the Radio Frequency (RF)

MMDS channel.

MMDS receivec A system receiver perfonns the inverse signal processing, as

described for the modulation process above, in order to recover the baseband signal.

The complete 'block descriptions are as follow:

2.2.1 Baseband interface, Synchronization and Randomization

This unit adapts the data structure to the format of the signal source. Data coming

in to the transmitter will be fonned into fiames. The fkaming structure is in accordance

with the Motion Picture Expert Group (Le., MPEG2).

MPEG2 is a video compression standard developed in the mid 90s for digital

television. MPEG2 is based on the discrete cosine traasform and is an evolutionary

extension of earlier video compression standards m.26 1, REG, and MPEGI). Audio and

video information c m be digitized, cornpressed, and many programs can be multiplexed

into a single MPEG2-TS.

The MPEG2-TS is defïned by the International Organization for Standardization

(ISO) [7]. It is compnsed of a packet having 188 bytes with one byte for synchronization

purposes, three bytes of header containing service identification, scrarnbling and control

information, followed by 184 bytes of MPEG2 or auxiliary data. The systern fiarning

structure is shown in Figure 2.2.

The total packet lm& of the MPEG2-TS packet is 188 bytes as show in Figure

2.2a. This includes 1 synchronization byte (Le., 0x47). The processing order at the

transmitting side always starts fkom the MSB of the synchronization byte (Le., the MSB

of 0 1000 1 1 1). In order to ensure adequate binary transitions for clock recovery, the data

at the output of the MPEG2-TS multiplexer is randomized in accordance with the

configuration depicted in Figure 2.2~. Randomization is a process of rernoving auto-

correlation fiom a signal, i-e. white noise spectnun shaping at the transmitter side to ease

symbols or bit timing recovery at the receiver side. The randornization process uses a

generator polynomial to generate a random sequence and then applies the sequence to a

data stream.

a) MPEG2- Transport Stream MUX Packet

Sync byte 010001 1 1

i Pseudo Random Binary Sequence period = 1503 bytes i

187 bytes

b) Randomized TS packets: Sync bytes and Randomized Sequence R

,-

initialization sequence

EnabIe Clear/randomized data output data input

// ' //

Figure 2.2 Framing structure of MPEG2-TS

D I

,,

The polynomial used for the Pseudo Random Binary Sequence (PRBS) generator

in this process is 1 + xL4 + XI'. Laading the sequence ' 1001010 L0000000' into the PRBS

// -

registers, as indicatd in Figure 2 . 2 ~ ~ will initiate the start of every 8 transport packets. To

R

I I I I

R 187 byte% 1,

provide an initialization signal for the derandomizer at the receiver, the MPEG2

'ync8 I D

'yncZ Spc '

synchronization byte of the fïrst transport packet in a group of eight packets is bit-wise

187 bytes R

187 byte4 -

inverted fkom 0x47 to OxB8 (i-e., 1 0 1 1 1 000).

,, 'ync1 R

The lïrst bit at the output of the PRBS generator will be applied to the first bit of

187 bytes

the first byte following the inverted MPEG2 synchronization byte (i.e., OxB8). To aid

other synchronization functions, during the h4PEG2 synchronization bytes of the

subsequent 7 transport packets, the PRBS generation continues, but its output is disabled,

leaving these bytes unrandornized. The period of the PRBS sequence wiU be 1503 bytes

a s shown in Figure 2.2b. The randomization process is active also when the modulator

input bit-stream is non-existent, or when it is non-cornpliant with the MPEG2-TS format

(Le., 1 synchronization byte + 187 packet bytes). This is to avoid the emission of an

unmodulated carier from the modulator [43. After fiame randomization, the packets are

then sent to the FEC block that includes a RS encoder and a convolutional interleaver.

2.2.2 Reed-Solomon codec

To achieve the appropriate level of error protection required for MMDS

transmission of digital data (i-e., BER=~o-'~), a FEC based on Reed-Solomon (RS) [7]

coding is used. FEC means that a digital systern c m detect and reconstruct an erroneous

transmitted message at the receiver, without requesting a retransmission. In this type of

Error Detection and Correction (EDAC) strategy, the FEC system accomplishes this by

analyzingthe redundant data transmitted dong with the message. One of the means to

obtain the required system BER is the utilization of RS codec coding. The RS error

correction codes have an extremely pronounced effect on the efficiency of digital

communication channels [8]. For example, the (255,239) RS codes achieves a BER of

1 0-l2 fiom an uncoded BER of 1 O" [9,1 O]. in the developrnent of the high-speed MMDS

system, the RS codes are chosen over the other codes because the basic unit of

information of these codes is symbol based (i.e., the codes are b a s 4 on byte wide

symbolsj. The MMDS system uses 8-bit symbols in the MPEG2-TS which fit into the

category of RS codes. An encoder and decoder are used to realize RS codes in the

system.

The use of RS codes is weU suited for correcthg of both random and burst mors

caused by the transmitting channel over the air such as MMDS system [Il , 121. The codec

(codeddecoder) can be ùnplementcd using either software or hardware. So far, the

software implementation has failed to increase the operating fiequency of the codec

which in tum results in a higher systern data rate. Hardware implementation, such as in

an ASIC, has the highest throughput. The theoretical architecture and implementation of

the RS codec were the subject of numbers of research papas [12- 141 and also one of the

subjects of this research. Today, RS codes remain among the most efficient codes that

c m be implemented using state-of-the-art software and hardware technology.

Each MPEG2 transport data packets consists of 188 bytes (Le., 8 bits), the closest

code that c m be used is the (255,239) RS code. This specific code has an ability to

correct up to any 8-symbol errors in a codeword of 255 symbols. Each 8-bit syrnbol is an

element of the 256 elernents in the Galois Field ( ~ ~ ( 2 ~ 1 ) . A GF or fhite field is a finite

set of elernents in which one can do arithmetic functions without leaving the set. The

field is generated by a primitive field polynomial [l 11. Appendix B provides a complete

description of the Galois Field-

The primitive field polynomial p(x)= x8 + x4 + x3 + x2 + 1 is used to generate the

finite field for the (255,239) RS codes. The RS codec also uses a code generator

polynomial to generate codes in the encoder and to find and correct errors in the decoder.

The (255,239) RS codes use the following polynomial to generate the codes in ~ ~ ( 2 ~ 1 :

g(x)= (x+u)(x+a2)(x+a3) . . . . (x+a15),

or in term of polynomial coefficients:

15 g(x)=go+glx+g2x2+ g3x3+ - +g*sx ,

and in terms of the primitive elernent, a, in the nnite field G F ( ~ ~ ) is:

2 0 8 2 1953 g ( ~ ) = t C X ~ ~ ~ X t a x + a x + aI8'x4 + al"xS+ a 2 0 1 ~ 6 + a i 0 0 ~ 7 + a11X8 +

83 9 167 10 107 1 1 113 12 Il0 13 106 14 a x + a x +a x +a x + a x +a x +a121~15+~16 .

RS codes are linear block codes and belong to a group called systematic codes.

Such codes leave data unchanged and append parity symbols to the data stream. These

parity symbols are generated by encoding the data stream using a code generator.

The MPEG2-TS is 188 symbols in length and is encoded using the (255,239) RS

code. The code must be shortened to become a (204,188) RS code. The shortened RS

code is implemented by appending 51 symbols, al1 set to zero, before the 188 MPEG2

symbols at the input of the (255,239) encoder. After the coding procedure, these zero

symbols are discarded Ieaving 204 symbols to be transniitted. At the decoder, 51 bytes of

symbols are then inserted before the decoding process begins.

To increase the efficacy of the RS code against burst errors of the transmitting

channel, the RS codec is followed by a convolutional interleaver.

2.2.3 Convolutional interleaverlde-interleaver

The errors caused by noise sometimes are not random bits but a long series of bits

that affects many symbols or a large group of short errors. This results in more error

symbols than can be corrected in a single block (Le., beyond the correcting capacity of

the codes). One way tu overcome this burst error problem is to add interleaving.

InterIeaving data in the system enhances the random-error correcting capabilities of a

code to the point that it can dso become useful in a burst-noise environment. The overall

effect of interleaving is to spread out the long burst errors so that they appear to the

decoder as independent randorn errors or shorter more manageable burst errors.

Interleaving in error correction coding has the benefit of increasing system

robustness by making the system more immune to bursty errors, typicd in over the air

transmission. This interleaving function is essential for transport channels that require a

low BER. It improves the efficiency of RS encodeddecoder by spreading burst mors

across several codewords [ 151. In the MMDS system, a block of 204 symbols is entkely

corrected by the RS decoder (Le., less than or equal to 8-syrnboi mors) or not at al1 (i-e.,

more than 8-symbol errors). By adding interleaving, the burst errors of more than 8

symbols will spread out to many codewords.

The basic operationi of an interleaving subsystem is to remange the encoded data

over a span of several block lengths. The amount of error protection, based on the length

of burst encountered on t h e channel, determines the span length of the interleavers The

interleaver must be given the details of the data arrangement so that the data stream can

be de-interleaved before it 3s decoded.

The interleaver is wmposed of 1 branches, cyclicdly comected to the input byte-

stream by an input switch on the lefi of Figure 2.3. The number of branches, 1, depends

on the order of QAM modulation used, I=12 (for 16 and 64 QAM) and I=204 (for 256

QAM). Each branch will B>e a First in First Out (FIFO) shifi register, with a depth of M

cells, where M=N/I and N=204th- e enor protected fiame length (Le., 188 bytes MPEG2

and 16 parity bytes fkom tne RS encoder). The cells of the FIFO will contain 1 byte, and

the input and output switches will be synchronized.

Index Sync. word route Index

O* Sync. word route * O c -IMIMIMI .;; I M + O

1 4 - 1 2- MI M I A

* 2 N

i3- O 3 N

Switch one byte per position

I= 12 for 16 and 64 QAM 1=204 for 256 QAM

M= 2 M A = M-stage FIFO shifl regina

Figure 2.3 Conceptual diagram of the convoiutional interleaver/de-interleaver

The interleaved fiame will be composed of overlapping error protected packets

(from the RS encoder) and will be delirnited by MPEG2 synchronization bytes

breserving the periodicity of 204 bytes). For synchronization purposes, the

synchronization bytes (0x47) and the inverted synchronization bytes (OxB8) are always

routed into the branch "O" of the interleaver (corresponding to a nul1 delay, see Figure

The de-interleaver is sirnilar Iri principle to the interleaver, but the branch indexes

are revased (Le. branch "O" corresponds to the largest delay). The de-interleaver

synchronization is achieved by routing the first recognized synchronization byte into

branch "O". Figure 2.3 depicts the convoIutional interleaver md de-interleaver to be used

in the MMDS systern depending on the QAM leveis (i-e., the numbers of branches 1 and

the numbers of ceIl depth M). The total delay for the operation is equal to the frame

length N, which is 204 bytes in length.

2.2.4 Byte to symbol mapping

After convolutional interleaving, an exact mapping of bytes into symbols is

performed. The mapping is relied on the use of byte boundaries in the modulation system.

This mapping in the MMDS system is necessary when the 64 QAM is used. The mapping

converts 8-bit bytes into 6-byte symbols before QAM modulation. In this case, the MSB

of the symbol Z is taken fiom the MSB of byte V as shown in Figure 2.4 for 64 QAM.

Fmm interkaver b7 b6 b5 W b3 b2l b i W ( b7 b6 bS b4/ b3 b2 b l bO 1 b7 b6 bS M b3 b2 b 1 W output (bytes) I 1

I

Notes: bO is understood as being the has t Significant Bit (LSB) of each byte or m-tuple. in this conversion, each byte resdts in more than one m-tuple, labeled Z, Zi- L , etc, with Z being transmitted before Z+ 1.

Figure 2.4 Byte to m-tuple conversion for 64 QAM

Correspondingly, the next significant bit of the symbol is taken fiom the next

significant bit of the byte. For the case of 2" QAM modulation, the process will map k

bytes into n symbols, such that 8 h - n . The process in Figure 2.4 illustrates the

conversion for 64 QAM system where m-6, k=3 and n=4.

2.2.5 Differential encoder/decoder

Differential coding provides protection against 1 80' phase ambiguity in the

charnel. This feature is essentid when QAM is used in the transmission since the rotation

of the symbol determines its position in the QAM constellation. The differential coding

obtains a 7d2 rotation invariant in the QAM constellation. The DAVIC 1.2 specification

requires that the two most significant bits of each symbol be coded differentially before

symbol mapping is done as shown in Figure 2.5.

8 I D

h m I

convolutional interieaver

q bits (ba-l -.. bo) . , I ,

q=2 for 16 QAM, p= for 64 QAM, q=6 for 256 QAM

Figure 2.5 Implementation of the differential encoduig of the two MSBs

The differential encoding of the two MSBs is given by the followi~g Boolean

expressions:

Qk =(A,, @Bk)*(Bk @ Q , d + ( A k @B&(Ak @ I k - l )

where k indicates present state and (k- 1) indicates previous state.

2.2.6 TCM encoderfdecoder

Trellis-coded modulation (TCM) is a combined modulation and coding technique

for band-Iimited channels with which coding gains relative to uncoded modulation are

achieved without bandwidth expansion [ 16,171. With simple TCM schernes, coding gains

of 3 dB are obtained easily and gains of up to 6dB can be achieved using more cornplex

schemes [ 161. This gain cornes Tom the efficiency of TCM codes [ 1 11. TCM differs f h m

traditional error-control coding in the way that the coding is used to make transmission

errors less likely rather than detectkorrect mors because the TCM codes follow a pre-

deterrnined trellis diagram. The TCM scherne in MMDS does not rnodi* the shape of the

constellation nor the spectnim shape, but ody provides additional coding gain that will

increase coverage area of the transmitter [4]. The trade off of TCM in the coding gains

without sacrificing bandwidth is at the expense of a greater receiver complexity [18].

The TCM coding scheme consists of a combination of a differential encoder, a

convoluîion encoder and a mapper to the signal QAM constellation. This process takes

place in the MMDS transceiver of Figure 2.1 a k the RS encoder and convolutional

interleaver. Figure 2.6 shows the detail reference mode1 of the TCM encoder.

Differential Encoder Convolutional Encoder Mapping

Figure 2.6 Detail reference mode1 of the TCM encoder

The differential encoder has the same function as the one described in Section

2.2.5. The convolutional encoder is a 16-state with a rate of 213 encoder (Le., one parity

bit is added into two data bits). The encoder implementation is a 3-bit shift register

interconnected by XOR and AND logic.

The operation of the convolutional encoder is as follow. Output of the three-

encoder memory bits in Figure 2.6 is called the delay state and the set of output bits is

known as the path state. The size of the encoder memory is referred to as its constraint

length. The path taken by the coded data follows a trellis structure [ 1 1,191. The particular

path chosen at a time interval depends on the curent path state of the encoder. The three-

bit path state is part of the output of the encoder. The other input bits are not encoded and

are passed to the output directly fkom the input stage. The output syrnbol is Iarga than

the input symbol since it contains error-correction information in addition to the message

data. The output bits are mapped into a QAM constellation and modulated for

transmission.

In the receiver, a maximum-Iikelihood Viterbi decodhg algorithm is used to

estimate and reconstnict the original transmitted data [I l ,20-241. The Viterbi algorithm

rnaximizes the correlation between the received vector and a table of possible codewords

while sequentially performiog the opposite operation of the encoder [25]. It makes use of

the past history and reliability information to decode incorning data. A necessary

ingredient of the decision decoder is a suitable distance (or cost) fiuiction. The decoder

keeps track of d l the possible states d I it decides which one to select. The actual

decision is delayed until sufficient information is available. The length of the past history

analyzed by the decoder (Le., tracking length) is one of the key factors af5ecting the

performance of the Viterbi decoder. In general, the tracking length should be four or five

times the encoder constraint Iength. Any fkther increase in tracking length provides only

a small increase in performance [ 1 1,19,26].

The complexity and the delay of the Viterbi decoder depend on the number of

states and the tracking length of the codes [25,27,28]. The tracking length of this Viterbi

TCM decoder is 32. In the implementation, the TCM block has been excluded due the

complexity of the Viterbi decoder and is reserved for future study to increase system

coverage area by investigating the trade off between the receiver complexity and coding

gains.

2.2.7 Quadrature amplitude modulation

QAM is used as a means of encoding digital information over communication

links. It is a f o m of digital modulation where the digital information is contained in both

the amplitude and phase of the ttansmitted carrier [29]. Therefore, this method is a

combination of amplitude and phase modulation techniques. QAM is an extension of

multiphase Phase Shift Keying (PSK) which is a type of phase modulation. The primary

difference between the two is the lack of a constant envelope in QAM versus the

presence of a constant envelope in PSK techniques. In general, the use of QAM is to

serve the bandwidth conservation fünction since information signals are sent in the same

bandwidth [30]. The QAM technique is used as a result of its performance with respect to

spectral efficiency [ 173.

QAM is closely related to the original non-rem-to-zero (NRZ) baseband

transmission. Al1 QAM versions can be fomed by generating two multilevel pulse

sequences fiom the initial NRZ sequence, and applying these to the two carriers that are

offset by a phase shifi of 90 degrees. Each modulated carrier then yields an AM signal

with suppressed carrier. Since carrier multiplication in the tirne domain corresponds to a

shifi in the frequency domain, the modulated spectrum maintains the shape of the two-

sided baseband signal spectrum.

The s p e c m of a QAM system is determined by the spectnun of the baseband

signals applied to the quadrature channels. Since these signals have the same basic

structure as the baseband PSK signals, QAM spectrum shapes are identical to PSK

spectrum shapes with equal nurnbers of signal points. Even though the spectnun shapes

are identical, the m o r performances of the two systems are quite different. With large

numbers of signal points, the error performance of QAM systerns outperfoms PSK

counterparts C29-3 11. The basic reason is that the distance between signal points in a PSK

system is smaller than the distance between points in a comparable QAM system [29].

QAM can have any number o f discrete digital levels. Cornmon levels are 4 QAM,

16 QAM, 64 QAM and 256 QAM. It is bas& on amplitude modulation of "quadrature"

carriers, 90 degrees out of phase with each other. For the DAVIC 1.2 specifications, 16

Qhl, 64 QAM and 256 QAM levels are defhed. Two grades of QAM level are

specified for MMDS in DAVIC 1.2:

1 Grade 1 OAM Level 1

16 and 64 16 and 64 and 256

A QAM modulator (&ansrnitter) will support at least one of the QAM levels: 16,

64 or 256. A QAM demodulator (receiver) will support A or B grade of QAM level. The

modulation of the MMDS system will be QAM with 16, 64 or 256 points in the

constellation diagram. The 16 QAM uses +bit symbols, the 64 QAM uses 6-bit symbols

and the 256 QAM uses %bit symb-ols in the mapping and the modulation process. The

MPEG2-TS uses 8-bit symbols, i f 64 QAM is to be used, the conversion from 8-bit

symbols to 6-bit symbols is required as shown in Section 2.2.4.

2.2.8 QAM constellation niapping

Prior to modulation, symbols are mapped into positions depending on their

values. These position maps are called QAM constellations. The system constellation

diagrams for 16 QAM, 64 QAM and 256 QAM are defined and shown in Figure A.2,

Figure A.3 and Figure A.4 respecfively (see Appendix A). As shown in the diagrams, the

constellation points in Quadrant I are convertecl to Quadrants 2, 3 and 4 by changinp the

two MSB's (Le., Ik and Qk in Figure 2.5) and by rotating the q LSB's according to the

following rule given in Table 2-1 below.

Table 2.1 Conversion of constellation of quadrant 1 to other quadrants of the constellation diagrams given in Figure A.2, Figure A.3 and Figure A.4 in Appendix A.

1 Quadrant 1 MSB 1 LBSs rotation 1

2.2.9 Baseband fdtering

This filtering is required for baseband shaping purposes. The baseband shaping

controls the shape of the pulses used to transmit the sarnple values [30]. This pulse shape

detennines the degree of intersymbol interference during transmission. The ideal shape,

which has no cross talk between the syrnbols, cm not be achieved in the filtering process

(i.e., the ideal filter can not be built in the real world). The desirable compromise for the

ideal shape is the raised-cosine characteristic since it is possible to build a raised-cosine

filter but not an ideal filter. The roll-off of the pulse shape off the filter depends on a roll-

off factor. The filters using a high roll-off factor yields better approximation to the ideal

shape.

In the MMDS system, the 1 and Q signais fiom the QAM mapping are square-root

raised-cosine filtered before modulation. The square-root raised-cosine has the square-

root of the raised-cosine characteristics. The roll-off factor of the filter, a, is 0.13 or 0.15

depending on the channel bandwidth (Le., 6 MHz or 8 MHz). The square-root raised-

cosine filter has a theoretical fimction d e k e d by the following expression:

1 Rs where fN = - - - - is the Nyquist fiequency . 2Ts 2

The impulse response of the transmitter filter characteristic is given in the

following section. The time-domain response of a square-root raised-cosine pulse with

excess bandwidth parameter a is given by:

where T is the symboI period.

The output signal is defined as

where T is the syrnbol period (T=l/f,), and % is the moduiator's carrier fiequency. The

values of 1. and Qn are as follow: For 16 QAM, 1, and Q, are equal to +1 or 13,

independent of each other. For 64 QAM, In and Q. are equal to *l or é 3 or *5 or *7,

independent of each other. For 256 QAM, In and Q. are equal to hl or k3 or A5 or *7 or

*9 or * i 1 or hl 3 or A 15, independent of each other.

The convolution of the transmitter filter impulse response with itself d l have

intersymbol interference of less than -40dB (RMS) [4].

2.2-10 Radio frequency interface

Afer filtering, the digital 1 and Q signals are converted to analog and modulated

in quadrature onto a 44 MHz IF carrier according to Equation:

s(t) = 1-cos(o-t) + Q.sin(w-t). (2-5)

The IF signal is then transported and fed to the MMDS main transrnitter. The

transmitter upconverts a 44 MHz signal to a MMDS channel in the GHz range by mixing

the signal with a local oscillator. The mixer output is then amplified to a power levei

rangùig fiom 15 to 100 watts of average power before sending to the antenna for

transmission. At the received end, the signal is downconverted to a 44 MHz IF and

demodulated by a QAM demodulator.

The use of QAM puts very stringent specifications on the MMDS transmitters for

local oscillator phase noise and amplifier linearity. The higher the degree of QAM to be

used in the system, the better performance demanded fiom the LO and the amplifier is

required [3].

2.2.1 1 System synchronization

Figure 2.1 indicates that al1 of the transceiver building blocks must operate using

a common clock and must be synchronized within the system. The reason for using the

same clock and system synchronization becomes clear by examing the clock&sync

generation in the trammitter and the clock&sync regeneration in the receiver in Figure

2.1. This also implies that the system must be synchronized in both fiequency and time.

In order to achieve system synchronization, there are measures to implement the

procas. These uiclude clock synchronization and m e synchronization within the

system, The clock synchronization requirement is much more stringent since the

transmitter would be connecteci to the ~ t n i c t u r e networks .-. Tbese networks have the

ability to use the standard clocks of SONET or SDH. These clocks adhere to a global

standard tirne called UTC. In order for the MMDS system to synchronize properly to the

global network, the clock must have the time that can be traced to a universal standard

time.

The DAVIC 1.2 specifies that, with the absence of SONET or SDH, a locally

generated dock can be used as a system clock. This d o c k must meet certain

requirements. These requirements are discussed in Section 5-3.1. In addition, the clock

used in the receivers must be re-generated from the transmitter clock in order to be traced

back. This feature is necessary, specifically when the receivers transmit the data back to

the transrnitter through the two-way co~~l~~lUIUlcations.

In general, the synchronization cornpliance of the high-speed MMDS system

ensures data integrity of the communication Iink throughout the hierarchy. The

randomization process is to ensure binary transitions for cloc>k recovery at the receiver.

The insertion of synchronization byte in every 188 bytes and the inverse of the

synchronization byte in every eight MPEG2-TS packets are frame synchronization. The

MMDS system also utilizes another tool to provide i m p r o d packet synchronization

robustness, the High Reliability Marker (HRM). The HRM is accomplished by the format

of the HRM as a field carried in the normai payload area of a standard MPEG2-TS null

packet. The HRM packet is inserted into the MPEG2-TS pnor to the fiarning operation of

randomization and interleaving.

CHAPTER 3

MMDS TRANSMITTER IMPLEMENTATION

The transmitter blocks, shown in Figure 2.1, were implemented using FPGA

prototypes with the intention of development into an ASIC. Modules in Verïlog code

were used to describe the blocks. These modules were designed separately and were

comected into the system witb appropriate clocking. There are five main modules to be

implemented in the transmitter: the baseband interface, the Reed-Solomon encoder, the

Convolutional Interleaver, the m-tuple conversion and differential encoder, and the QAM

mapping. This Chapter describes these modules in detail. nie trellis code modulation

encoder is omitted as stated in Chapter 2.

3.1 Baseband interface, Synchronization byte inversion and Randomization

There are four modules to implement these three h c t i o n s into VLSI. The first

module generates a synchronization byte, inverts a synchronization byte and converts 187

bytes of MPEG to MPEG2-TS packets. As shown in Figure 2.2, the inverted

synchronization byte is at the beginning of every 8 packets and the other 7

synchronization bytes are in the subsequent packets. These synchronization bytes are

added to the MPEG 187 data bytes to form 188 byte packets before being randomized. A

sirnplified schematic diagram for this module is shown in Figure 3.1.

data [7:0]

8

counter ..

clk stqack D O +

Figure 3.1 Interface, synchronization and synchronization inversion

The second module converts the MPEG2 packets into a serial data bit stream to be

randornized. This is a simple parallel to serial conversion where the input is an 8-bit

symbol and the output is a serial bit stream. It consists of a 3-bit counter to control an 8-

to- l multiplexer. The diagram for this module is shown kt Figure 3.2a.

Parallel data

Serial Serial Parallel data

clock counter

a) Paraiiel to serial conversion b) Senai to parailel conversion

Figure 3.2 Parallel to senal conversion and Serial to paraIlel conversion

The third module randomizes the data strearn using PRBS as described in Section

2.2.1. The initialization sequence is generated in the modules at the beginning of every 8

packets. Generating and starting the sequence is initialized by the inverse synchronization

bytes. This signal is derived fiom the f is t module where the inverted synchronization is

generated. The schematic diagram for this module is shown in Figure 3.3.

Figure 3.3 Randornization and De-randornization

The last module is a simple serial to parallel conversion. Its fûnction is the

inversion of the second module. The input is serially randomized data and the output is an

8-bit randomized symbol ready to be sent to the Reed-Solomon encoder. There are two

clocks to operate the modules. One clock is 8 times faster than the other for the serial to

parallel conversion. Figure 3.2b depicts the diagram for the serial to parallel module.

3.2 Reed-Solomon encoder

Reed-Solomon codes are based on a special area of mathematics known as Galois

Field or £bite field. The RS encoder needs to cary out the arithmetic operations (i.e.,

add, subtract, rndtiplication) in this field to perform its fimction. These arithmetic

operations require special hardware functions to implernent. The addition and subtraction

operations are similar and use only bit-wise XOR gates while the multiplication and

division operations are more complex and require the use of fast calculation algorithms.

The implementation of these two operations is describecl in detail in Section 4.4.4.

Appendix B provides an explanation of GF arithmetic.

In this (255,239) RS codec, each codeword consists of 239 message symbols and

16 redundant check symbols. The encoder generates 16 parity symbols fiom the 239

message symbols it receives. The parity symbols are then appended to the end of the

message to generate a 255-symbol codeword, c(x). All valid codewords are exactly

divisible by the generator polynomial, g(x). In systematic form, the 16 parity check

symbols are the remainder, r(x), resdting fiom dividing the message polynomial, a(x), b y

the generator polynomial, g(x) [Il]. In general, the systernatical RS encoder performs the

fol10 wing :

x16a(x) = q(x)g(x) + r(x) (3.1)

or: c(x) = r(x) +x16 a(x), (3 -2)

where q(x) is a quotient and g(x) is the code generator. The generator polynomial

is repeated here for convenience:

195 3 181 4 g(x) = a'36+a240x+a208x2+a x + a x +a158x5+a201x6+a'00x7+~ l X8+a83X9+

167 10 107 11 113 12 a x +a x +a x +a110~13+ a10uX14+a121x15+x16. (3-3)

In a hardware impIementation, the polynomial division to fmd the remahder, r(x),

is accomplished using a 16-stage Linear Feedback Shifk Register (LFSR) depicted in

Figure 3.4. After 239 symbols are passed through the RS encoder, 26 parity check

symboIs are generated and sent out to form a codeword. As shown in Figure 3.4, between

any two consecutive shift registers, there is an &bit XOR to perform finite field G F ( ~ ~ )

addition. The feedback path, containhg the quotient Eom each division step, is broadcast

to 16 constant finite field multipliers.

@ Multiplier: multiplies 2 elements h m GF(29

[7 Storage device: stores field element h m GF(29

b .. - 8

- Gate

Figure 3.4 Reed-Solomon encoder architecture

The first 239 bytes of the output of the encoder are the same as the message input.

+

As soon as the message enters the circuit, the parity-check symbols are in the registers.

255 counter

The gate turns off when message ends and the parity symbols are emptied from the

registers to the codeword. The cycle repeats when a new message block, a(x), enters the

6

encoder.

v i8

ig14 f "" * k o Q - g l

-.8 '-8 , I

I

I

+@*- - - +@ +E4-a-- , I

Parity I Code r(x) j word -

clk 8 Message a(x)

@ Addm adds 2 elernenîs Erom GF(2*) 8 -

In the design, a combination of XOR gates is used to multiply the generator

polynornial coefficients, gi, to the quotient, q(x), instead of a general multiplier in G F ( ~ ~ )

since these coefficients are known. The code generator coefficients, go to g15, are given in

Equation 3.3. This implementation technique reduces both hardware and latency of the

multiplier. An example of GF(~*) multiplication using XOR gates between a constant and

an element in the field is described in the following. Let

~ ( x ) =ao + a l a + a2a2 + a3a3 ta-- + a7a7

be an arbitrary element in the GF(~') to be multiplied with the coefficient gl=a136, the

multiplication is as follows:

136 a ~ ( x ) = +a2a138 +a<r'39 + * - * +a7a'43.

Replace =(l+a +a2 +a3 +a6), =(a +a2 +a3 +a4+a7), =(1 +a5), . - . ,

=(a2 +a4 +a6) in the above equation. These vector representations, ai, can be found

in Table B. 1. These elements of this G F ( ~ ~ ) are generated using the field generator given

in Section 2.2.2, p(~)=x8+x4+~3+~2+i.

Combining the element ai in ternis of ai, the multiplication of a136~(x ) becomes:

a '36~(x)= (~î@a~@a~) f (a&ai@a3@~)a + (ao@a~&@a@a7)a2 + (a&a@a)a3 +

(al c8a5@a7)a4 + (a&%@a5 + (a&a3@a7)a6 + (al @%)a7.

As shown, the muItipIication of the generator coefficients, g;, to an arbitrary

elernent in GF(~') can be implernented using a combination of XOR gates. This unique

procedure to implement the multiplication is rnuch simpler and faster than using the

multiplier shown in Section 4.4.4. The implementation of u I ~ ~ A ( x ) using this procedure

saves 88% of hardware compared to the use of G F ( ~ ~ ) multiplier (1 7 gates versus 14 1

gates). This is another reason why the RS encoder implementation is much simpler than

the RS decoder implementation.

In the encoder implementation, addition of the 2 elements in ~ ~ ( 2 3 is also

required. This finite field arithmetic is implemented using a simple bit-wise XOR as

follow:

A(x)CBB(x) = (a@bo) + (ai CBb ,)a + (a2G3b2)a2 + (a@b3)a3 + (a4@b4)a4 +

(asG3b~)a5 + (a&b&x6+(a7@b7)d.

3.3 Convolutional Interleaver

Traditionally, the convolutional interleaver and deinterleaver are implemented

using extemal RAM to store data as shifi registers. The use of RAM can limit the speed

of the system due to off chip operation. One way to increase the speed of the

convolutional interleaver is to have the shift registers on-chip. The on-chip convolutional

interleaver requires a substantial number of flip-flops to implernent the shifi registers. For

example, 8976 flip-flops are required to implement (66x1 7=ll22) shift registers for 8-bit

symbols of a 64 QAM intedeaver with 1=12, M=l7.

The work was carried out in the research to reduce the hardware requirement and

to increase the speed of the on-chip interleaver. By changing the clocking scheme to the

flip-ffops, the amount of hardware can be reduced significantly [32]. The reason is that

the registers are only needed at certain times when data is required to be input or output

to or fiom the FiFO shift-register blocks. There is no memory or shifting data required

between the times of input and output. Timing to dock these shifi registers is derived

fiom data distributor signals which are available at the front end (the distributor) of the

interleaver as shown in Figure 3.5. This unique clocking technique reduces the number of

£lip-flops required by more than 85% for the 64 QAM interleaver. In the implementation,

only 1 OS6 flip-flops were used as compared to 8976. A s m d number of logic gates are

needed to interface these shifl registers. Tne draw back of this implementation is the

routing complexity of the design due to the new clocking architecture. The description of

the convolutional interleaver below is for 64 QAM where I=12, N-204 and M=N/I=17

and it is shown in Figure 3.5.

Index Sync, word route

O * - 0

Distributor . Comubtor

I= 12 for 16 and 64 QAM M= 2O4/l2

17-stage F E 0 shifl register

Figure 3.5 Interleaving 64 QAM

A counter generates a count fiom O to 1 1 to distribute incoming symbols to the 12

branches of the interleaver at the distributor. Starting with the synchronization byte, the

204 bytes of the MPEG2-TS packet are routed through. The k t byte goes through

branch O and has no deIay. The couter advances and the second byte is routed to branch

1. The counter keeps advancing to distribute the incoming data to other branches until it

resets to O after 12 counts. The process is then repeated. Branch 1 of the interleaver has

one 17-stage F E 0 to delay data and the last branch has a total of eleven 17-stage FIF07s.

FIF07s are constnicted using the new clocking scheme to reduce hardware instead

of using 1 7 stage registers. In this implementation, a 17-stage shift register (M=17) uses

only 2 D-type flip-flops. An active high wntrol signal will enable the flip-flop. Data is

clocked into the fïrst register and passed to the second register. Output fiom the second

register is the output of the 17-stage FIFO. This output is enable and clocked out when it

is needed. Timing for these enable signals corne fiom the counter generated at the

distributor. The clock used to clock data is the global clock and therefore no extra clock

is required in this operation. As incoming data contains 8-bit symbols, only 16 D type

flip-flops are used to implement an M=17 stage FIFO shifi register as shown in Figure

3.5.

At the output of the interleaver, a commutator picks up and delivers data 60m the

12 branches of the interleaver to the output. The commutator at the output reverses the

fùnction of the distributor. That is it collects data from the branches to reconstruct the

original MPEG2-TS packet. The counter used in the distributor is also being used to

control the switching of the commutator in order to synchronize the input and output of

the interleaver. By being delayed through the FIFO of the branches, the data is cornbined

in a timing fashion and sent to the byte to m-tuple conversion shown in Figure 2.1.

Verilog code was written to describe the interleaver. The code was compiled, fitted, and

simulated using an Altera FPGA prototype. An operation speed of l6OMbs was obtained

when the interleaver was implanented with the Altera FLEXlOK FPGA. The results are

shown in Table 6.1.

3.4 Byte-to-m tuple conversion and differential encoding

Figure 2.1 shows these two blocks can be bypassed if the K M is used in the

transceiver. The conversion and coding functions of these blocks are for QAM

constellation mapphg and de-mapping purposes. The conversion fkom 8-bit symbols to

6-bit symbols is required if a 64 QAM system is used. The differential c o d e is required

for both 64 and 256 QAM to protect against the phase rotation of the two MSB'S. This

protection is necessary since these two bits determine the positions of the symbols in their

QAM constellations.

3.4.1 Byte-to-m tuple conversion

As shown in Figure 2.4, the input of this block is an 8-bit byte fkom the

interleaver and the output is a 6-bit symbol to the differential encoder. This conversion is

a simple bit arrangement of every 3 bytes input (Le., 24 bits). There are four symbols

output corresponding to every three bytes input and the cycle repeats. The

implementation of this bIock is simply using a counter to conbol the bit arrangement. A

behavior design domain is used in the synthesizable Verilog code to describe t h i s block.

3.4.2 Differential encoding

Using the Boolean expression described in Chapter 2 for differential encoding

(Equation 2. l), the differential encoding c m be described in words as:

a). If both inputs are 1, change both outputs.

b). If one input is 1, change an output as follows:

-If the previous outputs are equal, change the output whose input is 1.

-If the previous output are unequal, change the output whose input is O.

Figure 3.6 depicts the hardware implementation of the differential encoding

algorithm. Only the two MSB are encoded, (i-e., y[6], y[7]), other lower bits pass through

without encoding as shown in Figure 2.5 (Section 2.2.5). Since this is a simple circuit,

only a few gates and flip-flops are required to implement i t The output of the encoder,

(Le., 2[6], 2[7]), is pipelined dong with the other lower bits, (i.e., bit O to 51, to present al1

symbols to the QAM mapping block sidtaneously.

Figure 3.6 Differential encoding of the two MSB's.

3.5 QAM mapping

The 16, 64 and 256QAM constellations are shown in Appendix A. The points on

the constellation are arranged such that adjacent points are as far apart as possible. This is

one of the factors that make QAM outperfom PSK. Depending on the value of the input

symbol, the two outputs of the QAM mapping have different levels. These levels are used

in the mixer to fom QAM IF signals. There are 16 levels (4-bit word) for both 1 and Q

outputs for each 8-bit byte input if the system is using 256 QAM.

The QAM mapping is implernented using ROM since it is easy, simple, and it

does not required a large a mount of memory (256x8 for 256 QAM). In the FPGA

prototype, it is simple to implement ROM in Altera FLEX devices by using megacores in

the MAXPLUS iI packets. A megacore is pre-verified HDL design file that pdorrns a

specific task for complex system-level functions. The content of the ROM is stored in a

file and it is used to compile and program into the FPGA. In Verilog, ROM c m be

designed using "IF" statements but this approach uses more resources than using memory

IP core. Figure 3-6 shows the mapping and de-mapping of a 256 QAM using ROM.

256x8 ROM

+

M a ~ ~ h g De-mapping 256 QAM

256x8 ROM

Figure 3.7 Mapping and de-rnapping of 256 QAM

The two signals 1 and Q are sent out for pulse shaping filterhg with a raised-

cosine filter before being rnodulated with an RF signal. The receiver receives the signal

fiom the channe1 and reverses the signal processing of the transmitter as discussed in the

next chapter.

CHAPTER 4

MMDS RECEIVER IMPLEMENTATION

The receiver blocks shown in Figure 2.1 were implemented using FPGA

prototypes with the intention of developing an ASIC as the final product The blocks

were described in modules using Verilog HDL code. These modules were implemented

separately and then connected into the complete system with appropnate clocking. There

were five main modules implemented in the receiver: the QAM de-mappïng, the

differential decoder and m-tuple conversion, the convolutional de-interleaver, the Reed-

Solomon decoder and the baseband interface. The TCM decoder is omitted as stated in

Chapter 2.

4.1 QAM de-mapping

The implementation of this block is the inversion of the QAM mapping in the

transmitter. The block now has two 4-bit inputs (1 and Q) and the output is an 8-bit

symbol. The QAM de-mapping Is also implemented using ROM. The two inputs are

combined to form the address of the ROM as shown in Figure 3.7 (see Section 3.5). The

conversion is done by recalculating the contents of the ROM and storing them in a file for

compiling.

4.2 Differential decoder and m-to-byte conversion

As shown at the MMDS receiver side in Figure 2.1, these two blocks are by-

passed if the TCM decoder is used. The blocks decode the differential code and rearrmge

the bits fkom 6-bit symbols to 8-bit syrnbols if the transceiver uses 64 QAM. In al1 cases

of QAM, the differential decoder is necessary.

4.2.1 Differential decoder

The differential decoder performs the reverse fünction of the differential encoder

in the transmitter. From the description of the differential encoder in Section 2.2.5 and

Section 3.4.2, the differential decoder is describeci in words as:

a) If an input changes, make an output 1; if an input is constant, make an output O.

b) If the new inputs are equal, set the output of the ones that change to 1.

C ) If the new inputs are not equal, set the output opposite the ones that change to 1.

The differential decoding can be written by the following Boolean expression:

where k uidicates the present state and (k-1) indicates the previous state. Figure 4.1

depicts the schematic diagram implementing the differential decoder algorithm above.

Only the two MSBs, (Le., z[6], z[7]), are decoded and the other lower bits are

passed through without decoding which are not shown in the diagram. The decoder is

always more complex than the encoder and requires more gates to implement. The

decoder output is also pipelined for high-speed operation and synchronized with the

undecoded bits before being delivered to the m-to-byte conversion block.

Figure 4.1 Schematic diagram for the differential decoder

4.2.2 M-to-byte conversion

This module is the inversion of the byte-to-rn tuple conversion of the transmitter

and it is used only for 64 QAM. This block takes in four 6-bit symbols and sends out

three 8-bit bytes. The module arranges the order of the 24 input bits to produce the three

outputs as s h o w in Figure 2.4 (Section 2.2.4). The symbols are converted into a bit

stream and the bits are held and combined as required output. The cycle repeats with

every four symbols input and three 8-bit symbols output. The implementation of this

module using Verilog code is the same as the counterpart byte to m-tuple conversion in

the trammitter (i.e., using behavior design domain in Vedog)

4.3 Convolutional de-interleaver

The de-interleaver is implemented similarly to the interleaver but the branch

indexes are reversed (Le., branch "O" corresponds to the largest delay). The de-interleaver

synchronization is achieved by routing the first recognized synchronization byte into the

branch "O". The synchronization byte now has the longest delay. The 17-stage shift

register is replaced by the FIFO constructed in the interleaver- The clocking scheme is the

same as in the interleaver for the flip-flops. There is the same amont of hardware to

implement the de-interleaver and the interleaver since one is the inverse of the other.

Figure 4.2 shows the hardware construction of the 64 QAM de-interleaver. The operating

principle and clocking description of the deinterleaver is simila. to the interleaver which

is described in Section 3 -3.

Index Sync. word route

M = 204/12

= 1 7-stage FIFO shift register

Figure 4.2 De-intedeaving for 64 QAM

4.4 Reed-Solomon decoder

The RS decoder is the most compIex block to Mplement in the transceiver since it

involves Galois Field calculations and a complicated decoding algorithm. The slow

calculations in GF(~') arithmetic and the complexity of the decoding algorithms hinder

the decoder throughput. Many RS decoder cores in both hardware and software have

been studied and developed in the p s t [12,33-411, but they are stiil low in correction

capacity and cannot satise the required bit rate of 200 Mbs. An operation fiequency of at

least 25 MHz is required for the decoder to be used in the high-speed MMDS systern

[32,42]. The hi& data rate of the MMDS transceiver depends entirely on the theoretical

and architectural improvement of this RS decoder [32]. The theoreîicaI and VLSI

architecture developments are essential to design a high-speed decoda and they are

subjects of the research. The developments not only increase system speed, they also

reduce the hardware requirement of the codec since the dtimate goal is to design a low

cost, high-speed transceiver.

The transmitied codeword is corrupted by the channel due to noise or other

disturbances, the received codeword, r(x), at the decoder input is the result of the

codeword, c(x), and the errors, e(x):

r(x) = C(X) +e(x) (4.2)

The purpose of the decoder is to find the locations and values of the errors in

vector e(x) which is concealed in the received codeword, r(x). The correction is made by

adding the errors, e(x), to the received codeword, r(x), to claim the original codeword,

c(x) :

C(X) = r(x) + e(x) (4-3

Compared to the encoder, the decoding of Reed-Solomon code is much more

involved and the process is shown in Figure 4.3.

Evaluator

emr locator

Calculation Generation 4 1 error - locations

1 i Decoder Core i 1

Figure 4.3 Reed-Solomon decoder block diagram

There are four steps that must be evaluated in the decoding of RS codes. The

syndrome calculation computes 16 syndromes which represent the error pattern of the

received codeword, r(x). The division-fiee Berlekamp-Massey algorithm evaluates the

error location polynonid, a(x). This polynomial contains the error locations and the

error magnitudes according to error values in r(x). The error evaluator polynomial and the

Chien search fïnd the error magnitude polynomial, Z(x), and the error locations. The last

step generates mors, e(x), and error corrections, r(x)+e(x).

The decoder core introduces latency since it takes time to generate mors. The

issue of decoder latency is not critical in this implementation since the DAVIC 1.2

specification does not specify RS decoder latency. Latency is the t h e that is required for

the data to flow through the decoder and is measured in symbol clock cycles. A certain

delay time is needed for the received codeword, r(x), to be alignecl and corrected with

errors since the error locations are hown in r(x). This delay is shown in the delay block

of Figure 4.3. This delay (or correction thne of the decoder) is the sarne regardless of the

number of errors in r(x) because a l l of the received codewords must go through the same

number of steps in the decoding process. The input and output rates of the decoder are

one byte per clock cycle. The implementation of the four decoding steps is now described

in detail.

4.4.1 Syndrome calculation

The syndrome calculator is similar to the encoder and its outputs take a value of

zero if there is no error in the received codeword. There are 16 syndromes that have to be

computed to correct 8 symbol errors. The syndrome, Si, is found by substituting the root,

ai, of the generator polynomial, g(x), into the received polynomial, r(x), or Si = r(ui). The

syndrome, Si, can dso be computed by dividing r(x) by x+ai 1111. This division results in

the equality:

r(x) = C(X)(X + ai ) +- bi , (4-4)

where c(x) is the codeword and the rernainder, bi7 is a constant in GF(~'). Replacing x=ai

in Equation (4.4), then the syndrome, Si, is the remahder, bi, (i.e., x + ai =O). This

division is accomplished using the circuit shown in Figure 4.4.

The multiplication in the syndrome evaluation is implemented using XOR gates

which is the same as in the encoder because the mots of the generator, ai, are known (see

Section 3.2.). Since the codeword length is 255, the syndrome calculation requires 255

clock cycles to complete. This is the fist latency of the decoder core. The delay

introduced by this evaluation is the longest delay in the decoding process even though

this is the most simple step in the process. The syndromes are sent out and held during

255 dock cycles at the end of each received codeword, r(x). The syndrome evaluation

continues at the beginning of the next available r(x).

Multiplied by ai f l

= D flip-flop, dock not showing

Figure 4.4 Syndrome cdcdation

4.4.2 Error locator poIynomial calculation

There are two main algorithms to evaluate the error location polynomial: the

Berlekamp-Massey algorithm [ 1 11 and Euclidean algorithm [34]. Appendix C describes

the algorithms in detail. Both of the algorithms are designed to solve a set of 16 equations

set by the syndromes. The solutions yield the m o r pattern in which the smallest number

of mors is the nght solution [Il]. The Euclidean algorithm tends to be more widely used

in practice because it is easier to irnplernent. However, the B-M algorithm tends to lead to

more efficient hardware and software implementations.

This implernentation utilizes a rnodified algorithm, the division-fiee B-M

algorithm [43], to h d the error location polynomial, o(x). This modified algorithm

avoids 16 divisions in evaluating o(x) in the traditional B-M algorithm. It reduces the

decoder complexity because the irnplernentation of a division in G F ( ~ ~ ) is hardware

extensive. The detail of this division-free algorithm is described in [44] and is briefly

stated as follow:

Set o0(x)=l, do)&) =l, R")=o and Lo=O.

For r=l to 1 6, compte:

I , $(dr + O ) and 24,-,, 5 (r - I )

otherwke

After 16 iterations, the error locator polynomial o(x)=oo+oix+ozx2+ +osx8 is

obtained. In each iteration, the matrix in the algorithm has to be solved to find or(x) and

Er(x). The iteration yields the coeEcients of these two polynomids and passes them to

the next iteration. The degree of these two polynomials increases as the iteration

proceeds. As a result, the complexity of the irnplementation depends on the degrees of the

iterations (i.e., the h t step is much simpler than the sixteenth step).

In this step of the decoding process, a number of multiplias in GF(~') is required

to implement the algorithm. Many efficient multipliers have been reported [45-491.

During the research, a low latency multiplier was developed and used to in the decoder.

This G F ( ~ ~ ) multiplication circuitry has the same complexity as the LSB-£kt multiplier

described in [45] but the circuit latency is lower (see Table 6.2). The algorithm for this

multiplier is described in the following example.

The multiplication process is carried out in two s t e p for two arbitrary elements

A(x) and B(x) in ~ ~ ( 2 ' 1 . First the product D(x)=A(x)B(x) is computed and then the

modular reduction, P(x)=D(x)mod(G(x)), is performed. The modulo of P(x) is the

operation under the group G(x) where G(x) is the ~ ~ ( 2 ' 1 . The polynomial D(x) has a

degree of 14 and its coefficients can be found using:

and the coefficients of the product P(x) are:

where gi* = O or 1 is the coefficient generated by the field generator polynomial. These

coefficients are in the f i s t 14 rows of Table B.1 in Appendix B, where i is the column

number and k is the row number.

A complete description for the multiplier GF(~*) implementation is as follows. Let

and B(x) = bo tb a +b2a2 +b3a3 + - - - - +bTa7

be two elements to be multiplied in GF(~*) and the product is P(x)=A(x)B(x).

Using Equation (4.9, the coefficients 6 to di4 of the polynomid D(x) can be

found as:

and the modulus product P(x) = po +pla - k P 2 a 2 +p3a3 + - +p7a7 has the coefficients, pi,

according to Equation (4.6):

Figure 4.5 depicts a module of the multiplier. ui this multiplier, the total gate

count is 2'=64 AND gates and 77 XOR gates. However, the number of XOR gates in the

VLSI implementation is reduced due to the gate combination repetition. Total deIay for

this multiplier is Dh(=DA+6Dx, where DA and Dx are the delays of the AND gate and the

XOR gate, respectively. The longest delay in the fïrst step cornes f?om the calculation of

Figure 4.5 Multiplier in G F ( ~ ~ )

The evaluation of the error locator polynomial, o(x), occupies the most hardware

requirement for the decoder. An extra latency of the decoder core cornes f?om this step

because of the 16 iterations which involve many multiplications. A total of 20 clock

cycles is needed to complete this computation when the decoder was irnplemented into

FPGA prototype.

The error locator polynomial, o(x), has a degree of eight or less depending on the

nurnber of errors in r(x). Al1 of its eight coefficients are the elements in the finite field

GF(~*) including zero. These coefficients are used to compute the error magnitude

polynomial, Z(x), and fkding error locations in the next step.

4.43 Error magnitude polynomial calculation and Chien search

The error magnitude polynomial, Z(x), is defined as:

Z(x)= ~ ( x ) s (x)mod(x ' 6 ,

and is o f degree 7. This polynomid conveys the value of the errors, e(x), and it is the

multiplication between two polynornials in the GF(Z~) as stated in its definition. The

coefficients of Z(x) are computed by the convolution of the coefficients of a(x) and S(x)

using :

for 01iI 7,

Z(x)= oOSox + (QS~ +olSo)x + (O& + o ~ ~ ~ + a & ) x ~ +

(oos~+o&+o~s~+~~so)x~+ .

This calculation requires several multiplications. These multiplications are carried

out using the designed multiplier circuitry descrïbed previously. The evaluation of

coefficients, Z;, is performed in separate circuits as soon as the error locator polynomial

coefficients, ci, are available. Of course, the higher of index i of Z(x), the longer it takes

to evaluate its coefficients, Zi, due to its complexity. Additional decoder latency is added

in this step due to the time required to evaluate the coefficients of Z(x). In the FPGA

implernentation, only one clock delay was added into the decoder latency.

The roots of the error locator polynomiai are found by exhaustively evaluating

o(x) at x=ai for i=l to 254. This technique is refmed to as a Chien search [Il]. Figure

14.6 depicts the block diagram for the Chien search. In this search, the error Iocator

polynomial, a(x), is evaluated in every clock cycle and tested if it is zero. The zero detect

will generate the roots of the polynomial.

u

Zero Generato

Figure 4.6 Chien search algorithm

In this search, the eight coefficients of the error locator polynomial are cyclically

multiplied and summed to 6nd the zero. If there is a zero, there is a root of the

polynomial, ~ ( x ) , at that position of the received codeword, r(x). The output of the sum

and zero block is sent to the root generator to generate one of the roots of the polynomial

when the root is detected. The multiplications and zero of the sum are evaluated within

one clock cycle. The results of the multiplication are fed back and multiplied to the next

degree of ai, (i.e., ai+'), through a shifi register. There are eight multipliers in the search

and these multipliers are implemented using XOR gates as described in Section 3.2 since

these constants are known (i.e., a, a2, a3, --•, a8). The roots are the elements in the G F ( ~ ~ )

and are sent to the error generator block of the decoding process. The inverses of the

roots in the field are the error locations in the received codeword, r(x).

4.4.4 Error value generation

The error value generator takes error magnitude polynomial, Z(x), and the roots

(if any) of the error location polynomial to generate errors. The corrected codeword, c(x),

is computed in the decoder using the following algorithm:

For i=O to 254

r f (da")) == il) then

in which r(x) is the decoder received vector, Z(x) is the error magnitude polynomial,

d(x) is the derivative of ~ ( x ) and a-' is the error location.

In the algorithrn, the condition is set and satisfied by the Chien search which is the

root of o(x) at position ai. The division results in error values, e(x). These errors are

added through the channel and they compt the received codeword at that position of r(x).

In the calculation, the derivative, at(x), is simply the odd terms of o(x), Le.,

or(x)= O ~+G~(x')+ u-j(x4)+ 67(x6).

The error is generated by multiplying ~(a-') and the inverse of of(a-'). The

decoder has to generate each error wiuiin one dock cycle before receiving another root

fkom the Chien search. One way to increase throughput of the decoder is the

improvement in latency to evaluate this division. The faster the calculation, the shorter

the symbol clock of the decoder can be. The low latency circuit to perform this division is

necessary for a successful irnplementation of the hi& speed RS decoder used in the

MMDS systerns.

The error generation includes the evaluation of ~(a") and o(a4) and the division

between the two as shown in the algorithm. The evaluations of ~ ( a - ' ) and the odd term

&(a-') take place simultaneously as soon as the root, à', is available fiom the Chien

search. These evaluations are realized using the low latency power-sum circuit described

in Section 4.4.4.1 below. Mer the evaluation of a-', the division of the two is perfomed.

The division between the two efernents in the finite field is the multiplication between the

hrst one and the inverse of the second. The delay of the error evaluation includes time to

inverse the odd term, d ( a i ) , and multiply it with 2(aei). Since the error is generated

within one clock, this evaluation must be complete in one clock period.

The inversion in Galois Fields is dways more complex to irnplement and takes

much longer than the multiplication to cornpute. The theoretical and VLSI architecture

developments of the inversion circuit using G F ( ~ ~ ) are described in Section 4.4.4.2. Since

the error generation requires inversion, the latency of the inversion circuit sets the e m r

generation tirne. This time is critical in the decoding process because it detemines the

decoder clock cycle. Therefore the lower the latency of the inversion circuit, the faster the

decoder c m operate.

The evaluation of ~ ( a - ' ) and <r'(ùi) are implemented using a newly developed

power-sum circuit, (P=c+AB~), in G F ( ~ ~ ) [50]. The use of this circuit significantly

reduces the computation of the decoder core in both hardware and latency. To evaluate

Z(x) at the root a-' (or of(x) at the root a"), expand

Z(X)= Zo +Z (x) +z2(x2) +z~($) f24(x4) +z5(xS) fz6(YC6) f ... .

as

Z(x)= (Zo + &(x2) )+ x(Z1 +z3(xZ)) + x4(24 + 2g(x2))+ S...

Replace x=a-' and apply the power-surn circuits, P=C+AB~. The number of

multipliers and exponentiations in the computation is also reduced. In addition, the

computation of power (2') of an elernent in GF(~') c m be implemented using very few

XOR gates with minimum delay. The power-sum and exponential circuits are described

in the next two sections.

4.4.4.1 Low latency power-sum circuit in G F ( ~ ~ )

The power-sum circuit used in the decoder core has low complexity and low

latency according to [SOI. The algorithm to compute the power-sum can be summarized

as follows:

Let three arbitrary elements A(x), B(x), C(x) be calculated to £ïnd the power-sum

To obtain P(x), first compute the power sum operation D(x)=c(x)+A(x)B*(x) and

then perfom the rnodular reduction operation P(x)=D(x)rnod(G(x)). In ~ ~ ( 2 ' 1 ,

BZ(x)=~(xZ) and the power s u m operation becomes the task of fhding the coefficients,

dk, of D(x) using:

Then the coefficients, (pk), of P(x) c m be computed using:

where gi,i, = O or 1, is the coefficient generated by the field generator polynomial, p(x),

and p(x)= x8 + x4 + x3 + x2 + 1. The coefficients, gib can be seen in the first 14 rows of

Table B.!. The algorithm for the power surn circuit is closely related to the one used in

the multiplier circuit in Section 4.4.2.

Using Equation (4.8), the coefficients & to d2, of D(x) are:

according to Equation (4.9):

The schematic diagram for this circuit is similar to Figure 4.5 in Section 4.4.2. in

which D(x) now has 22 coefficients and the product, P(x), has 8 coefficients of O or 1.

4.4.4.2 Low latency exponential circuit in G F ( ~ ~ )

Following is a simple cornputaiion for the powers 2' of an element in the f i t e

field GF(~') which was developed in the research. Let

p = bo + bra+. . . + b6a6+ b7a7

is an element in GF(z'). The coefficients of p2' can be found using:

10 for higher values of n, the powers repeat (i.e., p28 =a, P 29 ,a2, = a 4 , fi3 2" =a 8 , ..J.

These powers are uniqueiy expressed in terms of coefficients, bk, of the element

p. Fast calculations of the coefficients using Equation (4.10) are implemented in VLSI

using only XOR gates. The number of XOR gates is reduced due to gate combination

repetitions. Maximum delay in this computation is 3Dx, where Dx is the XOR gate delay.

The example below shows the implementation to calculate p4 in ~ ~ ( 2 ~ 1 .

According to Equation (4.1 O), the expression for p4 is:

p4 = bo + bia4 + bsa8 + b 6 d 2 + b 6 d 6 + b6a20 + b6a24+ b7a2',

3 3 replace a' =(l+a- + a +a4), al2 =(l+a2 +a3 +a6+a7)3 al6 =(a2 +a3 +a6), ... fiom the

table of GF(Z') into p4 and combine terms with the same powers of a:

p4 = @00b2@b3CBbs) + (b6)a + (b2@b3G3b4@b&b6)a2 + (b2@b3@b4@b~@b~)a3 +

@ 1 G3b2G3b@b7)a4 + @5)a5 + @3@b4)a6 + (bl@b@b6)a7

The schernatic diagram for this circuit is shown in Figure 4.7.

Figure 4.7 Schernatic diagram for p4 circuit in GF(~*).

4.4.4.3 Low latency inversion and division circuits in G F ( ~ ~ )

The hi&-speed operation of the decoder is obtained partdly due to the

development of a low latency inversion circuit that was developed in the research. In the

past, considerable effort has been made to develop efficient schemes for finite field

inversion and division [49,5 1-55]. This low latency circuit was developed to increase the

designed RS decoder throughpiit. It is lower both in latency and cornplexity compared to

others [56]. The architecture of this circuit is described as follows.

The inversion cf an element, B, in GF(23 can be expressed as:

This equation shows that the inverse computation can be realized using

exponentiation, ( B ), and multiplication circuits. The fast implementation of these two

circuits results in the low latency of the inversion circuit. The exponentiation of an

element in the order of 2' is easily implemented using only XOR gates in VLSI as shown

in the previous section. This implementation yields a very fast calculation and al1 the

exponentiations c m be computed simultaneously before they are multiplied. Figure 4.8

shows two steps of the low Iatency inversion circuit architecture: the exponentiation

calculation and the multiplication.

4 r F

Exponentiation Multiplication

Figure 4.8 Low latency inversion and division architectures in G F ( ~ ~ )

The inversion process begins with the calculation of the exponentiations of

element B. Seven exponentiations p 2 , B ~ , B ~ , B ' ~ , B ~ ~ , B ~ , and BI2') must be evaluated

first. These exponentiations are cornputed using Equation (4.10).

B ~ ~ ~ = bO + bla12' + b2a + b3a12' + b4az + bsa130 + b6a3+ b7aL3'.

Replacing a8=(i +a2+a3+a4), al0=(a2+a4+a5+a6), al2 =(I +a2 +a3 +a6+d), and

so on from Table B.l for GF(~') uito B', these exponentiations eventually become the

combinations of XOR gates. For example,

The maximum delay in this step is 3Dx, where Dx is the XOR gate deIay. Using

this architecture, all of the exponentiation terms of the inverted element B are evaluated

simultaneously before the multiplication begins. In an actual VLSI design, the number of

XOR gates is reduced significantly (by 12%) due to repetition of the combinations of hi's.

In the multiplication stage, six multipliers are required and they are in a three-step

consecutive arrangement as shown in Figure 4.8. The multiplication uses the multiplier

circuit described in Section 4.4.2. in this low latency architecture for the inversion, there

are only 3 consecutive multiplications instead of 6 power-sum calculations using the

architecture in [52]. As a result, the latency of the inversion circuit is reduced

significantly while low complexity is maintained. Compared to the architecture proposed

in [52], there is a reduction of 25% in latency and 10% in hardware to implement the

inversion in G F ( ~ ~ ) [56].

The total delay for this inversion circuit is t, = t, + t2 , where ti is the delay for

exponential calculations (3D,) and tz is the delay of the multiplication stage which is

3DM, (DM is the delay of the multiplier described in Section 4.4.2). The total latency of

the inversion circuit is D1=3Dd3DM.

Another multiplier will perform the division function between two elements C

C and £3 in G F ( ~ ~ ) as shown in Figure 4.8 since - = C - B-' . The delay for this division is

B

DD=3Dx+4DrVr. The division circuit is used to generate the error in the decoder as shown

in the m o r value generation algorithm.

4.4.5 Correction

The errors generated fkom the decoder core are to be used to correct the comipted

received codeword, r(x). The codeword is delayed by the decoder latency using RAM.

The correction step in Figure 4.3 is very simple since the addition of the two elements in

GF(Z~) is simply a bit-wise XOR operation between the two.

4.4.6 Eiigh-speed RS decoder design summary

In the decoding process, the t h e to genbrate errors in the last step is critical since

it determines decoder throughput. The computations of error locator polynomial, ~ ( x ) ,

and error magnitude polynomial, Z(x), introduce delay because no clocking is required in

these steps. These two cornputations are complex and hardware extensive but they don't

determine the operation speed of the decoder. The multiplication and addition in Chien

search are fast due to the implernentation using only XOR gates. The simulation and

implementation results showed this step takes only 60% of the time required for error

generation. Therefore the RS codec operating fiequency depends on the error generation

tirne of the decoder core.

The majority of the time in the error generation is devoted to the evaluation of the

~ ( a - ~ ) and the inversion of the odd term at(x) as shown in the error generation algorithm.

Of the two, the inversion tirne is more critical than the evaluation of 2(ai). Using fast

power-sum, multiplication, exponentiation and division circuits, the error generation tirne

was 80ns when it was implemented into the AItera FPGA prototypes.

The designed RS decoder achieved a data rate of 96 Mbs and had a latency of 278

clock cycles when it was implemented into an Altera FPGA. The decoder core applied

appropriate pipelining when the modules were connected. This pipelining added only two

more clock delays to the decoder. A data rate of 200 Mbs is expected when the codec is

implemented in an ASIC. For an operating fiequency of 25 MHz, this ASIC will have a

latency of las than 1 2 ~ s .

The process of de-randomization in the receiver is the same as the randomization

in the transmitter. The PRBS generator is identical therefore it is not necessary to develop

a de-randomizer. In the de-randornization process, the eight bit syrnbol data is serialized

and sent to the input of a randomizer. A ser ia l to parallel conversion is required to

convert the de-randomized bit stream data back into an 8-bit symbol output.

Chapter 5

MMDS SYNCHRONIZATION USING GPS CLOCK

Tinuig and synchronization are critical in the design of any digital

communication system. Synchronization plays an important role since it ensures the

srnooth transfer of information. The goal in synchronization is to align the t h e and

fi-equency scales of the clock so that every piece of equipment of the communication

network operates synchronously. This chapter describes the need for the communication

for the MMDS system system synchronization and the proposed architecture

synchronization.

' 1

5.1 The need for synchronization

The topic of synchronization was introduced with the evolution of digital

communication and becomes more important when a higher transmission speed is

required. Synchronization is a S ~ ~ O U S challenge to modem communication systems to

ensure integrity of the transmitted data. Synchronization has been discussed in the

literature and in recent years the topic has become popular [19,29,57-611. A

communication syst& can be classified as synchronous if there exists a time reference

common to both the transmitter and the receiver [58] . Analog systems are generally not

synchronous. If synchronization is required in an analog system, it is a requirement

imposed by the source, as in television transmission, not by the communication system

itseK

In digital communications, the requirement for synchronization is due to the

requirment of the system having to run at the same clock fiequency and the syrnbols

must be recognized and synchronized for various elements in the system to bc t i on

properly. Any multiplexing scherne must be tnrly synchronous throughout the network

with a single master clock defining the slot intervals for dl fiames. The frames are

constructed by interleaving at rates derived îrom tbis single clock, with al1 fiames

digned. M e n a digital c o ~ u n i c a t i o n system is to be operated in a large geographic

area, it is usually set up in a hierarchical arrangement (Le., network) and synchronization

becomes even more important. Network synchronization has recently become a popuIar

topic because standards must be set for the network tu operate smoothly and to be cost

effective [5 71.

The block diagram in Figure 5.1 depicts a generalized communication system

mode1 and the designed MMDS system uses the same structure. The source simply

represents the source of information to be transmitted. The data sarnpler converts the

randorn process to a random sequence by a sarnpling operation. The source encoder

serves as a device for mapping fiom data samples onto data words, that is, onto

sequences of digits or data symbols. The channel encoder is designed to add redundancy

to the digital sequence represented at its input for error correction purpose. The hc t ion

of the modulator is to convert the sequence of symbols at the encoder output into a

sequence of wavefoms suitable for transmission over the communication channel. The

nature of the channei is generally ass ied to be bath wideband and time-invariant and

perturbed by noise. The channel noise is assumed to be a sample function of a white

Gaussian process. Each block on the receiver side of Figure 5.1 performs the inverse

operation of the correspondhg block on the transmitter side.

Noise

Data Destination r -,

~ e ~ e n e r a t o r -,

RECEIVER

Figure 5.1 Mode1 of a cornmunication system

In the communication mode1 above, two sequences of events are said to be

synchronous if corresponding events in the two sequences occur simultaneously.

Synchronization is defïned simply as the process of bringing about, or retaining, a

synchronous situation [29,58]. It is only necessary to identiS. one of the two sequences of

events to be synchronized with one taking place at the transmitter and the other one

taking place at the receiver. In order for the two events to be synchronized, there exists a

cornmon time reference between the transrnitter and receiver. Each block shown in

Figure 5.1 represents a specific synchronization constraint, Le., a specific requirement,

that the cornmon time refkrence must satis*

The synchnization process can be disthguished by two modes. In the first

mode, the dock synchronization mode, the clocks that regulate the two sequences being

synchronized (Le., the transmitter and the receiver clocks) are forced to nin at the same

rate. In the second mode, the higher order synchronization mode, a corresponding pair of

events in the two sequences is identified and made to occur simultaneously. Clearly, if

the same event occurs in two identical sequences simultaneously, and if the sequences are

processing at the same rate, the sequences are, and will remain, synchronized.

If the transmitter and receiver clocks are both sufficiently stable relative to the

required synchronization accuracy, the clock synchronization mode rnay be bypassed.

However, when this is not the case, techniques must be devised to provide the needed

clock synchronization. Traditional methods of clock synchronization include,

transmittrng the transmitter clock signal dong with the information, using the carrier

itself as a clock, and deriving the carrier fiequency and phase fkom the data signal [60].

Once the transmitter and receiver clocks have been synchronized, the second

mode of the synchronization process begins. Events taking place in each of the blocks in

the receiver portion of Figure 2.1 must be synchronized with the corresponding events

taking place in the analogous block in the transmitter portion. Efficient demodulation

requires the demodulator to be synchronized with the modulator so as to know when the

waveform representing one sequence of digits (or symbol) ceases and the next one

begins. This is calied symbol synchronizafion. The channel decoder can not decode

correct1 y unless it can identie the beginning of each code word and it is called code word

synchronization. Similarly, the source decoder is useless udess the digits appearing at its

input can be separated into groups, one group corresponding to each data sample. This is

called data word syzchronization. Since the significance of a particular data sample may

be defined only in terrns of its position in a sequence of samp!es, sometimes called the

fiame, the data regenerator must fiequently be synchronized with the data sarnples and

this is called fiame synclzronization.

The higher synchronization mode is implemented in MMDS by inserting

synchronization and inverting synchronization bytes into the fiame structure of the

MPEG2-TS. This synchronization scherne is sufficient for the second degree of

synchronization as specified in DAWC 1.2 specification. The dock synchronization is

more aitical in the MMDS system, especially for the high-speed operation. The fast

clock imposes a strhgent condition to synchronize the clock. As a result, a stable dock

with a high degree of precision is desired for the system dock synchronization. Finding

an efficient synchronization scheme for the high-speed MMDS system is the topic in this

part of the thesis. The use of a GPS dock was investigated to replace crystai oscillator for

clock synchronization in the absence of SONET and SDH in the network. The system

clock fiequency must be stable and traceable to a standard time as stated in Section

2.2.1 1. Receiving anci using of this precision GPS clock in synchronization are discussed

in the next section.

5.2 GPS clock derivation and application in a MMDS system

hperfect clock tuning in the receiver of the system will degrade the performance

of synchronization Ioop and, hence, the overall system's reliability and data transmission

quality. This section describes why the GPS ciock is selected and how it is used as an

MMDS system clock.

5.2.1 GPS clock versus crystat oscillators

Traditionaily low-cost crystal osciIlators have been used to generate reference

fiequencies for synchronization. The use of crystal oscillators only works well for low

speed data transmission. The main disadvantage of using crystal oscillators is that the

fiequmcy drifts due ta temperature fluctuations, age and inaccuracy [62].

Even expensive crysd oscillators drift by a small amount each day and they must

be adjusted to maintain Long term reliability and accurate time. Maintaining this

reliability is a major problem [62]. To solve this dilemma, teleco~ll~~lunications

companies use a fiequency reference distribution system, which is linked to an atomic-

reference source, to continuously steer the crystal oscillators to the reference t h e . At low

data speeds, crystal oscillators with reference steering provide adequate synchronization

accuracy and reliability.

An alternate solution is to install v q high precision clocks at each terminal

location but the cost of those atomic or cesium tube clocks are too hi&. In addition to the

long term reliabiliq problem, this is an expensive option, especially if redundant clocks

are needed. In addition to the drifting problem of the crystals, other solutions must be

sou@ for the independent crystal to be synchronized with standard tirne of the higher

hierarchy in the network. This hierarchy synchronization requirement nakes the use of

crystal clock even more expensive. In contrast, an inexpensive GPS receiver may be

availa3le at each base station to generate a stable local dock.

The use of a GPS clock has many advantages. Tt has high accuracy, high

reliability, high stability, worldwide access, precise time, low calibration cost, small size,

low pawer, low unit cost, and low installation and maintenance costs [62]. Receiving the

G P S clock becomes less expensive as the technology matures. In addition, the GP S clock

satisfkes the tolerance of 50 ppm of the clock as required in DAVIC 1.2 specifications

[63]. As a result, instead of using a crystal oscillator to generate a system clock, a GPS

clock could be used in the hi&-speed MMDS system for clocking and synchronization

purpuses.

The choice of using a GPS clock over the crystal oscillator in system

synchronization is also based on the issue of reference time to which the clock is set. The

obvious choice of this reference tirne is the UTC [64,65]. This feature is essential for the

MMDS system since it is connecteci to the higher hierarchy of the global communication

link. Therefore, another advantage of using a GPS clock is the traceable ability of this

dock to the international standard time as used by SONET and SDH. GPS signats are

available but how they are received and converted into a usefiil clock for t h e

synchronization used in the MMDS system is describeci in the next section.

5.2.2 GPS clock

GPS is a worldwide resource of unprecedented accuracy and precision for time

and position- Precise measurement of time and time intervals is at the heart of the GPS.

The entire system is based upon very accurate t h e as kept by atomic standards on board

each of the satellites which are monitored and controlled by the US Naval Observatory

(USNO) [62,66]. The USNO Master Clock is the time and frequency standard for al1 of

these systems. Thus, this clock system must be at least one step ahead of the dernands

made on its accuracy and developments planned for the years ahead must be anticipated

and supported.

The Master Clock system now incorporates hydrogen masers, which in the short

term are more stable than cesium beam atomic clocks, and rnercury ion fiequency

standards [64]. These represent the most advanced technologies available to date. Highly

accurate portable atomic clocks have been transported aboard GPS satellites in order to

synchronize the time at Naval Bases and other Department of Defense facilities around

the world with the Master Clock. Accurate time synchronization with the Master Clock is

now beginning to be carried out through the use of atomic clocks in GPS satellites, which

will provide the primary means of tune synchronization and worldwide tirne distribution

1671. As a result, the use of GPS receivers which are locked to the satellites c m provide

the user with very accurate, inexpensive and traceable t h e . The received GPS frequency

exceeds Stratum 1 level requirements in the communications industry (0.3 ps in time and

IO-'' in fiequency) 1651. With this precision, a GPS clock can be used for clocking

purposes in communication digital circuitry.

The GPS systern consists of three parts: the space segment, the operational control

segment and the user equipment. The GPS constellation includes 24 satellites which are

in polar orbits. The clocks, or more appropriately, the fkequency references, are carried

aboard the satellites and are used to generate signals with precise and synchronized

timing marks. Each satellite c d e s a pair of cesium and rubidium atornic standards

164,681. The fiequency stability of these clocks over a day is about one part in 1014 and

one part in 1013 respectively. The satellite clocks are maintained in synchronism by

monitoring the signals f?om a net-work of tracking stations. These stations are operated by

the Department of Defense in the United State as part of GPS operation control segment.

Each GPS satellite transmits continuously at two fiequencies in the L band: 1575.42 MHz

(LI) and 1227.6 MHz (L2). These signals are modulated by a pseudorandom noise (PN)

code called the Coarse Acquisition (CA) code. The GPS signal format is known as direct

sequence spread s p e c t m [69]. The user equipment receives GPS signals for navigation

and timing purposes. A general GPS receiver block diagram is shown in Figure 5.2.

Y GPS Antema

Figure 5.2 Generic GPS receiver block diagram [69]

nie antema normaliy is right-hand circular polarized to match the incoming

signal and the pattern is hemispherical. A well-designed GPS antema m u t have a good

multipath-rej ection characteristic [70]. The analog fiont end of the receiver involves

filtering, amplification and d o m conversion. Mer analog-to-digital conversion (ADC),

the baseband processing processes the digitized signal to provide the navigation and

timing information.

For timing purpose, a GPS clock receives signals and locks onta the GPS

fkequency and locally regenerates a stable clock as shown in Figure 5.3. The GPS clock

used in this research to study the MMDS system synchronization was rnanufactured by

Absolute T h e (GPS CLOCKTM MODEL 100A) [7 11.

I

3

- Navigation Processing

ADC - Front End Pre-amp - RF

. ' I

Baseband Processor

Figure 5.3 Frequency based GPS dock [7 11

NAVG

The architecture of this GPS clock is different fkom the standard time-based GPS

receivers shown in Figure 5.4. The architecture is optimized for frequency applications,

-

which irnproves timing performance of the clock. This GPS clock architecture is

OSC GPS Receiver -

Micro Processor

fiequency-based, not a time-based receiver. Instead of slaving an oscillator to the 1 Pulse

G-

Per Second (PPS) output of a GPS receiver, the GPS dock slaves the oscillator of the

GPS receiver to the satellites and derives the 1 PPS fiom the locked oscillator frequency

output. The result is that the GPS clock output is more stable, more accurate and more

precise.

The operation of the GPS clock is as follows. A circular polarized antenna

receives the CA code signals fiom the GPS satellites. The antenna module consists of an

Ll frequency antema element and a preamplifier and interfaces to the receiver via

antenna cable. The GPS dock cont- a processor, fiequency generation hardware and

RWIF circuits. The reference frequency for the systems is a 10 MHz crystal oscillator.

The reference signal drives the PLL at 44.456 MHz. The RF fiequency is downcoverted

by the RF/IF circuitry with appropriate filtering and automatic gain control circuitry. The

analog IF signal is then digitized with a sample and hold circuit and 3 bit ADC. Digital

data is sent to a DSP to process. The DSP is implemented using Codelator ASICS? Each

codelator performs al1 the correlation, signal processing and tracking of an individual

satellite. The DSP interfaces to the microprocessor for control and to output data-

I I / PPL

GPS Receiver

. .

Figure 5.4 Time based GPS clock generation

GPS Frequency -+

To lock ont0 a GPS satellite's £kequency and tune, the GPS clock operates in time

transfer mode. In this mode, the clock first surveys its location by tracking at least four

satellites. AAer locationing, only one satellite is required for timing tracking. The clock

measures the satellite fiequency and adjusts its intemal oscillator. Once the clock jam-

sets to the GPS satellite's time and fiequency, the phase of the 1 PPS coincides with the

satellite time, the clock then closes the Phase Lock Loop and continues to lock the

oscillator to the satellite's fiequency. In normal operation, the GPS reference signal

determines the long-term stability of the GPS clock frequency output. In the unlikely

event of satellite signal loss or interruption, the GPS clock enters an intelligent holdover

mode to maintain accuracy until the GPS signal reference is regained. In this mode, the

intemal oscillator remains set to the last hown fiequency until the satellites are once

again acquired.

The Absolute T h e GPS clock unit has a fiequency accuracy of 1 part in 1 o1 l

over a one day average, and 5 parts in 10" over a one week average [71]. The time

accuracy relative to the Coordinated Universal Time (UTC) is 30011s (Selective

Availability (SA) on) and lOOns (SA off). The stability of the fkequency output (1 0 MHz)

is 1 part in 10" for averaging times f?om 0.1 to 100 seconds and time stability (1 PPS) is

less than lns of pulse-to-pulse jitter, rrns [7 11. The clock now has real-the direct

traceability to the USNO and thereby ultimately to the international d e h e d fiequency

and time.

5.2.3 Using GPS clock in MMDS transceiver prototype

A 2Vp-p sinewave output fiom the unit can be converted into a 10 MHz TTL

output to clock the MMDS system components using a comparator shown in Fig 2.5.

This simple circuit uses a zero-crossing detector to convert a sinewave into a squarewave

TTL. The LT1720 comparator made by Linear Technology is used for zero-crossing

detection. This hi&-speed comparator (4.511s) operates on a single +5V supply and

provides a rail-to-rail output [72]. The intenial design of this device minimizes

oscillations due to feedback because the sensitive inverting input is placed away from the

output and shielded by the power rail. In addition to the high stability of the device itself,

care has been taken in the layout of the PC board. A double-sided PC board was used for

the circuit Fi,we 22. 5 t h appropriate grounding. The circuit was checked and tested to

show its performance and stability. In addition, manufacture testing resdts show that this

device has a high degree of reliability [72].

The voltage divider (RI, R2) shown in the schematic diagram of Figure 2.5 is

required at the input since the maximum negative input of the comparator is -0.2V. The

TTL output provided by the comparator is connected directly to dock the FPGA

prototype for system development.

vcc I/P O T

Figure 5.5 GPS clock TTL output

5.3 MMDS system synchronization

Careful synchronization planning is necessary in both the wired and the wireless

worlds because the robustness of any cornmunicat;lon network depends on its

synchronization. The two objectives in designing a synchronous system are clock and

word synchronization. Al1 the elements in the network must run at the same clock rate

and words must be synchronized to ensure the integrity of transmitted data. The

synchronous systern has to be simple, low cost, robust and reliable. It must also meet the

specification for synchronization tirne. In the MMDS system, the receiver uses the

transmit clock to clock al1 components, this clock is derived nom the received data clock.

To ensure an adequate binary transition for the clock recovery, the system data packet

(MPEG2-TS) is randornized as shown in Section 2.2.1. The synchronization byte and its

inversion are added into the transport Stream for fiame synchronization and they provide

the initialization signal for the de-randomization process. These synchronization bytes

provide the required system fiame synchronization.

53.1 Clock synchronization

In the continuous-time world, establishing a common time base at physically

separated locations presents some serious challenges. Typical systems use independent

time bases, fiequently derived fiom crystal osciIlators. Although crystal oscillators

provide accurate time references at low cost, "acc~rate" is not adequate to maintain the

integrity of discrete-time data [60]. In addition, time references ofien must be identical, at

least in the sense of long term averages, within the system and in the communication link

hierarchy. In other words, systems must be synchronized within itself and to others. The

first step in the synchronization procedure is usually to slave the receiver and transmitter

clocks, thereby establishing a common clock reference throughout the system. Since the

receiver clock is derived fiom the transmitted signal, which carries clocking information,

a clock recovery is required at the receiver to recover the clock, This recovered clock is

called "loop tirned" since it cornes fkom the transmitter. Many cfock recovery schemes

have been developed in the past and described in the literature [19,58-61,73-761. A few

are:

1) Carrier synchronization: This approach is used when coherent detection is

used; knowledge of both fiequency d phase of the carrier signal is necessary. The

optimum receiver is a PLL using either a Costa Loop or an ni-th power loop [74].

2) Symbol synchronization: The information needed to establish symbol

synchronization in particular is actually present in the message-bearing signai itself A

clock is transmitted along with data then extracted at the receiver end or the clock is

extracted by processing demodulated baseband waveforms. This second approach avoids

wasting transmitter power.

3) Maximum-likelihood symbol synchronization: The maximum-likelihood

decision with respect to the symbol epoch is to accept the epoch and to maximize its

density function [74]. Depending on the modulation technique used in the transmission,

different symbol synchronizers will be used.

4) Tracking symbol synchronization: The maximum-likelihood rnethod based on

the howledge of the receiver symbol period is stable or else the clock must be slaved to

that of the transrnitter. Any subsequent fluctuation in the symbol epoch will be reflected

in the receiver clock. Nevertheless, it is ofien advantageous to be able to track variations

in the symbol epoch directly without relying on the auxiliary clock. This is the scheme

for clock recovery used in the MMDS system.

The DAVIC 1.2 specifies timing for the MMDS network:

"The trammitter in the network device will use a transmit clock which is derived ffom the network clock (e-g. SONET clock, SDH clock, PON clock, ...) to allow end-to-end network synchronization. In the absence of a network clock, the network device will use a locally generated clock with a maximum tolerance of 50ppm. The transrnitter in the user device will use a srn nit ter clock that is derived fiom its received data clock, i.e., the user device is loop timed. In the absence of a vdid dock derived fiom the received data clock, the user will not perfonn any upstream access on the media" [4j.

The use of a GPS clock meets the standard set above to provide the timing

required for the MMDS system as described in Section 5.2. Figure 5.6 depicts the system

clock synchronization configuration in which the GPS clock is generated at the base

station and transmitted to the multi-receivers. The clock is recovered at the receivers for

clock synchronization.

I t

GPS Satellites

[ I l l l i t I I I

data

I I

data

Figure 5.6 MMDS system synchronization using GPS clock

The clock recovery attempts to synchronize the receiver clock with the baseband

symbol rzite transmitter clock [17]. The MMDS receivers use an early-late gate technique

to recover the clock nom transmitted data. The clock is extracted by processing

demodulated baseband waveforms. Since the symbols must be distinguishable, it should

be possible to determine directly fiom the received sequence exactly when the transition

from one symbol to the next can take place. The use of baseband signals for clock

recovery avoids wasting transmitted power for a separated clock. Figure 5.7 shows the

earlynate gate clock recovery.

Symbol Waveform Generator

Delay

Symbol Timing

Figure 5.7 Earlynate-gate data symbol synchronizer [17]

In this clocking recovery, correlators are used instead of equivalent matched

filters. Both correlators integrate over a full symbol interval T, with one starting &T

early relative to the transition time estimated and the other sbrting &T late. The e m r

signal, which is the sum of the absolute value of the two correlators, is low-pass filtered.

The output of the low-pass filter is applied to a VCO that controls the charging and

discharging instant of the correlators. The closed loop design of the recovering circuit is

narrow band relative to symbol rate 1/T. The instantanmus frequency of the local clock is

advanced or retarded in an interactive manner until the equilibrium point is reached, and

symbol synchronization is thereby established.

For FPGA implernentation, an earlynate gate synchronizer Altera MegafimctionB

has been developed [77]. The synchronizer is fimdamentally a digital phase locked loop.

It provides phase lock between an internally generated data clock and an input data

Stream. The synchronizer includes a phase detector, an up-down counter loop filter, and a

digitally controlled oscillator. The phase detector provides the error between the data

clock and the input data stream. The up-down counter accumulates the phase error output

according to its sign and magnitude. The digitally controlled oscillator advances or

retards the phase of the locally generated data clock whenever the error accumuiator

exceeds a specific error threshold. This threshcild is programmable which allows the

synchronizer to change the acquisiticn time and data clock jitter.

The clock at the receiver is re-generated fkom the symbol clock. This clock is loop

timed to the transmitter clock which is a UTC traceable GPS fiequency and time. The use

of a GPS clock is much simpler and less expensive than the use of expensive independent

clocks. One GPS dock unit at the base station provides clocking for al1 of the receivers

within the coverage area of the MMDS system.

5.3.2 Frame synchronization

After the clock rate is recovered fiom the received signal, the higher order of

synchronization begins to provide necessary information for the various components in

the system to operate synchronously. This synchronization includes coding and insertkg

special symbols for word synchronization. DAVIC 1.2 specification indicates the use of

synchronization and inverted synchronization bytes into MPEG2-TS. This is discussed in

more detail in Section 2.2.

CHAPTER 6

RESULTS

This chapter presents the results of the system simulation using Matiab as well as

the hardware implementation of Figure 2.1 into Altera FPGA devices and the GPS clock

testing. Figure 6.1 shows the configuration set up for the MMDS system developrnent

and testing. Descriptions of the equipment shown in the figure are as follows.

GPS Antenna

HP 53 132A 1 Universal 1 ?TL GPS *

Counter Interface Clock 4 ~ ~ - 2 13 Antenna Cable

Figure 6.1 Equipment set up

Matlab and Simulink were installed in the computer for systern simulation. The

Altera software (MaxPlus II@ Version 9.3) was installed in the computer to compile

Venlog HDL code into FPGA program files. These files were used to program the FPGA

device dirough the Altera ByteBlaster cable connected between the computer and the

prototype board. Once programmed, the FPGA was configured as the hc t ion block

described in the Veriiog code. The logic analyzer captured input and output wavefoms of

the FPGA to verify the functionalities of the building blocks at a specific speed.

The GPS antema was rnounted on the building's roof to receive GPS signals. The

RG-213 antenna cable carried the sipals to the GPS clock unit. The GPS dock generated

a stable local clock for system clocking as described in Section 5.2.1. An RS-232 cable

was connected between the GPS clock unit and the cornputer for control and operation

monitoring. The TTL interface circuit board converted the clock into a ?TL level output

to clock the AItera FPGA prototype board. The universal couter was used to measure

the GPS clock fiequency and its stability. The following sections show the development

resdts obtained during the research.

6.1 System simulation

For system simulation, dl of the system building blocks were built using a

combination of standard blocks in the Simulink library, basic logic gates andor math

functions. These math functions codd be in Madab or C progrartunhg language. Figure

6.2 shows the transceiver with appropriate RF interface. This simulation set-up

corresponds to the communication system mode1 of Figure 5.1 and the proposed MMDS

system. The simulation files are in Simulink (Le., Matlab) format and they are included in

[SI-

The s y s t a starts with an infonnation source. The source cornes fiom an ADC

that digitizes an analog signal generated by the analog source. Digital data is encoded

with an RS encoder, a convolutional interleaver and a differential encoder. The encoded

data is then mapped into QAM constellations (64 or 256) to generate two quadrature

signals: 1 and Q. The 1 and Q outputs of the QAM mappuig block are then sent to the

raised-cosine filters with appropriate roll-off factor, a, and data interpolation for filtering.

The QAM moddator modulates the fltered 1 and Q signals before sending them to the

transmission channel.

RUN qarnBb256a-sen~p

. . . .- . . - - . - . . Encoder Finenngl I

RS Oecodar z 1 Scope9 Scopef O Scopel 1 Scopel2

Figure 6.2 Matlab and Simulink simulation setup

The receiver reverses the transmitter process as s h o w in Figure 6.2. The RF

signal is QAM demodulated to recover the quadrature 1 and Q signals. These signals are

then filtered before sending them out to the QAM de-rnapping. The 1 s t process of the

receiver is the FEC decoding to correct any corrupted data f?om the channel.

As shown in the simulation set-up, a cornmon clock (Tsample) is used for both

transmitter and receiver. This indicates system clock synchronization must be established.

In addition to clocking various building blocks, the clock also enters a delay block. This

block generates a delay signal equal to the delay of the convolutional interleaver. This

delayed signal establishes the t h e synchronization between the transmitter and the

receiver.

In the simulation, Simulink scopes were connected to various points to veri@

fiinctions of the transceiver building blocks. For simplicity, only a few scopes are shown

in Figure 6.2. For example, scopes 1-5 captured the output wavefoms of the transmitter

building blocks, scope 6 and 8 displayed waveforms of the channel without noise and

with AWGN added and scopes 9-14 showed the output waveforms of the receiver

building blocks. The noise was adjusted and injected into the channel to a S N R level of

20 dB. This SNR is required for a raw BER of 105 at the receiver wiîhout FEC [3 ] . The

FEC should correct the errors to achieve a BER of 10-l2 at the output. Figure 6.3 shows

the input (scopel) and output (scopel4) wavefoms of the system simulation.

Input (Scope 1) Output (Scope 14)

Figure 6.3 Input md output wavefoms of the Simulink simulation

The receiver recovered the input signal that was properly encoded, modulated and

sent over the channel. The delay between the input scope and the output scope came f?om

the convolutional interleaving process and RS decoder latency. The result of this

simulation demonstrated a fùnctional systern. The next task was the hardware design and

implementation of the hi&-speed transceiver.

6.2 Transceiver FPGA implementation

Ushg rnanually generated Verilog code, various blocks of the baseband

~ansceiver were implemented in a FPGA. The blocks are described in Chapter 3 and

Chapter 4. Synthesis and simulation tools were used to simulate the blocks to veriQ their

functional and timing operations. The test bed for simulation requirements was provided

by the Altera MaxPlus II and the test bench was incorporated into the VeriIog code. The

input 2nd output ports of the FPGA prototype board were also used for testing. The

hardware simulation results were compared to the theoretical Matlab simulation results in

Section 6.1 for al1 of the building blocks. AI1 of the blocks implemented in the Altera

FPGA devices were checked to verifi that they worked correctly.

Hardware requirements and operating fkquencies of these blo cks are summarized

in Table 6.1. In this table, the hardware requirement is indicated by the number of h g i c

Cells (LC) of the FPGA in the designated Altera devices and the operating fiequency

indicates the speed obtained during t e s k g of the individual blocks.

Table 6.1 Prototype resources and operation fkequency of the transceiver

Building block

Baseband interface, rstndornization Convolutional interleaveddeinterleaver Differential codec, QAM mapping RS encoder

Altera FPGA

RS decoder B aseb and transceiver

EPF 1 OK20RC240-4

Number of LC's 40

1356/ 13 56 35 196

EPF lOK3ORC208-3

Speed (MHz)

40 20 40 40

12745 15728 -

EPF 10K20RC240-4

12 10

EPF 1 OK20RC240-4 EPF 10K200A EPF 10K100B and EP20K400

Due to the complexity of the RS encodeddecoder, the transceiver was

implemented using two FPGA devices. The fïrst device (the Altera EPFlOKlOOB)

consisted of the baseband interface, the convolutional interleaveddeinterleaver and the

differential codec along with the QAM mapping/de-mapping. The second device (the

Altera EP20K400) implemented the RS codec which inctuded the encoder and the

decoder. Only a small portion (2%) of the device was required for the encoder compared

to a large amount of hardware dedicated for the decoder.

The number of LC's in the results clearly shows that the main complexity of the

transceiver is in the implementation of the FEC (i-e., RS codec and data interleaving to

correct random and burst errors). The other building blocks are quite simple since they

require only basic logic gates to implement without using any complex algorithm.

Among the blocks, the RS decoder is the most complex and requires a substantial amount

of hardware to implement. As mentioned in Section 4.4, the cornplexity of the decoder

core is in the implernentation of the algorithm to find the error locator polynomial, o(x).

The algorithm required 7875 LC's to implement and occupied 62% of the decoder core

hardware.

Table 6.1 also shows the operating frequency of the transceiver various building

blocks. For the low speed Altera FPGA devices, most of the blocks operated at a data rate

of 320 Mbs (Le., a dock rate of 40MHz) except for the FEC (20MHz of the interleaver

and 12MHz for the RS decoder).

Following is an example of the results in the system block implementation using a

FPGA. The RS encoder was irnplemented into the Altera EPF20RC240-4 device on the

prototype board. The device was clocked with the GPS clock and comected to the logic

analyzer as shown in Figure 6.1. The waveform captured by the logic analyzer was

compared with the thwretical values to verify the bctionality of the encoder. The

captured waveform was identical to the corresponding simulation waveforms using either

Simdink or Altera MaxPlus II simulation. The output waveform of the RS encoder is

shown in Figure 6.4.

In this test, an 8-bit çounter generated an input data streâm of 0x00-OxEE (0-238

decimal). The data Stream entered the encoder and 16 parity symbols were generated. For

the data stream of 0x00-OxEE, the parity symbols had the values of 0x3A7 OxEC, 0x98,

OxX, 0x58, OxlF, 0x14, OxA8, 0x79, Ox3C, 0x20, OxOA, OxBF, OxA6, 0x04 and 0x65.

The parities were appended at the end of the data stream to form a 255-symbol codeword.

The cycle repeated when another 239 symbols fkom the data stream entered the encoder.

Curren t Sample P e r i o d = 8.000 ns N e x t Samole P e r i o d = 4.000 ns - -1' I

A c q u i s i t i o n Time -3.820 us 07 N a r 2000 12:05:47

CODE I CODE

CCIDE

CODE

CODE

CODE

CODE

16 parity symbols

1st codeword 2nd codeword

Figure 6.4 RS encoder waveform

The encoder output waveform captured by the logic analyzer of Figure 6.4 are

zoomed in to show the 8-bit codewords that are centered in the 16 parity symbols. Eight

channels of the logic analyzer captured 8 bits of the codeword (i.e., code 0-7) at the

output pins of the EPFZORC240-4 Ntera FPGA device. The waveform also shows the

10 MHz GPS clock was used. The code O in the waveform fiom 0x00-OxEE indicates the

clock. The 16 parity symbols are shown at the center of Figure 6.4. For simplicity, the

wavefoms of other transceiver building blocks are not shown.

As shown in Table 6.1, the RS decoder is the block that limits system data rate.

Therefore, higher throughput of the RS decoder core is essential for the implementation

of the hi&-speed transceiver. The most significant achievement in the implementation of

the hi& bit rate transceiver was the successful design of the RS decoder. Without this

hi&-speed codec, the desired bit rate of 2OOMbs for the MMDS system could not be

achieved.

The speed of the designed RS decoder can not be increased without using new

architectures for G F ( ~ ~ ) arithmetic developed during the research including the

multipIication and the inversion circuits. Of the two, the low latency inversion circuit was

the most important circuit and was used to increase the RS decoder core operating

kquency. The use of the new inversion circuit architecture increased the RS decoder

throughput by 25% compared to the use of the proven low latency inversion circuit.

The improved multiplication circuit was also used throughout the decoder core,

mostly to reduce the hardware requirement to impiement the algorithm fbding the error

locator and the mor magnitude polynomials. The multiplication circuit mitigates the RS

codec complexity (Le., smaller, low cost transceiver) while the inversion circuit uicreases

the codec symbol rate (i-e., higher system speed).

A comparison between previous test circuits and new architectures for

multiplication and inversion of the G F ( ~ ~ ) has been performed and s h o w in [56]. For

previous test circuits, the LSB-first multiplication circuit described in [45] and the

architecture for inversion circuit descrïbed in [52] were used. The multiplication and

inversion for the new architectures G F ( ~ ~ ) arithmetic are desmied in Section 4.4.2 and

Section 4.4.4. Verilog code was written to describe the circuits. The code was

synthesized in MaxPlus II@ to implement the circuits in the EPF 10K20RC240-4 Altera

FPGA device and synthesis to 0.5um CMOS ASIC for comparison. Simulations were

performed to verify the funceion and the delay of the circuits. The delay was measured as

the time between the input and the output- Table 6.2 shows the comparison results

between the previous test circuits and the new circuits for G F ( ~ ~ ) in hardware

requirement and circuit latency.

Table 6.2 Comparison of the G F ( ~ ~ ) arithmetic implementation

In this table, the number of LC's indicates the required hardware to implement the

circuits. Results show the new inversion circuit outperfoms the previous test circuit by

G F ( ~ ~ ) Arithmetic

Multiplication

Inversion

New circuit 1561

Number of LCs

53

370

Previous test circuit [52]

DeIay (ns) 22 1 71

Numberof LC's

54

381

Delay (ns) 23

11 O

3% in hardware complexity and by 30% in delay. The improved multiplication circuit is

also simpler to implement with less delay,

For the multipliers, even with only a small difference in hardware requirements

between the two architectures (see Table 6.2), a substantial amount of hardware has been

reduced in the RS decoder as a large number of multipliers were used to implement the

a r e . The results clearly indicate that the use of the new architectures reduces complexity

and latency of the GF arithmetic circuitry which in turn makes the design of the high-

speed MMDS system possible.

The comparison of the inversion circuits for other GF(2") was also performed and

the results were presented in [56]. Table 6.3 shows the comparison of the hardware

complexity and latency of the inversion circuits for different degrees of the finite fields.

The degree, rn, of the fields is ftom 3 to 10.

Table 6.3 Cornparison of the GF(23 inversion implernentation

Degree

m 3 4 5 6 7 8 9 10

Proposed Low Latency Inversion Circuit r561

Previous Test Inversion Circuit [52]

Logic Cells

3 18 75

110 196 370 50 1 678

Logic Cells

3 14 61 118 209 381 493 655

Delay (ns) Delay (ns)

12 19 39 61 8 1

110 120 134

Substantial irnprovement in both complexisr and latency can be obtained using

the proposed inversion circuits when the degree m of the finite field increases. The results

show an increase of at least 25% in operation fiequency using the low latency inversion

circuits for m=7 to 10. For low degrees of the GF, m=4 and 5, the proposed circuit

requires a small amount of additional hardware in order to gain lower latency. This is due

to the implementation complexity of the multipliers in these fields. The hardware

required to implement these multipliers depend on the field generating polynomials.

Table 6.4 lists the standard field generating polynomials used in the GF(2") for

comparison. These finite fields can be found in [Il].

Table 6.4 GF(23 field generating polynomials p(X)

Further investigation in the irnplementation of the transceiver revealed that the

speed of an ASIC would exceed 3 times the speed of the FPGA prototypes.

1. The RS encoder was synthesizsd into 0 . 5 ~ CMOS technology by Mr. Neil

McLeod at TRLabs Saskatoon (September 1999). This ASIC simulation operated at a

data rate of 920 Mbs (i.e., an operating fiequency of 1 1 5 MHz). The speed increased 2.8

times using a relatively old technology (the 0 . 5 ~ CMOS) compareci to the FPGA

irnplernentation.

2. Al1 of the HDL code of the GF(2") inversions was also synthesized into 0Spn

CMOS. Table 6.5 shows the speed comparison beîween the FPGA prototypes and the

ASIC simulation. The results fiom this table clearly show that an ASIC conversion can

m 3

Primitive Polynomial p(X) 1+x+x3

increase the speed by 2.4 up to 5 times over the FPGA prototype. In particular, the

inversion of G F ( ~ ~ ) increased the speed by 4.7 times with the conversion (highlighted in

Table 6.5)-

Table 6.5 Delay time of the inversion circuit in GF(Zm)

As shown in Section 4.4.4, the inversion time of GF(~') is critical for the

operation speed of the RS decoder core in the MMDS transceiver. The hi&-speed RS

GF degree

decoder depends solely on this inversion time. As explained in Chapter 4, the

ASIC delay

FPGA delay

improvement in the speed of the RS decoder determines the final speed of the complete

Speed incrernent

Results of the partial synthesis of the HDL code indicate that an ASIC conversion

shouId operate at least 3 times faster than the current FGPA clock rate of 10MHz. A

speed improvement only 2.5 tunes the FPGA prototype is required to have an ASIC

transceiver operate at 2OOMbs (i.e., 25MHz). Therefore, a data rate of 2OOMbs can be

achieved easily using current technology (the 0 . 1 8 ~ CMOS) when the synthesizable

HDL code is implemented into an ASIC. nie ASIC implementation of the transceiver is

out of the scope of this thesis and is suggested for future work.

6 3 GPS clock testing

The GPS clock unit was set up and connected in the lab as shown in the

coni@ration of Figure 6.1. The GPS antenna was comected to the unit through a 30m

antenna cable. The dock output fiom the unit was comected to the TTL interface circuit

to convert the lOMHz sinewave to a squarewave T ï L output clock. The clock was used

to clock the FPGA device to implement the transceiver building blocks.

The precision and stability of the clock were measured using the Hewlett Packard

Mode1 53132A Universal Counter. This counter is a 12-digit, 150ps time interval

resolution counter which provides very accurate fiequency count. Once locked, the GPS

dock frequency was stable at 10 MHz with an accuracy of k5.10-[[ according to the

fiequency counter (Le., M.05Hz out of 10 MHz). The fiequency remained constant

provided that the unit was locked to the GPS satellites.

The antenna was discomected to the dock unit to simulate a system failure. The

unit was unlocked to the GPS and the output fkequency remainecl at the last value before

failure. However, over a 48 hour period while being unlocked, the fiequency drift& by

IO-'' (i-e., 0.1 Hz out of 10 MHz). This was the fiequency drift of the intemal crystal of

the GPS clock while it remained unlocked to the GPS. The antenna was then re-

comected to the unit, the GPS clock re-locked to satellites and continued to operate in

locked mode. The output fkequency retunied to the value before the failure occwed. The

re-locking time after failure was only 3 minutes since the unit required Iocking to only

one satellite to acquire time transfer mode (Le., the 'location is known and the unit

requires to obtain only the h o w n time). There was no change in the clock frequency if

the unit lost satellite signals in a period of less than 2 hours and started the re-locking

process. This test is for the case of a short disruption of the GPS.

The clock system has operated for seven months since it was installed in the lab.

The ?TL GPS clock output was used to clock the FPGA prototype board for system

development as shown in Figure 6.1. The stability testing and the clock operation

indicated that an accurate, stable and reliable fiequency can be obtained fiom the GPS for

system clocking. The accuracy of the clock exceeds Stratum 2 level used in a second

node communication networks such as at the base stations.

As a result, the precision of the GPS clock is sufficient and it can be used for the

MMDS system dock reference. The clock is used in the transmitter, sent and then

received at the receivers. The receivers recover the clock for system synchronization

purposes (Le., clock and fiame synchronization). With this loop timed GPS clock, the

system clock is traceable to the UTC and the MMDS system can be connected directly to

the global communications network.

CBAPTER 7

CONCLUSIONS, CONTRIBUTIONS AND FUTURE WORK

This final chapter presents some conclusions drawn fiom the research and

proposes areas in which future work can be conducted on to Mprove the system.

7.1 Conclusions

The demand for faster data delivery services puts pressure on the development of

hi&-speed communication systems and MMDS is no exception. High-speed systems are

in demand to deliver data seMces through the air using microwave signals. The use of

MMDS systems has many benefits over the wire networks. This is due to the advantages

of a fast and low cost system installation, higher signal reliability, and higher channel

capacity. In addition, an M M D S system is capable of handling the accelerating demands

of hi&-speed data services and bandwidth limitation. To be competitive with existing

video and data services over the air, the high-speed MMDS system must be low cost and

reliable. To develop such systems, two objectives were set for the research:

1. To investigate into the implernentation of a high bit rate (200 Mbs) transceiver

in a compact high-speed ASIC.

2. To find suitable system synchronization scheme for the low cost MMDS

systems.

System simulation is the f h t step to ensure the designed system is functional and

the system can be irnplemented into hardware.

The results fkom system simulation using Simulink and Matlab have demonstrated

2 functional system. The simulation systern building blocks were built using basic gates

which are analogous to hardware irnplementation. This simulation method ensures

hardware can be built to realize math functions and algorithms used in the system

realization.

Based on DAVIC Version 1.2 specifications, various building blocks of the hi&-

speed MMDS transceiver were implemented using FPGA prototypes. The design uses

manually written synthesizable Verilog HDL to describe al1 of the transceiver building

blocks. The code can be used to implement the transceiver in either FPGA or ASIC.

It has been found that the system data integrity protection is very expensive to

implement, namely the forward enor correction scheme of the transceiver. This includes

the Reed-Solomon codec and the byte interleaving to correct both random and burst

mors causing by the channel. Besides the need for extensive hardware to implement, this

FEC is also the system speed limitation. In general, the most difficult task in the hi&-

speed MMDS system implementation is to ensure a system BER of 10-12. Much of the

effort and resources were allocated to the FEC of the transceiver to insure data integrity

and to increase system data rate. Extensive hardware resources were used to innplement

the algorithm of the FEC, especially the RS decoder. The main limitation of the speed

came fiom the implementation of the GF(~') arithmetic used in the RS decoder, namely

the multiplication and the division of the elements in this field.

As the problems were identified, the research concentrateci on the development of

a low cost, efficiency, hi&-speed FEC, in particular the hi&-speed (255,239) RS codec.

This included a theoretical investigation of RS error correction and efficient

implernentation of the chosen architecture. Operational speed of the RS decoder core was

cntical because this decoder throughput limited the overall transceiver data rate. The

other system building blocks had speeds fa. beyond system requirement, even though

they were implemented using relatively slow FPGA devices.

The speed improvernent of the RS decoder is obtained fiom the new VLSI

implementations of the GF(~') arithmetic. The most complex calculation in this field is

the division. This calculation includes ul inversion of an elexnent and a multiplication of

the two elements in the field. Between the two calculations, the inversion speed is more

important since the decoder throughput relies on ttiis inversion time. A low latency

inversion circuit for GF(~*) has been developed during the research which results in the

speed increase of the transceiver. This low latency inversion circuit had an improvernent

of 3% in hardware and 28% in latency over the pre-tested circuits. In addition to speed

improvernent, a new decoding algorithm of an RS decoder (Le., the division-fiee B-M

algorithm) was used to reduce the hardware complexity of the core. Using written

Verilog code, the designed transceiver was implemented into Altera FPGA devices. The

prototypes have achieved a system bit rate of 80 Mbs.

The research continued with the investigation of using a GPS dock for system

synchronization. Timing and synchronization are critical in the design of any digital

communications system. The synchronization process ensures the integrity of transmitted

data by clocking d l the elements in the network at the sarne rate. The MMDS system

requires a robust synchronization in order to be able to transmit a hi&-speed data stream

and to ensure - its data integrïty. In the absence of SONET and SDH, a precise frequency

derived fiom the Global Positioning System is to be used for the MMDS system

reference clock instead of crystals. This GPS clock fkquency and time have a very high

accuracy and they are directly and traceable continuously to the UTC. This GPS clock is

a cost-effective way to equip the system with a precise reference frequency for hi&-

speed data transmission since it has many advantages over conventional oscillator

crystals. These include precision, stabilim reliability, availability and low cost set-up.

A GPS clock was set up and tested during the research. The clock locked onto

GPS satellites and generated a local clock. Testing results demonstrated a high precision

(IO-'') and a stable dock generated using the GPS fiequency and time. This precision is

adequate for the high-speed MMDS systern clock reqiiirement. An interface circuit was

built to convert the GPS clock into a TTL output level. This TTL output clock wris used

to clock the FPGA prototype board and the board was used to design and to develop

various blocks of the transceiver.

To synchronize the high-speed MMDS system, a GPS clock is to be installed at

the transrnitter and the clock is received and recovered at the surroundhg receivers using

an earl yAate gate s ynchronization technique. This s ynchronizztion scheme ensures both

trammitter and receiver sire synchronized in both frequency and tirne. In addition to

system simplicity, the proposal of using a single GPS clock at the base station makes

more sense fiom the cost point of view. Another important feature of GPS clock is its

traceable ability to the UTC (which is used as a standard time for the global

communication networks). This property of the GPS ciock allows the MMDS system to

synchronize pro perl y to o h r hierarchical communications nodes.

In addition to clock synchronization, the MMDS system also utilizes fiame

synchronization. This includes the insertion of synchronization and inverse

synchronization bytes into the MPEG2-TS data packets. A high reliable marker is also

used to increase data integrity at the receiver end. In conclusion, the combination of GPS

clock and frame synchronization provides the robustness and reliability for the MMDS

synchronization requireinents while maintaining a simple, low-cost system.

7.2 Contributions

The resufts show that the Matlab and Simulink source code simulates the entire

MMDS system successfully. The code is available in a TRLabs report [5 ] and can be

used to simulate other wired or wireless communication systems. The code can be

changed easily to simulate any particular s ystem (Le., different decoduig scheme,

modulation technique, degree of QAM, filter, RF fiequency, channel characteristics,

etc.. .).

The synthesizable HDL code provided in [5] will benefit both industries and other

researchers in M e r study or implementation of any baseband wireless transceiver. A

system data rate of 2OOMbs with a ~ ~ ~ - 4 0 - l ~ over a wireless channel can be achieved

using this transceiver providing that the HDL code is implemented into an ASIC. The use

of this transceiver is not limited to MPEG-2 data. The designed transceiver will be able to

transmit and receive any data Stream.

The novel clocking scheme of the c~nvolutional interleaver and deinterleaver has

a great impact on the complexity of the block. Data transfer in the block is clocked using

its own counter to reduce the number of stored devices (i.e., flip-flop). A reduction over

80% in hardware implementation has been obtained.

The designed (255,239) RS decoder core m s at l e s t 25% faster and is 35%

srnaller in size compared to other implementations. The reduction in hardware and the

gain in speed are obtained by using a division f?ee B-M algorithm and efficient GF

arithmetic circuits. The GF(2m) arithmetic VLSI circuits c m also be used in other areas

that use abstract algebra such as cryptography. The new algorïthms reduce hardware

complexity while increasing the operathg speed of these circuits. The most important

improvement is the parallel architecture of the inversion circuit which was developed in

the thesis. Al1 the HDL code for the tested GF(2m) arithrnetic circuits is included in [5].

The use of a precision GPS clock for synchronization is a novel application. The

two important characteristics of the GPS clock are its precision (10"' in Grequency) and

its time reference (Le., universal standard time). This makes the synchronization of the

MMDS system with any global communication networks simple and robust- The use of a

single GPS dock and the replication of the clock at ali of the receivers provides a t h e -

loop clock in a very cost effective marner compared with the use of expensive crystal

oscillators.

7.3 Future work

There are some areas in which work and research c m be done to irnprove

the system performance- The immediate work required is the fabrication of the

transceiver in an ASIC using current available technology. This is straigh~orward since

the Verilog code is synthesizable therefore code modification is not required. The desired

speed of 200 Mbs should be achieved when the final ASIC is built. Fn the FPGA

implementation, the transceiver was implemented using 2 devices: one is dedicated for

the RS codec and the other is for the other transceiver building blocks. However, when

the ASIC is made, the transceiver should be in one ASIC if possible due to the one-chip

solution advantages. The systern-on-a-chip solution not only reduces the cost of the

ASIC, it also eliminates the need for any interface which is required for data transfer

between the chips.

The GPS clock synchronization will be complete with the verification of data

transfer. A complete system set-up with the RFLF circuits is required to perform the test.

Models of the trammitter and the receiver must be buik in order to send and receive test

data.

Another direction of the future work is the investigation of the use of TCM in the

system. TCM c m be used to increase system coverage given its 3dB code-gain as shown

in Section 2.2.6. The TCM encoder can be represented as a finite-state machine since the

codes follow a trellis structure. It is quite simple to irnplement the encoder in VLSI using

HDL. In contrast, the work is extensive to design the TCM decoder as in the case of RS

decoder. The decoder is very cornplex because it requires the use of special decoding

algorithm. There are rnany available decoding algorithms that cm be used to decode the

TCM convolutional code [Il]. Arnong them, the Viterbi algorithm using in the maximum

likelihood method has the best performance in both hardware implementation and speed.

Many studies and implementations of the Viterbi algorithm used in TCM for various

constraint lengths have been published [ 1 8-28].

The challenge in the design of this TCM decoder lies in its long tracking length

because the complexity of the TCM dccoder is proportional to this tracking length. More

details in the implementation of the Viterbi decoder can be found in 125,271. The

integration of this TCM decoder into the system eventually increases the transceiver

cornplexity. As the result, the system operation speed and cost of the ASIC have to be

considered.

Further study is on the MMDS trammitter. Since the designed system used 256

QAM for data transmission, the transmitter requires higher SNR and more stnngent

linearity specifications. The geometry of the signal constellation reveds the distance

bctween an ideal symbol and a decision boundary decreases with increasing bits per

symbol (Le., fiom 64 QAM to 256 QAM as shown in Figure A.3 and Figure A.4). As a

result, Gaussian noise, local oscillator phase noise, mixer and amplifier non-lineariw

fiom the trammitter are more likely to create symbol errors in the 256 QAM. Among the

components of the transmitter, the IF signal processing circuitry is a critical part of the

transrnitter performance. In particular, frequency response, delay correction and linear

correction of the circuits should be fully understood. The up-conversion fiom IF to RF

also requires a hi& phase noise performance of the local oscillator. For 256QAM, a

phase shift of 3.7 degrees wilI destroy the signal. One suggestion for better phase noise

performance is to multiply up a voltage controIled crystai osci1lator phase locked to a

GPS reference fiequency which is available at the trammitter site. The non-linearity

characteristic of the amplifier cm be improved using feedforward linearization technique.

The principle of the technique is to sample the input and output of a high power amplifier

and subtract the two signals. This distortion result is amplified and injected out of phase

with the ha1 hi& power output. In th is manner, the distortion of the high powex

amplifier is canceled.

Charnel fkequency response of the system is also a topic for future research. The

response other than the ided of unity will dimpt the r a i d cosine response and create

intersymbol interference. In addition, the effects of multipath delay and fading are

interesting subjects to investigate. The multipath effect cm make the wavefom no longer

crosses zero at every symbol tirne. The study of the adaptive equalizer to eIiminate

intersymbol interférence due to multipath propagation in this radio channel is essential-

REFERENCES

[1] Lawrence Behr Associates, "Wireless Evolution, Definition and Current Practice," Technical Note 1 15, Lawrence Behr Associates, Inc., Greenville, North Carolina 27835, USA, 1999.

[23 CA1 Wireless Systems Inc., "MMDS Wireless CabIe Backgromder," C M Technology Updates, CA1 Wireless Systems Inc., June 1998.

[3] David Urban, "MMDS Transmitter for High Data Rate Digital Video Delivery," ADC Technical papers, ADC Telecommunications, Microwave S ystems Division, PA, USA, July 1997.

[4] Digital Audio-Visual Council (DAVIC), "The DA V7C 1.2 Speczfications, " DAVIC, Geneva, Switzeriand, 1997.

153 A. Dinh and R. J. Bolton, "TRCabs Research Report, " Regina TFUabs, 1 08-2 Research Drive, Regina, Saskatchemn, Canada, July 2000.

[63 The International Organization for Standardization, "Coding of audio, picture, multimedia, and hypermedia information pnor to fkaming in the multirate stnicture," [ISOIIEC 1381 8-11 ISO/EC Document 13818-1.

171 1. S. Reed and G. Solomon, 4'Polynomial Codes over Certain Finite Fields," J. Soc. Ind. AppZ. Math., Vol. 8, pp. 300-304, June 1960.

[8] Robert J. McEliece, The Theory of Information and Coding, A Mathematical Framework foi- Communication, Addison-Wesley Publishing Company, 1997.

[9] Advanced Hardware Arcb&ectures, hc, "Primer: Reed-Solomon Error Correction Codes (ECC)," AHA Application Note, Doc. # ANMOI-0395, AHA Inc., Pullman, WA, USA, f 996.

[IO] Co-Optic Inc., "COic5130A Specifications, Programmable Reed-Solomon Error Correction Encoder and Decoder," Co-Optic Inc, Pa10 Nto, CA, USA, 1998.

11 11 Shu Lin and Daniel J. Costello, Jr., Ewor Control Coding, Furzdamentals and Applications, Prentice Hall, New Jersey, 1 983.

[ 1 21 Bernard S klar, Digital Communications, Fundamentah and Applications, Prentice Hall, Englewood CliRs, New Jersey, 1988.

[13] J. M. Hsu and C. L. Wang, "An Area-Efficient Pipelined VLSI Architecture for Decoding of Reed-Solomon Codes Based on Time-Domain Algo~ithm," IEEE Transactions on Circuits and Systems for Video TechnoZogy, Vol. 7, NO. 6, pp. 864-87 1, Decernber 1997.

[14] C . C. Hsu, 1. S. Reed and T. K_ Truong, "Use of the RS Decoder as an RS Encoder for Two-Way Digital Communications and Storage Systems," EEE Transactions on Circuits and Systems for Video Technology, Vol. 4, No. 1, pp. 9 1-92, February 1994.

[l5] Altera Corporation, "lnterleaver/Deinterleaver Megacore Function," Solution Brief 42, Altera Corporation, 1 0 1 Innovation Drive, San Jose, CA, USA, 95 1 34, June 1 999.

[16] G. Ungerboeck, 'The state of the art in Trellis Coded Modulation," Coded Modulation and Bandwidth-Eflcient Transmission, Edited by E. Biglieri, M. Luise, Elsevier Science Publishers B. V., N .Y., USA, pp. 3-14, 1992.

[17] William Webb and Lajos Hanzo, Modem Quadrature Amplitude Modulation, Princ@les and Applications for Fixed and Wireless Communications, Pentech Press Publishers, London, England, 1994.

[18] S. Benedetto, C . Guerra, M. Mondin, A. Pincetti and F. Pasello, "Receiver Design for 8-PSK Trellis Coded modulation in a TDMA Burst Mode Satellite Link," Coded Modulation and Bandwidth-Eficient Transmission, Edited by E. Biglieri, M. Luise, Elsevier Science Publishers B. V., N .Y., USA, pp. 103-1 16, 1992.

[19] Simon Haykin, Communications System, 3rd Edition, John Wiley & Son, New York, 1994.

1201 Chang C. Y. and Yao K., "Systolic Array Processing of the Viterbi Algorithm," IEEE Tmns. on Information Theoly, Vol. 35, No. 1, pp. 76-86, January 1989.

[21] B. A. Harvey, "Adaptive Viterbi Decoding for ARQ and Reduced Complexity Decoding," Proceedings of The 10th International Conference on Wireless Communications, Calgary, Canada, Vol. 1, pp. 239-250, July 6-8, 1998.

[22] Y. Savaria, F. El-Hassan, H. Khali and M. Sawan, "An Effective Hardware Software Implementation of a Viterbi Decoder Using an FPGA-based Reconfigurable Computïng Platform," The 5th Canadian Workshop on Field-Programmable Devices (FPD'98): Technology, Tools and Applications, École Polytechnique de Montréal, June 7- 10, 1998, Montréal, Québec, Canada.

[23] Olaf J. Joeressen and Heinrich Meyr, "A 40 Mb/s Soft-Output Viterbi Decoder," LEEE Journal on Solid-State Circuits, Vol. 30, No. 7, pp. 8 12-8 18, July 1995.

[24] A. M. Michelson and A. H. Levesque, Evor Control Techniques for Digital Communications, John WiIey & Sons, New York, 1985.

[25] A. Dinh, R. Mason and J. Toth, "High-speed V.32 Trellis EncodedDecoder Implementation using FPGA," IEEE International Symposium on Circuits and Systems (ISCAS '99) Proceedings, Orlando, Florida, pp. IV-295 to IV-298, May 30-June 2, 1999.

[26] Mansoor A. Christie, 'Viterbi Implementation on the TMS320C5x for V.32 Modems," Digital Signal Processing Applications- Semiconductor Group, Document # SPR4099.pdfi Texas Instniments Incorporated, Texas, 1 996.

[27] Lihong Jia, Yonghong Gao and Jouni Isoaho, "Design of a super-pipelinecl Viterbi Decoder," ISCAS '99 Proceedings, Orlando, Florida, pp. 1-132 to 1-136, May 30-June 2, 1999.

1281 Chi-Young Tsui, Roger S. K. Cheng and Curtis Ling, " Low Power ACS Unit Design for the Viterbi Decoder," lSCAS'99 Proceedings, Orlando, Florida, pp. 1-137 to I- 141, May 30-June 2, 1999.

[29] Wayne Tomasi, Advanced Electronic Comrnunications Systems, 3rd Edition, Prentice Hall, Englewood Cliffs, N. J., USA, 1994.

1301 Martin S. Roden, Analog and Digital Comrnunications Systems, Prentice Hall, Englewood Cliffs, N. J., USA, 199 1.

[3!] Kami10 Feher, Advanced Digital Communications, System and Signal Processing Techniques, Prentice Hall Inc, Englewood Cliffs, N. J., USA, 1987.

[32] A. Dinh, R. J. Bolton, R. Mason, and R. Palmer, ccMulti-channel Multi-point Distribution Services System Transceiver Implementation," LEEE Pacifie Rim Conference on Communications, Cornputers and Signal Processing Proceeding, Victoria, B.C., Canada, pp.242-245, August 22-24, 1999.

[33] Shunghoon Kwon and Hyunchul Shin, "An Area Efficient VLSI Architecture of a Reed-Solomon DecodedEncoder for Digital VCRs," LEEE Trans. on Consumer Electronics, Vol. 43, No. 4, pp. 10 19- 1027, November 1997.

[34] Dariush Dabiri and Ian F. Blake, "Fast Parallel Algorithms for Decoding Reed- Solomon Codes Based on Remainder Poiynomials," E E E Trans. on In formation Theory, Vol. 41, NO. 4, pp. 873-885, July 1995.

[35] Tetsuo Iwaki, Toshihisa Tamaka, Eiji Yamada, Tohru Okuda and Taizoth Sasada, "Architecture of a High Speed Reed-Solomon Decoder," IEEE Trans. on Consumer Elecfronics, Vol. 40, No. 1, pp. 75-8 1, Februa. 1994.

[36] H. M. Shao, T. K. Truong, L. J. Deusch, J. H. Yuen and 1. S. Reed, "A VLSI Design of a Pipeline Reed-Solomon Decoder," IEEE Trans. on Cornputers, Vol. C-34, No. 5, pp. 393-403, May 1985.

[37] Keeichi Iwamura, Yasumori Dohi and Hideki Imai, "A Design of Reed-Solomon Decoder with S ystolic-Array Structure," IEEE Trans. on Cornputers, Vol. 44, No. 1, pp. 1 18-122, January 1995.

1381 Y. R. Shayan and Tho Le-Ngoc, "A Cellular Structure for a Versatile Reed-Solomon Decoder," LEEE Transactions on Computers, Vol. 46, No. 1, pp. 80-85, January 1997.

1391 S. R. Whitaker and J. A- Canaris, "Xeed-Solomon VLSl Codec for Advanced Television," LEEE Transactions on Circuits and Systems for Video Technology, Vol. 1, No. 2, pp. 230-236, June 199 1.

[40] J. C. Huang, et al., "h Area-Efficient Versatile Reed-Solomon Decoder for ADSL," ISCAS'99 Proceeding, Orlando, Florida, USA, pp. 1-5 17 to 1-520, May 30-June 2,1999.

[41] Hynman Chang and Myung H. Sunwoo, "A Low Complexity Reed-Solomon Architecture Using the Euclid's Algoritfim," ISCAS'99 Proceeding, Orlando, Florida, USA, pp. 1-5 13 to 1-5 16, May 30-June 2,1999.

1421 A. Dinh and R. J. Bolton, "Design of a High Speed (255,239) Reed-Solomon Codec," IEEE Canada Wescanex '99 Proceeding, Calgary, Alberta, Canada, October 29- 30, 1999.

[43] Yousef R. Shayan and Tho Le-Ngoc, "Modified Time-Domain Algorithm for Decoding Reed-Solomon Codes," IEEE T'ns. on Communications, Vol. 41, No. 7, pp. 1036-1038, July 1993.

[44] Leilei Song and Keshab K. Parhi, "Low-Energy Software Reed-Solomon Codecs Using Specialized Finite Field Datapath and Division-Free Berlekamp-Massey Algonthm," ISCAS'99 Proceeding, Orlando, Flonda, USA, pp. 1-84 to 1-89, May 30- June 2,1999.

[45] S. K. J a h , L. Song and K. K. Parhi, "Efficient Semi-Systolic Architecture for Finite Field Anthmetic," IEEE Trans. on VLSJ Systems, Vol. 6, No. 1, pp. 10 1-1 13, March 1998.

[46] C. Paar, P. Fleischmann md P. Roelse, "Efficient Multiplier Architecture for Galois Fields GF(z~")", IEEE Trans. on Cornputers, Vol. 47, No. 2, pp. 162-169, February 1998.

[47] L. Song and K. K. Parhi, "Low-Complexity Modified Mastrovito Multipliers Over Finite Field G F ( ~ ~ ) , " ISCAS'99 Proceeding, Orlando, Florida, USA, pp. 1-508 to 1-5 12, May 30-June 2, 1999.

[48] C. C. Wang, T. K. Truong, et al., ''VLSI Architectures for Computing Multiplications and Inverses in GF(2m)," IEEE Transactions on Cornputer, Vol. C-34, No. 8, pp. 709-7 17, October 1998.

[49] Sebastien T. J. Fem, Mohammed Benaissa and David Taylor, "GF(2m) Multiplication and Division Over the Dual Basic," lEEE Trans. on Computers, Vol. 45, No. 3, pp. 3 19-227, March 1996.

[50] J. H. Guo and C . L. Wang, " A Low Time-Complexity, Hardware-Efficient Bit- Parallel Power-Sum Circuit for Finite Field GF(2q," ISCASY99 Proceeding, Orlando, Florida, USA, pp. 1-521 to 1-524, May 30-fune 2, 1999.

[51] M. A. Hasan, 'cDouble-Basis Multiplicative Inversion Over GF(2m)," LEEE Tram. on Computer, Vol. 47, No. 9, pp. 960-970, September 1998.

[52] Shyue-Win Wei, "VLSI Architectures for Computing Exponentiations, Multiplicative Inverses, and Divisions in GF(2m)," IEEE Trans. on Circuits and Systems- II: Analog and Digital Signal Processing, Vol. 44, No. 10, pp. 847-855, October 1997.

[53] 5. H. Guo and C. L. Wang, "Systolic Array hplementation of Euclid's Algorithm for Inversion and Division in GF(2=)," IEEE Transactions on Computer, Vol. 47, No. 10, pp. 1 16 1-1 167, October 1998.

[54] C. C Wang, S. K. Truong, et al., V L S I Architectures for Computing Multiplications and Inverses in GF(2m)," E E E Trans. on Computer, Vol. C-34, No. 8, pp. 709-717, October 1998.

1551 Yong-Jin Jeong and Wayne Burleson, "VLSI Array Synthesis for Polynomial GCD Computation and Application to Finite Field Division," IEEE Trans. on Circuits and System-1: Fundamental Theory and Applications, Vol. 4 1, No. 12, pp. 89 1-897, December 1994.

[56] A. V. Dinh and R. J. Bolton, "A Low Latency Architecture for Computing Multiplicative Inverses and Divisions in GF(2")," Canadian Conference on Elec~ical and Computer Engineen-ng (CCECE 2000) Proceeding, Halifax, Nova Scotia, Canada, pp. 43-47, May 7- 1 0,2000.

[57] John C. Bellamy, "Digital Network Synchronization," IEEE Communications Magazine, Vol. 33, No. 4, pp. 70-83, April 1995.

[Sa] J. J. Stiffler, Theory of Synchronous Communications, Prentice Hall, USA, 197 1.

[59] William C. Lindsey, Synchronization Systems in Communication and Control, Prentice Hall, USA, 1972.

[60] Jack Smith, Modem Communication Circuits, McGraw-Hill International Editions, 1996.

1611 J. Das, S. K. Mullick and P. K. Chaîîerjee, Princ@les of Digital Communications, John Wiley & Sons, 1986.

[62] G. Smith and J. Kates, "GPS precise time for VME bus," VîME Bus Systems, pp. 27- 48, ApriVMay 1996.

[63] A. V. Dinh, R. J. Bolton, R. J. Palmer and R. Mason, "Multichannel Multipoint Distribution Services System Synchronization Using Global Positioning System Clock" Canadian Conference on Electrical and Compter Engineering (CCECE 2000) Proceeding, Halifax, Nova Scotia, Canada, pp. 875-879, May 740,2000.

[64] P. Enge and P. Misra, "Scanning the Issue/Technology, Special lssue on Global Positioning System," Proceeding of the IEEE, Vol. 87, No. 1, pp. 3- 15, January 1999.

[65] W. Lewandowski, J- Azoubib, and W. Klepczynski, "GPS: Primary Tool for Time Transfer," Proceeding of the LEEE, Vol. 87, No. 1, pp. 1 63- 1 72, January 1 999.

[66] Elliot D. Kaplan, Understanding GPS- Principles and Applications, Mobile Communications Senes, Artech House Publisbers, Boston, USA, 1996.

[67] J. B. Bullock, et al., 'Test results and analysis of a low cost core GPS receiver for tirne transfer applications," Motorola Position and Navigation Systems Business, Presented at the 1997 IEEE Frequency Control Consortium in Orlando, Flonda, USA, 1997.

[68] Steven C. Fisher and Kamran Ghassemi, "GPS IIF-The Next Generation," Proceeding of the E E E , Vol. 87, No. 1, pp. 24-32, January 1999.

[69] M. S. Braasch and A.J. Dierendonck, "GPS Receiver Architectures and Measurements," Proceeding of the IEEE, Vol. 87, No. 1, pp. 163- 172, January 1999.

[70] Charles C. Counselman, "Multipaih-Rejecting GP S htenna", Proceeding of the IEEE, Vol. 87, No- 1, pp. 86-9 1, January 1999.

[71] Absolute Tirne, "Mode1 100A/B GPS Clock User Manual," Absolute Time Corporation, San Jose, California, October 1996.

[72] Linear Technology, "LTI 704 Data sheet ", Linear Technology Inc, USA, 1998-

[73] Boaz P. Shamir and Sergio Rajsbaum, MIT, "A Theory of Clock Synchronization," Proceeding 26th Symp. on Theory of Computing, May 1994.

[74] E. A. Lee, D. G. Messerschmitt, Digital Communication, 2nd Edition, Kluwer Academic Pub, 1997.

[75] Aldo Nunzio D'Andrea and Marco Luise, "Optimization of Symbol Timing Recovery for QAM Data Demodulators," IEEE Tram. on Commzknications, Vol. 44, No.3, pp. 399-406, March 1996.

[76] Daeyoung Kim, Madihally J. Narasunha and Donald C. Cox, "Design of Optimal Interpolation Filter for S ymbol Timing Recovery," IEEE Tr-ans. on Communications, Vol. 45, No.7, pp. 877-884, July 1997.

[77] Aitera Corporation, "EarlyLate Gate Synchronizer Megabction," Solution Brief I 7, Altera Corporation, 1 0 1 Innovation Drive, San Jose, CA, USA, June 1 997.

1781 C. R. Cab, "Performance of digital phase modulation communication systems", IRE Truns. on Cornmunicutions, Vol. CS-7, pp. 3-6, May 1959.

[79] J. G. Proakis, Digital Communications, McGraw-Hill, New York, New York, 1983.

[80] David C. Buchthal and Douglas E. Cameron, Modem Absb-act Algebra, Prindle, Weber & Schmidt Publishers, Boston, USA, 1987.

[81] J. H, van Lint, htroduction to Coding neory, Second Edition, Springer-Verlag, Berlin, Gcxmany, 1992.

1821 Man Young Rhee, Ewor Cornecring Coding Theory, McGraw-Hill, New York, New York, 1989.

APPENDIX A: QAM constellations

Since its discovery in the e d y 1960s, QAM has continued to gain interest and

practical applications. In recent years, many new ideas and techniques have been

proposed, allowing QAM deployrnent. A large number of constellations have been

proposed for QAM transmission over Gaussian channels. The idea began with Cahn [78]

and evolved through the years. The three constellations shown in Figure A.1 are often

referred. The essential problem is to maintain a high minimum distance, d-, between the

points while keeping the average power required for the constellation to a minimum [17].

' Q Type 1 QAM constellation

' Q Type II QAM constellation

a l m a

Type III QAM constellation

Figure A.l Variety of QAM constellation

Calculation of d- and the average power is a geometric procedure and has been

performed for a range of constellations 1791. The results show that the square

constellation, (Type III), is optimal for Gaussian channels. The other two types require a

higher energy to achieve the same d- as the square constellation and are generally not

preferred. The following Figures show the QAM constellations used in the MMDS

system. There are three Ievels of QAM as defined in Section 2.2.7.: 16 QAM, 64 QAM

and 256 QAM. The constellation points in the 2114 3rd and 4th quadrants are located by

the changing the two MSBs and rotating LSBs according to the mle in Section 2.2.8. The

Ik and Qk are the two MSBs in each quadrant and should be prepended to the

constellation values to complete the m-bit value.

A.116 QAM

The 16 QAM has 1 6 points in the constellation as shown in Figure A.2. The

symbols are 4-bit words.

Figure A.2 16 QAM Constellation diagram

A.2 64 QAM

The 64 QAM has 64 points in the constellation as shown in Figure A.3. The symbols are 6-bit words.

Figure A 3 64 QAM Constellation Diagram

A.3 256 QAM

The 256 QAM has 256 points in the constellation as shown in Figure A.4. The

symbols are 8-bit -words.

Figure A.4 256 QAM Constellation Diagram

Appendix B: Galois Fields

B.1 Galois field

A Galois Field (or f i t e field) is a set of symbols which obey a set of restrictions

that allow addition, subtraction, multiplication, and division upon them [11,80,81]. The

reason f i t e field mathematics is so important in FEC digital circuitry is that it allows

mathematics to be performed on binary vectors (i-e., bytes) without expanding their size.

For instance, adding 2 bytes together must result in another byte, instead of a 9-bit word.

It hirns out that the restrictions that need to impose on the symbols are the same

restrictions that define a finite field. These rules were discovered by Evariste Galois, and

as result, these fields are called Galois Fields (GF) [go].

A GF with "Q" symbols in it is referred to as GF(Q). "Q" is called the order of the

field. For exarnple, the GF(~') has the order 4-56. The GF has the following simple

rules:

1) The elements of the field must form a commutative group under addition. If

two elements are adding together, the result is another element in the sarne field.

Furthemore, they must commute (Le., a + b = b + a).

2) Like the case of addition, multiplying an elernent to other element results in

another element in the field. The multiplication also commutes (Le., a-b = b-a).

3) The addition and multiplication operations must distribute. This means that:

a@+c) = (ab) + (a-c).

4) The number of elements "Q" in the field must be equal to qm where "q" is a

prime number, and "m" is a positive integer. For exarnple G F ( ~ ~ ) has q=2 and m=8.

B.2 Construction of GF(23

The constructing of the elements in the GF(Zm), is based f?om the binary field,

GF(2). It begins with two elements O and 1, a new syrnbol a, and the dehition of a

multiplication 'Y A sequence of powers of a is introduced as follows:

aJ =ma.a. . -a (j times),

It follows fiom the definition of the multiplication that

From the restriction on the multiplication operation above, the following set of

elements is defined:

The eiement 1 is denoted a*. The condition, which imposes on the element a, is that the

set F contains only 2m elements and the set F is closed under the multiplication definition

(i-e., the multiplication of 2 elements in the set F results in an eIement in F).

Let p(x) be a primitive polynomial of degree m over GF(2) (i-e., the coefficients

of p(x) are either O or 1). This polynomial is cdled the field generator polynomid. The

condition imposed on this polynomial is p(a)=O. Under such condition, the set F becomes

finite and contains the following elements:

2 'm-2 F = ( O , l , a , a ,..., a- 1 ,

and the nonzero elements of F are closed under the multiplication operations, "-". The

nonzero elements also form a commutative group under ".". Under addition operation,

"+", al1 of the elements in the set F are closed and the set is also a commutative group.

The set F of G F ( ~ ~ ) elements for a given p(x) is shown in Table B.1. This

particular field is used in the designed RS codec. The table shows two representations of

the elements: the power representation, ai, and the polynomial representation (the

coefficients of 1 ,a,a2,..,a7 which are either O or 1). The f ist representation is convenient

for multiplication and the second is convenient for addition.

To multiply two eiements ai and aj in the field, one simply adds their exponents

- and use the fact that azss=l. For example, al5- a'98 - a 2 I 3 and a213- a76 = a 289-255- 34 -a .

Dividing ai by a!, one simply multiplies ai by the multiplicative inverse, as'-j, of a-'. For

example, a 213/a76 = a2~3.a179 - - a392 = a'37. To add a 2 I 3 and a76, one uses their

polynomial representations in the table. For example,

B.3 Vector representation of GF(23

Another useful representation for the field elements in GF(2") is the use of an m-

dimensional vector. Let (a + ala + a& + .-- + amlam-l) be the polynomial representation

of a field element P in GF(2m)7 where ai = O or 1. Then this element is represented by an

order sequence of m components, called an m-tuple, as follows:

(%Y a17 a& --• 7 a,-~)~

where the m components are simply the coefficients of the polynomial representation of

B. The 8-tuples of GF(~*) are shown in the second col- of Table B. 1.

Using this representation, addition is easy to define. The addition is perfiormed

element by element according to the d e s of GF(Q) math. Adding B and y, one simply

add the corresponding components of their m-tuple representations :

(ao +boy a l + h --- Y am-1 +b,l),

where a$bi is carried out in modulo-2 addition. This addition is simply an operation of an

XOR logic. Obviously, the components of the resultant m-tuple are the coefficients of the

polynomial representation of (P + y) which is an element in the set (Le., the set is closed

under the addition operation). For the given GF(~*) example, adding between a 2 I 3 and

aJ6 is:

a 2 1 3 + o r 7 6 = ( ~ 1 0 0 1 1 11)+(01 I 1 1 0 0 0 ) = ( 0 0 1 1 0 1 1 l)=alzz.

Appendix C: Algorithms to Find Error Locator Polynomial for RS Decoder

In decoding an RS code, it is necessay to determine both the location and the

magnitude of the error in the received vector. The required information to determine

these two elements are in the syndrome polynomial, S(x). This polynomial is found by

substituting the code generator roots into the received codewords. There are two

polynomials derived fiom S(x). Tbe error locator polynornial, ~ ( x ) , holds the location of

error and the error magnitude polynomial, Z(x), contains the error value. The solution to

h d a minimum degree of o(x) must satism the following key equation,

S(x) o(x) = Z(x) mod (xZ3,

where t is the number of error symbols can be corrected by the code.

Evaluating these polynomials is the most complex step in decoding an RS code.

The complexity of the RS decoder lies in fïnding a minunum degree of o(x). Berlekamp

was the e s t to develop a computationally efficient method of solving the key equation.

Since then several di fferent methods have been developed. Sugi yama showed that

Euclid's algorithm for finding the greatest comrnon divisor of the polynomials could also

be adapted to this purpose [39]. There are two main algorithms to find a(x): the

Berlekarnp-Massey and the Euclidean algorithms. Both algorithms have roughly the same

computational cornplexity. The algorithms have been revised or modified to adapt to

diEerent decoding architectures [13,14,34,35,4 1,431.

1. The Berlekamp-Massey algorithm

Berlekamp devised the algorithm in 1967 and Massey discussed it in 1968. The

algorithm is described as follows [82] :

1. Start at n =O with the initial conditions:

3. Ifdn#O, we have

4. For eifher &=O or d&O, the next discrepancies are

(km+[ ) ( k " 4 )

d kt "-1) = s(kn+, , + i s ( ~ n + l )-i and

5. The itemtion stops at n =2t-1.

2. The Euclidean algorithm

In general, the Euclidean algorithm which evaluates the error locator polynomial,

o(x), and the error magnitude polynomial, Z(x), is expressed as follows:

I Four polynomia ls are initialized:

ne algorîthm iteratively updates 2 polynomials a@) and Z(x) as follows.

a. Divide Zi.?(x) by Zi.l(x) to obtain the quotient Qi@) and the rernninde Z@)-

The iteration is continued until the degree of Zi is Zess than 2t. men set a&)= G~(x), and Z(X)= Zi(x).

HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION … · This thesis presents the fht phase in the...

Documents

Transcript of HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION … · This thesis presents the fht phase in the...