HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION … · This thesis presents the fht phase in the...
Transcript of HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION … · This thesis presents the fht phase in the...
HIGH SPEED MMDS TRANSCEIVER IMPLEMENTATION WITH GPS CLOCK SYNCHRONIZATION
A Thesis
Submitted to the Facdty of Graduate Studies and Research
in Partial Fulfillment of the Requirements
For the Degree of
Doctor of Philosophy
in Engineering
UNIVERSITY OF REGINA
BY
Anh van Dinh
Regina, Saskatchewan
August 2000
@Copyright 2000: A. V. Dinh
National Library 1*1 ,.,da Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques
395 Wellington Street 395. rue Wellington Ottawa ON KtA ON4 Ottawa ON K1 A ON4 Canada Canada
The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bïblictheque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format electronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
ABSTRACT
This thesis presents the f h t phase in the hardware implementation of a hi&-
speed trmceiver to be used in a Multi-charnel Multi-point Distribution Services
(MMDS) system. Based on standard specifications, various building biocks were
implemented using FPGA prototypes. Among the blocks, the forward error correction for
data integrity protection is the most complicated and expensive to ïmplement. This block
includes the Reed-Solomon (RS) codec and byte interleaving to correct both random and
burst mors caused by the channel. The high-speed RS decoder was designed using
highly efficient algorithms and low latency VLSI circuits which are used to implement
arithmetic in the Galois field G F ( ~ ~ ) . Results show a data rate of 80 Mbs was obtained
using FPGA prototypes. A data rate of 200 Mbs should be achieved when ASICs are
developed using the synthesizable Verilog code that was developed for this thesis.
In addition to a high bit rate, the MMDS system also requires robust
synchronization in order to transmit a high-speed data stream and to ensure data integrity.
Timing and synchronization are critical in system design. These include both clock and
frame synchronization. In the absence of SONET and SDH, a precise frequency fiom the
GPS is used for the system reference clock instead of crystal oscillators. This fiequency
and time have very hi& accuracy that is directly and continuously traceable to the
Coordinates Universal T h e . A reference 10 MHz clock with TTL output levels was
generated 60m a GPS clock. This clock was used in the development of the high-speed
MMDS systern and the clock is proposed to be used for system clock synchronization.
Precision timing combined with a word synchronization scheme makes the MMDS
system simple, robust, low cost and reliable.
1 wish to express my gratitude to some of the organizations and people whose
support -supervision, guidance, advice and encouragement were helpfid to this research.
First, 1 would like to thank TRLabs for its continued support in financial
assistance, advice, supervision and facility required for the research. The financial
assistance from the Natural Sciences and Engineering Research Council of Canada from
S epternber 1 997 to October 1 999 is highiy appreciated. Various graduate study
scholarships provided by the Faculty of Graduate Studies and Research of the University
of Regina are also acknowledged.
Second, 1 would iike to extend my sincere gratitude to my advisors: Dr. R. J.
Palmer, University of Regina, and Dr. R. J. Bolton, University of Saskatchewan, who
provided support, advice, assistance and technical help during the research. A special
thank to Dr. R. Mason, Carleton University, for his countless help and advice fiom the
beginning to the end of this research. 1 thank Mr. J. Toth of Edge Networks Inc. in
Winnipeg for the initiation of this project. 1 also thank Mr. A. Kostiuk for his help during
my tirne at TRLabs Regina The work of Mr. N. McLeod to synthesize the GF(2m)
inversion circuits using CMOS IS 5 technology at TRLabs Saskatoon is highl y
aclmowledged. 1 also appreciate the help fiom Engineering Faculty Members and Faculty
of Engineering Staff of the University of Regina over my years of study.
Finally, 1 owe a great debt to my family, especially my wife in the past five years
since 1 started my graduate study. I thank my family for their patience, understanding,
encouragement and unfâiling moral support over the long period of time spent on the
research.
TABLE OF CONTENTS
................................................................ Abstract i
. . Acknowledgements ....................................................... il
... ........................................................ Table of Contents iii
. . ListofFigures .......................................................... mi
........................................................... List of Tables ix
ListofAcronyms ........................................................ x
..................................................... 1.INTRODUCTION 1
1.1 The MMDS technology .......................................... 1
1.2 Research objectives, contribution and methodology ................... - 6
1.2.1 Systm modeling and simulation ........................... - 8
1.2.2 Hardware description language for basic building blocks ....... - 9
................................ ..... 1 .2.3 S ynchronization .. 10
1.3Contributions ................................................. 11
1.4Dissertationoutline ............................................ 12
2 . DAVIC VERSION 1.2 SPECIFICATIONS AND MMDS SYSTEM ............. 14
2.1 Digital Audio-Visual Council Version 1.2 specifications ............... 14
2.2MMDSsyst eni ................................................ 16
2.2.1 Baseband interface, S ynchronization and Randornization ....... 19
2.2.2 Red-Solomon codec ................................... -21
CI 2.2.3 Convolutional intedeavedde-interleaver ..................... 23
2.2.4 Byte to symboI mapping ................................ -26
2.2.5 Differential encodeddecoder ........................... - 2 6
... Ill
2.2.6 TCM encodeddecoder .................................. -27
2.2.7 Quadrature amplitude modulation ...................... 30
2.2.8 QAM constellation mapping ............................. -31
2.2.9 Baseband filtering ...................................... 32
2.2.1 0 Radio fkequency interface ............................... 34
2.2.1 1 S ystem synchronization ................................. 34
3 . MMDS TRANSMITTER IMPLEMENTATION ............................ - 3 6
3.1 Baseband interface, Synchronization byte inversion and Randomization ... 36
........................................ 3.2 Reed-Solomon encoder - 3 9
3.3 Convolutional hterleaver ....................................... 42
3.4 Byte-to-m tuple conversion and differential encoder . . . . . . . . . . . . . . . . . . 45
3.4.1 Byte-to-m tuple conversion ............................... 45
3 .4.2 Differential encoding .................................. -45
3SQAMmapping ................................................ 46
4 . MMDS RECElVER IMPLEMENTATION ................................ -48
............................................ 4.1 QAM de-mapping -48
4.2 Differential decoder and m-to-byte conversion ...................... - 4 9
4.2.1 Differentiai decoder ................................... - 49
4.2.2 M-to-byte conversion .................................. - 5 0
4.3 Convolutional de-interleaver ..................................... 50
4.4 Reed-Solornon decoder ........................................ - 5 1
4.4.1 Syndrome calculation .................................. - 5 4
4.4.2 Error Iocator polynomial ................................. 55
...... 4.4.3 Error magnitude polynomial calculation and Chien search 59
................................... 4.4.4 Error value generation 61
............ 4.4.4.1 Low latency power-sum circuit in G F ( ~ ~ ) 64
4.4.4.2 Low latency exponential circuit in G F ( ~ ~ ) ............ 66
4.4.4.3 Low latency inversion and division circuits in G F ( ~ ~ ) . . -67
............................................ 4.4.5 Correction 70
................... 4.4.6 Hi&-speed RS decoder design s u m m q - 7 0
............................................. 4.5De.randornization 71
...................... 5 . M M D S SYNCHRONIZATION USING GPS CLOCK - 7 2
.................................... 5.1 The need for synchronization -72
.............. 5.2 GPS clock derivation and application in a MMDS system 76
........................ 5.2.1 GPS clock versus crystal oscillators 76
........................................... 5.2.2 GPS clock -78
5.2.3 Using GPS clock in MMDS transceiver prototype ............. 83
5.3 MMDS system synchronization .................................. - 8 4
................................... 5.3.1 Clock synchronization 85
.................................. 5.3 -2 Frame synchronization -89
6.RESULTS ........................................................... 90
............................................. 6.1 System simulation 91
6.2 Transceiver FPGA implernentation ............................... - 9 4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 GPS dock testing 102
............... 7 . CONCLUSIONS, CONTRIBUTIONS AND FUTURE WORK 104
.................................................. 7.1 Conclusions 104
7.2 Contributions ................................................ 108
7.3 Future work ................................................. 109
REFERENCES ....................................................... 113
APPENDIX A: QAM constellations ....................................... 120
.................................................... A.1 16Q APc/l 121
A.164QAM .................................................... 122
A.1356QAM ................................................... 123
APPENDIX B: Galois Field .............................................. 124
B.l Galois field .................................................. 124
B.2 Construction of GF(2m) ........................................ 125
B.3 Vector representation of GF(23 ................................. 130
APPENDIX C: Algorithm to Find Error Locator Polynomial for RS Decoder . . . 13 1
. C 1 The Berlekamp-Massey algorithm ................................ 132
C.2 The Eulidean algorithm ........................................ 133
LIST OF FIGURES
Figure 1.1 A Spica1 MMDS installation ..................................... - 2
Figure 1 -2 High-speed MMDS system ................................. .. ... - 5
Figure 2.1 Transceiver block diagram ....................................... 17
Figure 2.2 Framing structure of MPEG2-TS. ................................ - 20
Figure 2.3 Conceptua1 diagram of the convolutional interleavedde-interleaver ...... 25
Figure 2.4 Byte to m-tuple conversion for 64 QAM ........................... -26
........... Figure 2.5 Implementation of the differential encoding of the two MSBs 27
Figure 2.6 Detail reference mode1 of the TCM encoder ........................ -28
Figure 3.1 Interface. synchronization and synchronization inversion ............. -37
Figure 3.2 Parallel to serial conversion and Serial to pmdlel conversion . . . . . . . . . . . 37
Figure 3.3 Randornization and De-randomization .............................. 38
Figure 3.4 Reed-Solomon encoder architecture ............................... 40
Figure 3.5 Interleaving 64 QAM ........................................... 43
Figure 3.6 Differential encoding of the two MSB's ............................ 46
Figure 3.7 Mapping and de-mapping of 256 QAM ............................ 47
Figure 4.1 Schematic diagram for the differential decoder ...................... - 5 0
Figure 4.2 De-interleaving for 64 QAM ..................................... 51
Figure 4.3 Reed-Solomon decoder block diagram ............................. 53
Figure 4.4 Syndrome calculation ......................................... - 5 5
Figure 4.5 ~ u l t i ~ l i e r in GF(~') ........................................... -59
Figure 4.6 Chien search algorithm ......................................... 61
Figure 4.7 Schematic diagram for p4 circuit in G F ( ~ ~ ) ......................... - 6 7
vii
............. Figure 4.8 Low latency inversion and division architecture in GF(Z~) - 6 8
............................... Figure 5.1 Mode1 of a communication system - 7 4
.............................. Figure 5.2 Generic GPS receiver block diagram 80
..................................... Figure 5.3 Frequency based GPS clock 81
................................. Figure 5.4 T h e based GPS clock generation 82
......................................... Figure 5.5 GPS dock TTL output - 8 4
.................... Figure 5.6 MMDS system synchronization using GPS clock - 8 7
........................ Figure 5.7 EarlyAate-gate data symbol synchronization -88
.............................................. Figure 6.1 Equipment set up 90
............................. Figure 6.2 Matlab and Simulink simulation set up 92
. . . . . . . . . . . . . . . Figure 6.3 Input and output waveforms of the Simulink simulation 93
.......................................... Figure 6 4 RS encoder waveform 96
.................................. Figure A . 1 Variety of QAM constellations 120
.................................. Figure A.2 16 QAM constellation diagram 121
.................................. Figure A.3 64 QAM constellation diagram 122
................................. Figure A.4 256 QAM constellation diagram 123
LIST OF TABLES
................................ Table 2.1 Conversion of QAM constellations - 3 2
.......... Table 6.1 Prototype resources and operation fiequency of the transceiver 94
.................. Table 6.2 Cornparison of the G F ( ~ ~ ) arithmetic implementation 98
.................. Table 6.3 Cornparison of the GF(2m) inversion implementation 99
.......................... Table 6.4 GF(23 field generating polynomials p(x) 100
........................ Table 6.5 Delay time of the inversion circuit in GF(2") 101
................... Table B . 1 Table of G F ( ~ ~ ) generated b y p(x)=x8+x4+x3+x2+1 127
LIST OF ACRONYMS
............................................. ADC AnalogtoDigitalConverter
................................................. AM .Amplitude Modulation
................................... AS IC .Application Speci fic Integrated Circuit
....................................... AWGN .Additive White Gaussian Noise
BER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B i t E r r o r R a t e
B-M . . . . . . . . . - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B e r l e k a m p - M a s s e y
.................................................... C A Coarse Acquisition
.............................................. CATV Conventional CabIe TV
......................................... DAVIC Digital Audio-Video Council
dB . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - d e c i b e l
.............................................. DBS .Direct Broadcast Satellite
.............................................. DSP .Digital Signal Processing
........................................ EDAC Error Detection And Correction
.................................... FCC Federal Communications Commission
............................................... FEC Forward Error Correction
................................................. FET Field Effect Transistor
FIFO ..................................................... First-inFirst-out
........................................ FPGA Field Programmable Gate Array
GF . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G a l o i s F i e l d
GHz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G i g a H e r t z
.............................................. GPS Global Positioning System
......................................... HDL Hardware Description Language
............................................... HRM .Hi& Reliability Marker
IF . . . . . . . . . , . . . , . . . . . . . . . - . . . . . . . . . . . . . . . . . . - . . . . . . . I n t e r m e d i a t e F r e quency
.................................................... IP htellectual Property
.................................... ITFS Instructional Television Fixed Services
....................................... P E G .Joint Photographic Expert Group
......................................... LFSR .Linear Feedback Shift Register
LSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - - . . . . . . L e a s t S i g n i f i c a n t B i t
................................................... Mbs .Mega bit per second
........................................ hfDS .Multi-point Distribution Services
W z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . . . . . . . M e gaHem
.......................... MMDS .Multi-channe1 Multi-point Distribution SeMces
MPEG . . . . . . . . - . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Mot ionPic tu reExper tGrou p
........................................ MPEG2-TS MPEG2- Transport Stream
.................................................. MSB .Most Significant Bit
MUX.................................-.....-..............-.-Multi plexer
W...,........................+.................-....Non-Retum-to-Zero
PID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . .P rogramIdent i f ica r ion
PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . P h a s e L o c k L o o p
............................................... PON .Passive Optical Network
ppm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p u l s e permillion
PPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . . . . . . . . . . . . . . P u l s e P e r S e c o n d
PN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . - . . . . P s e u d o r a n d o m N o i s e
....................................... PRBS Pseudo Random Binary Sequence
PSK .................-.............-.-....-c-...........Pha seShifiKeying
................. ................... QAM .. -Quadrature Amplitude Modulation
.............................................. RAM .Random Access Memory
RF . . . . . . . . . , . . . . . . . . . . . . . - . . . . . . . - - . - . . . . . . . . . . . . . . . . . . . . R adioFrequency
ROM . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R e a d O n l y M e m o ~
RMS ................................................... RootMeanSquared
RS . - . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . R e e d - S o l o m o n
................................................... SA Selection Availability
.......................................... SDH Synchronous Digital Hierarchy
.................................................. SNR Sigoal to Noise Ratio
......................... ............. SONET .. .Synchronous Optical Network
............................................... TCM Trellis Code Modulation
............................................... USNO .US Naval Obsetvatory
............................................. UTC .Coordinate Universal Time
............................................. VCO .Voltage Control Oscillator
................................ VHDL Yery Large Scale Integrated Circuit HDL
VHF . . . . . . . . . . . . . - , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V e r y H i g h F r e q uency
........................................... VLSI Very Large Scale Integration
CHAPTER 1
INTRODUCTION
1.1 The MMDS technology
High speed wireless data conunnication in generai, and Multi-channel Multi-
point Distribution Services (MNDS) in particular, are becoming increasingly important
as a means of delivering high speed data services. This is due to the prohibitive cost of
wired solutions in many environments. MMDS provides cable television and data
services transmitted through the air via microwave frequencies.
The MMDS uses a band which is close to the Instructional Television Fixed
Service (ITFS) (i.e., ffequency band of 2500-2686 MHz) and has a configuration similar
to ITFS. The antenna is usually ornmidirectional to reach al1 subscribers in a "coverage
circle", hence the name: Multi-point Distribution SeMces (MDS). This s e ~ c e was the
beginning of what is known as "wireless cable". While ITFS was a distance leaming tool
that was especially suited for delivery of education to business and industry, individuals,
organizations and to otha institutions, MDS was conceived as an altemate or suppIement
to conventional cable TV (CAV). The Federal Communications Commission (FCC)
reallocated eight of the lightly used ITFS channels for use by commercial over-the-air
pay-TV operatims. This allocation allowed for sbultaneous broadcast of many more
channels in addition to MDS. The practice of using these new channels became MMDS:
Multi-channel Multi-point Distribution Services.
Using a network consisting of a main insrnitter and multiple repeaters,
microwave signals of the MMDS systems are transmitted to mal1 receiving antennae on
subscribers's rooftops as shown in Figure 1.1. The transmitter site is centered in the
coverage are% on a pre-existing tower or on the top of a tall building. The transmitter
antema is located at the highest feasible elevation for greater effective coverage. A
typical MMDS installation requires that line-of-sight be maintained between the
transrnitter antenna and receiver antenna. The RF signal is then rou td into the home
f?om the antenna through a coaxial cable into a receiver set-top.
Figure 1.1 A typical MMDS installation [ I l
The cable studio, dong with the headend, receives programming fiom a variety of
rces. These sources are fiom the broadcasts of local TV stations, playback of videos
tapes, direct 'live" feeds fkom various locations, multiple satellite dishes receiving TV
signals fiom around the world and digital data 5om 0 t h sources. Each source is
assigned a channel number, processed to improve quality and then encoded. These
pmgrams are up-converted fiom regular VHF channel fkequencies to the RF band and
supplied to the wireless channel transmitter. The transmitter amplifies and broadcasts the
information on microwave channels.
MMDS utilizes either analog or digital technology to d e b e r information to
subscrîbers. Digital signals yield laser disc picture quality as well as CD quality sound for
cable television. Digital technology increases the oumber of channels that can be
transmitted over the existing wireless spectrum to at least six times the current analog
capacity by using data compression techniques. MMDS can also offer internet access and
data services to the customers. Depending on the terrain, a single MMDS transmitter cm
provide services to the surrounding subscribers within 50 kilometers.
Wireless MMDS has many advantages over the wire cable systems: no expensive
time-consuming cable installation, higher signal reliability, higher channe1 capacity,
higher availability, lower costs of operation, lower capital requirements, and market
competitiveness. Given that signals are traflsmitted in the Gigahertz range (S band),
picture reception is not materially affected by weather interference such as rain fade [2].
One distinct advantage of MMDS is speed. MMDS is capable of handling the
accelerating demands of hi@-speed data need and helps aileviate the bandwidth-starved
1 s t mile bottleneck. However, to remain cornpetitive to Direct Broadcast Satellite (DBS),
MMDS has to provide a cheaper set-top box at the subscriber site and higher speed for
data services.
A fkequency band has been allocated for MMDS. In March 1997 the FCC in the
United States licensed thirty channels, each with a 6 MHz bandwidth, in the frequency
range of 2 150-2 162 MHz and 2500-2689.875 MHz. This licensing allowed for more
extensive research, development and marketing for the services provided by MMDS.
However, the early research was mostIy in the area of fast conversion fiom analog to
digital MMDS to accommodate more channels for the customers connected to the
MMDS networks. The data rate is still in the lower range ( les than 60 Mbs for downlink
and 20 Mbs for uplink).
Ushg the designated fiequency, a total of 33 channels are available, each 6 MHz
wide. By utilizing video compression and spectrally efficient digital modulation, the
number of programs delivered in this bandwidth can be dramatically increased [3]. Audio
and video information can be digitized and compressed and many prograrns cari be
muleiplexed Ulto a single MPEG2 transport stream. Forward Error Correction (FEC) is
used to decrease the system bit error rate @ER) and obtain higher data reliability. The
output data stream is modulated ont0 a 44 MHz IF carrier by a 64 or 256 Quadratiire
Amplitude Modulator (QAM). The use of a higher degree of QAM raises the data rate by
increasing the nurnber of bits per symbol. The IF signal is required to be upconverted into
the GHz range before transmission. Mixers and amplifiers with a flat response are
required in the R F stage, as well as, a low phase-noise local oscillator. The best way to
achieve a low phase-noise performance is to multiply up a voltage control oscillator
(VXCO) phase locked to a GPS clock reference [3]. At the trammitter stage, a Ga& FET
power amplifier should be used since these devices have lower distortion at these
fkquencies [3].
The proposed MMDS system is shown in Figure 1.2. A base station broadcasts
signals through the air. Receivers at the temiinals pick up the signals and process them to
recover digital data in different designated channels. The system RF link is in the S band
(GHz range). The microwave fkequency is 2.5 GHz, which is allocated for MMDS use b y
the FCC. The modulation technique used in the system is either 64 or 256 QAM. A
fundamental aspect of the system will be the use of software radio techniques with
wideband sarnpling and multi-channel digital s ynchronization.
\ -- ZOOMbits/s, 64/256QAM, DAVICl.2 Specification) 0
Termuial - 0
Figure 1.2 High-speed MMDS system
The design is based on the Digital Audio-Visual Council (DAVIC) Version 1.2
(1997) specificanon [4]. The system is intended to carry Motion Picture Expert Group-
Transport Stream (MPEGZ-TS) data packets. The designed hardware of the transceiver is
located in the base station and at the receiving terrninals. The transceiver includes both a
trammitter and a receiver CO-existing in the same ASIC. The need for the transmitter at
the receiver is for the two-way communications link to be established if required. Both
uplink and downlink require synchronization in order to ensure the integrïty of the
transmitted data. Synchronization irnplies that al1 system elernents must run at the same
clock rate. If the h4MDS system is connected to global communication networks, its
clock must be synchronized to a standard tkne. The use of the standard time is necessary
because global timing must be maintainecl for the network to communicate within and to
other networks effectively.
1.2 Research objectives and methodology
The main objective of this research is to develop a low cost, reliable, high-speed
MMDS system. The low cost solution will provide subscnbed customers with hi&-speed
data services without installing fiber or cables. This would be especidly attractive in low-
density populated areas. The means to achieve the high-speed MMDS system is based on
the investigation of a sophisticated VLSI transceiver and system synchronization. The
system will provide customers with small set top boxes with a highly integrated receiver.
This research fulfills the need for the implementation of a hi&-speed MMDS system and
has two objectives:
1) To design a high-speed transceiver for the wireless MMDS data link
with an intended bit rate of 200 Mbs. A high level of integration is required to reduce the
cost of the terminais. Of prhnary concem is the potential for a low cost VLSI
implementation of the baseband portion of the transceiver into a small number of ASICs.
2) To design a framework for the use of GPS clock and timing to perform
digital network synchronization for the communications link in the MMDS system.
Presently, the speed of MMDS is 40 Mbs due to technology and design
limitations. Higher speed requires faster operation ffequency of the transceiver building
blocks. Among the blocks, FEC is the major speed limitation due to the low bit rate of
error correction decoders. Therefore, increasing throughput and successful integration of
this FEC block into the system allow for higher system data rate.
One of the major topics of concentration in this research is the development of a
fast FEC block and the integration of this block into the hi&-speed transceiver. This
development requires a theoretical and architectural irnprovement of the abstract algebra
and its application into the design of error correction integrated circuits. New clocking
schemes, low Iatency circuits and better algorithms to perfonn calculations in Galois
fields must be developed and implernented.
Another major topic of concentration this research is the system synchronization.
A hi&-speed system places more stringent demands on the system's synchronization to
provide data integrity. While ensuring the services, a low cost system must be
maintained. Therefore, an inexpensive and reliable synchronization scheme must be
sought and applied to the MMDS system. In addition, the system must have the ability to
connect synchronously to the global communications network.
To develop the low-cost, high-speed, robusî, reliable MMDS s ystern, three main
areas have been studied:
1) Modeling and simulation of the MMDS system,
2) lmplementation of the transceiver using s ynthesizable Hardware Description
Language (HDL) code for various building blocks into FPGA,
3) Developing a synchronization scheme using Synchronous Optical Network
(SONET), Synchronous Digital Hierarchy (SDH) or a GPS clock.
System simulation was carried out to determine hctionality of the building
blocks and the complete system. Basic building blocks of the baseband circuitry were
developed using FPGA prototypes. HDL Verilog code of each block was wntten,
compiled, simulated and fitted into the designated FPGA. In the future, major
components of the system will be integrated into a high density mixed signal ASIC, (i.e.,
system on chip), using the hardware code on the best available technology.
Synchronization was studied to k d a suitable synchronization scheme, in particular, the
use of the GPS clock at the base station and terminais in the absence of SONET and
SDH. Research methodologies and procedures are described as follows:
1.2.1 System modehg and simulation
The simulation tool available is simuluik8 that runs with MATLAB". This is a
tao1 for modeling, simulating and analyzing dynamic systems. System blocks can be built
in a hardware-Iike fashion using gates and flip-flops or using math functions. These math
functions can be written using available MATLAB or C programming Ianguages.
Hardware-like building blocks simulate much slower in ~imulhk@ than using equivalent
math functions. However, this simulation method is preferred because it is comparable to
hardware implernentation since the simulation uses basic logic gates. A cornputer with a
large amount of rnemory (256 MB) is required to simulate these hardware compatible
biocks. The complete system may be too large to put into a single file for simulation if
the processor has limited memory. Therefore, a combination of hardware blocks and
math functions was used to buiid and to nin simulations at the system level.
The system was built using bottom-up approaches, (i.e. hierarchical and reusable
models). Each unit of the system was built and simulated separately to detennine its
fùnctionality. Al1 the units were then connected into a complete system and simulated
with both digital and analog signal processing. Different levels of white Gaussian or
Rayleigh noise were added into the channel to determine S N R , BER, etc. Using
MATLAB scopes and display blocks, results could be viewed even when the simulation
was ninning. Parameters codd be changed and the results might be viewed immediately
to make 'khat-if' exploration easy.
1.2.2 Hardware description language for basic building blocks
Baseband circuitry was developed using FPGA prototypes. Based on the
fimctionality of each circuit in the system, Vedog HDL code was written to irnplement
the block with appropnate input, output and c lochg requirernents. Verilog HDL is one
of the two most common HDLs used by integrated circuit designers, the other is VHDL.
HDL allows the design to be simulated earlier in the design cycle in order to correct
errors or experirnent with different architectures. Synthesizable Verilog code is used as
the input of the synthesis program that will generate a gate-level description (is., a
netlist) of the circuit for simulation or fabrication purposes. The designs described in
HDL are technology independent, easy to design and debug, and are usually more
readable than schematics, particularly for large circuits. The code can be written in either
structural or behavior description. Vedog has many language constructs that can be used
to d e s d e the design at four levels of abstraction: algorithm level, register transfer level,
gate level, and switch level. Verilog constructs that are not synthesizable were not used in
the design. Hierarchical and reusable designs were emphasized when the modules were
built.
To develop the FPGA prototype, the Altera's MaxPlus II@ software was used.
Written HDL Verilog code of each block was imported, compiled, simulated, fitted and
prograwnied into designated FPGAs. Using the simulation and waveform editor, these
biocks were simulated to determine both functional and timing operations. Symbol and
bit rates were deteIrnuid in the prototype u s h g timing simulation and register
performance analysis. Once compiled, block symbols were created. A graphic editor was
used to connect these block symbols into a multi-block system before recompiling,
partitioning, fitting, simulating, and programming into the FPGA devices.
Sime the HDLs are synthesizable and technology independent, an ASIC can be
developed without rewriting the code. SYNOPSIS" is the synthesis software that is used
to synthesize the codes to generate a netlist. CADENCE@ is an Uitegrated circuit design
and layout package. CADENCE imports the netlist generated from SYNOPSIS and
perfoms the necessary steps to layout the circuit for fabrication. Major components of
the MMDS system will be integrated into high density, mixed signal integrated circuits.
Available technologies will allow cost effective and higher speed, higher circuit density,
srnalIer size and lower power consumption devices. Depending on the technology used to
fabricate the IC in the future, system operation fiequency will be much higher than the
current FPGA prototypes. Al1 of the written Verilog code in the thesis is contained in [SI.
1.2.3 Synchronization
In addition to hardware implementation, synchronization was emphasized in this
research to provide a robust systern with data integrity. Synchronization involves the
estimation of both time and fkequency. In a MMDS system, a cornmon clock is used in
both transmitta and receiver. Al1 the building blocks operate at the same fkequency.
Time synchronization is emphasized in the system for data recovery.
There are three levels of tirne-synchronization: syrnbol synchronization, frame
synchronization and network s ynchronization. The fundamental tirne-s ynchronization
process is symbol synchronization. The time to start and end the symbol detection
procedure must be recognized by the demodulator. A timing error degrades the detection
performance of the receiver. The next time-synchronization level, the £kame
synchronization, allows the reconstruction of the received message. Finally, network
synchronization allows the coordination with other users in order to use the
communications resource efficiently.
In network synchronization, every clock cm be traced back to a highly stable
reference supply. Al1 the major teleco~~l~~lunications networks have set up national
synchronization networks in order to distribute a comrnon timing reference to al1 of the
equipment. This timing reference is traceable to a nationai Primary Reference Clock or
PRC. The PRC is a Time-Server clock that maintains Universal Coordinate Time (UTC).
Since MMDS is connected to larger networks for data transmission, a traceable clock to
UTC is required for the network synchronization. The GPS clock was chosen to clock the
MMDS system instead of crystal oscillators due to its performance and UTC traceable
ability.
1.3 Contributions
Some of the contributions provided by this work greatly improve the efficiency of
any wireless co~~~~~iunicat ion system in both design and implementation.
1. The thesis provides a system simulation source code for MMDS system ushg
Matlab and Simulink. The simulation code c m be used to simulate any wireless
communications system provided a few minor modifications on individual blocks
because MMDS system is a standard communication systern.
2. This thesis provides a technology independent synthesizable HDL source code
for communication industries to fabricate a hi&-speed baseband MMDS transceiver.
Technology transfer is available thrùugh TRLabs with al1 the source code [SI.
3. The thesis proposes new algonthms for fast calculation of multiplication and
inversion in Galois fields. High-speed Galois Field arithmetic HDL source code is also
provideci for different degrees of Galois fields [5] .
4. The thesis shows a novel implementation of a convolutional interleaver and
deinterleaver accomplished by using a new clocking scheme [5 ] .
5. The thesis also provides a synthesizable HDL code for a high-speed, low
complexity (255,239) RS decoder core [SI.
6. The thesis proposed a novel application of the Global Positioning System: a
cost effective synchronization scheme for MMDS system. The system uses a single
precision GPS clock at the transrnitter and transfers this standard tune traceable clock to
al1 of the receivers.
1.4 Dissertation outline
This thesis outlines the proposed high-speed MMDS system development in two
sections. The £ïrst section is on the hardware implementation of the transceiver. The RF
circuit, equalizer and matched filter are excluded in this thesis. The excluded circuits are
heing studied at Carlton University in Ottawa, Canada. The second section is on the
integration of GPS clock into the MMDS system for synchronization purposes.
The implementation of a high speed MMDS systern that meets standard
specifications is dealt in the fïrst section. The intended speed of 200 Mbs is much higher
than the DAVIC specified speed of 60 Mbs [4]. Specifications of the building blocks are
used as they are or modified to suit hi&-speed operation. In spite of using DAVIC
specifications, al1 of the synthesizable Verilog HDL code was manually generated
without using any outside source code.
In this research, an extensive effort was dedicated to the development of the high
performance FEC for hi&-speed operation, The design of the convolutional interleaver
and de-interleaver lowers the FEC hardware complexity. The work on the developrnent
of the low latency VLSI architectures in G F ( ~ ~ ) arithmetic makes it possible to design the
hi&-speed RS decoder used in the system. Three chapters are devoted to the first section
on irnplementation. Chapter 2 describes the details of the MMDS system and the DAVIC
standards used to build the hi&-speed transceiver. Hardware implernentation of the
transmitter and the receiver is described in Chapter 3 and Chapter 4 respectivefy.
The second section de& with system synchronization. The DAVIC specification
does not specie the use of GPS clock for system ~~chronizat ion. The proposed GPS
synchronization architecture was developed. This was also the work camed out during
the research and it is TRLabs property. The proposed synchronization scheme takes
advantages of the new development in a wide variety of GPS applications. A precision
GPS clock, which is iraceable to UTC, is used for system synchronization instead of
crystal oscillators. Chapter 5 addresses synchronization issues and the integration of a
GPS clock into the hi&-speed MMDS system.
Chapter 6 presents the results achieved in the system simulation, the hardware
irnplementation and the GPS clock synchronization. Findly, Chapter 7 draws sorne
conclusions reached in the research and presents the areas where future research msy be
conducted.
CHAPTER 2
DAVIC VERSION 1.2 SPECIFICATIONS AND MMDS SYSTEM
The hi& speed MMDS systern was developed based on the standards and
spe~ifications set b y Digital Audio-Visual Council (DAVIC) Version 1.2. In this chapter,
the specifications fiom this standard body are introduced and the building blocks of the
baseiband transceiver are described.
2.1 Digital Audio-Visual Council Version 1.2 specifications
DAVIC is a non-profit association based in Switzerland, with a membership of
over 219 companies f?orn more than 20 coutries. it represents dl sectors of the audio-
visu-al industry: manufactunng (cornputer, consumer electronics and telecommunications
equiipment) and service (broadcasting, te~ecommunications and CATV) as well as a
nuamber of government agencies and research organizations. DAVIC was established in
1994 "with the aim of promoting the success of interactive digital audio-visual
appPications and seMces by promulgating specifications of open interfaces and protocols
that maximize interoperability, not on1 y acro ss geographical boundaries but also across
diverse applications, services and industries" [4].
The purpose of DAVIC is to advance the success of emerging digital audio-visual
appnications and services, initially of the broadcast and interactive type. This is done by
prowiding intemationally-agreed specifications of open interfaces and protocols that
maxirnize interoperability across countries and applications for services.
The goals of DAVIC are to identik select, augment, develop and obtain the
endorsement by fomal standard bodies for specifications of interfaces, protocols and
architectures of digital audio-visual applications and services. The DAVIC 1.2
specification has been developed by participating DAVIC members. Submissions for this
specification came f h m both members and non-members in response to "Calls For
Proposals" which were issued ui Mar& 1996. The DAVIC 1.2 specification is a super-set
of DAVXC 1.1 and was released in 1 997. This specification includes 13 parts. Part 1 and
2 describe the DAVIC fiinctionalities and system reference models as well as scenarïos in
which it is to operate. Part 3 and 4 provide specifications for the service provider system
architecture and delivery system architecture interfaces. Service consumer system
architecture and high level application program interface is described in Part 5. Part 6 is
reserved for k t w e use. Part 7 describes high and mid-layer protocols. Part 8 provides the
lower-layer protocols and physical interfaces specifications. The remahder of the
specification describes the information representation, the basic security, the usage
information protocols, the reference points and the interfaces and dynarnics and the
conformance and interoperability.
The high speed MMDS transceiver was built, based on the Part 8 specifications.
One section in this part, the physical layer interface, describes the complete physical
layer structure. This structure includes fiarning, charnel coding and modulation for the
carriage of content-information flow fiom a source to a destination through MPEGZ-TS.
This physical layer interface supports unidirectional transmission over radio fiequencies
up to 10 GHz. This is referred to as QAM-link on MMDS. QAM is specified due to its
performance in spectral efficiency. Three Ievels of modulation, 16 QAM, 64 QAM and
256 QAM, are de£ined to allow for flexible implanenîation of the MMDS system.
2.2 MMDS system
The system block diagram of the baseband transceiver is depicted in Figue 2.1.
To ensure a Bit-Error-Rate (BER) of less than 10-I2 at the receiver end, FEC is employed
using a Reed-Solomon ( R S ) codec and Trellis Code Modulation (TCM). Protection
against burst errors is achieved through the use of byte interleaving. A differentid
cracodeddecoder on the two most significant bits of each symbol is used to provide a
rotation-invariant on QAM constellations. Randomization is also being used for spectnim
shaping and synchronization purposes. End-to-end network synchronization is perforrned
using a common clock and a Fame synchronization technique.
Brief function descriptions of the various blocks in the system are as follows:
Baseband interfacing and synchronization: This block adapts the data structure to
the format of the signal source. The fiarning structure will be in accordance with
MPEG2-TS (including synchronization bytes).
Svnc I inversion and randornization: This unit inverts the synchronization 1 byte
according to the MPEG2 fiaming structure, and randomizes the data stream for spectrum
shaping purposes.
Reed-Solomon (RS) coder: This block applies a shortened RS code to each
randomized transport packet to generate an error-protected packet. This code is also
applied to the synchronization byte itself.
ConvoZutional interleavec This block perforns a convolutional interleaving of
the error-protected packets with I=12, M=17 (for 16 and 64 QAM) and I=204, M=l (for
256 QAM). The periodicity of the synchronization bytes remains unchanged.
Bvte to m-tuple conversion: This block performs a conversion of the bytes
generated by the interleaver into QAM symbols.
Differential encodim: In order to get a rotation-invariant constellation, this block
applies a differential encoding of the two MSBS of each symbol.
Trdlis Code Modulation (TCMI encoder/decoder: When used, the TCM replaces
the 'Byte to m-tuple conversion' and 'Differential encoding' blocks. The TCM's purpose
is to convolutionally encode the bits into the modulation and perform the differential
encoding. When it is not used, it will be bypassed.
Baseband shapina: This block performs mapping fiom differentidly encoded m-
tuples to I and Q signals and a square-root raised cosine filtering of the 1 and Q signals
prior to QAM modulation.
OAM modulation and phvsical interface: This block performs QAM modulation.
It is followed by interfacing the QAM modulated signal to the Radio Frequency (RF)
MMDS channel.
MMDS receivec A system receiver perfonns the inverse signal processing, as
described for the modulation process above, in order to recover the baseband signal.
The complete 'block descriptions are as follow:
2.2.1 Baseband interface, Synchronization and Randomization
This unit adapts the data structure to the format of the signal source. Data coming
in to the transmitter will be fonned into fiames. The fkaming structure is in accordance
with the Motion Picture Expert Group (Le., MPEG2).
MPEG2 is a video compression standard developed in the mid 90s for digital
television. MPEG2 is based on the discrete cosine traasform and is an evolutionary
extension of earlier video compression standards m.26 1, REG, and MPEGI). Audio and
video information c m be digitized, cornpressed, and many programs can be multiplexed
into a single MPEG2-TS.
The MPEG2-TS is defïned by the International Organization for Standardization
(ISO) [7]. It is compnsed of a packet having 188 bytes with one byte for synchronization
purposes, three bytes of header containing service identification, scrarnbling and control
information, followed by 184 bytes of MPEG2 or auxiliary data. The systern fiarning
structure is shown in Figure 2.2.
The total packet lm& of the MPEG2-TS packet is 188 bytes as show in Figure
2.2a. This includes 1 synchronization byte (Le., 0x47). The processing order at the
transmitting side always starts fkom the MSB of the synchronization byte (Le., the MSB
of 0 1000 1 1 1). In order to ensure adequate binary transitions for clock recovery, the data
at the output of the MPEG2-TS multiplexer is randomized in accordance with the
configuration depicted in Figure 2.2~. Randomization is a process of rernoving auto-
correlation fiom a signal, i-e. white noise spectnun shaping at the transmitter side to ease
symbols or bit timing recovery at the receiver side. The randornization process uses a
generator polynomial to generate a random sequence and then applies the sequence to a
data stream.
a) MPEG2- Transport Stream MUX Packet
Sync byte 010001 1 1
i Pseudo Random Binary Sequence period = 1503 bytes i
187 bytes
b) Randomized TS packets: Sync bytes and Randomized Sequence R
,-
initialization sequence
EnabIe Clear/randomized data output data input
// ' //
Figure 2.2 Framing structure of MPEG2-TS
D I
,,
The polynomial used for the Pseudo Random Binary Sequence (PRBS) generator
in this process is 1 + xL4 + XI'. Laading the sequence ' 1001010 L0000000' into the PRBS
// -
registers, as indicatd in Figure 2 . 2 ~ ~ will initiate the start of every 8 transport packets. To
R
I I I I
R 187 byte% 1,
provide an initialization signal for the derandomizer at the receiver, the MPEG2
'ync8 I D
'yncZ Spc '
synchronization byte of the fïrst transport packet in a group of eight packets is bit-wise
187 bytes R
187 byte4 -
inverted fkom 0x47 to OxB8 (i-e., 1 0 1 1 1 000).
,, 'ync1 R
The lïrst bit at the output of the PRBS generator will be applied to the first bit of
187 bytes
the first byte following the inverted MPEG2 synchronization byte (i.e., OxB8). To aid
other synchronization functions, during the h4PEG2 synchronization bytes of the
subsequent 7 transport packets, the PRBS generation continues, but its output is disabled,
leaving these bytes unrandornized. The period of the PRBS sequence wiU be 1503 bytes
a s shown in Figure 2.2b. The randomization process is active also when the modulator
input bit-stream is non-existent, or when it is non-cornpliant with the MPEG2-TS format
(Le., 1 synchronization byte + 187 packet bytes). This is to avoid the emission of an
unmodulated carier from the modulator [43. After fiame randomization, the packets are
then sent to the FEC block that includes a RS encoder and a convolutional interleaver.
2.2.2 Reed-Solomon codec
To achieve the appropriate level of error protection required for MMDS
transmission of digital data (i-e., BER=~o-'~), a FEC based on Reed-Solomon (RS) [7]
coding is used. FEC means that a digital systern c m detect and reconstruct an erroneous
transmitted message at the receiver, without requesting a retransmission. In this type of
Error Detection and Correction (EDAC) strategy, the FEC system accomplishes this by
analyzingthe redundant data transmitted dong with the message. One of the means to
obtain the required system BER is the utilization of RS codec coding. The RS error
correction codes have an extremely pronounced effect on the efficiency of digital
communication channels [8]. For example, the (255,239) RS codes achieves a BER of
1 0-l2 fiom an uncoded BER of 1 O" [9,1 O]. in the developrnent of the high-speed MMDS
system, the RS codes are chosen over the other codes because the basic unit of
information of these codes is symbol based (i.e., the codes are b a s 4 on byte wide
symbolsj. The MMDS system uses 8-bit symbols in the MPEG2-TS which fit into the
category of RS codes. An encoder and decoder are used to realize RS codes in the
system.
The use of RS codes is weU suited for correcthg of both random and burst mors
caused by the transmitting channel over the air such as MMDS system [Il , 121. The codec
(codeddecoder) can be ùnplementcd using either software or hardware. So far, the
software implementation has failed to increase the operating fiequency of the codec
which in tum results in a higher systern data rate. Hardware implementation, such as in
an ASIC, has the highest throughput. The theoretical architecture and implementation of
the RS codec were the subject of numbers of research papas [12- 141 and also one of the
subjects of this research. Today, RS codes remain among the most efficient codes that
c m be implemented using state-of-the-art software and hardware technology.
Each MPEG2 transport data packets consists of 188 bytes (Le., 8 bits), the closest
code that c m be used is the (255,239) RS code. This specific code has an ability to
correct up to any 8-symbol errors in a codeword of 255 symbols. Each 8-bit syrnbol is an
element of the 256 elernents in the Galois Field ( ~ ~ ( 2 ~ 1 ) . A GF or fhite field is a finite
set of elernents in which one can do arithmetic functions without leaving the set. The
field is generated by a primitive field polynomial [l 11. Appendix B provides a complete
description of the Galois Field-
The primitive field polynomial p(x)= x8 + x4 + x3 + x2 + 1 is used to generate the
finite field for the (255,239) RS codes. The RS codec also uses a code generator
polynomial to generate codes in the encoder and to find and correct errors in the decoder.
The (255,239) RS codes use the following polynomial to generate the codes in ~ ~ ( 2 ~ 1 :
g(x)= (x+u)(x+a2)(x+a3) . . . . (x+a15),
or in term of polynomial coefficients:
15 g(x)=go+glx+g2x2+ g3x3+ - +g*sx ,
and in terms of the primitive elernent, a, in the nnite field G F ( ~ ~ ) is:
2 0 8 2 1953 g ( ~ ) = t C X ~ ~ ~ X t a x + a x + aI8'x4 + al"xS+ a 2 0 1 ~ 6 + a i 0 0 ~ 7 + a11X8 +
83 9 167 10 107 1 1 113 12 Il0 13 106 14 a x + a x +a x +a x + a x +a x +a121~15+~16 .
RS codes are linear block codes and belong to a group called systematic codes.
Such codes leave data unchanged and append parity symbols to the data stream. These
parity symbols are generated by encoding the data stream using a code generator.
The MPEG2-TS is 188 symbols in length and is encoded using the (255,239) RS
code. The code must be shortened to become a (204,188) RS code. The shortened RS
code is implemented by appending 51 symbols, al1 set to zero, before the 188 MPEG2
symbols at the input of the (255,239) encoder. After the coding procedure, these zero
symbols are discarded Ieaving 204 symbols to be transniitted. At the decoder, 51 bytes of
symbols are then inserted before the decoding process begins.
To increase the efficacy of the RS code against burst errors of the transmitting
channel, the RS codec is followed by a convolutional interleaver.
2.2.3 Convolutional interleaverlde-interleaver
The errors caused by noise sometimes are not random bits but a long series of bits
that affects many symbols or a large group of short errors. This results in more error
symbols than can be corrected in a single block (Le., beyond the correcting capacity of
the codes). One way tu overcome this burst error problem is to add interleaving.
InterIeaving data in the system enhances the random-error correcting capabilities of a
code to the point that it can dso become useful in a burst-noise environment. The overall
effect of interleaving is to spread out the long burst errors so that they appear to the
decoder as independent randorn errors or shorter more manageable burst errors.
Interleaving in error correction coding has the benefit of increasing system
robustness by making the system more immune to bursty errors, typicd in over the air
transmission. This interleaving function is essential for transport channels that require a
low BER. It improves the efficiency of RS encodeddecoder by spreading burst mors
across several codewords [ 151. In the MMDS system, a block of 204 symbols is entkely
corrected by the RS decoder (Le., less than or equal to 8-syrnboi mors) or not at al1 (i-e.,
more than 8-symbol errors). By adding interleaving, the burst errors of more than 8
symbols will spread out to many codewords.
The basic operationi of an interleaving subsystem is to remange the encoded data
over a span of several block lengths. The amount of error protection, based on the length
of burst encountered on t h e channel, determines the span length of the interleavers The
interleaver must be given the details of the data arrangement so that the data stream can
be de-interleaved before it 3s decoded.
The interleaver is wmposed of 1 branches, cyclicdly comected to the input byte-
stream by an input switch on the lefi of Figure 2.3. The number of branches, 1, depends
on the order of QAM modulation used, I=12 (for 16 and 64 QAM) and I=204 (for 256
QAM). Each branch will B>e a First in First Out (FIFO) shifi register, with a depth of M
cells, where M=N/I and N=204th- e enor protected fiame length (Le., 188 bytes MPEG2
and 16 parity bytes fkom tne RS encoder). The cells of the FIFO will contain 1 byte, and
the input and output switches will be synchronized.
Index Sync. word route Index
O* Sync. word route * O c -IMIMIMI .;; I M + O
1 4 - 1 2- MI M I A
* 2 N
i3- O 3 N
Switch one byte per position
I= 12 for 16 and 64 QAM 1=204 for 256 QAM
M= 2 M A = M-stage FIFO shifl regina
Figure 2.3 Conceptual diagram of the convoiutional interleaver/de-interleaver
The interleaved fiame will be composed of overlapping error protected packets
(from the RS encoder) and will be delirnited by MPEG2 synchronization bytes
breserving the periodicity of 204 bytes). For synchronization purposes, the
synchronization bytes (0x47) and the inverted synchronization bytes (OxB8) are always
routed into the branch "O" of the interleaver (corresponding to a nul1 delay, see Figure
The de-interleaver is sirnilar Iri principle to the interleaver, but the branch indexes
are revased (Le. branch "O" corresponds to the largest delay). The de-interleaver
synchronization is achieved by routing the first recognized synchronization byte into
branch "O". Figure 2.3 depicts the convoIutional interleaver md de-interleaver to be used
in the MMDS systern depending on the QAM leveis (i-e., the numbers of branches 1 and
the numbers of ceIl depth M). The total delay for the operation is equal to the frame
length N, which is 204 bytes in length.
2.2.4 Byte to symbol mapping
After convolutional interleaving, an exact mapping of bytes into symbols is
performed. The mapping is relied on the use of byte boundaries in the modulation system.
This mapping in the MMDS system is necessary when the 64 QAM is used. The mapping
converts 8-bit bytes into 6-byte symbols before QAM modulation. In this case, the MSB
of the symbol Z is taken fiom the MSB of byte V as shown in Figure 2.4 for 64 QAM.
Fmm interkaver b7 b6 b5 W b3 b2l b i W ( b7 b6 bS b4/ b3 b2 b l bO 1 b7 b6 bS M b3 b2 b 1 W output (bytes) I 1
I
Notes: bO is understood as being the has t Significant Bit (LSB) of each byte or m-tuple. in this conversion, each byte resdts in more than one m-tuple, labeled Z, Zi- L , etc, with Z being transmitted before Z+ 1.
Figure 2.4 Byte to m-tuple conversion for 64 QAM
Correspondingly, the next significant bit of the symbol is taken fiom the next
significant bit of the byte. For the case of 2" QAM modulation, the process will map k
bytes into n symbols, such that 8 h - n . The process in Figure 2.4 illustrates the
conversion for 64 QAM system where m-6, k=3 and n=4.
2.2.5 Differential encoder/decoder
Differential coding provides protection against 1 80' phase ambiguity in the
charnel. This feature is essentid when QAM is used in the transmission since the rotation
of the symbol determines its position in the QAM constellation. The differential coding
obtains a 7d2 rotation invariant in the QAM constellation. The DAVIC 1.2 specification
requires that the two most significant bits of each symbol be coded differentially before
symbol mapping is done as shown in Figure 2.5.
8 I D
h m I
convolutional interieaver
q bits (ba-l -.. bo) . , I ,
q=2 for 16 QAM, p= for 64 QAM, q=6 for 256 QAM
Figure 2.5 Implementation of the differential encoduig of the two MSBs
The differential encoding of the two MSBs is given by the followi~g Boolean
expressions:
Qk =(A,, @Bk)*(Bk @ Q , d + ( A k @B&(Ak @ I k - l )
where k indicates present state and (k- 1) indicates previous state.
2.2.6 TCM encoderfdecoder
Trellis-coded modulation (TCM) is a combined modulation and coding technique
for band-Iimited channels with which coding gains relative to uncoded modulation are
achieved without bandwidth expansion [ 16,171. With simple TCM schernes, coding gains
of 3 dB are obtained easily and gains of up to 6dB can be achieved using more cornplex
schemes [ 161. This gain cornes Tom the efficiency of TCM codes [ 1 11. TCM differs f h m
traditional error-control coding in the way that the coding is used to make transmission
errors less likely rather than detectkorrect mors because the TCM codes follow a pre-
deterrnined trellis diagram. The TCM scherne in MMDS does not rnodi* the shape of the
constellation nor the spectnim shape, but ody provides additional coding gain that will
increase coverage area of the transmitter [4]. The trade off of TCM in the coding gains
without sacrificing bandwidth is at the expense of a greater receiver complexity [18].
The TCM coding scheme consists of a combination of a differential encoder, a
convoluîion encoder and a mapper to the signal QAM constellation. This process takes
place in the MMDS transceiver of Figure 2.1 a k the RS encoder and convolutional
interleaver. Figure 2.6 shows the detail reference mode1 of the TCM encoder.
Differential Encoder Convolutional Encoder Mapping
Figure 2.6 Detail reference mode1 of the TCM encoder
The differential encoder has the same function as the one described in Section
2.2.5. The convolutional encoder is a 16-state with a rate of 213 encoder (Le., one parity
bit is added into two data bits). The encoder implementation is a 3-bit shift register
interconnected by XOR and AND logic.
The operation of the convolutional encoder is as follow. Output of the three-
encoder memory bits in Figure 2.6 is called the delay state and the set of output bits is
known as the path state. The size of the encoder memory is referred to as its constraint
length. The path taken by the coded data follows a trellis structure [ 1 1,191. The particular
path chosen at a time interval depends on the curent path state of the encoder. The three-
bit path state is part of the output of the encoder. The other input bits are not encoded and
are passed to the output directly fkom the input stage. The output syrnbol is Iarga than
the input symbol since it contains error-correction information in addition to the message
data. The output bits are mapped into a QAM constellation and modulated for
transmission.
In the receiver, a maximum-Iikelihood Viterbi decodhg algorithm is used to
estimate and reconstnict the original transmitted data [I l ,20-241. The Viterbi algorithm
rnaximizes the correlation between the received vector and a table of possible codewords
while sequentially performiog the opposite operation of the encoder [25]. It makes use of
the past history and reliability information to decode incorning data. A necessary
ingredient of the decision decoder is a suitable distance (or cost) fiuiction. The decoder
keeps track of d l the possible states d I it decides which one to select. The actual
decision is delayed until sufficient information is available. The length of the past history
analyzed by the decoder (Le., tracking length) is one of the key factors af5ecting the
performance of the Viterbi decoder. In general, the tracking length should be four or five
times the encoder constraint Iength. Any fkther increase in tracking length provides only
a small increase in performance [ 1 1,19,26].
The complexity and the delay of the Viterbi decoder depend on the number of
states and the tracking length of the codes [25,27,28]. The tracking length of this Viterbi
TCM decoder is 32. In the implementation, the TCM block has been excluded due the
complexity of the Viterbi decoder and is reserved for future study to increase system
coverage area by investigating the trade off between the receiver complexity and coding
gains.
2.2.7 Quadrature amplitude modulation
QAM is used as a means of encoding digital information over communication
links. It is a f o m of digital modulation where the digital information is contained in both
the amplitude and phase of the ttansmitted carrier [29]. Therefore, this method is a
combination of amplitude and phase modulation techniques. QAM is an extension of
multiphase Phase Shift Keying (PSK) which is a type of phase modulation. The primary
difference between the two is the lack of a constant envelope in QAM versus the
presence of a constant envelope in PSK techniques. In general, the use of QAM is to
serve the bandwidth conservation fünction since information signals are sent in the same
bandwidth [30]. The QAM technique is used as a result of its performance with respect to
spectral efficiency [ 173.
QAM is closely related to the original non-rem-to-zero (NRZ) baseband
transmission. Al1 QAM versions can be fomed by generating two multilevel pulse
sequences fiom the initial NRZ sequence, and applying these to the two carriers that are
offset by a phase shifi of 90 degrees. Each modulated carrier then yields an AM signal
with suppressed carrier. Since carrier multiplication in the tirne domain corresponds to a
shifi in the frequency domain, the modulated spectrum maintains the shape of the two-
sided baseband signal spectrum.
The s p e c m of a QAM system is determined by the spectnun of the baseband
signals applied to the quadrature channels. Since these signals have the same basic
structure as the baseband PSK signals, QAM spectrum shapes are identical to PSK
spectrum shapes with equal nurnbers of signal points. Even though the spectnun shapes
are identical, the m o r performances of the two systems are quite different. With large
numbers of signal points, the error performance of QAM systerns outperfoms PSK
counterparts C29-3 11. The basic reason is that the distance between signal points in a PSK
system is smaller than the distance between points in a comparable QAM system [29].
QAM can have any number o f discrete digital levels. Cornmon levels are 4 QAM,
16 QAM, 64 QAM and 256 QAM. It is bas& on amplitude modulation of "quadrature"
carriers, 90 degrees out of phase with each other. For the DAVIC 1.2 specifications, 16
Qhl, 64 QAM and 256 QAM levels are defhed. Two grades of QAM level are
specified for MMDS in DAVIC 1.2:
1 Grade 1 OAM Level 1
16 and 64 16 and 64 and 256
A QAM modulator (&ansrnitter) will support at least one of the QAM levels: 16,
64 or 256. A QAM demodulator (receiver) will support A or B grade of QAM level. The
modulation of the MMDS system will be QAM with 16, 64 or 256 points in the
constellation diagram. The 16 QAM uses +bit symbols, the 64 QAM uses 6-bit symbols
and the 256 QAM uses %bit symb-ols in the mapping and the modulation process. The
MPEG2-TS uses 8-bit symbols, i f 64 QAM is to be used, the conversion from 8-bit
symbols to 6-bit symbols is required as shown in Section 2.2.4.
2.2.8 QAM constellation niapping
Prior to modulation, symbols are mapped into positions depending on their
values. These position maps are called QAM constellations. The system constellation
diagrams for 16 QAM, 64 QAM and 256 QAM are defined and shown in Figure A.2,
Figure A.3 and Figure A.4 respecfively (see Appendix A). As shown in the diagrams, the
constellation points in Quadrant I are convertecl to Quadrants 2, 3 and 4 by changinp the
two MSB's (Le., Ik and Qk in Figure 2.5) and by rotating the q LSB's according to the
following rule given in Table 2-1 below.
Table 2.1 Conversion of constellation of quadrant 1 to other quadrants of the constellation diagrams given in Figure A.2, Figure A.3 and Figure A.4 in Appendix A.
1 Quadrant 1 MSB 1 LBSs rotation 1
2.2.9 Baseband fdtering
This filtering is required for baseband shaping purposes. The baseband shaping
controls the shape of the pulses used to transmit the sarnple values [30]. This pulse shape
detennines the degree of intersymbol interference during transmission. The ideal shape,
which has no cross talk between the syrnbols, cm not be achieved in the filtering process
(i.e., the ideal filter can not be built in the real world). The desirable compromise for the
ideal shape is the raised-cosine characteristic since it is possible to build a raised-cosine
filter but not an ideal filter. The roll-off of the pulse shape off the filter depends on a roll-
off factor. The filters using a high roll-off factor yields better approximation to the ideal
shape.
In the MMDS system, the 1 and Q signais fiom the QAM mapping are square-root
raised-cosine filtered before modulation. The square-root raised-cosine has the square-
root of the raised-cosine characteristics. The roll-off factor of the filter, a, is 0.13 or 0.15
depending on the channel bandwidth (Le., 6 MHz or 8 MHz). The square-root raised-
cosine filter has a theoretical fimction d e k e d by the following expression:
1 Rs where fN = - - - - is the Nyquist fiequency . 2Ts 2
The impulse response of the transmitter filter characteristic is given in the
following section. The time-domain response of a square-root raised-cosine pulse with
excess bandwidth parameter a is given by:
where T is the symboI period.
The output signal is defined as
where T is the syrnbol period (T=l/f,), and % is the moduiator's carrier fiequency. The
values of 1. and Qn are as follow: For 16 QAM, 1, and Q, are equal to +1 or 13,
independent of each other. For 64 QAM, In and Q. are equal to *l or é 3 or *5 or *7,
independent of each other. For 256 QAM, In and Q. are equal to hl or k3 or A5 or *7 or
*9 or * i 1 or hl 3 or A 15, independent of each other.
The convolution of the transmitter filter impulse response with itself d l have
intersymbol interference of less than -40dB (RMS) [4].
2.2-10 Radio frequency interface
Afer filtering, the digital 1 and Q signals are converted to analog and modulated
in quadrature onto a 44 MHz IF carrier according to Equation:
s(t) = 1-cos(o-t) + Q.sin(w-t). (2-5)
The IF signal is then transported and fed to the MMDS main transrnitter. The
transmitter upconverts a 44 MHz signal to a MMDS channel in the GHz range by mixing
the signal with a local oscillator. The mixer output is then amplified to a power levei
rangùig fiom 15 to 100 watts of average power before sending to the antenna for
transmission. At the received end, the signal is downconverted to a 44 MHz IF and
demodulated by a QAM demodulator.
The use of QAM puts very stringent specifications on the MMDS transmitters for
local oscillator phase noise and amplifier linearity. The higher the degree of QAM to be
used in the system, the better performance demanded fiom the LO and the amplifier is
required [3].
2.2.1 1 System synchronization
Figure 2.1 indicates that al1 of the transceiver building blocks must operate using
a common clock and must be synchronized within the system. The reason for using the
same clock and system synchronization becomes clear by examing the clock&sync
generation in the trammitter and the clock&sync regeneration in the receiver in Figure
2.1. This also implies that the system must be synchronized in both fiequency and time.
In order to achieve system synchronization, there are measures to implement the
procas. These uiclude clock synchronization and m e synchronization within the
system, The clock synchronization requirement is much more stringent since the
transmitter would be connecteci to the ~ t n i c t u r e networks .-. Tbese networks have the
ability to use the standard clocks of SONET or SDH. These clocks adhere to a global
standard tirne called UTC. In order for the MMDS system to synchronize properly to the
global network, the clock must have the time that can be traced to a universal standard
time.
The DAVIC 1.2 specifies that, with the absence of SONET or SDH, a locally
generated dock can be used as a system clock. This d o c k must meet certain
requirements. These requirements are discussed in Section 5-3.1. In addition, the clock
used in the receivers must be re-generated from the transmitter clock in order to be traced
back. This feature is necessary, specifically when the receivers transmit the data back to
the transrnitter through the two-way co~~l~~lUIUlcations.
In general, the synchronization cornpliance of the high-speed MMDS system
ensures data integrity of the communication Iink throughout the hierarchy. The
randomization process is to ensure binary transitions for cloc>k recovery at the receiver.
The insertion of synchronization byte in every 188 bytes and the inverse of the
synchronization byte in every eight MPEG2-TS packets are frame synchronization. The
MMDS system also utilizes another tool to provide i m p r o d packet synchronization
robustness, the High Reliability Marker (HRM). The HRM is accomplished by the format
of the HRM as a field carried in the normai payload area of a standard MPEG2-TS null
packet. The HRM packet is inserted into the MPEG2-TS pnor to the fiarning operation of
randomization and interleaving.
CHAPTER 3
MMDS TRANSMITTER IMPLEMENTATION
The transmitter blocks, shown in Figure 2.1, were implemented using FPGA
prototypes with the intention of development into an ASIC. Modules in Verïlog code
were used to describe the blocks. These modules were designed separately and were
comected into the system witb appropriate clocking. There are five main modules to be
implemented in the transmitter: the baseband interface, the Reed-Solomon encoder, the
Convolutional Interleaver, the m-tuple conversion and differential encoder, and the QAM
mapping. This Chapter describes these modules in detail. nie trellis code modulation
encoder is omitted as stated in Chapter 2.
3.1 Baseband interface, Synchronization byte inversion and Randomization
There are four modules to implement these three h c t i o n s into VLSI. The first
module generates a synchronization byte, inverts a synchronization byte and converts 187
bytes of MPEG to MPEG2-TS packets. As shown in Figure 2.2, the inverted
synchronization byte is at the beginning of every 8 packets and the other 7
synchronization bytes are in the subsequent packets. These synchronization bytes are
added to the MPEG 187 data bytes to form 188 byte packets before being randomized. A
sirnplified schematic diagram for this module is shown in Figure 3.1.
data [7:0]
8
counter ..
clk stqack D O +
Figure 3.1 Interface, synchronization and synchronization inversion
The second module converts the MPEG2 packets into a serial data bit stream to be
randornized. This is a simple parallel to serial conversion where the input is an 8-bit
symbol and the output is a serial bit stream. It consists of a 3-bit counter to control an 8-
to- l multiplexer. The diagram for this module is shown kt Figure 3.2a.
Parallel data
Serial Serial Parallel data
clock counter
a) Paraiiel to serial conversion b) Senai to parailel conversion
Figure 3.2 Parallel to senal conversion and Serial to paraIlel conversion
The third module randomizes the data strearn using PRBS as described in Section
2.2.1. The initialization sequence is generated in the modules at the beginning of every 8
packets. Generating and starting the sequence is initialized by the inverse synchronization
bytes. This signal is derived fiom the f is t module where the inverted synchronization is
generated. The schematic diagram for this module is shown in Figure 3.3.
Figure 3.3 Randornization and De-randornization
The last module is a simple serial to parallel conversion. Its fûnction is the
inversion of the second module. The input is serially randomized data and the output is an
8-bit randomized symbol ready to be sent to the Reed-Solomon encoder. There are two
clocks to operate the modules. One clock is 8 times faster than the other for the serial to
parallel conversion. Figure 3.2b depicts the diagram for the serial to parallel module.
3.2 Reed-Solomon encoder
Reed-Solomon codes are based on a special area of mathematics known as Galois
Field or £bite field. The RS encoder needs to cary out the arithmetic operations (i.e.,
add, subtract, rndtiplication) in this field to perform its fimction. These arithmetic
operations require special hardware functions to implernent. The addition and subtraction
operations are similar and use only bit-wise XOR gates while the multiplication and
division operations are more complex and require the use of fast calculation algorithms.
The implementation of these two operations is describecl in detail in Section 4.4.4.
Appendix B provides an explanation of GF arithmetic.
In this (255,239) RS codec, each codeword consists of 239 message symbols and
16 redundant check symbols. The encoder generates 16 parity symbols fiom the 239
message symbols it receives. The parity symbols are then appended to the end of the
message to generate a 255-symbol codeword, c(x). All valid codewords are exactly
divisible by the generator polynomial, g(x). In systematic form, the 16 parity check
symbols are the remainder, r(x), resdting fiom dividing the message polynomial, a(x), b y
the generator polynomial, g(x) [Il]. In general, the systernatical RS encoder performs the
fol10 wing :
x16a(x) = q(x)g(x) + r(x) (3.1)
or: c(x) = r(x) +x16 a(x), (3 -2)
where q(x) is a quotient and g(x) is the code generator. The generator polynomial
is repeated here for convenience:
195 3 181 4 g(x) = a'36+a240x+a208x2+a x + a x +a158x5+a201x6+a'00x7+~ l X8+a83X9+
167 10 107 11 113 12 a x +a x +a x +a110~13+ a10uX14+a121x15+x16. (3-3)
In a hardware impIementation, the polynomial division to fmd the remahder, r(x),
is accomplished using a 16-stage Linear Feedback Shifk Register (LFSR) depicted in
Figure 3.4. After 239 symbols are passed through the RS encoder, 26 parity check
symboIs are generated and sent out to form a codeword. As shown in Figure 3.4, between
any two consecutive shift registers, there is an &bit XOR to perform finite field G F ( ~ ~ )
addition. The feedback path, containhg the quotient Eom each division step, is broadcast
to 16 constant finite field multipliers.
@ Multiplier: multiplies 2 elements h m GF(29
[7 Storage device: stores field element h m GF(29
b .. - 8
- Gate
Figure 3.4 Reed-Solomon encoder architecture
The first 239 bytes of the output of the encoder are the same as the message input.
+
As soon as the message enters the circuit, the parity-check symbols are in the registers.
255 counter
The gate turns off when message ends and the parity symbols are emptied from the
registers to the codeword. The cycle repeats when a new message block, a(x), enters the
6
encoder.
v i8
ig14 f "" * k o Q - g l
-.8 '-8 , I
I
I
+@*- - - +@ +E4-a-- , I
Parity I Code r(x) j word -
clk 8 Message a(x)
@ Addm adds 2 elernenîs Erom GF(2*) 8 -
In the design, a combination of XOR gates is used to multiply the generator
polynornial coefficients, gi, to the quotient, q(x), instead of a general multiplier in G F ( ~ ~ )
since these coefficients are known. The code generator coefficients, go to g15, are given in
Equation 3.3. This implementation technique reduces both hardware and latency of the
multiplier. An example of GF(~*) multiplication using XOR gates between a constant and
an element in the field is described in the following. Let
~ ( x ) =ao + a l a + a2a2 + a3a3 ta-- + a7a7
be an arbitrary element in the GF(~') to be multiplied with the coefficient gl=a136, the
multiplication is as follows:
136 a ~ ( x ) = +a2a138 +a<r'39 + * - * +a7a'43.
Replace =(l+a +a2 +a3 +a6), =(a +a2 +a3 +a4+a7), =(1 +a5), . - . ,
=(a2 +a4 +a6) in the above equation. These vector representations, ai, can be found
in Table B. 1. These elements of this G F ( ~ ~ ) are generated using the field generator given
in Section 2.2.2, p(~)=x8+x4+~3+~2+i.
Combining the element ai in ternis of ai, the multiplication of a136~(x ) becomes:
a '36~(x)= (~î@a~@a~) f (a&ai@a3@~)a + (ao@a~&@a@a7)a2 + (a&a@a)a3 +
(al c8a5@a7)a4 + (a&%@a5 + (a&a3@a7)a6 + (al @%)a7.
As shown, the muItipIication of the generator coefficients, g;, to an arbitrary
elernent in GF(~') can be implernented using a combination of XOR gates. This unique
procedure to implement the multiplication is rnuch simpler and faster than using the
multiplier shown in Section 4.4.4. The implementation of u I ~ ~ A ( x ) using this procedure
saves 88% of hardware compared to the use of G F ( ~ ~ ) multiplier (1 7 gates versus 14 1
gates). This is another reason why the RS encoder implementation is much simpler than
the RS decoder implementation.
In the encoder implementation, addition of the 2 elements in ~ ~ ( 2 3 is also
required. This finite field arithmetic is implemented using a simple bit-wise XOR as
follow:
A(x)CBB(x) = (a@bo) + (ai CBb ,)a + (a2G3b2)a2 + (a@b3)a3 + (a4@b4)a4 +
(asG3b~)a5 + (a&b&x6+(a7@b7)d.
3.3 Convolutional Interleaver
Traditionally, the convolutional interleaver and deinterleaver are implemented
using extemal RAM to store data as shifi registers. The use of RAM can limit the speed
of the system due to off chip operation. One way to increase the speed of the
convolutional interleaver is to have the shift registers on-chip. The on-chip convolutional
interleaver requires a substantial number of flip-flops to implernent the shifi registers. For
example, 8976 flip-flops are required to implement (66x1 7=ll22) shift registers for 8-bit
symbols of a 64 QAM intedeaver with 1=12, M=l7.
The work was carried out in the research to reduce the hardware requirement and
to increase the speed of the on-chip interleaver. By changing the clocking scheme to the
flip-ffops, the amount of hardware can be reduced significantly [32]. The reason is that
the registers are only needed at certain times when data is required to be input or output
to or fiom the FiFO shift-register blocks. There is no memory or shifting data required
between the times of input and output. Timing to dock these shifi registers is derived
fiom data distributor signals which are available at the front end (the distributor) of the
interleaver as shown in Figure 3.5. This unique clocking technique reduces the number of
£lip-flops required by more than 85% for the 64 QAM interleaver. In the implementation,
only 1 OS6 flip-flops were used as compared to 8976. A s m d number of logic gates are
needed to interface these shifl registers. Tne draw back of this implementation is the
routing complexity of the design due to the new clocking architecture. The description of
the convolutional interleaver below is for 64 QAM where I=12, N-204 and M=N/I=17
and it is shown in Figure 3.5.
Index Sync, word route
O * - 0
Distributor . Comubtor
I= 12 for 16 and 64 QAM M= 2O4/l2
17-stage F E 0 shifl register
Figure 3.5 Interleaving 64 QAM
A counter generates a count fiom O to 1 1 to distribute incoming symbols to the 12
branches of the interleaver at the distributor. Starting with the synchronization byte, the
204 bytes of the MPEG2-TS packet are routed through. The k t byte goes through
branch O and has no deIay. The couter advances and the second byte is routed to branch
1. The counter keeps advancing to distribute the incoming data to other branches until it
resets to O after 12 counts. The process is then repeated. Branch 1 of the interleaver has
one 17-stage F E 0 to delay data and the last branch has a total of eleven 17-stage FIF07s.
FIF07s are constnicted using the new clocking scheme to reduce hardware instead
of using 1 7 stage registers. In this implementation, a 17-stage shift register (M=17) uses
only 2 D-type flip-flops. An active high wntrol signal will enable the flip-flop. Data is
clocked into the fïrst register and passed to the second register. Output fiom the second
register is the output of the 17-stage FIFO. This output is enable and clocked out when it
is needed. Timing for these enable signals corne fiom the counter generated at the
distributor. The clock used to clock data is the global clock and therefore no extra clock
is required in this operation. As incoming data contains 8-bit symbols, only 16 D type
flip-flops are used to implement an M=17 stage FIFO shifi register as shown in Figure
3.5.
At the output of the interleaver, a commutator picks up and delivers data 60m the
12 branches of the interleaver to the output. The commutator at the output reverses the
fùnction of the distributor. That is it collects data from the branches to reconstruct the
original MPEG2-TS packet. The counter used in the distributor is also being used to
control the switching of the commutator in order to synchronize the input and output of
the interleaver. By being delayed through the FIFO of the branches, the data is cornbined
in a timing fashion and sent to the byte to m-tuple conversion shown in Figure 2.1.
Verilog code was written to describe the interleaver. The code was compiled, fitted, and
simulated using an Altera FPGA prototype. An operation speed of l6OMbs was obtained
when the interleaver was implanented with the Altera FLEXlOK FPGA. The results are
shown in Table 6.1.
3.4 Byte-to-m tuple conversion and differential encoding
Figure 2.1 shows these two blocks can be bypassed if the K M is used in the
transceiver. The conversion and coding functions of these blocks are for QAM
constellation mapphg and de-mapping purposes. The conversion fkom 8-bit symbols to
6-bit symbols is required if a 64 QAM system is used. The differential c o d e is required
for both 64 and 256 QAM to protect against the phase rotation of the two MSB'S. This
protection is necessary since these two bits determine the positions of the symbols in their
QAM constellations.
3.4.1 Byte-to-m tuple conversion
As shown in Figure 2.4, the input of this block is an 8-bit byte fkom the
interleaver and the output is a 6-bit symbol to the differential encoder. This conversion is
a simple bit arrangement of every 3 bytes input (Le., 24 bits). There are four symbols
output corresponding to every three bytes input and the cycle repeats. The
implementation of this bIock is simply using a counter to conbol the bit arrangement. A
behavior design domain is used in the synthesizable Verilog code to describe t h i s block.
3.4.2 Differential encoding
Using the Boolean expression described in Chapter 2 for differential encoding
(Equation 2. l), the differential encoding c m be described in words as:
a). If both inputs are 1, change both outputs.
b). If one input is 1, change an output as follows:
-If the previous outputs are equal, change the output whose input is 1.
-If the previous output are unequal, change the output whose input is O.
Figure 3.6 depicts the hardware implementation of the differential encoding
algorithm. Only the two MSB are encoded, (i-e., y[6], y[7]), other lower bits pass through
without encoding as shown in Figure 2.5 (Section 2.2.5). Since this is a simple circuit,
only a few gates and flip-flops are required to implement i t The output of the encoder,
(Le., 2[6], 2[7]), is pipelined dong with the other lower bits, (i.e., bit O to 51, to present al1
symbols to the QAM mapping block sidtaneously.
Figure 3.6 Differential encoding of the two MSB's.
3.5 QAM mapping
The 16, 64 and 256QAM constellations are shown in Appendix A. The points on
the constellation are arranged such that adjacent points are as far apart as possible. This is
one of the factors that make QAM outperfom PSK. Depending on the value of the input
symbol, the two outputs of the QAM mapping have different levels. These levels are used
in the mixer to fom QAM IF signals. There are 16 levels (4-bit word) for both 1 and Q
outputs for each 8-bit byte input if the system is using 256 QAM.
The QAM mapping is implernented using ROM since it is easy, simple, and it
does not required a large a mount of memory (256x8 for 256 QAM). In the FPGA
prototype, it is simple to implement ROM in Altera FLEX devices by using megacores in
the MAXPLUS iI packets. A megacore is pre-verified HDL design file that pdorrns a
specific task for complex system-level functions. The content of the ROM is stored in a
file and it is used to compile and program into the FPGA. In Verilog, ROM c m be
designed using "IF" statements but this approach uses more resources than using memory
IP core. Figure 3-6 shows the mapping and de-mapping of a 256 QAM using ROM.
256x8 ROM
+
M a ~ ~ h g De-mapping 256 QAM
256x8 ROM
Figure 3.7 Mapping and de-rnapping of 256 QAM
The two signals 1 and Q are sent out for pulse shaping filterhg with a raised-
cosine filter before being rnodulated with an RF signal. The receiver receives the signal
fiom the channe1 and reverses the signal processing of the transmitter as discussed in the
next chapter.
CHAPTER 4
MMDS RECEIVER IMPLEMENTATION
The receiver blocks shown in Figure 2.1 were implemented using FPGA
prototypes with the intention of developing an ASIC as the final product The blocks
were described in modules using Verilog HDL code. These modules were implemented
separately and then connected into the complete system with appropnate clocking. There
were five main modules implemented in the receiver: the QAM de-mappïng, the
differential decoder and m-tuple conversion, the convolutional de-interleaver, the Reed-
Solomon decoder and the baseband interface. The TCM decoder is omitted as stated in
Chapter 2.
4.1 QAM de-mapping
The implementation of this block is the inversion of the QAM mapping in the
transmitter. The block now has two 4-bit inputs (1 and Q) and the output is an 8-bit
symbol. The QAM de-mapping Is also implemented using ROM. The two inputs are
combined to form the address of the ROM as shown in Figure 3.7 (see Section 3.5). The
conversion is done by recalculating the contents of the ROM and storing them in a file for
compiling.
4.2 Differential decoder and m-to-byte conversion
As shown at the MMDS receiver side in Figure 2.1, these two blocks are by-
passed if the TCM decoder is used. The blocks decode the differential code and rearrmge
the bits fkom 6-bit symbols to 8-bit syrnbols if the transceiver uses 64 QAM. In al1 cases
of QAM, the differential decoder is necessary.
4.2.1 Differential decoder
The differential decoder performs the reverse fünction of the differential encoder
in the transmitter. From the description of the differential encoder in Section 2.2.5 and
Section 3.4.2, the differential decoder is describeci in words as:
a) If an input changes, make an output 1; if an input is constant, make an output O.
b) If the new inputs are equal, set the output of the ones that change to 1.
C ) If the new inputs are not equal, set the output opposite the ones that change to 1.
The differential decoding can be written by the following Boolean expression:
where k uidicates the present state and (k-1) indicates the previous state. Figure 4.1
depicts the schematic diagram implementing the differential decoder algorithm above.
Only the two MSBs, (Le., z[6], z[7]), are decoded and the other lower bits are
passed through without decoding which are not shown in the diagram. The decoder is
always more complex than the encoder and requires more gates to implement. The
decoder output is also pipelined for high-speed operation and synchronized with the
undecoded bits before being delivered to the m-to-byte conversion block.
Figure 4.1 Schematic diagram for the differential decoder
4.2.2 M-to-byte conversion
This module is the inversion of the byte-to-rn tuple conversion of the transmitter
and it is used only for 64 QAM. This block takes in four 6-bit symbols and sends out
three 8-bit bytes. The module arranges the order of the 24 input bits to produce the three
outputs as s h o w in Figure 2.4 (Section 2.2.4). The symbols are converted into a bit
stream and the bits are held and combined as required output. The cycle repeats with
every four symbols input and three 8-bit symbols output. The implementation of this
module using Verilog code is the same as the counterpart byte to m-tuple conversion in
the trammitter (i.e., using behavior design domain in Vedog)
4.3 Convolutional de-interleaver
The de-interleaver is implemented similarly to the interleaver but the branch
indexes are reversed (Le., branch "O" corresponds to the largest delay). The de-interleaver
synchronization is achieved by routing the first recognized synchronization byte into the
branch "O". The synchronization byte now has the longest delay. The 17-stage shift
register is replaced by the FIFO constructed in the interleaver- The clocking scheme is the
same as in the interleaver for the flip-flops. There is the same amont of hardware to
implement the de-interleaver and the interleaver since one is the inverse of the other.
Figure 4.2 shows the hardware construction of the 64 QAM de-interleaver. The operating
principle and clocking description of the deinterleaver is simila. to the interleaver which
is described in Section 3 -3.
Index Sync. word route
M = 204/12
= 1 7-stage FIFO shift register
Figure 4.2 De-intedeaving for 64 QAM
4.4 Reed-Solomon decoder
The RS decoder is the most compIex block to Mplement in the transceiver since it
involves Galois Field calculations and a complicated decoding algorithm. The slow
calculations in GF(~') arithmetic and the complexity of the decoding algorithms hinder
the decoder throughput. Many RS decoder cores in both hardware and software have
been studied and developed in the p s t [12,33-411, but they are stiil low in correction
capacity and cannot satise the required bit rate of 200 Mbs. An operation fiequency of at
least 25 MHz is required for the decoder to be used in the high-speed MMDS systern
[32,42]. The hi& data rate of the MMDS transceiver depends entirely on the theoretical
and architectural improvement of this RS decoder [32]. The theoreîicaI and VLSI
architecture developments are essential to design a high-speed decoda and they are
subjects of the research. The developments not only increase system speed, they also
reduce the hardware requirement of the codec since the dtimate goal is to design a low
cost, high-speed transceiver.
The transmitied codeword is corrupted by the channel due to noise or other
disturbances, the received codeword, r(x), at the decoder input is the result of the
codeword, c(x), and the errors, e(x):
r(x) = C(X) +e(x) (4.2)
The purpose of the decoder is to find the locations and values of the errors in
vector e(x) which is concealed in the received codeword, r(x). The correction is made by
adding the errors, e(x), to the received codeword, r(x), to claim the original codeword,
c(x) :
C(X) = r(x) + e(x) (4-3
Compared to the encoder, the decoding of Reed-Solomon code is much more
involved and the process is shown in Figure 4.3.
Evaluator
emr locator
Calculation Generation 4 1 error - locations
1 i Decoder Core i 1
Figure 4.3 Reed-Solomon decoder block diagram
There are four steps that must be evaluated in the decoding of RS codes. The
syndrome calculation computes 16 syndromes which represent the error pattern of the
received codeword, r(x). The division-fiee Berlekamp-Massey algorithm evaluates the
error location polynonid, a(x). This polynomial contains the error locations and the
error magnitudes according to error values in r(x). The error evaluator polynomial and the
Chien search fïnd the error magnitude polynomial, Z(x), and the error locations. The last
step generates mors, e(x), and error corrections, r(x)+e(x).
The decoder core introduces latency since it takes time to generate mors. The
issue of decoder latency is not critical in this implementation since the DAVIC 1.2
specification does not specify RS decoder latency. Latency is the t h e that is required for
the data to flow through the decoder and is measured in symbol clock cycles. A certain
delay time is needed for the received codeword, r(x), to be alignecl and corrected with
errors since the error locations are hown in r(x). This delay is shown in the delay block
of Figure 4.3. This delay (or correction thne of the decoder) is the sarne regardless of the
number of errors in r(x) because a l l of the received codewords must go through the same
number of steps in the decoding process. The input and output rates of the decoder are
one byte per clock cycle. The implementation of the four decoding steps is now described
in detail.
4.4.1 Syndrome calculation
The syndrome calculator is similar to the encoder and its outputs take a value of
zero if there is no error in the received codeword. There are 16 syndromes that have to be
computed to correct 8 symbol errors. The syndrome, Si, is found by substituting the root,
ai, of the generator polynomial, g(x), into the received polynomial, r(x), or Si = r(ui). The
syndrome, Si, can dso be computed by dividing r(x) by x+ai 1111. This division results in
the equality:
r(x) = C(X)(X + ai ) +- bi , (4-4)
where c(x) is the codeword and the rernainder, bi7 is a constant in GF(~'). Replacing x=ai
in Equation (4.4), then the syndrome, Si, is the remahder, bi, (i.e., x + ai =O). This
division is accomplished using the circuit shown in Figure 4.4.
The multiplication in the syndrome evaluation is implemented using XOR gates
which is the same as in the encoder because the mots of the generator, ai, are known (see
Section 3.2.). Since the codeword length is 255, the syndrome calculation requires 255
clock cycles to complete. This is the fist latency of the decoder core. The delay
introduced by this evaluation is the longest delay in the decoding process even though
this is the most simple step in the process. The syndromes are sent out and held during
255 dock cycles at the end of each received codeword, r(x). The syndrome evaluation
continues at the beginning of the next available r(x).
Multiplied by ai f l
= D flip-flop, dock not showing
Figure 4.4 Syndrome cdcdation
4.4.2 Error locator poIynomial calculation
There are two main algorithms to evaluate the error location polynomial: the
Berlekamp-Massey algorithm [ 1 11 and Euclidean algorithm [34]. Appendix C describes
the algorithms in detail. Both of the algorithms are designed to solve a set of 16 equations
set by the syndromes. The solutions yield the m o r pattern in which the smallest number
of mors is the nght solution [Il]. The Euclidean algorithm tends to be more widely used
in practice because it is easier to irnplernent. However, the B-M algorithm tends to lead to
more efficient hardware and software implementations.
This implernentation utilizes a rnodified algorithm, the division-fiee B-M
algorithm [43], to h d the error location polynomial, o(x). This modified algorithm
avoids 16 divisions in evaluating o(x) in the traditional B-M algorithm. It reduces the
decoder complexity because the irnplernentation of a division in G F ( ~ ~ ) is hardware
extensive. The detail of this division-free algorithm is described in [44] and is briefly
stated as follow:
Set o0(x)=l, do)&) =l, R")=o and Lo=O.
For r=l to 1 6, compte:
I , $(dr + O ) and 24,-,, 5 (r - I )
otherwke
After 16 iterations, the error locator polynomial o(x)=oo+oix+ozx2+ +osx8 is
obtained. In each iteration, the matrix in the algorithm has to be solved to find or(x) and
Er(x). The iteration yields the coeEcients of these two polynomids and passes them to
the next iteration. The degree of these two polynomials increases as the iteration
proceeds. As a result, the complexity of the irnplementation depends on the degrees of the
iterations (i.e., the h t step is much simpler than the sixteenth step).
In this step of the decoding process, a number of multiplias in GF(~') is required
to implement the algorithm. Many efficient multipliers have been reported [45-491.
During the research, a low latency multiplier was developed and used to in the decoder.
This G F ( ~ ~ ) multiplication circuitry has the same complexity as the LSB-£kt multiplier
described in [45] but the circuit latency is lower (see Table 6.2). The algorithm for this
multiplier is described in the following example.
The multiplication process is carried out in two s t e p for two arbitrary elements
A(x) and B(x) in ~ ~ ( 2 ' 1 . First the product D(x)=A(x)B(x) is computed and then the
modular reduction, P(x)=D(x)mod(G(x)), is performed. The modulo of P(x) is the
operation under the group G(x) where G(x) is the ~ ~ ( 2 ' 1 . The polynomial D(x) has a
degree of 14 and its coefficients can be found using:
and the coefficients of the product P(x) are:
where gi* = O or 1 is the coefficient generated by the field generator polynomial. These
coefficients are in the f i s t 14 rows of Table B.1 in Appendix B, where i is the column
number and k is the row number.
A complete description for the multiplier GF(~*) implementation is as follows. Let
and B(x) = bo tb a +b2a2 +b3a3 + - - - - +bTa7
be two elements to be multiplied in GF(~*) and the product is P(x)=A(x)B(x).
Using Equation (4.9, the coefficients 6 to di4 of the polynomid D(x) can be
found as:
and the modulus product P(x) = po +pla - k P 2 a 2 +p3a3 + - +p7a7 has the coefficients, pi,
according to Equation (4.6):
Figure 4.5 depicts a module of the multiplier. ui this multiplier, the total gate
count is 2'=64 AND gates and 77 XOR gates. However, the number of XOR gates in the
VLSI implementation is reduced due to the gate combination repetition. Total deIay for
this multiplier is Dh(=DA+6Dx, where DA and Dx are the delays of the AND gate and the
XOR gate, respectively. The longest delay in the fïrst step cornes f?om the calculation of
Figure 4.5 Multiplier in G F ( ~ ~ )
The evaluation of the error locator polynomial, o(x), occupies the most hardware
requirement for the decoder. An extra latency of the decoder core cornes f?om this step
because of the 16 iterations which involve many multiplications. A total of 20 clock
cycles is needed to complete this computation when the decoder was irnplemented into
FPGA prototype.
The error locator polynomial, o(x), has a degree of eight or less depending on the
nurnber of errors in r(x). Al1 of its eight coefficients are the elements in the finite field
GF(~*) including zero. These coefficients are used to compute the error magnitude
polynomial, Z(x), and fkding error locations in the next step.
4.43 Error magnitude polynomial calculation and Chien search
The error magnitude polynomial, Z(x), is defined as:
Z(x)= ~ ( x ) s (x)mod(x ' 6 ,
and is o f degree 7. This polynomid conveys the value of the errors, e(x), and it is the
multiplication between two polynornials in the GF(Z~) as stated in its definition. The
coefficients of Z(x) are computed by the convolution of the coefficients of a(x) and S(x)
using :
for 01iI 7,
Z(x)= oOSox + (QS~ +olSo)x + (O& + o ~ ~ ~ + a & ) x ~ +
(oos~+o&+o~s~+~~so)x~+ .
This calculation requires several multiplications. These multiplications are carried
out using the designed multiplier circuitry descrïbed previously. The evaluation of
coefficients, Z;, is performed in separate circuits as soon as the error locator polynomial
coefficients, ci, are available. Of course, the higher of index i of Z(x), the longer it takes
to evaluate its coefficients, Zi, due to its complexity. Additional decoder latency is added
in this step due to the time required to evaluate the coefficients of Z(x). In the FPGA
implernentation, only one clock delay was added into the decoder latency.
The roots of the error locator polynomiai are found by exhaustively evaluating
o(x) at x=ai for i=l to 254. This technique is refmed to as a Chien search [Il]. Figure
14.6 depicts the block diagram for the Chien search. In this search, the error Iocator
polynomial, a(x), is evaluated in every clock cycle and tested if it is zero. The zero detect
will generate the roots of the polynomial.
u
Zero Generato
Figure 4.6 Chien search algorithm
In this search, the eight coefficients of the error locator polynomial are cyclically
multiplied and summed to 6nd the zero. If there is a zero, there is a root of the
polynomial, ~ ( x ) , at that position of the received codeword, r(x). The output of the sum
and zero block is sent to the root generator to generate one of the roots of the polynomial
when the root is detected. The multiplications and zero of the sum are evaluated within
one clock cycle. The results of the multiplication are fed back and multiplied to the next
degree of ai, (i.e., ai+'), through a shifi register. There are eight multipliers in the search
and these multipliers are implemented using XOR gates as described in Section 3.2 since
these constants are known (i.e., a, a2, a3, --•, a8). The roots are the elements in the G F ( ~ ~ )
and are sent to the error generator block of the decoding process. The inverses of the
roots in the field are the error locations in the received codeword, r(x).
4.4.4 Error value generation
The error value generator takes error magnitude polynomial, Z(x), and the roots
(if any) of the error location polynomial to generate errors. The corrected codeword, c(x),
is computed in the decoder using the following algorithm:
For i=O to 254
r f (da")) == il) then
in which r(x) is the decoder received vector, Z(x) is the error magnitude polynomial,
d(x) is the derivative of ~ ( x ) and a-' is the error location.
In the algorithrn, the condition is set and satisfied by the Chien search which is the
root of o(x) at position ai. The division results in error values, e(x). These errors are
added through the channel and they compt the received codeword at that position of r(x).
In the calculation, the derivative, at(x), is simply the odd terms of o(x), Le.,
or(x)= O ~+G~(x')+ u-j(x4)+ 67(x6).
The error is generated by multiplying ~(a-') and the inverse of of(a-'). The
decoder has to generate each error wiuiin one dock cycle before receiving another root
fkom the Chien search. One way to increase throughput of the decoder is the
improvement in latency to evaluate this division. The faster the calculation, the shorter
the symbol clock of the decoder can be. The low latency circuit to perform this division is
necessary for a successful irnplementation of the hi& speed RS decoder used in the
MMDS systerns.
The error generation includes the evaluation of ~(a") and o(a4) and the division
between the two as shown in the algorithm. The evaluations of ~ ( a - ' ) and the odd term
&(a-') take place simultaneously as soon as the root, à', is available fiom the Chien
search. These evaluations are realized using the low latency power-sum circuit described
in Section 4.4.4.1 below. Mer the evaluation of a-', the division of the two is perfomed.
The division between the two efernents in the finite field is the multiplication between the
hrst one and the inverse of the second. The delay of the error evaluation includes time to
inverse the odd term, d ( a i ) , and multiply it with 2(aei). Since the error is generated
within one clock, this evaluation must be complete in one clock period.
The inversion in Galois Fields is dways more complex to irnplement and takes
much longer than the multiplication to cornpute. The theoretical and VLSI architecture
developments of the inversion circuit using G F ( ~ ~ ) are described in Section 4.4.4.2. Since
the error generation requires inversion, the latency of the inversion circuit sets the e m r
generation tirne. This time is critical in the decoding process because it detemines the
decoder clock cycle. Therefore the lower the latency of the inversion circuit, the faster the
decoder c m operate.
The evaluation of ~ ( a - ' ) and <r'(ùi) are implemented using a newly developed
power-sum circuit, (P=c+AB~), in G F ( ~ ~ ) [50]. The use of this circuit significantly
reduces the computation of the decoder core in both hardware and latency. To evaluate
Z(x) at the root a-' (or of(x) at the root a"), expand
Z(X)= Zo +Z (x) +z2(x2) +z~($) f24(x4) +z5(xS) fz6(YC6) f ... .
as
Z(x)= (Zo + &(x2) )+ x(Z1 +z3(xZ)) + x4(24 + 2g(x2))+ S...
Replace x=a-' and apply the power-surn circuits, P=C+AB~. The number of
multipliers and exponentiations in the computation is also reduced. In addition, the
computation of power (2') of an elernent in GF(~') c m be implemented using very few
XOR gates with minimum delay. The power-sum and exponential circuits are described
in the next two sections.
4.4.4.1 Low latency power-sum circuit in G F ( ~ ~ )
The power-sum circuit used in the decoder core has low complexity and low
latency according to [SOI. The algorithm to compute the power-sum can be summarized
as follows:
Let three arbitrary elements A(x), B(x), C(x) be calculated to £ïnd the power-sum
To obtain P(x), first compute the power sum operation D(x)=c(x)+A(x)B*(x) and
then perfom the rnodular reduction operation P(x)=D(x)rnod(G(x)). In ~ ~ ( 2 ' 1 ,
BZ(x)=~(xZ) and the power s u m operation becomes the task of fhding the coefficients,
dk, of D(x) using:
Then the coefficients, (pk), of P(x) c m be computed using:
where gi,i, = O or 1, is the coefficient generated by the field generator polynomial, p(x),
and p(x)= x8 + x4 + x3 + x2 + 1. The coefficients, gib can be seen in the first 14 rows of
Table B.!. The algorithm for the power surn circuit is closely related to the one used in
the multiplier circuit in Section 4.4.2.
Using Equation (4.8), the coefficients & to d2, of D(x) are:
according to Equation (4.9):
The schematic diagram for this circuit is similar to Figure 4.5 in Section 4.4.2. in
which D(x) now has 22 coefficients and the product, P(x), has 8 coefficients of O or 1.
4.4.4.2 Low latency exponential circuit in G F ( ~ ~ )
Following is a simple cornputaiion for the powers 2' of an element in the f i t e
field GF(~') which was developed in the research. Let
p = bo + bra+. . . + b6a6+ b7a7
is an element in GF(z'). The coefficients of p2' can be found using:
10 for higher values of n, the powers repeat (i.e., p28 =a, P 29 ,a2, = a 4 , fi3 2" =a 8 , ..J.
These powers are uniqueiy expressed in terms of coefficients, bk, of the element
p. Fast calculations of the coefficients using Equation (4.10) are implemented in VLSI
using only XOR gates. The number of XOR gates is reduced due to gate combination
repetitions. Maximum delay in this computation is 3Dx, where Dx is the XOR gate delay.
The example below shows the implementation to calculate p4 in ~ ~ ( 2 ~ 1 .
According to Equation (4.1 O), the expression for p4 is:
p4 = bo + bia4 + bsa8 + b 6 d 2 + b 6 d 6 + b6a20 + b6a24+ b7a2',
3 3 replace a' =(l+a- + a +a4), al2 =(l+a2 +a3 +a6+a7)3 al6 =(a2 +a3 +a6), ... fiom the
table of GF(Z') into p4 and combine terms with the same powers of a:
p4 = @00b2@b3CBbs) + (b6)a + (b2@b3G3b4@b&b6)a2 + (b2@b3@b4@b~@b~)a3 +
@ 1 G3b2G3b@b7)a4 + @5)a5 + @3@b4)a6 + (bl@b@b6)a7
The schernatic diagram for this circuit is shown in Figure 4.7.
Figure 4.7 Schernatic diagram for p4 circuit in GF(~*).
4.4.4.3 Low latency inversion and division circuits in G F ( ~ ~ )
The hi&-speed operation of the decoder is obtained partdly due to the
development of a low latency inversion circuit that was developed in the research. In the
past, considerable effort has been made to develop efficient schemes for finite field
inversion and division [49,5 1-55]. This low latency circuit was developed to increase the
designed RS decoder throughpiit. It is lower both in latency and cornplexity compared to
others [56]. The architecture of this circuit is described as follows.
The inversion cf an element, B, in GF(23 can be expressed as:
This equation shows that the inverse computation can be realized using
exponentiation, ( B ), and multiplication circuits. The fast implementation of these two
circuits results in the low latency of the inversion circuit. The exponentiation of an
element in the order of 2' is easily implemented using only XOR gates in VLSI as shown
in the previous section. This implementation yields a very fast calculation and al1 the
exponentiations c m be computed simultaneously before they are multiplied. Figure 4.8
shows two steps of the low Iatency inversion circuit architecture: the exponentiation
calculation and the multiplication.
4 r F
Exponentiation Multiplication
Figure 4.8 Low latency inversion and division architectures in G F ( ~ ~ )
The inversion process begins with the calculation of the exponentiations of
element B. Seven exponentiations p 2 , B ~ , B ~ , B ' ~ , B ~ ~ , B ~ , and BI2') must be evaluated
first. These exponentiations are cornputed using Equation (4.10).
B ~ ~ ~ = bO + bla12' + b2a + b3a12' + b4az + bsa130 + b6a3+ b7aL3'.
Replacing a8=(i +a2+a3+a4), al0=(a2+a4+a5+a6), al2 =(I +a2 +a3 +a6+d), and
so on from Table B.l for GF(~') uito B', these exponentiations eventually become the
combinations of XOR gates. For example,
The maximum delay in this step is 3Dx, where Dx is the XOR gate deIay. Using
this architecture, all of the exponentiation terms of the inverted element B are evaluated
simultaneously before the multiplication begins. In an actual VLSI design, the number of
XOR gates is reduced significantly (by 12%) due to repetition of the combinations of hi's.
In the multiplication stage, six multipliers are required and they are in a three-step
consecutive arrangement as shown in Figure 4.8. The multiplication uses the multiplier
circuit described in Section 4.4.2. in this low latency architecture for the inversion, there
are only 3 consecutive multiplications instead of 6 power-sum calculations using the
architecture in [52]. As a result, the latency of the inversion circuit is reduced
significantly while low complexity is maintained. Compared to the architecture proposed
in [52], there is a reduction of 25% in latency and 10% in hardware to implement the
inversion in G F ( ~ ~ ) [56].
The total delay for this inversion circuit is t, = t, + t2 , where ti is the delay for
exponential calculations (3D,) and tz is the delay of the multiplication stage which is
3DM, (DM is the delay of the multiplier described in Section 4.4.2). The total latency of
the inversion circuit is D1=3Dd3DM.
Another multiplier will perform the division function between two elements C
C and £3 in G F ( ~ ~ ) as shown in Figure 4.8 since - = C - B-' . The delay for this division is
B
DD=3Dx+4DrVr. The division circuit is used to generate the error in the decoder as shown
in the m o r value generation algorithm.
4.4.5 Correction
The errors generated fkom the decoder core are to be used to correct the comipted
received codeword, r(x). The codeword is delayed by the decoder latency using RAM.
The correction step in Figure 4.3 is very simple since the addition of the two elements in
GF(Z~) is simply a bit-wise XOR operation between the two.
4.4.6 Eiigh-speed RS decoder design summary
In the decoding process, the t h e to genbrate errors in the last step is critical since
it determines decoder throughput. The computations of error locator polynomial, ~ ( x ) ,
and error magnitude polynomial, Z(x), introduce delay because no clocking is required in
these steps. These two cornputations are complex and hardware extensive but they don't
determine the operation speed of the decoder. The multiplication and addition in Chien
search are fast due to the implernentation using only XOR gates. The simulation and
implementation results showed this step takes only 60% of the time required for error
generation. Therefore the RS codec operating fiequency depends on the error generation
tirne of the decoder core.
The majority of the time in the error generation is devoted to the evaluation of the
~ ( a - ~ ) and the inversion of the odd term at(x) as shown in the error generation algorithm.
Of the two, the inversion tirne is more critical than the evaluation of 2(ai). Using fast
power-sum, multiplication, exponentiation and division circuits, the error generation tirne
was 80ns when it was implemented into the AItera FPGA prototypes.
The designed RS decoder achieved a data rate of 96 Mbs and had a latency of 278
clock cycles when it was implemented into an Altera FPGA. The decoder core applied
appropriate pipelining when the modules were connected. This pipelining added only two
more clock delays to the decoder. A data rate of 200 Mbs is expected when the codec is
implemented in an ASIC. For an operating fiequency of 25 MHz, this ASIC will have a
latency of las than 1 2 ~ s .
The process of de-randomization in the receiver is the same as the randomization
in the transmitter. The PRBS generator is identical therefore it is not necessary to develop
a de-randomizer. In the de-randornization process, the eight bit syrnbol data is serialized
and sent to the input of a randomizer. A ser ia l to parallel conversion is required to
convert the de-randomized bit stream data back into an 8-bit symbol output.
Chapter 5
MMDS SYNCHRONIZATION USING GPS CLOCK
Tinuig and synchronization are critical in the design of any digital
communication system. Synchronization plays an important role since it ensures the
srnooth transfer of information. The goal in synchronization is to align the t h e and
fi-equency scales of the clock so that every piece of equipment of the communication
network operates synchronously. This chapter describes the need for the communication
for the MMDS system system synchronization and the proposed architecture
synchronization.
' 1
5.1 The need for synchronization
The topic of synchronization was introduced with the evolution of digital
communication and becomes more important when a higher transmission speed is
required. Synchronization is a S ~ ~ O U S challenge to modem communication systems to
ensure integrity of the transmitted data. Synchronization has been discussed in the
literature and in recent years the topic has become popular [19,29,57-611. A
communication syst& can be classified as synchronous if there exists a time reference
common to both the transmitter and the receiver [58] . Analog systems are generally not
synchronous. If synchronization is required in an analog system, it is a requirement
imposed by the source, as in television transmission, not by the communication system
itseK
In digital communications, the requirement for synchronization is due to the
requirment of the system having to run at the same clock fiequency and the syrnbols
must be recognized and synchronized for various elements in the system to bc t i on
properly. Any multiplexing scherne must be tnrly synchronous throughout the network
with a single master clock defining the slot intervals for dl fiames. The frames are
constructed by interleaving at rates derived îrom tbis single clock, with al1 fiames
digned. M e n a digital c o ~ u n i c a t i o n system is to be operated in a large geographic
area, it is usually set up in a hierarchical arrangement (Le., network) and synchronization
becomes even more important. Network synchronization has recently become a popuIar
topic because standards must be set for the network tu operate smoothly and to be cost
effective [5 71.
The block diagram in Figure 5.1 depicts a generalized communication system
mode1 and the designed MMDS system uses the same structure. The source simply
represents the source of information to be transmitted. The data sarnpler converts the
randorn process to a random sequence by a sarnpling operation. The source encoder
serves as a device for mapping fiom data samples onto data words, that is, onto
sequences of digits or data symbols. The channel encoder is designed to add redundancy
to the digital sequence represented at its input for error correction purpose. The hc t ion
of the modulator is to convert the sequence of symbols at the encoder output into a
sequence of wavefoms suitable for transmission over the communication channel. The
nature of the channei is generally ass ied to be bath wideband and time-invariant and
perturbed by noise. The channel noise is assumed to be a sample function of a white
Gaussian process. Each block on the receiver side of Figure 5.1 performs the inverse
operation of the correspondhg block on the transmitter side.
Noise
Data Destination r -,
~ e ~ e n e r a t o r -,
RECEIVER
Figure 5.1 Mode1 of a cornmunication system
In the communication mode1 above, two sequences of events are said to be
synchronous if corresponding events in the two sequences occur simultaneously.
Synchronization is defïned simply as the process of bringing about, or retaining, a
synchronous situation [29,58]. It is only necessary to identiS. one of the two sequences of
events to be synchronized with one taking place at the transmitter and the other one
taking place at the receiver. In order for the two events to be synchronized, there exists a
cornmon time reference between the transrnitter and receiver. Each block shown in
Figure 5.1 represents a specific synchronization constraint, Le., a specific requirement,
that the cornmon time refkrence must satis*
The synchnization process can be disthguished by two modes. In the first
mode, the dock synchronization mode, the clocks that regulate the two sequences being
synchronized (Le., the transmitter and the receiver clocks) are forced to nin at the same
rate. In the second mode, the higher order synchronization mode, a corresponding pair of
events in the two sequences is identified and made to occur simultaneously. Clearly, if
the same event occurs in two identical sequences simultaneously, and if the sequences are
processing at the same rate, the sequences are, and will remain, synchronized.
If the transmitter and receiver clocks are both sufficiently stable relative to the
required synchronization accuracy, the clock synchronization mode rnay be bypassed.
However, when this is not the case, techniques must be devised to provide the needed
clock synchronization. Traditional methods of clock synchronization include,
transmittrng the transmitter clock signal dong with the information, using the carrier
itself as a clock, and deriving the carrier fiequency and phase fkom the data signal [60].
Once the transmitter and receiver clocks have been synchronized, the second
mode of the synchronization process begins. Events taking place in each of the blocks in
the receiver portion of Figure 2.1 must be synchronized with the corresponding events
taking place in the analogous block in the transmitter portion. Efficient demodulation
requires the demodulator to be synchronized with the modulator so as to know when the
waveform representing one sequence of digits (or symbol) ceases and the next one
begins. This is calied symbol synchronizafion. The channel decoder can not decode
correct1 y unless it can identie the beginning of each code word and it is called code word
synchronization. Similarly, the source decoder is useless udess the digits appearing at its
input can be separated into groups, one group corresponding to each data sample. This is
called data word syzchronization. Since the significance of a particular data sample may
be defined only in terrns of its position in a sequence of samp!es, sometimes called the
fiame, the data regenerator must fiequently be synchronized with the data sarnples and
this is called fiame synclzronization.
The higher synchronization mode is implemented in MMDS by inserting
synchronization and inverting synchronization bytes into the fiame structure of the
MPEG2-TS. This synchronization scherne is sufficient for the second degree of
synchronization as specified in DAWC 1.2 specification. The dock synchronization is
more aitical in the MMDS system, especially for the high-speed operation. The fast
clock imposes a strhgent condition to synchronize the clock. As a result, a stable dock
with a high degree of precision is desired for the system dock synchronization. Finding
an efficient synchronization scheme for the high-speed MMDS system is the topic in this
part of the thesis. The use of a GPS dock was investigated to replace crystai oscillator for
clock synchronization in the absence of SONET and SDH in the network. The system
clock fiequency must be stable and traceable to a standard time as stated in Section
2.2.1 1. Receiving anci using of this precision GPS clock in synchronization are discussed
in the next section.
5.2 GPS clock derivation and application in a MMDS system
hperfect clock tuning in the receiver of the system will degrade the performance
of synchronization Ioop and, hence, the overall system's reliability and data transmission
quality. This section describes why the GPS ciock is selected and how it is used as an
MMDS system clock.
5.2.1 GPS clock versus crystat oscillators
Traditionaily low-cost crystal osciIlators have been used to generate reference
fiequencies for synchronization. The use of crystal oscillators only works well for low
speed data transmission. The main disadvantage of using crystal oscillators is that the
fiequmcy drifts due ta temperature fluctuations, age and inaccuracy [62].
Even expensive crysd oscillators drift by a small amount each day and they must
be adjusted to maintain Long term reliability and accurate time. Maintaining this
reliability is a major problem [62]. To solve this dilemma, teleco~ll~~lunications
companies use a fiequency reference distribution system, which is linked to an atomic-
reference source, to continuously steer the crystal oscillators to the reference t h e . At low
data speeds, crystal oscillators with reference steering provide adequate synchronization
accuracy and reliability.
An alternate solution is to install v q high precision clocks at each terminal
location but the cost of those atomic or cesium tube clocks are too hi&. In addition to the
long term reliabiliq problem, this is an expensive option, especially if redundant clocks
are needed. In addition to the drifting problem of the crystals, other solutions must be
sou@ for the independent crystal to be synchronized with standard tirne of the higher
hierarchy in the network. This hierarchy synchronization requirement nakes the use of
crystal clock even more expensive. In contrast, an inexpensive GPS receiver may be
availa3le at each base station to generate a stable local dock.
The use of a GPS clock has many advantages. Tt has high accuracy, high
reliability, high stability, worldwide access, precise time, low calibration cost, small size,
low pawer, low unit cost, and low installation and maintenance costs [62]. Receiving the
G P S clock becomes less expensive as the technology matures. In addition, the GP S clock
satisfkes the tolerance of 50 ppm of the clock as required in DAVIC 1.2 specifications
[63]. As a result, instead of using a crystal oscillator to generate a system clock, a GPS
clock could be used in the hi&-speed MMDS system for clocking and synchronization
purpuses.
The choice of using a GPS clock over the crystal oscillator in system
synchronization is also based on the issue of reference time to which the clock is set. The
obvious choice of this reference tirne is the UTC [64,65]. This feature is essential for the
MMDS system since it is connecteci to the higher hierarchy of the global communication
link. Therefore, another advantage of using a GPS clock is the traceable ability of this
dock to the international standard time as used by SONET and SDH. GPS signats are
available but how they are received and converted into a usefiil clock for t h e
synchronization used in the MMDS system is describeci in the next section.
5.2.2 GPS clock
GPS is a worldwide resource of unprecedented accuracy and precision for time
and position- Precise measurement of time and time intervals is at the heart of the GPS.
The entire system is based upon very accurate t h e as kept by atomic standards on board
each of the satellites which are monitored and controlled by the US Naval Observatory
(USNO) [62,66]. The USNO Master Clock is the time and frequency standard for al1 of
these systems. Thus, this clock system must be at least one step ahead of the dernands
made on its accuracy and developments planned for the years ahead must be anticipated
and supported.
The Master Clock system now incorporates hydrogen masers, which in the short
term are more stable than cesium beam atomic clocks, and rnercury ion fiequency
standards [64]. These represent the most advanced technologies available to date. Highly
accurate portable atomic clocks have been transported aboard GPS satellites in order to
synchronize the time at Naval Bases and other Department of Defense facilities around
the world with the Master Clock. Accurate time synchronization with the Master Clock is
now beginning to be carried out through the use of atomic clocks in GPS satellites, which
will provide the primary means of tune synchronization and worldwide tirne distribution
1671. As a result, the use of GPS receivers which are locked to the satellites c m provide
the user with very accurate, inexpensive and traceable t h e . The received GPS frequency
exceeds Stratum 1 level requirements in the communications industry (0.3 ps in time and
IO-'' in fiequency) 1651. With this precision, a GPS clock can be used for clocking
purposes in communication digital circuitry.
The GPS systern consists of three parts: the space segment, the operational control
segment and the user equipment. The GPS constellation includes 24 satellites which are
in polar orbits. The clocks, or more appropriately, the fkequency references, are carried
aboard the satellites and are used to generate signals with precise and synchronized
timing marks. Each satellite c d e s a pair of cesium and rubidium atornic standards
164,681. The fiequency stability of these clocks over a day is about one part in 1014 and
one part in 1013 respectively. The satellite clocks are maintained in synchronism by
monitoring the signals f?om a net-work of tracking stations. These stations are operated by
the Department of Defense in the United State as part of GPS operation control segment.
Each GPS satellite transmits continuously at two fiequencies in the L band: 1575.42 MHz
(LI) and 1227.6 MHz (L2). These signals are modulated by a pseudorandom noise (PN)
code called the Coarse Acquisition (CA) code. The GPS signal format is known as direct
sequence spread s p e c t m [69]. The user equipment receives GPS signals for navigation
and timing purposes. A general GPS receiver block diagram is shown in Figure 5.2.
Y GPS Antema
Figure 5.2 Generic GPS receiver block diagram [69]
nie antema normaliy is right-hand circular polarized to match the incoming
signal and the pattern is hemispherical. A well-designed GPS antema m u t have a good
multipath-rej ection characteristic [70]. The analog fiont end of the receiver involves
filtering, amplification and d o m conversion. Mer analog-to-digital conversion (ADC),
the baseband processing processes the digitized signal to provide the navigation and
timing information.
For timing purpose, a GPS clock receives signals and locks onta the GPS
fkequency and locally regenerates a stable clock as shown in Figure 5.3. The GPS clock
used in this research to study the MMDS system synchronization was rnanufactured by
Absolute T h e (GPS CLOCKTM MODEL 100A) [7 11.
I
3
- Navigation Processing
ADC - Front End Pre-amp - RF
. ' I
Baseband Processor
Figure 5.3 Frequency based GPS dock [7 11
NAVG
The architecture of this GPS clock is different fkom the standard time-based GPS
receivers shown in Figure 5.4. The architecture is optimized for frequency applications,
-
which irnproves timing performance of the clock. This GPS clock architecture is
OSC GPS Receiver -
Micro Processor
fiequency-based, not a time-based receiver. Instead of slaving an oscillator to the 1 Pulse
G-
Per Second (PPS) output of a GPS receiver, the GPS dock slaves the oscillator of the
GPS receiver to the satellites and derives the 1 PPS fiom the locked oscillator frequency
output. The result is that the GPS clock output is more stable, more accurate and more
precise.
The operation of the GPS clock is as follows. A circular polarized antenna
receives the CA code signals fiom the GPS satellites. The antenna module consists of an
Ll frequency antema element and a preamplifier and interfaces to the receiver via
antenna cable. The GPS dock cont- a processor, fiequency generation hardware and
RWIF circuits. The reference frequency for the systems is a 10 MHz crystal oscillator.
The reference signal drives the PLL at 44.456 MHz. The RF fiequency is downcoverted
by the RF/IF circuitry with appropriate filtering and automatic gain control circuitry. The
analog IF signal is then digitized with a sample and hold circuit and 3 bit ADC. Digital
data is sent to a DSP to process. The DSP is implemented using Codelator ASICS? Each
codelator performs al1 the correlation, signal processing and tracking of an individual
satellite. The DSP interfaces to the microprocessor for control and to output data-
I I / PPL
GPS Receiver
. .
Figure 5.4 Time based GPS clock generation
GPS Frequency -+
To lock ont0 a GPS satellite's £kequency and tune, the GPS clock operates in time
transfer mode. In this mode, the clock first surveys its location by tracking at least four
satellites. AAer locationing, only one satellite is required for timing tracking. The clock
measures the satellite fiequency and adjusts its intemal oscillator. Once the clock jam-
sets to the GPS satellite's time and fiequency, the phase of the 1 PPS coincides with the
satellite time, the clock then closes the Phase Lock Loop and continues to lock the
oscillator to the satellite's fiequency. In normal operation, the GPS reference signal
determines the long-term stability of the GPS clock frequency output. In the unlikely
event of satellite signal loss or interruption, the GPS clock enters an intelligent holdover
mode to maintain accuracy until the GPS signal reference is regained. In this mode, the
intemal oscillator remains set to the last hown fiequency until the satellites are once
again acquired.
The Absolute T h e GPS clock unit has a fiequency accuracy of 1 part in 1 o1 l
over a one day average, and 5 parts in 10" over a one week average [71]. The time
accuracy relative to the Coordinated Universal Time (UTC) is 30011s (Selective
Availability (SA) on) and lOOns (SA off). The stability of the fkequency output (1 0 MHz)
is 1 part in 10" for averaging times f?om 0.1 to 100 seconds and time stability (1 PPS) is
less than lns of pulse-to-pulse jitter, rrns [7 11. The clock now has real-the direct
traceability to the USNO and thereby ultimately to the international d e h e d fiequency
and time.
5.2.3 Using GPS clock in MMDS transceiver prototype
A 2Vp-p sinewave output fiom the unit can be converted into a 10 MHz TTL
output to clock the MMDS system components using a comparator shown in Fig 2.5.
This simple circuit uses a zero-crossing detector to convert a sinewave into a squarewave
TTL. The LT1720 comparator made by Linear Technology is used for zero-crossing
detection. This hi&-speed comparator (4.511s) operates on a single +5V supply and
provides a rail-to-rail output [72]. The intenial design of this device minimizes
oscillations due to feedback because the sensitive inverting input is placed away from the
output and shielded by the power rail. In addition to the high stability of the device itself,
care has been taken in the layout of the PC board. A double-sided PC board was used for
the circuit Fi,we 22. 5 t h appropriate grounding. The circuit was checked and tested to
show its performance and stability. In addition, manufacture testing resdts show that this
device has a high degree of reliability [72].
The voltage divider (RI, R2) shown in the schematic diagram of Figure 2.5 is
required at the input since the maximum negative input of the comparator is -0.2V. The
TTL output provided by the comparator is connected directly to dock the FPGA
prototype for system development.
vcc I/P O T
Figure 5.5 GPS clock TTL output
5.3 MMDS system synchronization
Careful synchronization planning is necessary in both the wired and the wireless
worlds because the robustness of any cornmunicat;lon network depends on its
synchronization. The two objectives in designing a synchronous system are clock and
word synchronization. Al1 the elements in the network must run at the same clock rate
and words must be synchronized to ensure the integrity of transmitted data. The
synchronous systern has to be simple, low cost, robust and reliable. It must also meet the
specification for synchronization tirne. In the MMDS system, the receiver uses the
transmit clock to clock al1 components, this clock is derived nom the received data clock.
To ensure an adequate binary transition for the clock recovery, the system data packet
(MPEG2-TS) is randornized as shown in Section 2.2.1. The synchronization byte and its
inversion are added into the transport Stream for fiame synchronization and they provide
the initialization signal for the de-randomization process. These synchronization bytes
provide the required system fiame synchronization.
53.1 Clock synchronization
In the continuous-time world, establishing a common time base at physically
separated locations presents some serious challenges. Typical systems use independent
time bases, fiequently derived fiom crystal osciIlators. Although crystal oscillators
provide accurate time references at low cost, "acc~rate" is not adequate to maintain the
integrity of discrete-time data [60]. In addition, time references ofien must be identical, at
least in the sense of long term averages, within the system and in the communication link
hierarchy. In other words, systems must be synchronized within itself and to others. The
first step in the synchronization procedure is usually to slave the receiver and transmitter
clocks, thereby establishing a common clock reference throughout the system. Since the
receiver clock is derived fiom the transmitted signal, which carries clocking information,
a clock recovery is required at the receiver to recover the clock, This recovered clock is
called "loop tirned" since it cornes fkom the transmitter. Many cfock recovery schemes
have been developed in the past and described in the literature [19,58-61,73-761. A few
are:
1) Carrier synchronization: This approach is used when coherent detection is
used; knowledge of both fiequency d phase of the carrier signal is necessary. The
optimum receiver is a PLL using either a Costa Loop or an ni-th power loop [74].
2) Symbol synchronization: The information needed to establish symbol
synchronization in particular is actually present in the message-bearing signai itself A
clock is transmitted along with data then extracted at the receiver end or the clock is
extracted by processing demodulated baseband waveforms. This second approach avoids
wasting transmitter power.
3) Maximum-likelihood symbol synchronization: The maximum-likelihood
decision with respect to the symbol epoch is to accept the epoch and to maximize its
density function [74]. Depending on the modulation technique used in the transmission,
different symbol synchronizers will be used.
4) Tracking symbol synchronization: The maximum-likelihood rnethod based on
the howledge of the receiver symbol period is stable or else the clock must be slaved to
that of the transrnitter. Any subsequent fluctuation in the symbol epoch will be reflected
in the receiver clock. Nevertheless, it is ofien advantageous to be able to track variations
in the symbol epoch directly without relying on the auxiliary clock. This is the scheme
for clock recovery used in the MMDS system.
The DAVIC 1.2 specifies timing for the MMDS network:
"The trammitter in the network device will use a transmit clock which is derived ffom the network clock (e-g. SONET clock, SDH clock, PON clock, ...) to allow end-to-end network synchronization. In the absence of a network clock, the network device will use a locally generated clock with a maximum tolerance of 50ppm. The transrnitter in the user device will use a srn nit ter clock that is derived fiom its received data clock, i.e., the user device is loop timed. In the absence of a vdid dock derived fiom the received data clock, the user will not perfonn any upstream access on the media" [4j.
The use of a GPS clock meets the standard set above to provide the timing
required for the MMDS system as described in Section 5.2. Figure 5.6 depicts the system
clock synchronization configuration in which the GPS clock is generated at the base
station and transmitted to the multi-receivers. The clock is recovered at the receivers for
clock synchronization.
I t
GPS Satellites
[ I l l l i t I I I
data
I I
data
Figure 5.6 MMDS system synchronization using GPS clock
The clock recovery attempts to synchronize the receiver clock with the baseband
symbol rzite transmitter clock [17]. The MMDS receivers use an early-late gate technique
to recover the clock nom transmitted data. The clock is extracted by processing
demodulated baseband waveforms. Since the symbols must be distinguishable, it should
be possible to determine directly fiom the received sequence exactly when the transition
from one symbol to the next can take place. The use of baseband signals for clock
recovery avoids wasting transmitted power for a separated clock. Figure 5.7 shows the
earlynate gate clock recovery.
Symbol Waveform Generator
Delay
Symbol Timing
Figure 5.7 Earlynate-gate data symbol synchronizer [17]
In this clocking recovery, correlators are used instead of equivalent matched
filters. Both correlators integrate over a full symbol interval T, with one starting &T
early relative to the transition time estimated and the other sbrting &T late. The e m r
signal, which is the sum of the absolute value of the two correlators, is low-pass filtered.
The output of the low-pass filter is applied to a VCO that controls the charging and
discharging instant of the correlators. The closed loop design of the recovering circuit is
narrow band relative to symbol rate 1/T. The instantanmus frequency of the local clock is
advanced or retarded in an interactive manner until the equilibrium point is reached, and
symbol synchronization is thereby established.
For FPGA implernentation, an earlynate gate synchronizer Altera MegafimctionB
has been developed [77]. The synchronizer is fimdamentally a digital phase locked loop.
It provides phase lock between an internally generated data clock and an input data
Stream. The synchronizer includes a phase detector, an up-down counter loop filter, and a
digitally controlled oscillator. The phase detector provides the error between the data
clock and the input data stream. The up-down counter accumulates the phase error output
according to its sign and magnitude. The digitally controlled oscillator advances or
retards the phase of the locally generated data clock whenever the error accumuiator
exceeds a specific error threshold. This threshcild is programmable which allows the
synchronizer to change the acquisiticn time and data clock jitter.
The clock at the receiver is re-generated fkom the symbol clock. This clock is loop
timed to the transmitter clock which is a UTC traceable GPS fiequency and time. The use
of a GPS clock is much simpler and less expensive than the use of expensive independent
clocks. One GPS dock unit at the base station provides clocking for al1 of the receivers
within the coverage area of the MMDS system.
5.3.2 Frame synchronization
After the clock rate is recovered fiom the received signal, the higher order of
synchronization begins to provide necessary information for the various components in
the system to operate synchronously. This synchronization includes coding and insertkg
special symbols for word synchronization. DAVIC 1.2 specification indicates the use of
synchronization and inverted synchronization bytes into MPEG2-TS. This is discussed in
more detail in Section 2.2.
CHAPTER 6
RESULTS
This chapter presents the results of the system simulation using Matiab as well as
the hardware implementation of Figure 2.1 into Altera FPGA devices and the GPS clock
testing. Figure 6.1 shows the configuration set up for the MMDS system developrnent
and testing. Descriptions of the equipment shown in the figure are as follows.
GPS Antenna
HP 53 132A 1 Universal 1 ?TL GPS *
Counter Interface Clock 4 ~ ~ - 2 13 Antenna Cable
Figure 6.1 Equipment set up
Matlab and Simulink were installed in the computer for systern simulation. The
Altera software (MaxPlus II@ Version 9.3) was installed in the computer to compile
Venlog HDL code into FPGA program files. These files were used to program the FPGA
device dirough the Altera ByteBlaster cable connected between the computer and the
prototype board. Once programmed, the FPGA was configured as the hc t ion block
described in the Veriiog code. The logic analyzer captured input and output wavefoms of
the FPGA to verify the functionalities of the building blocks at a specific speed.
The GPS antema was rnounted on the building's roof to receive GPS signals. The
RG-213 antenna cable carried the sipals to the GPS clock unit. The GPS dock generated
a stable local clock for system clocking as described in Section 5.2.1. An RS-232 cable
was connected between the GPS clock unit and the cornputer for control and operation
monitoring. The TTL interface circuit board converted the clock into a ?TL level output
to clock the AItera FPGA prototype board. The universal couter was used to measure
the GPS clock fiequency and its stability. The following sections show the development
resdts obtained during the research.
6.1 System simulation
For system simulation, dl of the system building blocks were built using a
combination of standard blocks in the Simulink library, basic logic gates andor math
functions. These math functions codd be in Madab or C progrartunhg language. Figure
6.2 shows the transceiver with appropriate RF interface. This simulation set-up
corresponds to the communication system mode1 of Figure 5.1 and the proposed MMDS
system. The simulation files are in Simulink (Le., Matlab) format and they are included in
[SI-
The s y s t a starts with an infonnation source. The source cornes fiom an ADC
that digitizes an analog signal generated by the analog source. Digital data is encoded
with an RS encoder, a convolutional interleaver and a differential encoder. The encoded
data is then mapped into QAM constellations (64 or 256) to generate two quadrature
signals: 1 and Q. The 1 and Q outputs of the QAM mappuig block are then sent to the
raised-cosine filters with appropriate roll-off factor, a, and data interpolation for filtering.
The QAM moddator modulates the fltered 1 and Q signals before sending them to the
transmission channel.
RUN qarnBb256a-sen~p
. . . .- . . - - . - . . Encoder Finenngl I
RS Oecodar z 1 Scope9 Scopef O Scopel 1 Scopel2
Figure 6.2 Matlab and Simulink simulation setup
The receiver reverses the transmitter process as s h o w in Figure 6.2. The RF
signal is QAM demodulated to recover the quadrature 1 and Q signals. These signals are
then filtered before sending them out to the QAM de-rnapping. The 1 s t process of the
receiver is the FEC decoding to correct any corrupted data f?om the channel.
As shown in the simulation set-up, a cornmon clock (Tsample) is used for both
transmitter and receiver. This indicates system clock synchronization must be established.
In addition to clocking various building blocks, the clock also enters a delay block. This
block generates a delay signal equal to the delay of the convolutional interleaver. This
delayed signal establishes the t h e synchronization between the transmitter and the
receiver.
In the simulation, Simulink scopes were connected to various points to veri@
fiinctions of the transceiver building blocks. For simplicity, only a few scopes are shown
in Figure 6.2. For example, scopes 1-5 captured the output wavefoms of the transmitter
building blocks, scope 6 and 8 displayed waveforms of the channel without noise and
with AWGN added and scopes 9-14 showed the output waveforms of the receiver
building blocks. The noise was adjusted and injected into the channel to a S N R level of
20 dB. This SNR is required for a raw BER of 105 at the receiver wiîhout FEC [3 ] . The
FEC should correct the errors to achieve a BER of 10-l2 at the output. Figure 6.3 shows
the input (scopel) and output (scopel4) wavefoms of the system simulation.
Input (Scope 1) Output (Scope 14)
Figure 6.3 Input md output wavefoms of the Simulink simulation
The receiver recovered the input signal that was properly encoded, modulated and
sent over the channel. The delay between the input scope and the output scope came f?om
the convolutional interleaving process and RS decoder latency. The result of this
simulation demonstrated a fùnctional systern. The next task was the hardware design and
implementation of the hi&-speed transceiver.
6.2 Transceiver FPGA implementation
Ushg rnanually generated Verilog code, various blocks of the baseband
~ansceiver were implemented in a FPGA. The blocks are described in Chapter 3 and
Chapter 4. Synthesis and simulation tools were used to simulate the blocks to veriQ their
functional and timing operations. The test bed for simulation requirements was provided
by the Altera MaxPlus II and the test bench was incorporated into the VeriIog code. The
input 2nd output ports of the FPGA prototype board were also used for testing. The
hardware simulation results were compared to the theoretical Matlab simulation results in
Section 6.1 for al1 of the building blocks. AI1 of the blocks implemented in the Altera
FPGA devices were checked to verifi that they worked correctly.
Hardware requirements and operating fkquencies of these blo cks are summarized
in Table 6.1. In this table, the hardware requirement is indicated by the number of h g i c
Cells (LC) of the FPGA in the designated Altera devices and the operating fiequency
indicates the speed obtained during t e s k g of the individual blocks.
Table 6.1 Prototype resources and operation fkequency of the transceiver
Building block
Baseband interface, rstndornization Convolutional interleaveddeinterleaver Differential codec, QAM mapping RS encoder
Altera FPGA
RS decoder B aseb and transceiver
EPF 1 OK20RC240-4
Number of LC's 40
1356/ 13 56 35 196
EPF lOK3ORC208-3
Speed (MHz)
40 20 40 40
12745 15728 -
EPF 10K20RC240-4
12 10
EPF 1 OK20RC240-4 EPF 10K200A EPF 10K100B and EP20K400
Due to the complexity of the RS encodeddecoder, the transceiver was
implemented using two FPGA devices. The fïrst device (the Altera EPFlOKlOOB)
consisted of the baseband interface, the convolutional interleaveddeinterleaver and the
differential codec along with the QAM mapping/de-mapping. The second device (the
Altera EP20K400) implemented the RS codec which inctuded the encoder and the
decoder. Only a small portion (2%) of the device was required for the encoder compared
to a large amount of hardware dedicated for the decoder.
The number of LC's in the results clearly shows that the main complexity of the
transceiver is in the implementation of the FEC (i-e., RS codec and data interleaving to
correct random and burst errors). The other building blocks are quite simple since they
require only basic logic gates to implement without using any complex algorithm.
Among the blocks, the RS decoder is the most complex and requires a substantial amount
of hardware to implement. As mentioned in Section 4.4, the cornplexity of the decoder
core is in the implernentation of the algorithm to find the error locator polynomial, o(x).
The algorithm required 7875 LC's to implement and occupied 62% of the decoder core
hardware.
Table 6.1 also shows the operating frequency of the transceiver various building
blocks. For the low speed Altera FPGA devices, most of the blocks operated at a data rate
of 320 Mbs (Le., a dock rate of 40MHz) except for the FEC (20MHz of the interleaver
and 12MHz for the RS decoder).
Following is an example of the results in the system block implementation using a
FPGA. The RS encoder was irnplemented into the Altera EPF20RC240-4 device on the
prototype board. The device was clocked with the GPS clock and comected to the logic
analyzer as shown in Figure 6.1. The waveform captured by the logic analyzer was
compared with the thwretical values to verify the bctionality of the encoder. The
captured waveform was identical to the corresponding simulation waveforms using either
Simdink or Altera MaxPlus II simulation. The output waveform of the RS encoder is
shown in Figure 6.4.
In this test, an 8-bit çounter generated an input data streâm of 0x00-OxEE (0-238
decimal). The data Stream entered the encoder and 16 parity symbols were generated. For
the data stream of 0x00-OxEE, the parity symbols had the values of 0x3A7 OxEC, 0x98,
OxX, 0x58, OxlF, 0x14, OxA8, 0x79, Ox3C, 0x20, OxOA, OxBF, OxA6, 0x04 and 0x65.
The parities were appended at the end of the data stream to form a 255-symbol codeword.
The cycle repeated when another 239 symbols fkom the data stream entered the encoder.
Curren t Sample P e r i o d = 8.000 ns N e x t Samole P e r i o d = 4.000 ns - -1' I
A c q u i s i t i o n Time -3.820 us 07 N a r 2000 12:05:47
CODE I CODE
CCIDE
CODE
CODE
CODE
CODE
16 parity symbols
1st codeword 2nd codeword
Figure 6.4 RS encoder waveform
The encoder output waveform captured by the logic analyzer of Figure 6.4 are
zoomed in to show the 8-bit codewords that are centered in the 16 parity symbols. Eight
channels of the logic analyzer captured 8 bits of the codeword (i.e., code 0-7) at the
output pins of the EPFZORC240-4 Ntera FPGA device. The waveform also shows the
10 MHz GPS clock was used. The code O in the waveform fiom 0x00-OxEE indicates the
clock. The 16 parity symbols are shown at the center of Figure 6.4. For simplicity, the
wavefoms of other transceiver building blocks are not shown.
As shown in Table 6.1, the RS decoder is the block that limits system data rate.
Therefore, higher throughput of the RS decoder core is essential for the implementation
of the hi&-speed transceiver. The most significant achievement in the implementation of
the hi& bit rate transceiver was the successful design of the RS decoder. Without this
hi&-speed codec, the desired bit rate of 2OOMbs for the MMDS system could not be
achieved.
The speed of the designed RS decoder can not be increased without using new
architectures for G F ( ~ ~ ) arithmetic developed during the research including the
multipIication and the inversion circuits. Of the two, the low latency inversion circuit was
the most important circuit and was used to increase the RS decoder core operating
kquency. The use of the new inversion circuit architecture increased the RS decoder
throughput by 25% compared to the use of the proven low latency inversion circuit.
The improved multiplication circuit was also used throughout the decoder core,
mostly to reduce the hardware requirement to impiement the algorithm fbding the error
locator and the mor magnitude polynomials. The multiplication circuit mitigates the RS
codec complexity (Le., smaller, low cost transceiver) while the inversion circuit uicreases
the codec symbol rate (i-e., higher system speed).
A comparison between previous test circuits and new architectures for
multiplication and inversion of the G F ( ~ ~ ) has been performed and s h o w in [56]. For
previous test circuits, the LSB-first multiplication circuit described in [45] and the
architecture for inversion circuit descrïbed in [52] were used. The multiplication and
inversion for the new architectures G F ( ~ ~ ) arithmetic are desmied in Section 4.4.2 and
Section 4.4.4. Verilog code was written to describe the circuits. The code was
synthesized in MaxPlus II@ to implement the circuits in the EPF 10K20RC240-4 Altera
FPGA device and synthesis to 0.5um CMOS ASIC for comparison. Simulations were
performed to verify the funceion and the delay of the circuits. The delay was measured as
the time between the input and the output- Table 6.2 shows the comparison results
between the previous test circuits and the new circuits for G F ( ~ ~ ) in hardware
requirement and circuit latency.
Table 6.2 Comparison of the G F ( ~ ~ ) arithmetic implementation
In this table, the number of LC's indicates the required hardware to implement the
circuits. Results show the new inversion circuit outperfoms the previous test circuit by
G F ( ~ ~ ) Arithmetic
Multiplication
Inversion
New circuit 1561
Number of LCs
53
370
Previous test circuit [52]
DeIay (ns) 22 1 71
Numberof LC's
54
381
Delay (ns) 23
11 O
3% in hardware complexity and by 30% in delay. The improved multiplication circuit is
also simpler to implement with less delay,
For the multipliers, even with only a small difference in hardware requirements
between the two architectures (see Table 6.2), a substantial amount of hardware has been
reduced in the RS decoder as a large number of multipliers were used to implement the
a r e . The results clearly indicate that the use of the new architectures reduces complexity
and latency of the GF arithmetic circuitry which in turn makes the design of the high-
speed MMDS system possible.
The comparison of the inversion circuits for other GF(2") was also performed and
the results were presented in [56]. Table 6.3 shows the comparison of the hardware
complexity and latency of the inversion circuits for different degrees of the finite fields.
The degree, rn, of the fields is ftom 3 to 10.
Table 6.3 Cornparison of the GF(23 inversion implernentation
Degree
m 3 4 5 6 7 8 9 10
Proposed Low Latency Inversion Circuit r561
Previous Test Inversion Circuit [52]
Logic Cells
3 18 75
110 196 370 50 1 678
Logic Cells
3 14 61 118 209 381 493 655
Delay (ns) Delay (ns)
12 19 39 61 8 1
110 120 134
Substantial irnprovement in both complexisr and latency can be obtained using
the proposed inversion circuits when the degree m of the finite field increases. The results
show an increase of at least 25% in operation fiequency using the low latency inversion
circuits for m=7 to 10. For low degrees of the GF, m=4 and 5, the proposed circuit
requires a small amount of additional hardware in order to gain lower latency. This is due
to the implementation complexity of the multipliers in these fields. The hardware
required to implement these multipliers depend on the field generating polynomials.
Table 6.4 lists the standard field generating polynomials used in the GF(2") for
comparison. These finite fields can be found in [Il].
Table 6.4 GF(23 field generating polynomials p(X)
Further investigation in the irnplementation of the transceiver revealed that the
speed of an ASIC would exceed 3 times the speed of the FPGA prototypes.
1. The RS encoder was synthesizsd into 0 . 5 ~ CMOS technology by Mr. Neil
McLeod at TRLabs Saskatoon (September 1999). This ASIC simulation operated at a
data rate of 920 Mbs (i.e., an operating fiequency of 1 1 5 MHz). The speed increased 2.8
times using a relatively old technology (the 0 . 5 ~ CMOS) compareci to the FPGA
irnplernentation.
2. Al1 of the HDL code of the GF(2") inversions was also synthesized into 0Spn
CMOS. Table 6.5 shows the speed comparison beîween the FPGA prototypes and the
ASIC simulation. The results fiom this table clearly show that an ASIC conversion can
m 3
Primitive Polynomial p(X) 1+x+x3
increase the speed by 2.4 up to 5 times over the FPGA prototype. In particular, the
inversion of G F ( ~ ~ ) increased the speed by 4.7 times with the conversion (highlighted in
Table 6.5)-
Table 6.5 Delay time of the inversion circuit in GF(Zm)
As shown in Section 4.4.4, the inversion time of GF(~') is critical for the
operation speed of the RS decoder core in the MMDS transceiver. The hi&-speed RS
GF degree
decoder depends solely on this inversion time. As explained in Chapter 4, the
ASIC delay
FPGA delay
improvement in the speed of the RS decoder determines the final speed of the complete
Speed incrernent
Results of the partial synthesis of the HDL code indicate that an ASIC conversion
shouId operate at least 3 times faster than the current FGPA clock rate of 10MHz. A
speed improvement only 2.5 tunes the FPGA prototype is required to have an ASIC
transceiver operate at 2OOMbs (i.e., 25MHz). Therefore, a data rate of 2OOMbs can be
achieved easily using current technology (the 0 . 1 8 ~ CMOS) when the synthesizable
HDL code is implemented into an ASIC. nie ASIC implementation of the transceiver is
out of the scope of this thesis and is suggested for future work.
6 3 GPS clock testing
The GPS clock unit was set up and connected in the lab as shown in the
coni@ration of Figure 6.1. The GPS antenna was comected to the unit through a 30m
antenna cable. The dock output fiom the unit was comected to the TTL interface circuit
to convert the lOMHz sinewave to a squarewave T ï L output clock. The clock was used
to clock the FPGA device to implement the transceiver building blocks.
The precision and stability of the clock were measured using the Hewlett Packard
Mode1 53132A Universal Counter. This counter is a 12-digit, 150ps time interval
resolution counter which provides very accurate fiequency count. Once locked, the GPS
dock frequency was stable at 10 MHz with an accuracy of k5.10-[[ according to the
fiequency counter (Le., M.05Hz out of 10 MHz). The fiequency remained constant
provided that the unit was locked to the GPS satellites.
The antenna was discomected to the dock unit to simulate a system failure. The
unit was unlocked to the GPS and the output fkequency remainecl at the last value before
failure. However, over a 48 hour period while being unlocked, the fiequency drift& by
IO-'' (i-e., 0.1 Hz out of 10 MHz). This was the fiequency drift of the intemal crystal of
the GPS clock while it remained unlocked to the GPS. The antenna was then re-
comected to the unit, the GPS clock re-locked to satellites and continued to operate in
locked mode. The output fkequency retunied to the value before the failure occwed. The
re-locking time after failure was only 3 minutes since the unit required Iocking to only
one satellite to acquire time transfer mode (Le., the 'location is known and the unit
requires to obtain only the h o w n time). There was no change in the clock frequency if
the unit lost satellite signals in a period of less than 2 hours and started the re-locking
process. This test is for the case of a short disruption of the GPS.
The clock system has operated for seven months since it was installed in the lab.
The ?TL GPS clock output was used to clock the FPGA prototype board for system
development as shown in Figure 6.1. The stability testing and the clock operation
indicated that an accurate, stable and reliable fiequency can be obtained fiom the GPS for
system clocking. The accuracy of the clock exceeds Stratum 2 level used in a second
node communication networks such as at the base stations.
As a result, the precision of the GPS clock is sufficient and it can be used for the
MMDS system dock reference. The clock is used in the transmitter, sent and then
received at the receivers. The receivers recover the clock for system synchronization
purposes (Le., clock and fiame synchronization). With this loop timed GPS clock, the
system clock is traceable to the UTC and the MMDS system can be connected directly to
the global communications network.
CBAPTER 7
CONCLUSIONS, CONTRIBUTIONS AND FUTURE WORK
This final chapter presents some conclusions drawn fiom the research and
proposes areas in which future work can be conducted on to Mprove the system.
7.1 Conclusions
The demand for faster data delivery services puts pressure on the development of
hi&-speed communication systems and MMDS is no exception. High-speed systems are
in demand to deliver data seMces through the air using microwave signals. The use of
MMDS systems has many benefits over the wire networks. This is due to the advantages
of a fast and low cost system installation, higher signal reliability, and higher channel
capacity. In addition, an M M D S system is capable of handling the accelerating demands
of hi&-speed data services and bandwidth limitation. To be competitive with existing
video and data services over the air, the high-speed MMDS system must be low cost and
reliable. To develop such systems, two objectives were set for the research:
1. To investigate into the implernentation of a high bit rate (200 Mbs) transceiver
in a compact high-speed ASIC.
2. To find suitable system synchronization scheme for the low cost MMDS
systems.
System simulation is the f h t step to ensure the designed system is functional and
the system can be irnplemented into hardware.
The results fkom system simulation using Simulink and Matlab have demonstrated
2 functional system. The simulation systern building blocks were built using basic gates
which are analogous to hardware irnplementation. This simulation method ensures
hardware can be built to realize math functions and algorithms used in the system
realization.
Based on DAVIC Version 1.2 specifications, various building blocks of the hi&-
speed MMDS transceiver were implemented using FPGA prototypes. The design uses
manually written synthesizable Verilog HDL to describe al1 of the transceiver building
blocks. The code can be used to implement the transceiver in either FPGA or ASIC.
It has been found that the system data integrity protection is very expensive to
implement, namely the forward enor correction scheme of the transceiver. This includes
the Reed-Solomon codec and the byte interleaving to correct both random and burst
mors causing by the channel. Besides the need for extensive hardware to implement, this
FEC is also the system speed limitation. In general, the most difficult task in the hi&-
speed MMDS system implementation is to ensure a system BER of 10-12. Much of the
effort and resources were allocated to the FEC of the transceiver to insure data integrity
and to increase system data rate. Extensive hardware resources were used to innplement
the algorithm of the FEC, especially the RS decoder. The main limitation of the speed
came fiom the implementation of the GF(~') arithmetic used in the RS decoder, namely
the multiplication and the division of the elements in this field.
As the problems were identified, the research concentrateci on the development of
a low cost, efficiency, hi&-speed FEC, in particular the hi&-speed (255,239) RS codec.
This included a theoretical investigation of RS error correction and efficient
implernentation of the chosen architecture. Operational speed of the RS decoder core was
cntical because this decoder throughput limited the overall transceiver data rate. The
other system building blocks had speeds fa. beyond system requirement, even though
they were implemented using relatively slow FPGA devices.
The speed improvernent of the RS decoder is obtained fiom the new VLSI
implementations of the GF(~') arithmetic. The most complex calculation in this field is
the division. This calculation includes ul inversion of an elexnent and a multiplication of
the two elements in the field. Between the two calculations, the inversion speed is more
important since the decoder throughput relies on ttiis inversion time. A low latency
inversion circuit for GF(~*) has been developed during the research which results in the
speed increase of the transceiver. This low latency inversion circuit had an improvernent
of 3% in hardware and 28% in latency over the pre-tested circuits. In addition to speed
improvernent, a new decoding algorithm of an RS decoder (Le., the division-fiee B-M
algorithm) was used to reduce the hardware complexity of the core. Using written
Verilog code, the designed transceiver was implemented into Altera FPGA devices. The
prototypes have achieved a system bit rate of 80 Mbs.
The research continued with the investigation of using a GPS dock for system
synchronization. Timing and synchronization are critical in the design of any digital
communications system. The synchronization process ensures the integrity of transmitted
data by clocking d l the elements in the network at the sarne rate. The MMDS system
requires a robust synchronization in order to be able to transmit a hi&-speed data stream
and to ensure - its data integrïty. In the absence of SONET and SDH, a precise frequency
derived fiom the Global Positioning System is to be used for the MMDS system
reference clock instead of crystals. This GPS clock fkquency and time have a very high
accuracy and they are directly and traceable continuously to the UTC. This GPS clock is
a cost-effective way to equip the system with a precise reference frequency for hi&-
speed data transmission since it has many advantages over conventional oscillator
crystals. These include precision, stabilim reliability, availability and low cost set-up.
A GPS clock was set up and tested during the research. The clock locked onto
GPS satellites and generated a local clock. Testing results demonstrated a high precision
(IO-'') and a stable dock generated using the GPS fiequency and time. This precision is
adequate for the high-speed MMDS systern clock reqiiirement. An interface circuit was
built to convert the GPS clock into a TTL output level. This TTL output clock wris used
to clock the FPGA prototype board and the board was used to design and to develop
various blocks of the transceiver.
To synchronize the high-speed MMDS system, a GPS clock is to be installed at
the transrnitter and the clock is received and recovered at the surroundhg receivers using
an earl yAate gate s ynchronization technique. This s ynchronizztion scheme ensures both
trammitter and receiver sire synchronized in both frequency and tirne. In addition to
system simplicity, the proposal of using a single GPS clock at the base station makes
more sense fiom the cost point of view. Another important feature of GPS clock is its
traceable ability to the UTC (which is used as a standard time for the global
communication networks). This property of the GPS ciock allows the MMDS system to
synchronize pro perl y to o h r hierarchical communications nodes.
In addition to clock synchronization, the MMDS system also utilizes fiame
synchronization. This includes the insertion of synchronization and inverse
synchronization bytes into the MPEG2-TS data packets. A high reliable marker is also
used to increase data integrity at the receiver end. In conclusion, the combination of GPS
clock and frame synchronization provides the robustness and reliability for the MMDS
synchronization requireinents while maintaining a simple, low-cost system.
7.2 Contributions
The resufts show that the Matlab and Simulink source code simulates the entire
MMDS system successfully. The code is available in a TRLabs report [5 ] and can be
used to simulate other wired or wireless communication systems. The code can be
changed easily to simulate any particular s ystem (Le., different decoduig scheme,
modulation technique, degree of QAM, filter, RF fiequency, channel characteristics,
etc.. .).
The synthesizable HDL code provided in [5] will benefit both industries and other
researchers in M e r study or implementation of any baseband wireless transceiver. A
system data rate of 2OOMbs with a ~ ~ ~ - 4 0 - l ~ over a wireless channel can be achieved
using this transceiver providing that the HDL code is implemented into an ASIC. The use
of this transceiver is not limited to MPEG-2 data. The designed transceiver will be able to
transmit and receive any data Stream.
The novel clocking scheme of the c~nvolutional interleaver and deinterleaver has
a great impact on the complexity of the block. Data transfer in the block is clocked using
its own counter to reduce the number of stored devices (i.e., flip-flop). A reduction over
80% in hardware implementation has been obtained.
The designed (255,239) RS decoder core m s at l e s t 25% faster and is 35%
srnaller in size compared to other implementations. The reduction in hardware and the
gain in speed are obtained by using a division f?ee B-M algorithm and efficient GF
arithmetic circuits. The GF(2m) arithmetic VLSI circuits c m also be used in other areas
that use abstract algebra such as cryptography. The new algorïthms reduce hardware
complexity while increasing the operathg speed of these circuits. The most important
improvement is the parallel architecture of the inversion circuit which was developed in
the thesis. Al1 the HDL code for the tested GF(2m) arithrnetic circuits is included in [5].
The use of a precision GPS clock for synchronization is a novel application. The
two important characteristics of the GPS clock are its precision (10"' in Grequency) and
its time reference (Le., universal standard time). This makes the synchronization of the
MMDS system with any global communication networks simple and robust- The use of a
single GPS dock and the replication of the clock at ali of the receivers provides a t h e -
loop clock in a very cost effective marner compared with the use of expensive crystal
oscillators.
7.3 Future work
There are some areas in which work and research c m be done to irnprove
the system performance- The immediate work required is the fabrication of the
transceiver in an ASIC using current available technology. This is straigh~orward since
the Verilog code is synthesizable therefore code modification is not required. The desired
speed of 200 Mbs should be achieved when the final ASIC is built. Fn the FPGA
implementation, the transceiver was implemented using 2 devices: one is dedicated for
the RS codec and the other is for the other transceiver building blocks. However, when
the ASIC is made, the transceiver should be in one ASIC if possible due to the one-chip
solution advantages. The systern-on-a-chip solution not only reduces the cost of the
ASIC, it also eliminates the need for any interface which is required for data transfer
between the chips.
The GPS clock synchronization will be complete with the verification of data
transfer. A complete system set-up with the RFLF circuits is required to perform the test.
Models of the trammitter and the receiver must be buik in order to send and receive test
data.
Another direction of the future work is the investigation of the use of TCM in the
system. TCM c m be used to increase system coverage given its 3dB code-gain as shown
in Section 2.2.6. The TCM encoder can be represented as a finite-state machine since the
codes follow a trellis structure. It is quite simple to irnplement the encoder in VLSI using
HDL. In contrast, the work is extensive to design the TCM decoder as in the case of RS
decoder. The decoder is very cornplex because it requires the use of special decoding
algorithm. There are rnany available decoding algorithms that cm be used to decode the
TCM convolutional code [Il]. Arnong them, the Viterbi algorithm using in the maximum
likelihood method has the best performance in both hardware implementation and speed.
Many studies and implementations of the Viterbi algorithm used in TCM for various
constraint lengths have been published [ 1 8-28].
The challenge in the design of this TCM decoder lies in its long tracking length
because the complexity of the TCM dccoder is proportional to this tracking length. More
details in the implementation of the Viterbi decoder can be found in 125,271. The
integration of this TCM decoder into the system eventually increases the transceiver
cornplexity. As the result, the system operation speed and cost of the ASIC have to be
considered.
Further study is on the MMDS trammitter. Since the designed system used 256
QAM for data transmission, the transmitter requires higher SNR and more stnngent
linearity specifications. The geometry of the signal constellation reveds the distance
bctween an ideal symbol and a decision boundary decreases with increasing bits per
symbol (Le., fiom 64 QAM to 256 QAM as shown in Figure A.3 and Figure A.4). As a
result, Gaussian noise, local oscillator phase noise, mixer and amplifier non-lineariw
fiom the trammitter are more likely to create symbol errors in the 256 QAM. Among the
components of the transmitter, the IF signal processing circuitry is a critical part of the
transrnitter performance. In particular, frequency response, delay correction and linear
correction of the circuits should be fully understood. The up-conversion fiom IF to RF
also requires a hi& phase noise performance of the local oscillator. For 256QAM, a
phase shift of 3.7 degrees wilI destroy the signal. One suggestion for better phase noise
performance is to multiply up a voltage controIled crystai osci1lator phase locked to a
GPS reference fiequency which is available at the trammitter site. The non-linearity
characteristic of the amplifier cm be improved using feedforward linearization technique.
The principle of the technique is to sample the input and output of a high power amplifier
and subtract the two signals. This distortion result is amplified and injected out of phase
with the ha1 hi& power output. In th is manner, the distortion of the high powex
amplifier is canceled.
Charnel fkequency response of the system is also a topic for future research. The
response other than the ided of unity will dimpt the r a i d cosine response and create
intersymbol interference. In addition, the effects of multipath delay and fading are
interesting subjects to investigate. The multipath effect cm make the wavefom no longer
crosses zero at every symbol tirne. The study of the adaptive equalizer to eIiminate
intersymbol interférence due to multipath propagation in this radio channel is essential-
REFERENCES
[1] Lawrence Behr Associates, "Wireless Evolution, Definition and Current Practice," Technical Note 1 15, Lawrence Behr Associates, Inc., Greenville, North Carolina 27835, USA, 1999.
[23 CA1 Wireless Systems Inc., "MMDS Wireless CabIe Backgromder," C M Technology Updates, CA1 Wireless Systems Inc., June 1998.
[3] David Urban, "MMDS Transmitter for High Data Rate Digital Video Delivery," ADC Technical papers, ADC Telecommunications, Microwave S ystems Division, PA, USA, July 1997.
[4] Digital Audio-Visual Council (DAVIC), "The DA V7C 1.2 Speczfications, " DAVIC, Geneva, Switzeriand, 1997.
153 A. Dinh and R. J. Bolton, "TRCabs Research Report, " Regina TFUabs, 1 08-2 Research Drive, Regina, Saskatchemn, Canada, July 2000.
[63 The International Organization for Standardization, "Coding of audio, picture, multimedia, and hypermedia information pnor to fkaming in the multirate stnicture," [ISOIIEC 1381 8-11 ISO/EC Document 13818-1.
171 1. S. Reed and G. Solomon, 4'Polynomial Codes over Certain Finite Fields," J. Soc. Ind. AppZ. Math., Vol. 8, pp. 300-304, June 1960.
[8] Robert J. McEliece, The Theory of Information and Coding, A Mathematical Framework foi- Communication, Addison-Wesley Publishing Company, 1997.
[9] Advanced Hardware Arcb&ectures, hc, "Primer: Reed-Solomon Error Correction Codes (ECC)," AHA Application Note, Doc. # ANMOI-0395, AHA Inc., Pullman, WA, USA, f 996.
[IO] Co-Optic Inc., "COic5130A Specifications, Programmable Reed-Solomon Error Correction Encoder and Decoder," Co-Optic Inc, Pa10 Nto, CA, USA, 1998.
11 11 Shu Lin and Daniel J. Costello, Jr., Ewor Control Coding, Furzdamentals and Applications, Prentice Hall, New Jersey, 1 983.
[ 1 21 Bernard S klar, Digital Communications, Fundamentah and Applications, Prentice Hall, Englewood CliRs, New Jersey, 1988.
[13] J. M. Hsu and C. L. Wang, "An Area-Efficient Pipelined VLSI Architecture for Decoding of Reed-Solomon Codes Based on Time-Domain Algo~ithm," IEEE Transactions on Circuits and Systems for Video TechnoZogy, Vol. 7, NO. 6, pp. 864-87 1, Decernber 1997.
[14] C . C. Hsu, 1. S. Reed and T. K_ Truong, "Use of the RS Decoder as an RS Encoder for Two-Way Digital Communications and Storage Systems," EEE Transactions on Circuits and Systems for Video Technology, Vol. 4, No. 1, pp. 9 1-92, February 1994.
[l5] Altera Corporation, "lnterleaver/Deinterleaver Megacore Function," Solution Brief 42, Altera Corporation, 1 0 1 Innovation Drive, San Jose, CA, USA, 95 1 34, June 1 999.
[16] G. Ungerboeck, 'The state of the art in Trellis Coded Modulation," Coded Modulation and Bandwidth-Eflcient Transmission, Edited by E. Biglieri, M. Luise, Elsevier Science Publishers B. V., N .Y., USA, pp. 3-14, 1992.
[17] William Webb and Lajos Hanzo, Modem Quadrature Amplitude Modulation, Princ@les and Applications for Fixed and Wireless Communications, Pentech Press Publishers, London, England, 1994.
[18] S. Benedetto, C . Guerra, M. Mondin, A. Pincetti and F. Pasello, "Receiver Design for 8-PSK Trellis Coded modulation in a TDMA Burst Mode Satellite Link," Coded Modulation and Bandwidth-Eficient Transmission, Edited by E. Biglieri, M. Luise, Elsevier Science Publishers B. V., N .Y., USA, pp. 103-1 16, 1992.
[19] Simon Haykin, Communications System, 3rd Edition, John Wiley & Son, New York, 1994.
1201 Chang C. Y. and Yao K., "Systolic Array Processing of the Viterbi Algorithm," IEEE Tmns. on Information Theoly, Vol. 35, No. 1, pp. 76-86, January 1989.
[21] B. A. Harvey, "Adaptive Viterbi Decoding for ARQ and Reduced Complexity Decoding," Proceedings of The 10th International Conference on Wireless Communications, Calgary, Canada, Vol. 1, pp. 239-250, July 6-8, 1998.
[22] Y. Savaria, F. El-Hassan, H. Khali and M. Sawan, "An Effective Hardware Software Implementation of a Viterbi Decoder Using an FPGA-based Reconfigurable Computïng Platform," The 5th Canadian Workshop on Field-Programmable Devices (FPD'98): Technology, Tools and Applications, École Polytechnique de Montréal, June 7- 10, 1998, Montréal, Québec, Canada.
[23] Olaf J. Joeressen and Heinrich Meyr, "A 40 Mb/s Soft-Output Viterbi Decoder," LEEE Journal on Solid-State Circuits, Vol. 30, No. 7, pp. 8 12-8 18, July 1995.
[24] A. M. Michelson and A. H. Levesque, Evor Control Techniques for Digital Communications, John WiIey & Sons, New York, 1985.
[25] A. Dinh, R. Mason and J. Toth, "High-speed V.32 Trellis EncodedDecoder Implementation using FPGA," IEEE International Symposium on Circuits and Systems (ISCAS '99) Proceedings, Orlando, Florida, pp. IV-295 to IV-298, May 30-June 2, 1999.
[26] Mansoor A. Christie, 'Viterbi Implementation on the TMS320C5x for V.32 Modems," Digital Signal Processing Applications- Semiconductor Group, Document # SPR4099.pdfi Texas Instniments Incorporated, Texas, 1 996.
[27] Lihong Jia, Yonghong Gao and Jouni Isoaho, "Design of a super-pipelinecl Viterbi Decoder," ISCAS '99 Proceedings, Orlando, Florida, pp. 1-132 to 1-136, May 30-June 2, 1999.
1281 Chi-Young Tsui, Roger S. K. Cheng and Curtis Ling, " Low Power ACS Unit Design for the Viterbi Decoder," lSCAS'99 Proceedings, Orlando, Florida, pp. 1-137 to I- 141, May 30-June 2, 1999.
[29] Wayne Tomasi, Advanced Electronic Comrnunications Systems, 3rd Edition, Prentice Hall, Englewood Cliffs, N. J., USA, 1994.
1301 Martin S. Roden, Analog and Digital Comrnunications Systems, Prentice Hall, Englewood Cliffs, N. J., USA, 199 1.
[3!] Kami10 Feher, Advanced Digital Communications, System and Signal Processing Techniques, Prentice Hall Inc, Englewood Cliffs, N. J., USA, 1987.
[32] A. Dinh, R. J. Bolton, R. Mason, and R. Palmer, ccMulti-channel Multi-point Distribution Services System Transceiver Implementation," LEEE Pacifie Rim Conference on Communications, Cornputers and Signal Processing Proceeding, Victoria, B.C., Canada, pp.242-245, August 22-24, 1999.
[33] Shunghoon Kwon and Hyunchul Shin, "An Area Efficient VLSI Architecture of a Reed-Solomon DecodedEncoder for Digital VCRs," LEEE Trans. on Consumer Electronics, Vol. 43, No. 4, pp. 10 19- 1027, November 1997.
[34] Dariush Dabiri and Ian F. Blake, "Fast Parallel Algorithms for Decoding Reed- Solomon Codes Based on Remainder Poiynomials," E E E Trans. on In formation Theory, Vol. 41, NO. 4, pp. 873-885, July 1995.
[35] Tetsuo Iwaki, Toshihisa Tamaka, Eiji Yamada, Tohru Okuda and Taizoth Sasada, "Architecture of a High Speed Reed-Solomon Decoder," IEEE Trans. on Consumer Elecfronics, Vol. 40, No. 1, pp. 75-8 1, Februa. 1994.
[36] H. M. Shao, T. K. Truong, L. J. Deusch, J. H. Yuen and 1. S. Reed, "A VLSI Design of a Pipeline Reed-Solomon Decoder," IEEE Trans. on Cornputers, Vol. C-34, No. 5, pp. 393-403, May 1985.
[37] Keeichi Iwamura, Yasumori Dohi and Hideki Imai, "A Design of Reed-Solomon Decoder with S ystolic-Array Structure," IEEE Trans. on Cornputers, Vol. 44, No. 1, pp. 1 18-122, January 1995.
1381 Y. R. Shayan and Tho Le-Ngoc, "A Cellular Structure for a Versatile Reed-Solomon Decoder," LEEE Transactions on Computers, Vol. 46, No. 1, pp. 80-85, January 1997.
1391 S. R. Whitaker and J. A- Canaris, "Xeed-Solomon VLSl Codec for Advanced Television," LEEE Transactions on Circuits and Systems for Video Technology, Vol. 1, No. 2, pp. 230-236, June 199 1.
[40] J. C. Huang, et al., "h Area-Efficient Versatile Reed-Solomon Decoder for ADSL," ISCAS'99 Proceeding, Orlando, Florida, USA, pp. 1-5 17 to 1-520, May 30-June 2,1999.
[41] Hynman Chang and Myung H. Sunwoo, "A Low Complexity Reed-Solomon Architecture Using the Euclid's Algoritfim," ISCAS'99 Proceeding, Orlando, Florida, USA, pp. 1-5 13 to 1-5 16, May 30-June 2,1999.
1421 A. Dinh and R. J. Bolton, "Design of a High Speed (255,239) Reed-Solomon Codec," IEEE Canada Wescanex '99 Proceeding, Calgary, Alberta, Canada, October 29- 30, 1999.
[43] Yousef R. Shayan and Tho Le-Ngoc, "Modified Time-Domain Algorithm for Decoding Reed-Solomon Codes," IEEE T'ns. on Communications, Vol. 41, No. 7, pp. 1036-1038, July 1993.
[44] Leilei Song and Keshab K. Parhi, "Low-Energy Software Reed-Solomon Codecs Using Specialized Finite Field Datapath and Division-Free Berlekamp-Massey Algonthm," ISCAS'99 Proceeding, Orlando, Flonda, USA, pp. 1-84 to 1-89, May 30- June 2,1999.
[45] S. K. J a h , L. Song and K. K. Parhi, "Efficient Semi-Systolic Architecture for Finite Field Anthmetic," IEEE Trans. on VLSJ Systems, Vol. 6, No. 1, pp. 10 1-1 13, March 1998.
[46] C. Paar, P. Fleischmann md P. Roelse, "Efficient Multiplier Architecture for Galois Fields GF(z~")", IEEE Trans. on Cornputers, Vol. 47, No. 2, pp. 162-169, February 1998.
[47] L. Song and K. K. Parhi, "Low-Complexity Modified Mastrovito Multipliers Over Finite Field G F ( ~ ~ ) , " ISCAS'99 Proceeding, Orlando, Florida, USA, pp. 1-508 to 1-5 12, May 30-June 2, 1999.
[48] C. C. Wang, T. K. Truong, et al., ''VLSI Architectures for Computing Multiplications and Inverses in GF(2m)," IEEE Transactions on Cornputer, Vol. C-34, No. 8, pp. 709-7 17, October 1998.
[49] Sebastien T. J. Fem, Mohammed Benaissa and David Taylor, "GF(2m) Multiplication and Division Over the Dual Basic," lEEE Trans. on Computers, Vol. 45, No. 3, pp. 3 19-227, March 1996.
[50] J. H. Guo and C . L. Wang, " A Low Time-Complexity, Hardware-Efficient Bit- Parallel Power-Sum Circuit for Finite Field GF(2q," ISCASY99 Proceeding, Orlando, Florida, USA, pp. 1-521 to 1-524, May 30-fune 2, 1999.
[51] M. A. Hasan, 'cDouble-Basis Multiplicative Inversion Over GF(2m)," LEEE Tram. on Computer, Vol. 47, No. 9, pp. 960-970, September 1998.
[52] Shyue-Win Wei, "VLSI Architectures for Computing Exponentiations, Multiplicative Inverses, and Divisions in GF(2m)," IEEE Trans. on Circuits and Systems- II: Analog and Digital Signal Processing, Vol. 44, No. 10, pp. 847-855, October 1997.
[53] 5. H. Guo and C. L. Wang, "Systolic Array hplementation of Euclid's Algorithm for Inversion and Division in GF(2=)," IEEE Transactions on Computer, Vol. 47, No. 10, pp. 1 16 1-1 167, October 1998.
[54] C. C Wang, S. K. Truong, et al., V L S I Architectures for Computing Multiplications and Inverses in GF(2m)," E E E Trans. on Computer, Vol. C-34, No. 8, pp. 709-717, October 1998.
1551 Yong-Jin Jeong and Wayne Burleson, "VLSI Array Synthesis for Polynomial GCD Computation and Application to Finite Field Division," IEEE Trans. on Circuits and System-1: Fundamental Theory and Applications, Vol. 4 1, No. 12, pp. 89 1-897, December 1994.
[56] A. V. Dinh and R. J. Bolton, "A Low Latency Architecture for Computing Multiplicative Inverses and Divisions in GF(2")," Canadian Conference on Elec~ical and Computer Engineen-ng (CCECE 2000) Proceeding, Halifax, Nova Scotia, Canada, pp. 43-47, May 7- 1 0,2000.
[57] John C. Bellamy, "Digital Network Synchronization," IEEE Communications Magazine, Vol. 33, No. 4, pp. 70-83, April 1995.
[Sa] J. J. Stiffler, Theory of Synchronous Communications, Prentice Hall, USA, 197 1.
[59] William C. Lindsey, Synchronization Systems in Communication and Control, Prentice Hall, USA, 1972.
[60] Jack Smith, Modem Communication Circuits, McGraw-Hill International Editions, 1996.
1611 J. Das, S. K. Mullick and P. K. Chaîîerjee, Princ@les of Digital Communications, John Wiley & Sons, 1986.
[62] G. Smith and J. Kates, "GPS precise time for VME bus," VîME Bus Systems, pp. 27- 48, ApriVMay 1996.
[63] A. V. Dinh, R. J. Bolton, R. J. Palmer and R. Mason, "Multichannel Multipoint Distribution Services System Synchronization Using Global Positioning System Clock" Canadian Conference on Electrical and Compter Engineering (CCECE 2000) Proceeding, Halifax, Nova Scotia, Canada, pp. 875-879, May 740,2000.
[64] P. Enge and P. Misra, "Scanning the Issue/Technology, Special lssue on Global Positioning System," Proceeding of the IEEE, Vol. 87, No. 1, pp. 3- 15, January 1999.
[65] W. Lewandowski, J- Azoubib, and W. Klepczynski, "GPS: Primary Tool for Time Transfer," Proceeding of the LEEE, Vol. 87, No. 1, pp. 1 63- 1 72, January 1 999.
[66] Elliot D. Kaplan, Understanding GPS- Principles and Applications, Mobile Communications Senes, Artech House Publisbers, Boston, USA, 1996.
[67] J. B. Bullock, et al., 'Test results and analysis of a low cost core GPS receiver for tirne transfer applications," Motorola Position and Navigation Systems Business, Presented at the 1997 IEEE Frequency Control Consortium in Orlando, Flonda, USA, 1997.
[68] Steven C. Fisher and Kamran Ghassemi, "GPS IIF-The Next Generation," Proceeding of the E E E , Vol. 87, No. 1, pp. 24-32, January 1999.
[69] M. S. Braasch and A.J. Dierendonck, "GPS Receiver Architectures and Measurements," Proceeding of the IEEE, Vol. 87, No. 1, pp. 163- 172, January 1999.
[70] Charles C. Counselman, "Multipaih-Rejecting GP S htenna", Proceeding of the IEEE, Vol. 87, No- 1, pp. 86-9 1, January 1999.
[71] Absolute Tirne, "Mode1 100A/B GPS Clock User Manual," Absolute Time Corporation, San Jose, California, October 1996.
[72] Linear Technology, "LTI 704 Data sheet ", Linear Technology Inc, USA, 1998-
[73] Boaz P. Shamir and Sergio Rajsbaum, MIT, "A Theory of Clock Synchronization," Proceeding 26th Symp. on Theory of Computing, May 1994.
[74] E. A. Lee, D. G. Messerschmitt, Digital Communication, 2nd Edition, Kluwer Academic Pub, 1997.
[75] Aldo Nunzio D'Andrea and Marco Luise, "Optimization of Symbol Timing Recovery for QAM Data Demodulators," IEEE Tram. on Commzknications, Vol. 44, No.3, pp. 399-406, March 1996.
[76] Daeyoung Kim, Madihally J. Narasunha and Donald C. Cox, "Design of Optimal Interpolation Filter for S ymbol Timing Recovery," IEEE Tr-ans. on Communications, Vol. 45, No.7, pp. 877-884, July 1997.
[77] Aitera Corporation, "EarlyLate Gate Synchronizer Megabction," Solution Brief I 7, Altera Corporation, 1 0 1 Innovation Drive, San Jose, CA, USA, June 1 997.
1781 C. R. Cab, "Performance of digital phase modulation communication systems", IRE Truns. on Cornmunicutions, Vol. CS-7, pp. 3-6, May 1959.
[79] J. G. Proakis, Digital Communications, McGraw-Hill, New York, New York, 1983.
[80] David C. Buchthal and Douglas E. Cameron, Modem Absb-act Algebra, Prindle, Weber & Schmidt Publishers, Boston, USA, 1987.
[81] J. H, van Lint, htroduction to Coding neory, Second Edition, Springer-Verlag, Berlin, Gcxmany, 1992.
1821 Man Young Rhee, Ewor Cornecring Coding Theory, McGraw-Hill, New York, New York, 1989.
APPENDIX A: QAM constellations
Since its discovery in the e d y 1960s, QAM has continued to gain interest and
practical applications. In recent years, many new ideas and techniques have been
proposed, allowing QAM deployrnent. A large number of constellations have been
proposed for QAM transmission over Gaussian channels. The idea began with Cahn [78]
and evolved through the years. The three constellations shown in Figure A.1 are often
referred. The essential problem is to maintain a high minimum distance, d-, between the
points while keeping the average power required for the constellation to a minimum [17].
' Q Type 1 QAM constellation
' Q Type II QAM constellation
a l m a
Type III QAM constellation
Figure A.l Variety of QAM constellation
Calculation of d- and the average power is a geometric procedure and has been
performed for a range of constellations 1791. The results show that the square
constellation, (Type III), is optimal for Gaussian channels. The other two types require a
higher energy to achieve the same d- as the square constellation and are generally not
preferred. The following Figures show the QAM constellations used in the MMDS
system. There are three Ievels of QAM as defined in Section 2.2.7.: 16 QAM, 64 QAM
and 256 QAM. The constellation points in the 2114 3rd and 4th quadrants are located by
the changing the two MSBs and rotating LSBs according to the mle in Section 2.2.8. The
Ik and Qk are the two MSBs in each quadrant and should be prepended to the
constellation values to complete the m-bit value.
A.116 QAM
The 16 QAM has 1 6 points in the constellation as shown in Figure A.2. The
symbols are 4-bit words.
Figure A.2 16 QAM Constellation diagram
A.2 64 QAM
The 64 QAM has 64 points in the constellation as shown in Figure A.3. The symbols are 6-bit words.
Figure A 3 64 QAM Constellation Diagram
A.3 256 QAM
The 256 QAM has 256 points in the constellation as shown in Figure A.4. The
symbols are 8-bit -words.
Figure A.4 256 QAM Constellation Diagram
Appendix B: Galois Fields
B.1 Galois field
A Galois Field (or f i t e field) is a set of symbols which obey a set of restrictions
that allow addition, subtraction, multiplication, and division upon them [11,80,81]. The
reason f i t e field mathematics is so important in FEC digital circuitry is that it allows
mathematics to be performed on binary vectors (i-e., bytes) without expanding their size.
For instance, adding 2 bytes together must result in another byte, instead of a 9-bit word.
It hirns out that the restrictions that need to impose on the symbols are the same
restrictions that define a finite field. These rules were discovered by Evariste Galois, and
as result, these fields are called Galois Fields (GF) [go].
A GF with "Q" symbols in it is referred to as GF(Q). "Q" is called the order of the
field. For exarnple, the GF(~') has the order 4-56. The GF has the following simple
rules:
1) The elements of the field must form a commutative group under addition. If
two elements are adding together, the result is another element in the sarne field.
Furthemore, they must commute (Le., a + b = b + a).
2) Like the case of addition, multiplying an elernent to other element results in
another element in the field. The multiplication also commutes (Le., a-b = b-a).
3) The addition and multiplication operations must distribute. This means that:
a@+c) = (ab) + (a-c).
4) The number of elements "Q" in the field must be equal to qm where "q" is a
prime number, and "m" is a positive integer. For exarnple G F ( ~ ~ ) has q=2 and m=8.
B.2 Construction of GF(23
The constructing of the elements in the GF(Zm), is based f?om the binary field,
GF(2). It begins with two elements O and 1, a new syrnbol a, and the dehition of a
multiplication 'Y A sequence of powers of a is introduced as follows:
aJ =ma.a. . -a (j times),
It follows fiom the definition of the multiplication that
From the restriction on the multiplication operation above, the following set of
elements is defined:
The eiement 1 is denoted a*. The condition, which imposes on the element a, is that the
set F contains only 2m elements and the set F is closed under the multiplication definition
(i-e., the multiplication of 2 elements in the set F results in an eIement in F).
Let p(x) be a primitive polynomial of degree m over GF(2) (i-e., the coefficients
of p(x) are either O or 1). This polynomial is cdled the field generator polynomid. The
condition imposed on this polynomial is p(a)=O. Under such condition, the set F becomes
finite and contains the following elements:
2 'm-2 F = ( O , l , a , a ,..., a- 1 ,
and the nonzero elements of F are closed under the multiplication operations, "-". The
nonzero elements also form a commutative group under ".". Under addition operation,
"+", al1 of the elements in the set F are closed and the set is also a commutative group.
The set F of G F ( ~ ~ ) elements for a given p(x) is shown in Table B.1. This
particular field is used in the designed RS codec. The table shows two representations of
the elements: the power representation, ai, and the polynomial representation (the
coefficients of 1 ,a,a2,..,a7 which are either O or 1). The f ist representation is convenient
for multiplication and the second is convenient for addition.
To multiply two eiements ai and aj in the field, one simply adds their exponents
- and use the fact that azss=l. For example, al5- a'98 - a 2 I 3 and a213- a76 = a 289-255- 34 -a .
Dividing ai by a!, one simply multiplies ai by the multiplicative inverse, as'-j, of a-'. For
example, a 213/a76 = a2~3.a179 - - a392 = a'37. To add a 2 I 3 and a76, one uses their
polynomial representations in the table. For example,
B.3 Vector representation of GF(23
Another useful representation for the field elements in GF(2") is the use of an m-
dimensional vector. Let (a + ala + a& + .-- + amlam-l) be the polynomial representation
of a field element P in GF(2m)7 where ai = O or 1. Then this element is represented by an
order sequence of m components, called an m-tuple, as follows:
(%Y a17 a& --• 7 a,-~)~
where the m components are simply the coefficients of the polynomial representation of
B. The 8-tuples of GF(~*) are shown in the second col- of Table B. 1.
Using this representation, addition is easy to define. The addition is perfiormed
element by element according to the d e s of GF(Q) math. Adding B and y, one simply
add the corresponding components of their m-tuple representations :
(ao +boy a l + h --- Y am-1 +b,l),
where a$bi is carried out in modulo-2 addition. This addition is simply an operation of an
XOR logic. Obviously, the components of the resultant m-tuple are the coefficients of the
polynomial representation of (P + y) which is an element in the set (Le., the set is closed
under the addition operation). For the given GF(~*) example, adding between a 2 I 3 and
aJ6 is:
a 2 1 3 + o r 7 6 = ( ~ 1 0 0 1 1 11)+(01 I 1 1 0 0 0 ) = ( 0 0 1 1 0 1 1 l)=alzz.
Appendix C: Algorithms to Find Error Locator Polynomial for RS Decoder
In decoding an RS code, it is necessay to determine both the location and the
magnitude of the error in the received vector. The required information to determine
these two elements are in the syndrome polynomial, S(x). This polynomial is found by
substituting the code generator roots into the received codewords. There are two
polynomials derived fiom S(x). Tbe error locator polynornial, ~ ( x ) , holds the location of
error and the error magnitude polynomial, Z(x), contains the error value. The solution to
h d a minimum degree of o(x) must satism the following key equation,
S(x) o(x) = Z(x) mod (xZ3,
where t is the number of error symbols can be corrected by the code.
Evaluating these polynomials is the most complex step in decoding an RS code.
The complexity of the RS decoder lies in fïnding a minunum degree of o(x). Berlekamp
was the e s t to develop a computationally efficient method of solving the key equation.
Since then several di fferent methods have been developed. Sugi yama showed that
Euclid's algorithm for finding the greatest comrnon divisor of the polynomials could also
be adapted to this purpose [39]. There are two main algorithms to find a(x): the
Berlekarnp-Massey and the Euclidean algorithms. Both algorithms have roughly the same
computational cornplexity. The algorithms have been revised or modified to adapt to
diEerent decoding architectures [13,14,34,35,4 1,431.
1. The Berlekamp-Massey algorithm
Berlekamp devised the algorithm in 1967 and Massey discussed it in 1968. The
algorithm is described as follows [82] :
1. Start at n =O with the initial conditions:
3. Ifdn#O, we have
4. For eifher &=O or d&O, the next discrepancies are
(km+[ ) ( k " 4 )
d kt "-1) = s(kn+, , + i s ( ~ n + l )-i and
5. The itemtion stops at n =2t-1.
2. The Euclidean algorithm
In general, the Euclidean algorithm which evaluates the error locator polynomial,
o(x), and the error magnitude polynomial, Z(x), is expressed as follows:
I Four polynomia ls are initialized:
ne algorîthm iteratively updates 2 polynomials a@) and Z(x) as follows.
a. Divide Zi.?(x) by Zi.l(x) to obtain the quotient Qi@) and the rernninde Z@)-
The iteration is continued until the degree of Zi is Zess than 2t. men set a&)= G~(x), and Z(X)= Zi(x).