Dds Cordic
-
Upload
bruno-vitorino -
Category
Documents
-
view
264 -
download
0
Transcript of Dds Cordic
-
7/27/2019 Dds Cordic
1/10
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007 151
A 380 MHz Direct Digital Synthesizer/Mixer WithHybrid CORDIC Architecture in 0.25 m CMOS
Davide De Caro, Member, IEEE, Nicola Petra, Student Member, IEEE, andAntonio Giuseppe Maria Strollo, Senior Member, IEEE
AbstractThe paper describes the implementation of a380 MHz, 13 bit, direct digital synthesizer/mixer IC in 0.25 mCMOS technology. The circuit employs an innovative archi-tecture which divides the rotation operation required inthe quadrature synthesizer/mixers, in three rotations. The firsttwo rotations are implemented by using a CORDIC datapathcompletely realized in carry-save arithmetic. The directions ofthe CORDIC rotations are computed in parallel by using a littlelookup table, for the first rotation, and a multiply by constant andaddition circuit for the second rotation. The final (third) rotationis multiplier-based, in order to reduce the circuit latency andincrease the circuit performances.
The CORDIC datapath is implemented with a novel approachboth at the algorithmic level and at the transistor level. At the al-gorithmic level the combined employ of sign-extension prevention,overflow prevention and a novel rounding scheme are presented.At the transistor level a design style that jointly uses full-CMOSand DPL to improve the circuit latency is described.
The overall circuit performances are very interesting. The syn-thesizer/mixer IC, realized in a 0.25 m CMOS technology, has anarea occupation of 0.22 mm and dissipates 152 mW at 380 MHzwith a supply voltage of 2.5 V.
Index TermsAngle rotator, carry save arithmetic, CORDIC al-gorithm, digital downconverter, direct digital frequency synthe-sizer, digital mixer, digital synthesizer, digital tuner, digital upcon-verter, mixer, modulator, overflow prevention, quadrature modu-
lator, rounding.
I. INTRODUCTION
IN RECENT years, there has been a growing trend in the
communication technologies to shift from analog toward
digital techniques. The use of digital techniques, in fact, over-
comes many analog hardware limitations (like high sensitivity
to process and temperature variations, difficult portability as
the VLSI technology scales down, etc.). Moreover, the pro-
grammability offered by digital techniques provides flexibility
that is especially important in the context of rapidly evolving
communication standards.
Owing to advances in CMOS circuit performances, digital
techniques are nowadays able of handling intermediate fre-
quency (IF) or even low radio frequency (RF) tasks. One of the
most basic building-block in this context is the direct digital
Manuscript received April 3, 2006; revised June 30, 2006. Chip fabricationwas supported by MOSIS Research Educational Program.
The authors are with the Department of Electronics and TelecommunicationEngineering, University of Napoli Federico II, Napoli 80125, Italy (e-mail:[email protected]).
Digital Object Identifier 10.1109/JSSC.2006.886527
Fig. 1. Synthesizer/mixer nonoptimal architectures: a) DDFS-based architec-ture; b) CORDIC-based architecture.
frequency synthesizer/mixer (DDFSM), which is in ubiqui-
tous use for many communication subsystems such as tuners,
derotators, up and down frequency converters (see [8], [35]).
In addition, the quadrature mixer is the front-end of various
modulation/demodulation schemes, such as binary phase shift
keying (BPSK), quadrature phase shift keying (QPSK), and
quadrature amplitude modulation (QAM) (see[35]).
The inputs of a DDFSM are two signals and , and
a frequency control word . The outputs of the system arecomputed according to the following equations:
(1)
where
(2)
The equations(1)and(2) correspond to a complex multipli-
cation between an input vector in the complex plane, with coor-
dinates , and an unitary modulus vector
.
A first implementation for the DDFSM includes two distinctfunctional units[1]; seeFig. 1(a). The first one is a direct dig-
ital frequency synthesizer (DDFS)[2],[3]that generates the se-
quences and . The second one is
a complex multiplier, which uses four real multipliers, one adder
and one subtractor to generate the outputs and
according to(1).This implementation is generally nonoptimal
[4],[8]. The DDFS is in fact a cumbersome circuit itself. More-
over, the complex multiplier does not exploit the property that
the modulus of one of the inputs is unitary.
A second possible implementation[5],[6]employs a simple
overflowing accumulator that generates the angle and a ro-
tator using the CORDIC algorithm [7] to implement (1); see
0018-9200/$20.00 2007 IEEE
-
7/27/2019 Dds Cordic
2/10
152 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007
Fig. 1(b). Unfortunately, the CORDIC algorithm in its standard
implementation is inherently slow, using many cascade compu-
tation stages.
The recent approaches[8][11]overcome the limitations of
the simple architectures ofFig. 1by implementing the synthe-
sizer/mixer as the cascade of two stages: a coarse angle rota-
tion followed by a fine rotation stage. In[8][10]both thecoarse rotation and thefine rotation employ a multiplier-based
architecture, while the approach of[11]uses a CORDIC archi-
tecture for the coarse rotation and a multiplier-based fine rota-
tion. The IC implementations[8],[10]are very effective, with
high speed operation and reduced hardware complexity. Until
now, no IC implementation exists of the mixed approach of[11].
This paper [12] introduces a novel combined approach,
named Hybrid CORDIC, to realize a synthesizer/mixer. This
approach splits the rotation required in the synthesizer/mixer
circuit in three rotations. A first rotation is performed by em-
ploying a CORDIC datapath in which the rotation direction are
computed in parallel, by employing a lookup table. The second
rotation is also CORDIC-based, with rotations directionscomputed in parallel analytically. The final (third rotation) is
multiplier-based.
The parallel evaluation of the rotations directions allows an
efficient use of the carry-save arithmetic in the CORDIC data-
path of the first two rotation blocks, without requiring additional
carry-propagate adders (as in[19],[20]) or the introduction of
additional CORDIC sub-rotations (as in [21]). Thefinal mul-
tiplier-based rotation allows to reduce the overall number of
pipelining levels and the circuit complexity as well.
At the transistor level, a novel approach, which combines
full-CMOS and double-pass-transistor logic (DPL)[30]design
styles, is presented to implement the CORDIC datapath.The paper is organized as follows.Section IIintroduces the
top level implementation of the synthesizer/mixer. Section III
discusses the algorithmic aspects of the novel Hybrid CORDIC
architecture. Section IV highlights the main advantages
of the Hybrid CORDIC architecture in comparison to the
state-of-the-art architectures. The carry-save implementation
of the CORDIC stages is discussed in Section V, while the
mixed CMOS-DPL design style is presented inSection VI. In
Section VIIthe prototype IC, realized in 0.25 m CMOS tech-
nology, is presented, and the experimental results are compared
to the state-of-the-art implementations.
II. SYNTHESIZER/MIXERBASICARCHITECTURE
The top-level architecture of the designed DDFSM IC is
shown inFig. 2.The circuit is sized in order to exhibit a 90 dBc
spurious free dynamic range (SFDR). The input word-length
is 12 bit while the output word-length is 13 bit. The 32 bit
phase accumulator generates the rotation angle ,
represented with a binary fractional value in [0, 1]. The rotation
angle is truncated to 16 bit, introducing output spurs that are
below the 90 dBc SFDR constraint. The heart of the circuitis the Hybrid CORDIC rotator block. This block is able to
Fig. 2. Top-level architecture of the designed DDFSM IC.
is given by
1 .
perform a rotation by an angle represented with a
binary fractional value in [0, 1]:
(3)
The least significant bit of has a weight that will be indicated
in the following as .
The other minor subsystems in Fig. 2 (1s complementer,
swappers and 2s complementers controlled by control logic)
employ the symmetries of the sine and cosine functions [8], [10]
to perform the complete rotation in the full interval. It is
worth to highlight that introducing of a phase shift in
the rotator block, it is possible to completely eliminate the error
due to the employ of a simple 1s complementer to evaluate the
angle (see[2],[3],[16]).
III. HYBRIDCORDICROTATORALGORITHM
The architecture of the Hybrid Cordic rotator is shown in
Fig. 3.The circuit rotates its input vector by the angle
. The rotation is performed in three steps. Thefirst
two steps are performed with a CORDIC datapath, while the
final step is realized by using two multipliers.
A. First Rotation
In thefirst step, the angle is divided in two sub-words
, where
(4)
(5)
and is the complement of .
The goal of thefirst stage is to perform a rotation by an angle
close to . To that purpose, the first rotation block
uses the CORDIC algorithm, described by the following equa-
tions:
(6)
-
7/27/2019 Dds Cordic
3/10
DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 3
Fig. 3. Hybrid CORDIC rotator architecture.
where is equal to . The algorithm starts with
, and . To simplify hardware im-
plementation, only four CORDIC sub-rotations are performed
in(6), resulting in a rotation by an angle .
From the CORDIC algorithm properties, it can be easily shown
that the absolute value of the residual angle
is upper bounded by . Therefore, by choosingfour rotations in the first stage, we have about the same max-
imum absolute value for both and [see(5)].
The direction of the first rotation in (6) is fixed
since . The directions of the remaining rotations
depend only on . These directions, therefore, can
be precomputed, by using (6), and stored in the lookup table
shown inFig. 3. The lookup table is very small, having three
address bits . The residual angle , similarly
to values, depends only on , , . Also , there-
fore, can be stored in the lookup table.
Finally, let us note that the four CORDIC sub-rotations(6)
amplify the modulus of the input vector by a factor
(7)
The amplification is inconsequential in many applications
[4][6],[11]and is not compensated in the proposed approach.
B. Second Rotation
In order to complete their operation, the second and third
stages of the Hybrid CORDIC architecture rotate the vector
(the output of thefirst stage) by an angle
(8)
The angle is computed by using the multiplier and the
adder shown inFig. 3. The multiplier is needed to calcu-
late from its scaled representation; see(5).Since, as we have
observed before, the absolute values of and are both
lower than , the absolute value of is lower than . By
representing with 11 bits, we have
(9)
A phase quantization error in the range is in-
troduced in(9). This results in an maximum error at the ,
outputs of the DDFSM equal to . This
value is much lower than the weight of the less significant bit atthe outputs of the DDFSM .
The angle is then split as the sum of two sub-angles
, where
(10)
(11)
Thesecond rotationblock is aimed to perform the rotation bythe angle , whereas the rotation by the angle is assigned to
the final rotationblock.
In the second rotation we employ a CORDIC algorithm
without the computation. The rotation directions are
obtained directly by the bits of as follows:
for (12)
The corresponding CORDIC equations are
(13)
where is the output of the first rotation
stage; seeFig. 3.
The actual rotation angle obtained with(13)is not exactly the
required angle but is instead an angle , given by
(14)
From(10),(12)the angle can be written as
(15)
As a consequence, thesecond rotationblock introduces a phase
error, :
(16)
With simple manipulations, it is possible to show that is
upper bounded by
(17)
The phase error of thesecond rotationintroduces an error on
each component of the DDFSM output. From(17), is muchlower than the weight of the output LSB .
-
7/27/2019 Dds Cordic
4/10
154 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007
TABLE I
PERFORMANCES OF THEPROPOSEDARCHITECTURE
Like thefirst rotation block, also the CORDIC rotations(13)
amplify the modulus of the input vector, by a factor
(18)
Therefore, the total amplification factor is
(19)
C. Final (Third) Rotation
Thefinal rotationblock inFig. 3implements the rotation by
. The operation to be performed by this block can be written
as
(20)
This final rotation could also be computed by using the
CORDIC algorithm. However, as observed in [17] and [18],
when the rotation angle is small a complex multiplier is able to
reduce the latency and improve the performances.
In our case, the absolute value of is lower than . There-
fore, we can approximate sine and cosine functions in (20)as
(21)
In this way, the final rotation is realized without the need of
lookup tables to store sine and cosine values.
The approximation (21) introducesan error on the DDFSM
outputs and . It can be easily shown that this error
component is upper bounded by .
As shown inFig. 3, we have introduced two rounders in the
final rotation stage, to reduce the wordlength of multipliers
input. The two rounders introduce an additional error at the
output. We have .
An analytical derivation of the joined effect of all algorithmic
and quantization errors is not easy. We performed bit-level sim-
ulations, by operating the DDFSM in two modes. In DDS mode
and so that the circuit generates two quadra-
ture sinewave outputs. In SSB mode a sinusoidal input is applied
to the DDFSM, that operates as a digital upconverter with image
rejection. Table I summarizes the performances of the developed
architecture.
IV. COMPARISON WITHSTATE-OF-THE-ARTAPPROACHES
The main advantage of the proposed Hybrid CORDIC archi-
tecture is the parallel computation of the rotations directionsand . This computation is performed with a small lookup table,
a multiplier by constant and an adder. Therefore, simple and ef-
fective carry-save[31]implementation for the datapaths can be
used, avoiding the speed penalties due to carry propagation[5].
Previously proposed carry-save CORDIC architectures re-
quire a datapath, and also additional carry-propagate adders
to determine rotations directions [19], [20]. Other techniques do
not include carry-propagate adders, but require the introductionof extra rotations[21].
Thefirst two CORDIC rotation blocks in our architecture re-
semble the algorithms proposed in[22].However, in the parti-
tioned Hybrid CORDIC algorithm of[22]the partitioning and
the handling of the rotation angle would require a huge lookup
table for its implementation. On the other hand, the mixed Hy-
brid CORDIC algorithm, also proposed in[22],does not parti-
tion the rotation angle. Therefore, its implementation requires
in thefirst stage either a full datapath or a lookup table ad-
dressed by all the bits of the rotation angle.
The solution of[11]uses two rotation stages. The first one is
a CORDIC rotator, while the second one is multiplier-based (as
originally proposed in[17]). The CORDIC rotator of[11]usesa number of stages comparable to the overall stages used in the
first and second block of our architecture. The use in[11]of a
single CORDIC rotator, however, requires a lookup table much
larger than the one used in our architecture.
The recently proposed DDFSM implementations [8][10]
use an architecture composed by two multiplier-based rotation
stages. These architectures require a total of 8 small-width
multipliers. The experimental results shown in the following
demonstrate that the Hybrid CORDIC architecture is more
effective, especially in terms of power and area occupation.
V. HYBRID CORDICIMPLEMENTATION
The most critical subsystem in the Hybrid CORDIC architec-
ture ofFig. 3are the CORDIC stages. In fact, the lookup table
is very small and can be effectively be synthesized as random
logic. The multiplier requires only the sum of few partial
products that can easily be merged with the adder needed to
compute in a single summation tree.
The final rotation of the Hybrid CORDIC architecture of
Fig. 3 uses multiply-accumulate circuits also realized with a
single summation tree. The sign-extension prevention tech-
nique[23] has been used to realize the subtraction needed to
compute .
A. Carry-Save Implementation of the Cordic StagesAn innovative architecture has been devised to implement the
first and second CORDIC rotation stages. The basic equation to
implement the CORDIC stages is
(22)
where is the direction of the CORDIC sub-rotation, while
represents the order of the sub-rotation. The(22)implements
the computation in (6), (13). The computation can be easily
obtained by swapping and in(22)and changing the sign
of .
Since, in our architecture, the CORDIC rotations directions
are efficiently evaluated in parallel, the implementation wasperformed by using carry-save arithmetic. Rewriting (22)
-
7/27/2019 Dds Cordic
5/10
DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 5
Fig. 4. Detailed implementation of thefirstand second rotationblocks with carry-save arithmetic. The datapath is built by one wiring block and six CORDICsub-rotations driven by the directions , .
Fig. 5. Optimized bit-level implementation in carry-save arithmetic of theelementary stage(23); is the order of theelementary stage.
in carry-save [31] form, we obtain the main equation to be
implemented:
(23)
Fig. 4 shows the detailed carry-save datapath of the seven
CORDIC stages needed in the architecture ofFig. 3.
The , inputs of the circuit ofFig. 4, are in twos com-plement representation. Thefirst two blocks in Fig. 4(labeled
wiring) implement the first CORDIC sub-rotation with a fixed
direction ( , seeSection III). These blocks are also in
charge of the conversion from twos complement to carry-save
representation and therefore can be realized by simple wiring
and complementations, without additional logic.
The six remaining CORDIC sub-rotations are implemented
by using the elementary stages inFig. 4. Each elementary stage
implements(23). The wordlength of the , signals inside the
datapath ofFig. 4is increased by 2 LSBs (in order to reduce the
overall error introduced by the CORDIC elaboration) and by 1
MSB (to avoid overflow).
The twofinal vector merging adders (VMAs), inFig. 4,con-vert the result to twos complement representation. Rounding is
also performed in the VMAs to provide thefinal , sig-
nals with a wordlength of 13 bits.
Fig. 5 shows the terms to be added to implement (23) atthe bit
level. In thisfigure, is the binary value associated to (
if and for ).Fig. 5highlights the use
of both the sign-extension prevention of[23]and the overflow
prevention of[19]. Both techniques allow to reduce the circuit
complexity with respect to simpler carry-save approaches [6].
In order to implement the two subtractions of(23)the bits ofand are XORed with . Moreover, a twos complement
constant (the bit equal to s in the column of weight LSB) is
also added.
The rounding constant has been computed in order to min-
imize the rounding error. For all elementary stages, but the one
marked with a star in Fig. 4, the rounding error is minimized
when if and if . Therefore,
the sum of the twos complement constant and is equal to
LSB. For the elementary stage indicated with a star inFig. 4,
the optimal rounding constant is zero.
B. Elementary Stage Implementation
Fig. 6shows that the terms ofFig. 5can be summed with asingle row of 4-2 compressors[24]. Besides these blocks, the
-
7/27/2019 Dds Cordic
6/10
156 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007
Fig. 6. Implementation of the
-th orderelementary stageby using 4-2 compressors and half-adders. For theelementary stagemarked with a star inFig. 4,
and . For the otherelementary stages and .
Fig. 7. Implementation of
(a) and
(b) half-adder circuits.
circuit requires half adders ( and for the compression
of the MSBs) and XOR gates (for conditional complementing).
The circuits are the traditional half adders which compute
. The circuits, instead, compute . These
blocks allow the summation of the sign-extension prevention
constant (seeFig. 5) without requiring additional hardware.
The circuit is well known[34]. An effective implemen-
tation in CMOS logic (derived from the 28T full-adder [34]) is
shown in Fig. 7(a). The circuit is described by the following
equations:
(24)
and is implemented as shown inFig. 7(b).
It is interesting to observe, inFig. 6, that the employ of the
sign-extension prevention allows the use of a couple of half
adder circuits in place of a single 4-2 compressor, to compute the
most significant bits. The most efficient realizations of the 4-2
compressor[25][28]requires about 60 MOS transistors, while
the couple of half adder circuits, realized as shown inFig. 7re-
quire only a total of 28 transistors. The sign-extension preven-
tion technique is, therefore, able to provide a very low circuit
complexity. The number of 4-2 compressors decreases with the
increase of the order of the stage and, in our approach, this re-sults in a substantial gain in area.
The timing performances of the elementary stage shown in
Fig. 6are limited by two critical paths.
Thefirst timing critical circuit, shown inFig. 8(a), is com-
posed by a 4-2 compressor with two inputs conditionally com-
plemented. The best available implementations of the 4-2 com-
pressor[27], [28]provide a delay of three XOR gates, and in-
clude a total of four XOR gates plus two multiplexers. There-
fore, a straightforward implementation of the circuit ofFig. 8(a)
would require a maximum delay of four XOR gates.
An optimized implementation of thisfirst timing critical cir-
cuit can be obtained by embedding the two XORgates driven by
in the 4-2 compressor. This is not straightforward, since (due
to redundancy of the carry-save arithmetic) different Boolean
equation sets exist which provide the same arithmetic function
of a 4-2 compressor. We have found that an optimal solution can
be obtained starting from the Boolean equations set of the 4-2compressor introduced by Ghoshet al.[29], and embedding the
XORgates in the circuit, as shown in the following equations:
(25)
Fig. 8(b) shows the gate level implementation of (25). Our
circuit exhibits only three XOR
gates on the critical path,highlighting an evident advantage in terms of speed with
respect to the implementation of Fig. 8(a). Moreover, the
circuit of Fig. 8(b), requiring a total of five XOR gates plus
two multiplexers, results in one less XOR gate with respect to
the implementation of Fig. 8(a) using the state-of-the-art 4-2
compressor of Hsiao et al.[28].
Let us now consider the second timing critical circuit of
Fig. 9(a), corresponding to the overflow prevention logic, on
the left-hand side ofFig. 6. A straightforward implementation
of the circuit would present four gates on the critical path (by
assuming the delay of an half adder comparable to the delay of
an XOR gate). An optimized implementation, with a delay of
three XORgates can be obtained by exploiting the redundancyof carry-save arithmetic. In fact, the two half adders surrounded
-
7/27/2019 Dds Cordic
7/10
DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 7
Fig. 8. Optimal implementation of thefirst timing critical block inFig. 6.(a) Logical function. (b) Detailed implementation with simple gates.
Fig. 9. Optimal implementation of the second timing critical block inFig. 6. (a) Logical function. (b) Detailed implementation with simple gates.
by the dashed line in Fig. 9(a)are described by the following
Boolean equations:
(26)
where . By exploiting the redundancy of the
carry-save arithmetic, we can rewrite the Boolean equations of
this block, preserving its arithmetic function, as follows:
(27)
Proceeding in a similar way for the second column of half
adders inFig. 9(a), with some manipulations, we obtain for the
whole circuit ofFig. 9(a)the following equivalent equations:
(28)
where ; .
The resulting circuit is shown inFig. 9(b),where the critical
path from all inputs to all outputs is composed of three gates
(twoXOR and one multiplexer or two XOR and one NANDgate).
The worst delay from to all outputs is two gates (one XOR
and one multiplexer). Since the input arrives with a delay
of one gate [seeFig. 6andFig. 7(b)], this path results again in
a total delay of three gates.
VI. MIXEDCMOS-DPL IMPLEMENTATION
In order to simplify IC design, the DDFSM has been im-plemented by using a standard cell approach, with automatic
place and routing. To optimize performances special purpose
cells were designed to implement the timing critical circuits of
Fig. 8(b)andFig. 9(b). These circuits, being composed mainly
by XORgates and multiplexers, are well suited for a pass tran-sistor logic implementation. Having high speed operation as
our main target, we employed the double-pass-transistor (DPL)
logic style[30], as shown inFig. 10.
DPL is a double-rail logic. In the developed cells, each
input is converted from single to dual rail by using a couple
of inverters. In this way passgate inputs, that are not suited
for the timing models used by the timing analysis tools, are
also avoided. The inverters 15 in the circuit ofFig. 10(b)and
the inverters 16 in Fig. 10(c) increase the circuit speed by
limiting the maximum number of series transistors. Moreover,
the inverters 12 inFig. 10(b) make the propagation delay of
the Carry output independent from the capacitive load on the
Sumoutput. A similar consideration applies to the inverters 12and 34 inFig. 10(c).In this way the developed DPL circuits
are fully compatible with the other full-CMOS standard cells
of the library.
It is worth noting that not all gates have to be dual rail.
The gates which drive the outputs, both in Fig. 10(b) and in
Fig. 10(c), can be single rail. Also the XOR gate which drives
the single rail multiplexer that calculates inFig. 10(b)can
be implemented in a single rail style.
The number of transistors, the propagation delay and
the power dissipation obtained by employing the proposed
DPL-CMOS design style are reported inTable II. For compar-
ison, the same table reports also the performances achievableby using a standard cell library with only full-CMOS logic,
-
7/27/2019 Dds Cordic
8/10
158 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007
Fig. 10. Transistor level implementation of the special cells ofFig. 8(b)andFig. 9(b).(a) Basic gates implementation. (b) DPL implementation of the circuit ofFig. 8(b).(c) DPL implementation of the circuit ofFig. 9(b).
TABLE IISIMULATEDPERFORMANCES OFDIFFERENTCORDIC STAGESCONFIGURATIONS BYEMPLOYINGPROPOSEDDPL-CMOS AND FULL-CMOS STYLES
TABLE III
EXPERIMENTAL PERFORMANCES OF THESYNTHESIZER/MIXER
without special cells. All stages have been designed for a
0.25 m, 2.5 V technology. The analysis of Table II reveals
that proposed design style allows about a 35% reduction of
the propagation delay by providing about the same number oftransistors and power dissipation of the full-CMOS realization.
VII. EXPERIMENTAL RESULTS
The DDFSM with the optimized carry-save CORDIC archi-
tecture and the mixed CMOS-DPL design style has been fabri-
cated on a test chip (see Fig. 11) ina 2.5 V, 0.25 m CMOS tech-
nology. The DDFSM has been synthesized from a VHDL de-scription, and has been automatically placed and routed by using
a commercial tool. The DDFSM accepts a 32 bit frequency con-
trol word, resulting in a frequency resolution of about 0.088 Hz.
Table III summarizes the main characteristics of the circuit.
Fig. 12reports the experimental digital output spectrum of the
DDFSM when operated in DDS mode ( , ),
showing an SFDR larger than 93 dBc.
Besides the DDFSMthe chip includes a built in self test struc-
ture (SA Mixer) and two programmable ring oscillators (RO
Fast and RO Slow) to make the measurement of DDFSM max-
imum clock frequency and power dissipation easier. Also in-
cluded in the chip it is a DDFS which can provide
inputs to the synthesizer/mixer to test the single and double side-band modulation functionality of the circuit.
-
7/27/2019 Dds Cordic
9/10
DE CAROet al.: A 3 80 MHz DIRE CT DIGITA L S YNT HES IZ ER/MIXER WITH HYBR ID CORD IC ARC HITEC TURE IN 0 .2 5 m CMO S 15 9
TABLE IV
COMPARISON WITHRECENTLYPROPOSEDDESIGNS
Fig. 11. Test chip realized in CMOS 0.25 m technology. The chip includesour optimized synthesizer/mixer (Synth/Mixer) a DDFS, two ring oscillators(RO Fast andRO Slow) andthe built-in self-testlogic(SA Mixer)for easy circuittesting.
Fig. 12. Output spectrum of the DDFSM in DDSmode ( 0 , );
MHz.
Table IIIalso reports the experimental performances of the
developed DDFSM. The circuit exhibits a very low power
dissipation (0.40 mW/MHz) with a maximum clock frequency
slightly lower than 400MHz.
The experimental performances of the proposed circuit
are compared in Table IV with the performances of the best
DDFSMs available in literature based on two stage multiplier
architecture and implemented with the same 0.25 m tech-
nology. It can be observed that the developed architecture
allows a more than three-fold reduction of power dissipation,
with also a substantial reduction in the silicon area with respectto[8].The circuit in[10], while able to reach a SFDR of 100
dBc, requires about a 2.32 times larger area with respect to our
implementation.Table IVshows, moreover, that our circuit is
able to work correctly up to 385 MHz, whereas the best result
obtained in literature was 330 MHz.
VIII. CONCLUSION
The paper has described in detail the implementation of
an high performances synthesizer/mixer IC which exploits
improvements at the algorithmic, architectural and transistorlevels.
In the novel synthesizer/mixer architecture, the rotation
operation has been split in three rotations. The first two rota-
tions use a CORDIC datapath completely realized in carry-save
arithmetic. This is possible since the directions of the CORDIC
rotations are computed in parallel by using a little lookup table
in thefirst rotation and a fast multiply by constant and addition
circuit in the second rotation. The final (third) rotation is mul-
tiplier-based, in order to reduce the circuit latency and increase
the circuit performances.
The CORDIC datapath has been realized in carry-save arith-
metic. In this datapath the combined employ of sign extensionprevention, overflow prevention and a novel rounding scheme
have been presented. At the transistor level a design style that
jointly uses full CMOS and DPL to improve the circuit latency
has also been described.
The realized synthesizer/mixer IC shows very good perfor-
mances in terms of power dissipation, area and maximum clock
frequency.
REFERENCES
[1] L.K. Tanand H. Samueli, A 200 MHzdirectdigital synthesizer/mixerin 0.8 m CMOS, IEEE J. Solid-State Circuits, vol. 30, no. 3, pp.193200, Mar. 1995.
[2] B. G. Goldberg, Direct Digital Frequency Synthesis Demystified.
Eagle Rock, VA: LLH Technology Publishing, 1999.[3] J. Vankka and K. Halonen, Direct Digital Synthesizers: Theory, Design
and Applications. Norwell, MA: Kluwer Academic, 2001.
[4] S. Nahm, K. Han, and W. Sung, A CORDIC-based digital quadra-ture mixer: comparison with a ROM-based architecture, in Proc. IEEE
ISCAS, 1998, pp. 385388.[5] G. Gielis, R. Van de Plassche, and J. Van Valburg,A 540 MHz 10-b
polar to Cartesian converter,IEEE J. Solid-State Circuits, vol. 26, no.11, pp. 16451650, Nov. 1991.
[6] Y. Ahn, S. Nahm, and W. Sung,VLSI design of a CORDIC-basedderotator,in Proc. IEEE ISCAS, 1998, pp. 449452.
[7] J. E. Volder,The CORDIC trigonometric computing technique,IRETrans. Electron. Comput., vol. EC-8, no. 3, pp. 330334, Sep. 1959.
[8] A. Torosyan, D. Fu, and A. N. Wilson,A 300 MHz direct digital syn-thesizer/mixer in 0.25
m CMOS, IEEE J. Solid-State Circuits, vol.38, no. 6, pp. 875887, Jun. 2003.
[9] D.Fu and A.N. Wilson, A high-speed processor for digital sine/cosinegeneration and angle rotation, in Proc. 42nd Asilomar Conf. Signal,Systems and Computers, 1998, vol. 1, pp. 177181.
-
7/27/2019 Dds Cordic
10/10
160 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 42, NO. 1, JANUARY 2007
[10] Y. Song and B. Kim,A quadrature digital synthesizer/mixer architec-ture using fine/coarse coordinate rotation to achieve 14-b input, 15-boutput, and 100-dBc SFDR,IEEE J. Solid-State Circuits, vol. 39, no.11, pp. 18531861, Nov. 2004.
[11] F. Curticapean and J. Niittylahti,An improved digital quadrature fre-quency down-converter architecture,in 35th Asilomar Conf. Signals,Systems and Computers, Nov. 2001, pp. 13181321.
[12] D. De Caro, N. Petra, and A. G. M. Strollo,A 380 MHz, 150 mW
direct digital synthesizer/mixer in 0.25
m CMOS, in IEEE ISSCCDig. Tech. Papers, 2006, pp. 258259.[13] H. T. Nicholas and H. Samueli,An analysis of the output spectrum of
direct digital frequency synthesizers in the presence of phase accumu-lator truncation, in Proc. 41st Annu. Frequency Control Symp., May1987, pp. 495502.
[14] A. Torosyan and A. N. Willson, Jr.,Analysis of the output spectrumfor direct digital frequency synthesizers in the presence of phase trun-cation andfinite arithmetic precision,in Proc. 21th Symp. Image andSignal Processing and Analysis, 2001, pp. 458463.
[15] F. Curticapean and J. Niittylahti,Exact analysis of spurious signals indirectdigital frequency synthesizers due to phase truncation,Electron.
Lett., vol. 39, no. 6, pp. 499 501, Mar. 2003.[16] J. Vankka,Methods of mappingfrom phase to sine amplitude in direct
digital synthesis,IEEE Trans. Ultrason. Ferroelectr. Freq. Contr., vol.44, no. 2, pp. 526534, Mar. 1997.
[17] H. M. Ahmed,Signal processing algorithms and architectures,Ph.D.
dissertation, Dept. Electr. Eng., Stanford Univ., Stanford, CA, Dec.1981.
[18] , Efficient elementary function generation with multipliers, inProc. 19th Symp. Computer Arithmetic, Sep. 1989, pp. 5259.
[19] T. G. Noll, Carry-save arithmetic for high speed digital signal pro-cessing,in Proc. IEEE ISCAS, 1990, pp. 982986.
[20] N. Takagi, T. Asada, and S. Yajima, Redundant CORDIC methodswith a costant scale factor for sine and cosine computation, IEEETrans. Comput., vol. 40, no. 9, pp. 989995, Sep. 1991.
[21] T. B. Juang, S. F. Hsiao, and M. Y. Tsai, Para-CORDIC: parallelCORDIC rotation algorithm, IEEE Trans. Circuits Syst. I: Reg. Pa-
pers, vol. 51, no. 8, pp. 15151524, Aug. 2004.[22] S. Wang, V. Piuri, and E. E. Swartzlander,Hybrid CORDIC algo-
rithms, IEEE Trans. Comput., vol. 46, no. 11, pp. 12021207, Nov.1997.
[23] J. F. Ardekani,M 2 N booth encoded multiplier generator using op-timized wallace trees, IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 1, no. 2, pp. 120125, Jun. 1993.
[24] M. Nagamatsu, S. Tanaka, J. Mori, K. Hirano, T. Noguchi, and K.Hatanaka, A 15-ns 32 2 32-b CMOS multiplier with an improvedparallel structure, IEEE J. Solid-State Circuits, vol. 25, no. 2, pp.494497, Apr. 1990.
[25] J. Mori, M. Nagamatsu, M. Hirano,S. Tanaka,M. Noda, Y. Toyoshima,K. Hashimoto, H. Hayashida, and K. Maeguchi, A 10-ns 54 2 54-bparallel structured full array multiplier with 0.5-
m CMOS tech-
nology,IEEE J. Solid-State Circuits, vol. 26, no. 4, pp. 600606, Apr.1991.
[26] G. Goto, T. Sato, M. Nakajima, and T. Sukemura,A 5 42 54-b regu-larly structured tree multiplier, IEEE J. Solid-State Circuits, vol. 27,no. 9, pp. 12291236, Sep. 1992.
[27] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome,A 4.4-ns CMOS 54 2 54-b multiplier usingpass-transistor multiplexer, IEEE J. Solid-State Circuits, vol. 30, no.
3, pp. 251257, Mar. 1995.[28] S. F. Hsiao, M. R. Jiang, and J. S. Yeh, Design of high speed low-
power 3-2 counter and 4-2 compressor for fast multipliers,Electron.Lett., vol. 34, no. 4, pp. 341 342, Feb. 1998.
[29] D. Ghosh, S. K. Nandy, and K. Parthasarathy,TWTXBB: a low la-tency, high throughput multiplier architecture using a new 4-2 com-pressor,in Proc. 7th Int. Conf. VLSI Design, Jan. 1994, pp. 7782.
[30] M. Suzuki, N. Ohkubo, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome, A 1.5-ns 32-b CMOS ALU in doublepass-transistor logic,IEEE J. Solid-State Circuits, vol. 28, no. 11, pp.11451151, Nov. 1993.
[31] B. Parhami, Computer Arithmetic: Algorithms and Hardware De-signs. Oxford, U.K.: Oxford Univ. Press, 1999.
[32] Y. H. Hu,The quantization effects of the CORDIC algorithm,IEEETrans. Signal Process., vol. 40, no. 4, pp. 834 844, Apr. 1992.
[33] S. Y. Park and N. I. Cho,Fixed-point error analysis of CORDIC pro-cessor based on the variance propagation formula, IEEE Trans. Cir-cuits Syst. I: Reg. Papers, vol. 51, no. 3, pp. 573584, Mar. 2004.
[34] N. H. E. Weste and K. Eshragian, Principles of CMOS VLSI Design .
Reading, MA: Addison-Wesley, Jan. 1993, 0201533766.[35] J. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill, Aug. 2000.
Davide De Caro (M05) was born in Naples, Italy,on February 9, 1973. He received the M.S. degree inelectronic engineering with honors in 1999, and thePh.D. degree in electronic engineering and computerscience in 2003, both from the University of Naples
Federico II, Italy.He has worked in the area of digital integrated
VLSI circuit design for the last eight years. SinceMarch 2003, he has been a Researcher at the De-
partment of Electronics and TelecommunicationEngineering of the University of Naples, Italy, where
he is working on high-performance flip-flops (including both low-power and
high-speed structures), VLSI implementation of arithmetic circuits (squarers,fixed-width multipliers, ReedSolomon decoders, Galois-field multipliers),direct digital frequency synthesizers and digital mixers.
Dr. De Caro is author or coauthor of more than 30 technical papers in interna-tionaljournals and refereed international conferences. He acted as a reviewerforIEEE TRANSACTIONS ONCIRCUITS ANDSYSTEMSI and IEEE TRANSACTIONS
ON VLSI SYSTEMS.
Nicola Petra (S05) was born in 1974 in Naples,Italy. He received the M.S. degree in electronicengineering with honors in 2002 from the Universityof Naples Federico II. He is presently workingtowards the Ph.D. degree at the Department ofElectronics Engineering of the University of Naples
Federico II.His research interests include design of digitalVLSI circuits for telecommunications and high-per-formance arithmetic circuits.
Antonio Giuseppe Maria Strollo (M05SM06)was born in 1963. He received the Laurea degree(with honors) in electronic engineering in 1988,
and the Ph.D. degree in electronic engineering andcomputer science, in 1992, both from the Universityof NaplesFederico II, Italy.
From 1990 to 1998, he was a full time Researcherat the Department of Electronic Engineering of theUniversity of Naples. In November 1998, he was
appointed Associate Professor at the Universityof Naples Federico II. Since November 2002,
he has been Full Professor at the same University. Currently, he is the head
of the Department of Electronic and Telecommunication Engineering of theUniversity of Naples Federico II. His initial research activities covered thearea of bipolar devices modelling and power electronics. His current researchinterests include design and analysis of VLSI digital circuits. In particular, he isworking on: advanced architectures for direct-digital frequency synthesis and
for digital mixers, high-performance arithmetic circuits and high performanceand low-powerflip-flops. He has authored or co-authored more than 100 paperson international journals and refereed conferences.