Implementation platforms · communication and embedded electronics. The most suited platforms to...

22
Chapter 11 Implementation platforms 11.1. Introduction The SWR is a concept from the work of Joe Mitola in 1995 [MIT 95], as explained in Chapter 6. He proposed a concept called Software Radio, which, ideally, allows the equipment to communicate with any radio standard without changing any physical component but only by changing the firmware (embedded software). This technology, though it may appear simple at first, does raise several technological difficulties, es- pecially in the contexts of increasing mobility and high throughput. In addition, very flexible hardware architecture is needed to handle the various types of processing to be performed. Reconfigurability of an implementation platform is a crucial technologi- cal breakthrough for the SWR. The platform must adapt to different processing needs in order to handle the requests for changes in application contexts (here the contexts related to radio applications). 11.2. Software radio platform The SWR and CR are two very promising technologies for developing future mo- bile radio systems. They can be designed to work together on one device (a terminal or a base-station). Thus, the equipment can analyze its environment and adapt its op- eration, for example depending on different available wireless networks. It would be possible for a user to operate his/her terminal anywhere, anytime, and with any stan- dard just by downloading a reconfiguration file. There is also another advantage: the SWR may accept communication modes, which had not yet been invented at the time Chapter written by Amor NAFKHA, Pierre LERAY and Christophe MOY. 301

Transcript of Implementation platforms · communication and embedded electronics. The most suited platforms to...

Page 1: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Chapter 11

Implementation platforms

11.1. Introduction

The SWR is a concept from the work of Joe Mitola in 1995 [MIT 95], as explainedin Chapter 6. He proposed a concept called Software Radio, which, ideally, allows theequipment to communicate with any radio standard without changing any physicalcomponent but only by changing the firmware (embedded software). This technology,though it may appear simple at first, does raise several technological difficulties, es-pecially in the contexts of increasing mobility and high throughput. In addition, veryflexible hardware architecture is needed to handle the various types of processing to beperformed. Reconfigurability of an implementation platform is a crucial technologi-cal breakthrough for the SWR. The platform must adapt to different processing needsin order to handle the requests for changes in application contexts (here the contextsrelated to radio applications).

11.2. Software radio platform

The SWR and CR are two very promising technologies for developing future mo-bile radio systems. They can be designed to work together on one device (a terminalor a base-station). Thus, the equipment can analyze its environment and adapt its op-eration, for example depending on different available wireless networks. It would bepossible for a user to operate his/her terminal anywhere, anytime, and with any stan-dard just by downloading a reconfiguration file. There is also another advantage: theSWR may accept communication modes, which had not yet been invented at the time

Chapter written by Amor NAFKHA, Pierre LERAY and Christophe MOY.

301

anafkha
Note
a given
anafkha
Note
others
Page 2: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

302 SoftWare Radio to Cognitive Radio

of manufacturing the equipment, by making a simple update. Hence, although thefunctionality of equipment varies, the executing electronics remains fixed at the timeof manufacture.

The constraints concerning the SWR architectures are numerous and still requiresubstantial research efforts and it is one of the largest concerns for a target hard-ware platform. Indeed, such an architecture should be reconfigurable and must haveflexible resources associated to different constraints involving the fields of radio-communication and embedded electronics. The most suited platforms to this type ofprocessing are, advantageously, heterogeneous as they consist of several componentswith different computing capabilities having their particular benefits. We propose toestablish a non-exhaustive state-of-the-art of the major hardware architectures devel-oped for the SWR.

11.3. Hardware architectures

Commonly used architectures are: general-purpose or specialized processors, coarse-grain or fine-grain reconfigurable architectures and dedicated circuits.‘These hardwarearchitectures have different characteristics in terms of flexibility, performance and en-ergy efficiency [HAR 01]. Architectures like CPUs, from Von Neumann or Harvardmodel, are more likely to carry out a processing in sequential fashion, even thoseones that incorporate a large number of functional units and long pipelines to extractthe best instruction-level parallelism. In contrast, the dedicated circuits have a spa-tial execution model allocating processing on a set of functional units specialized tothat particular type of processing. In case of processors, each machine cycle spec-ifies which functional units are active, what types of operations must be conductedand how data flows from the storage space to the functional units. In case of dedi-cated circuits, architecture is fixed: the data is processed according to a predeterminedpattern and passes through a particular network of operators in an ultra-fast manner.Between these two extremes lie the reconfigurable architectures that allow us to havea good compromise between flexibility and performance. Figure 11.1 shows the per-formance/flexibility compromise for major computing technologies available today.Beside their good compromise between power and flexibility, reconfigurable architec-tures allow more freedom, ease in development, and rapid prototyping in comparisonto the design of a dedicated circuit. Development times are closer to those of program-ming for processors. We will briefly introduce each type of architecture.

11.3.1. Dedicated circuits

Dedicated circuits, called Application Specific Integrated Circuit (ASIC), are oneway to make hardware computers. These are specialized circuits to perform a givenfunction. Unlike processors, there is no code to execute in this case. The algorithm is

anafkha
Barrer
anafkha
Note
trade-off
anafkha
Note
In the following sections, we.......
anafkha
Note
also
anafkha
Barrer
anafkha
Note
are basically an integrated circuits designed specifically for a special purpose or applications.
Page 3: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 303

Figure 11.1. Flexibility/Performance comparison for principal technologies

physically wired in the form of an assembly of logic gates. They are generally used forrapid implementation of intensive calculations. For each application, a different cir-cuit is created, either by building it entirely or by configuring a grid of pre-constructedcomponents. The advantage of ASICs is obviously their speed, since the logic con-nections are physically created rather than being programmed. The counterpart is avery high development and, especially, production cost and it only becomes profitableto use ASICs if the production is in large quantities. In general, the use of an ASICleads to several benefits, primarily related to reduction in system size. It shows:

– A reduction in the number of components on the PCB. Consumption and con-gestion are significantly reduced;

– The concept of ASIC, by definition, ensures the maximum optimization of adesigned circuit. Thus, we really have an integrated circuit that corresponds to ourown needs;

– Customizing the system gives privacy to the designer and as a result industrialprotection;

– Increased circuit complexity, speed of operation and reliability.

ASICs are dedicated to a specific task and typically handle large amounts of data.It is possible to choose, depending on the desired degree of customization, betweenseveral techniques for developing an ASIC: Full Custom, Standard Cell and GateArray.

11.3.2. Processors

The processors are sequential computers usually based on an architecture modelproposed by John Von Neumann in 1945 [NEU 45]. The principle of such architectureis illustrated in Figure 11.2. Following this block diagram, we can model all thecomponents of a computer or a system on chip (SoC). The principle of operation ofthe processors is based on sequential execution of instructions by the control unit,which implements the instruction processing to an arithmetic logic unit (ALU). TheALU then performs the requested processing on data that is read from memory.

anafkha
Note
Power consumption
Page 4: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

304 SoftWare Radio to Cognitive Radio

Figure 11.2. Internal architecture of a processor

Harvard’s architectural style contrasts with that of Von Neumann as it uses twoseparate structures to store the program and the data [HEA 95]. With two separatebuses, the Harvard architecture allows simultaneous transfer of data and instructionsto be executed. Thus, the ALU will simultaneously access the instruction and itsassociated data. This model is faster than that of Von Neumann. However, the per-formance gain is achieved at the expense of an increased internal complexity of thesystem. Although, still based on the principle of Von Neumann or Harvard, processorshave received several architectural changes to improve their computing capacity.

In the following sections, we present different architectures that exist in currentprocessors. This list is non-exhaustive, the purpose is only to present the main ar-chitectural features and their resulting advantages and disadvantages. One might alsonote that these characteristics are not all mutually-exclusive and some can be com-bined to obtain new architectures.

11.3.2.1. CISC architecture

Complex Instruction Set Computer (CISC) Architecture [EDE 90] was (and con-tinues to be) used in the early microprocessors. Processors based on the CISC archi-tecture can handle complex instructions that are directly wired into their electroniccircuits; it means that some instructions that are difficult to create from the basic in-structions are printed directly on the silicon chip to gain execution speed for thosecommands. These architectures offer a comprehensive instruction set, but the execu-tion of each instruction requires a large number of clock cycles. CISC architectureshave been developed until the late 1980s but now seem to disappear in the next gen-erations of processors. The most popular CISC processors are the Intel 8051 and itscompetitor, the Motorola 6805.

11.3.2.2. RISC architecture

Reduced Instruction Set Computer (RISC) Architecture [MIR 92] has a reducedinstruction set where each instruction performs a single elementary operation. The in-struction set of RISC architecture is more uniform than that of CISC. All instructionsare encoded for the same size and run in a single clock cycle. Thus, providing sim-plicity and consistency, it was possible to “pipeline” the decoding and the executionstages, which ultimately reduce the execution time.

Page 5: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 305

RISC architectures are often coupled with Harvard architecture and their effectiveuse depends largely on the ability of their compilers to efficiently handle their inter-nal registers. The performance of modern RISC processors is strongly dependent onthe memory hierarchy. RISC architecture is implemented in current processors andhas gradually replaced the CISC architecture. The RISC has a drawback that is thebottle-neck posed by data transfer with the memory, since it has to be carried out bytwo separate and specialized instructions. The most widely RISC processors are theSPARC, the PowerPC, the MIPS and the ARM.

11.3.2.3. Superscalar architecture

The term superscalar corresponds more to a characteristic than an architecture it-self. This characteristic can be applied to both CISC and RISC architectures. Insuperscalar architectures [SMI 95], several instructions can be decoded and launchedevery cycle. Such hardware dynamically detects which instructions can be run si-multaneously. Recent RISC architectures can launch more than two instructions perclock cycle which results in less than one cycle per instruction. The bottleneck fordata transfers with the memory is even more critical in this type of architecture. Toreduce this problem, the register-bank contains more registers than conventional ar-chitecture. The use of cache memories for instructions and data also reduces the costof data transfer with the memory.

There are different models of superscalar execution, which can be classified ac-cording to how the allocation and re-ordering of instructions are carried out. Instruc-tion allocation is the technique which assigns an instruction to a computing unit andinstruction reordering is the technique to execute an instruction before or after another.

These two techniques can be applied, either statically at compiler-level (providesa code where consecutive instructions that are more independent), or dynamically atprocessor-level. Among the superscalar processors with dynamic allocation, we findall the modern processors, such as the Pentium IV, the Alpha 21164, the Power, etc.The advantage of dynamic allocation is that the executable code of such a processorbecomes independent of the number of functional units contained in the hardware.

11.3.2.4. VLIW architecture

Very Long Instruction Word (VLIW) architecture uses an instruction word that con-sists of several hundred bits containing the code for several instructions that are to bedecoded and then executed simultaneously. Unlike dynamic superscalar architectures,the detection of instruction-level parallelism is done statically at compile time. VLIWprocessors represent the class of superscalar processors that have static allocation.

The advantage of these architectures comes from the fact that we can use a largenumber of functional units in parallel without increasing the complexity of controlpart. They, however, have two major drawbacks:

Page 6: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

306 SoftWare Radio to Cognitive Radio

– Very dependent on the performance of the compiler and branching strategies;– The compiler is closely coupled with the architecture. It means that a compiler is

not compatible with all the hardware implementations of a single VLIW architecturesince it must take into account the peculiarities of the machine on which it works.

Recently, various microprocessors are built using VLIW concepts, such as thePhilips Trimedia, the Transmeta Crusoe (128 bit) and the Itanium. Now days, it isimpossible to consider an application of real-time signal processing without resortingto a parallel machine or a dedicated hardware solution. VLIW architectures are boom-ing for both the Digital Signal Processor (DSP) and RISC type of processors. DSPsare specialized processors for computations related to signal processing. For example,it is not uncommon to see Fourier Transforms implemented in a DSP. Manufacturerssuch as TI (Texas Instruments), Analog Devices, Motorola, ST (STMicroelectronics)offer their own structures of DSP. Among the different types of DSPs from TI, we candistinguish the fixed-point DSPs, floating-point DSPs, control-oriented DSPs (the TIC2x family), low-power-oriented DSPs (the TI C54, C55), high-performance-orientedDSPs (the TI C6x family), etc.

11.3.2.5. Vector architecture

Vector architectures can bind identical operations on all the elements of a vec-tor. These architectures exploit maximum data-level parallelism. Vector processorsachieve high computational power with instructions that operate on sets of numbers,called vectors, instead of isolated numbers, called scalars. Initially they were uni-processor machines specially adapted to vector and matrix calculus and they did notexceed a few GFLOPS. Recently, their performance is largely superseded by archi-tectures consisting of multiple vector processors or microprocessors put in parallel.Currently, vector processors have become rarer and are used as multiprocessor cores.

11.3.3. Reconfigurable architecture

Reconfigurable architectures are differentiated by two distinct criteria: granularityand the reconfiguration method (static/dynamic and partial/global). Granularity of anarchitecture defines the size of data being processed by the computing resources of thearchitecture. Computing resources dealing with single-bit of data are characterized asfine-grain architectures while architectures processing several bits of data are charac-terized as coarse-grain architectures. Important work on this classification has beenmade in recent years [BOS 04, HAR 01, SCH 01]. In 2001, R. Harteinstein [HAR 01]gave a retrospective of a decade of projects in reconfigurable architectures. This studyrelates more particularly to fine-grain rather than coarse-grain reconfigurable architec-tures. JP David [DAV 02] analyzed, in his thesis, different reconfigurable architecturesand classified them into three broad categories, coarse-grain reconfigurable architec-tures without processor, coarse-grain architectures with processor, and fine-grain re-configurable architectures. The coarse-grain architectures are considered easier to

anafkha
Barrer
anafkha
Note
The use of single bit wide processing elements like the configurable logic bloc (see section 11.3.3.1) is called fine-grained architectures, however, using wide datapath like 32-bit or 64-bit is called coarse-grained architectures.
Page 7: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 307

reconfigure. In addition, for the same calculation, the fine-grain architectures requiremore configuration bits than the coarse-grain architectures, consequently extendingthe time for reconfiguration.

11.3.3.1. Fine-grain architecture

Components based on the fine-grain logic are mostly known as FPGAs. Fig-ure 11.3 shows the internal structure of these circuits. This structure allows the FPGAto emulate any circuit, provided only that it is not too big not to exhaust the logic androuting resources available in the FPGA. However, this flexibility in programmabil-ity comes at the expense of reduction in circuit performance. Although FPGAs aredesigned with the same technology as general-purpose processors running at severalgiga-hertz, the clock frequency of FPGAs does not exceed a few hundred mega-hertzfor newer models. Nevertheless, the great architectural freedom can make the mostof fine-grain parallelism of implemented circuits, which largely compensates the rela-tively low operating frequencies. FPGA cells are independent of each other; they maywell perform their calculations in parallel. The FPGA allows fine application-leveloptimizations by varying their parallelism. Current FPGAs integrate tens of millionsof gates. Originally designed to check the application behavior for ASIC integration,the role of FPGAs has changed considerably since. FPGAs are directly integratedinto final products to perform the role of glue logic. They can also be used as smallcapacity products whose development requires the ability to edit the contents of thecircuit.

Figure 11.3. Example of the internal architecture of an FPGA

FPGA is characterized by its routing network; topology of the latter is representa-tive of the type of the deployed architecture. Thus, the architecture of such “computingislands” includes various functional elements in the form of a matrix (see Figure 11.3).The configurable elements include I/Os, logic modules (LUT, multiplexer, etc.), single

anafkha
Note
However, the fine-grained architectures give more flexibility and high low-level reconfigurability level.
Page 8: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

308 SoftWare Radio to Cognitive Radio

or dual port memories and more complex arithmetic elements such as wired multipli-ers/accumulators and adders/subtractors. Programmable blocks provide the connec-tion between configurable elements while the matrix connects these blocks together.Modern FPGAs are large enough and contain enough memory to be configured toaccommodate a processor core or a to execute a software program. Such processorcore is called a softcore as opposed to microprocessors wired in silicon (called hard-core). Today, FPGA vendors even include one or more hardcore processor cores ona single component to conserve the resources of the configurable logic component(see Figure 11.4) for alternate use. This does not preclude the use of softcore proces-sors having many advantages, including that of a set of processing elements that aretailored for an intended application.

Figure 11.4. Architecture of Xilinx Virtex-6 family [XIL 10]

To take full advantage of the inherent flexibility of FPGAs that incorporate SRAMtechnology, we can exploit the SRAM to store the different configurations. Indeed,nothing prevents to virtually modify the contents of SRAM between powering-on andpowering-off of the FPGA and its function may change during operation. Observingthe size of FPGAs, the reconfiguration time of the entire component can be prohibitivefor highly-constrained real-time applications. The Xilinx Virtex FPGA can dynami-cally reconfigure a portion of the configuration memory (partial reconfiguration) whilemaintaining, during the reconfiguration phase, the full functionality of the parts of theconfiguration memory that are not affected by the reconfiguration. We can also con-sider local dynamic reconfiguration for executing functions, without interrupting thecomplete application. The contributions of partial reconfiguration are:

– During the reconfiguration of a function, other functions remain active and cancontinue their tasks. However, one must be very conscious about the interaction be-tween the region being reconfigured and other active regions;

– The file size of partial reconfiguration will be reduced proportional to the sizeof the area to be reconfigured. Thus, the reconfiguration time can be considerably

Page 9: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 309

reduced in the reconfiguration scenarios where only a small subset of functions mustbe modified [DEL 09, MOY 08b];

– This mode of reconfiguration uses the transfer paths of the configuration data andspecific physical structures embedded in the component. Thus, the use of integratedresources for partial reconfiguration allows a simplification of the overall processingarchitecture.

11.3.3.2. Coarse-grain architecture

The computing elements of coarse-grain architecture are based on ConfigurableFunctionnal Block (CFB) or Reconfigurable Cell (RC). It consists of at least an ALUand one or more configuration registers. Several developments have been made on thismodel such as adding a shift register to handle floating point operations. For example,for radio communication applications we can cite some pieces of architecture:

– SystolicRing [SAS 01] is the architecture developed at LIRMM Montpellier andis made of a number of clusters interconnected by a ring-shaped configurable network.The architecture is shown in Figure 11.5. Each cluster includes a number of identi-cal processor cores called “Dnode,” each with a memory that can hold up to eightinstructions. Each processor includes two operators (an ALU and a multiplier). Eachcluster then consists of two Dnodes. An instruction for each “Dnode” of a cluster canbe loaded in one clock cycle when operated at an estimated execution frequency of200 MHz. There are two configuration modes described as static and dynamic. Inthe static mode, the instruction-register of each “Dnode” remains fixed. The wholearchitecture is seen as a static datapath after configuration. In the dynamic mode, theinstruction memories of each “Dnode” are loaded during configuration phase and theinstructions contained in each of these memories are then executed sequentially;

– MorphoSys architecture [PAR 02] was developed at the University of Califor-nia; this architecture is at the forefront of research in the field of coarse-grain recon-figurable systems. A RISC processor, a reconfigurable cell array and a sophisticatedmemory interface are combined on a single chip. The architecture is shown in Fig-ure 11.6. The reconfigurable cell is the largest processing element in the architecture.Each cell is composed of four basic elements: an ALU for processing, a data mem-ory, an I/O module for interconnection with other reconfigurable units and finally ablock of fine-grain reconfigurable logic components. The processor controls the basicoperations of the matrix through special instructions added to its instruction set. Thememory is used to store the different possible configurations of the architecture;

– Pleiades architecture [ABN 01] was developed at Berkeley and has been de-signed to meet the constraint of low power consumption. The Pleiades architectureis composed of a host processor, multiple execution units of different levels of gran-ularity, a memory and an interconnection network connecting the components of thearchitecture. Depending on the application domain, a new instance of architecture canbe created by adjusting the type or the number of execution units. The partitioningmethod associated with this architecture is purely manual. The implementation starts

Page 10: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

310 SoftWare Radio to Cognitive Radio

with a pure software description running on the host processor. If timing and powerconstraints are not met, the tasks are migrated manually to the execution units;

– DART architecture [PIL 08], developed at the National Institute for Research inComputer Science and Control (INRIA), is a dynamically reconfigurable structure.The highest level of the hierarchy in DART is shown in Figure 11.7. The adaptationof the platform is conducted in a partial, online and dynamic manner. This architec-ture consists of a task controller, storage resources, and computing resources calledclusters. The task controller assigns different computations to be executed to differentclusters.cte les différents traitements à exécuter aux différents clusters.

Figure 11.5. Architecture of SystolicRing

Figure 11.6. The Morphosys architecture

Page 11: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 311

Figure 11.7. System-level view of DART

11.4. Characterization of the implementation platform

In this section, we present the main physical characteristics necessary for the im-plementation of the SWR/CR equipment.

11.4.1. Flexibility/reconfiguration

From a conceptual standpoint, any system that is capable of changing its process-ing or the manner to carry out the processing upon an external order or an internal de-cision can be referred as a reconfigurable system. Reconfiguration, therefore, mainlytargets structural changes in the platform, or allocation of resources. The implemen-tation of an application, which can be generally broken down into functions of controland scheduling, functions of processing and storage, and communications functions,can follow two main approaches:

– The software implementation that translates the required features in the form ofexecutable programs running on processors that can be more or less specialized;

– The hardware implementation that consists in designing dedicated hardware ar-chitectures, they are given different names: hardware accelerator, coprocessor, etc.;

– The hardware/software implementation that consists in merging the two previousapproaches.

The first approach provides great ease for application-level reconfiguration by edit-ing and loading a new executable code, but suffers from performance and limited com-puting power due to sequential processing. An improvement is to have multiprocessorplatforms sharing the tasks. The second approach is used to accelerate processing andreaches speeds of calculation hardly possible with pure software solutions. It consistsin building a very specific processing architecture, using strong task-level parallelismby duplicating computational resources and adopting an operation-level pipeline indata-flow processing.

Page 12: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

312 SoftWare Radio to Cognitive Radio

Dynamic reconfiguration applies when data and configurations are distributed si-multaneously. Thus, the phases of reconfiguration and execution are concurrent (e.g.the systems like processors). Static reconfiguration characterizes systems in whichthe reconfiguration and data are distributed via different routes (e.g. programmablecircuits like FPGA).

To reconfigure as well as implement different types of processing, it is neces-sary to design configurable features or modify the physical structure for implementingthese functions. Although the parameter reconfiguration seems much easier to imple-ment, the technology of programmable components also offers the ability to changethe embedded hardware resources. In all cases, the resources dedicated to manage thereconfiguration must be foreseen in the overall architecture of the platform.

In an effort to optimize complex applications, both hardware and software ap-proaches are often combined and lead to heterogeneous platforms involving proces-sors and dedicated hardware blocks, sometimes collocated in a single component:such an approach is called a system on chip (SoC).

The term reconfigurability is used to denote the capacity of the circuits to switchthe processing (e.g. changing the physical layer processing to accommodate a changein radio access network). This term is strongly linked to the notion of flexibility intro-duced by the hardware or software part. JP Delahaye [DEL 07a] distinguishes severaltypes of systems that provide the flexibility in software or hardware. There are threetypes:

– Systems with fixed hardware architecture: these systems are neither modifiableat design time nor at execution time. Nevertheless, they can provide software flexibil-ity (programmability in the form of one or more processors). Flexibility is only intro-duced by the software developed by the user that can be changed at runtime. Thesesystems can easily use the software layers such as the architecture called SoftwareCommunication Architecture (SCA) described in paragraph 11.6.1;

– Flexible systems at design time: These systems are flexible in both software andhardware at the design time. They consist of standard hardware modules or dedi-cated modules defined at design time. The University of Twente in Netherlands pro-poses a heterogeneous SoC architecture consisting of a generic ARM processor core,an FPGA-based accelerator and a matrix of computing elements called Field Pro-grammable Function Array (FPFA). Each computing element, called tile, receives itsown instruction flow and allows for parallel computations. Each tile consists of aconfigurable ALU, a program memory, a local memory, a control unit, and a commu-nications unit;

– Flexible systems at runtime: the flexibility of these systems can be obtained viasoftware or hardware. Moreover, we can distinguish two types of architectures:

Page 13: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 313

- Architectures defined by parameterization (introduced by Jondral et al.[JON 02]. The objective here is to highlight the similarities between different stan-dards. In the literature, Harada et al. [HAR 05] propose an implementation of amulti-standard and multimode receiver that is controlled by the parameters. Param-eterization techniques, from the point of view design methodology, are presented inChapter 10.

- Dynamically reconfigurable architecture: such architecture is required forSWR implementation. It can handle very different applications in terms of grain,computing patterns, and time constraints. It offers high performance, is dynamicallyreconfigurable and consumes minimal energy in order to be embedded in a system.Compton [COM 02, COM 03] did a review of the principles and main features ofreconfigurable architectures. These architectures have been classified by L. Bossuet[BOS 04] in the category of multigrain architecture as processor rather than copro-cessor. According to Cummings [CUM 99] the performance/flexibility compromiseprovided by FPGAs makes them a good candidate for SWR applications over ASICand DSP. The manufacturer of the Xilinx FPGA is particularly involved in the domainof SWR, for example through participation in the SDR Forum.

11.4.2. Performances

The performance of an architecture is related to the number of basic computingoperations it can perform and the data rate it can handle. Generally, improving perfor-mance of an architecture is coupled with increased energy consumption. The perfor-mance is probably the most important criterion for a given architecture, but it must becompared against power consumption in case of embedded equipment.

11.4.3. Power consumption

Power consumption is divided into two distinct parts: the static power and dynamicpower [PIG 04]. The static power consumption occurs mainly in standby mode of thecircuit. It is due to transistor leakage currents. Ideally, it should be zero as in thestandby state of a CMOS circuit there is no path between the supply and ground. Thedynamic consumption occurs at each transition of a logical node in a CMOS circuit.It is divided into two components: the power consumption due to short-circuit currentand that due to load currents. Dynamic power is usually by far the dominant fractionof the total power dissipated by a circuit (about 90% [TIW 98]). The transition to veryfine geometries (< 35nm), however, tends to reverse this growing trend [KIM 03].All criteria that we just mentioned are not independent and no platform SWR will beoptimal on each of them. Therefore, the design of an SWR/CR is primarily a matterof compromise at all design levels. The more these compromises are located at higherabstraction levels, the higher are the resultant gains.

Page 14: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

314 SoftWare Radio to Cognitive Radio

11.5. Qualitative assessment

It is important to note that the above-mentioned components/hardware architec-tures are not necessarily competitors. Instead, they often complement each other: forexample, one can use general purpose processor core to perform management of theimplementation platform, the coprocessors of the type of dedicated circuit and FPGAsfor operations involved in specific signal processing, and the DSPs for other types ofprocessing. In the context of SWR, we agree to pay an overhead to gain flexibil-ity/reconfigurability. In every context of system design, the designer will determinethe acceptable additional cost/overhead. In all cases, this cost allows an improvedservice or a facility in design as compared to a conventional radio product. It is note-worthy that in the context of SWR, flexibility can bring to terms a significant econ-omy. Table 11.1 present a qualitative comparison of hardware solutions for possibleimplementation of a platform for SWR. Each type of solution offers advantages anddisadvantages.

Technology Performance Power Consumption Flexibility PriceASIC Excellent Low Null Depends on the qualityFPGA Very good High Good Average to lowDSP Good Average Very good Average to lowGPP Average Average Very good Low

Table 11.1. Qualitative comparison of different hardware solutions

A SWR is usually characterized by a variety of processing. Functional analysisand factorization of these various types of processing have been widely studied in or-der to pool them to reduce the number of configurations and resources. JP Delahaye[DEL 07a] proposed a classification of baseband processing for SWR. Three func-tional classes were defined consisting of the following types of processing:

– Modulation class: The functions of this class correspond to the processing per-formed prior to the frequency translation of the signal (e.g. filter shaping). The func-tional need for this class is the computation capacity of the executing architecture.Thus, the hardware architecture suitable for carrying out this type of class will be adedicated hardware accelerator;

– Coding class: This class is characterized by the diversity of source coding andchannel coding schemes to manage. The functions dealing with bit-type informa-tion are often based on computation structures using shift registers and XOR logicaloperators. The main functional requirement for this class is the flexibility of the im-plementation architecture to meet the diverse coding schemes. FPGA architecture canaddress this diversity;

– Data structure class: the functions of this class correspond to the control-oriented processing to perform concatenation, segmentation or multiplexing of data

Page 15: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 315

packets of variable sizes. The functional need for this class is the memory capac-ity for data manipulation. The implementation architecture performing the duties ofthis class requires maximum flexibility to handle data blocks of different sizes. Aprocessor-type architecture associated with a fast memory can meet this need.

Based on the previous classification, flexibility of the hardware platform of theSWR system is necessary to meet the variety of processing to be performed. Thus, thehardware platform must be heterogeneous containing (non-)specialized processors,programmable logic circuit and hierarchical and/or distributed memory.

11.6. Architectures of software layers

Given the rapid evolution of hardware and software focused on embedded systemsand the SWR, it gradually leads to a less centralized organization of resources. In-creasing the share of software design in the development of radio systems is explainedby economic as well as technical reasons described in [KOU 02]. The integrationof intermediate software layers represents a trend in the evolution of embedded sys-tems. In general, the intermediate software layers provide an abstraction of low levelhardware and software resources. They aim to provide services to facilitate the inter-operability of components or applications in order to interact i.e. to use or to drive oneanother. The issue of interoperability among a wide variety of hardware/software ar-chitectures brings great importance to the middleware: middleware is a software layerinterleaved between the operating system and applications themselves to carry out theexchange between various applications. The middleware ensures reusability. Like theOSI model for networks, integration of intermediate software layers can be tailored tothe desired services and abstraction levels.

11.6.1. The SCA software architecture

The heterogeneity of the hardware platform presented above is one of the primaryconcerns for a designer of restricted SWR equipment. Abstraction of hardware im-plementation constraints vis-à-vis radio application can be created by an intermediatelayer between hardware and software. Thus, the U.S. Army investments, through aca-demic/industry projects, have allowed the definition of a unified software architecturecalled SCA. The SCA aims to encourage manufacturers of communication systems touse standard platforms and to find a way to make them all compatible.

The SCA, a real international benchmark, was designed to provide an environmentfacilitating the deployment of radio architecture on any hardware platform. Some

Page 16: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

316 SoftWare Radio to Cognitive Radio

companies offer services for designers to use the SCA software architecture. Exam-ples include the Communication Research Center of Canada (CRC) whose SCARI-Open and SCARI software suite were the very first environments following the guide-lines of the SCA. Other companies provide either software components of the SCA ar-chitecture or tools to develop radio elements compatible with the SCA, such as Zelig-soft or PrismTech.

As a result, using the SCA, radio applications are theoretically decomposed intoentities deployed on heterogeneous and distributed radio platforms. These entitiesprovide and require abstract software interfaces described in the unified modeling lan-guage (UML) and/or the Interface Definition Language (IDL) of CORBA middleware.However lack of efficiency, particularly in terms of real-time execution, results in thefact that only the products subjected to military constraints comply with this approach.Other solutions that are more comprehensive and less restrictive are also studied, asshown in the following paragraphs.

11.6.2. Intermediate software layer: ALOE

Revés et al. [REV 05] of the Polytechnic University of Catalonia offer another ap-proach for the intermediate layer, called ALOE. This solution proposes a more generalapproach than that of the SCA for designing the RLR equipment. It not only allowsan abstraction between the hardware and software but it is also an equipped designflow. The level of abstraction in this layer is less than that of the SCA to minimize theoverhead caused by the intermediate layers. With the aim of providing the hardwareabstraction services, the abstraction layer called P-HAL (c.f. Figure 11.8) is scaled forheterogeneous hardware platforms and targets the modem applications for the SWR.This abstraction layer makes the hardware architecture abstract and transparent to theapplication. The ALOE is intended to provide a common development environmentfor radio applications and makes them independent of the hardware layer. The appli-cation description tools supported by P-HAL are circuit- specific, where OMG-IDL isthe only exception and it is used to describe the applications on CORBA.

Figure 11.8. Software architecture of P-HAL

Page 17: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 317

11.6.3. Software architecture for reconfiguration management: HDReM

JP Delahaye [DEL 05b, DEL 07a] at Supélec has defined a reconfiguration man-agement architecture called Hierarchical and Distributed Reconfiguration Manage-ment (HDReM) adapted to the context of SWR. The architecture HDReM proposesadding a wrapper of operator reconfiguration management to the classical chain of op-erators performing the radio signal processing. This makes sense in the SWR equip-ment where flexibility of the processing implementation is pushed to the maximumcapacity of the hardware equipment. One of the highlights of this architecture is thefunctional separation of datapath and reconfiguration path. This also induces the SWRapplication design where the radio functions are separated from reconfiguration man-agement functions. In classical architecture, we must encapsulate reconfiguration dataand processing data in a single package. This operation requires us to identify in theheader of each data packet being sent to each operator. In addition, each operatormust be able to reformat this package to pass it to the next operator. It is, therefore,acknowledged that such a solution results in a substantial overhead in the control partof each operator. The HDReM architecture reduces the time overhead of reconfigura-tion management elements by separating the data flow from the control flow.

Another important feature of HDReM architecture is the specialization of the re-configuration manager that allows taking into account the specificities of the process-ing operator as well as the constraints related to the target implementation hardwareplatform. Each operator is controlled by a dedicated Reconfiguration Manager (orReM). Each ReM entity is thus optimized for a specific type of reconfigurable resourceand this management is called distributed management. Nevertheless, it remains diffi-cult to manage a set of radio system by a single distributed management due to a largenumber of resources and the heterogeneity of the implementation platforms. Thus, athree-level hierarchical structure (see Figure 11.9) is adopted in order to centralize thereconfigurations of a set of resources and to have a global understanding of the systemparameters.

HDReM architecture is defined as follows: at the highest level in the hierarchy liesa single reconfiguration manager called L1_ReM which performs the role of generalreconfiguration supervisor. This central manager handles lower-level managers and isindependent of their physical implementation. The designer is free to implement thismechanism as he/she sees fit. It should be noted that the hardware module on whichthis entity will be implemented will become the master module of the platform. Thisentity handles the instantiation and control of one or more lower-level units calledL2_ReMU. Each of these units has the function of instantiation and control of one ormore lower-level units called L3_ReMU. Located at the lowest hierarchical level, eachunit L3_ReMU manages an operator allowing the partial implementation of the overallprocessing of a communication standard. This shows that the reconfiguration data tobe exchanged from the entity L1_ReM to one or more entities L3_ReMU making theHDReM Top/Down-type of architecture where the reconfiguration information is sent

Page 18: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

318 SoftWare Radio to Cognitive Radio

Figure 11.9. The HDReM architecture

from top to bottom. The L1_ReM is interfaced with the outside world and handles therequests for context change from the upper layers. Chapter 4 describes an extensionof HDReM for SWR called HDCRAM.

11.7. Some platform examples

Currently, a large number of experimental platforms are built to support researchprojects. A non-exhaustive selection of these platforms can be found in [SCH 01].Manufacturers of SWR and IR platforms have adopted different choices for addressingissues of flexibility, partitioning, reconfigurability, etc. To highlight the variety ofarchitectures, four platforms are discussed briefly.

11.7.1. The USRP platform

The Universal Software Radio Peripheral (USRP) platform, Figure 11.10, is amodular component for the prototyping of radio links where different electronic de-signs are available that operate in the HF range, 30 MHz, and SHF 2.4 GHz. Thefull-featured platform is controlled by free software, available from the GNU Radioproject mentioned in Section 10.3.1.3. Depending on the desired level of abstractionand access to the necessary equipment, development takes place in Python, C++ or

Page 19: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 319

Verilog. The great interest of the USRP is to provide a simple prototyping platform ata low cost.

The USRP has become, in conjunction with the GNU Radio, the most used labworldwide for teaching and prototyping the research in the domain of SWR. Even ifit does not provide the complete set of possible features that one would expect from aSWR platform, it is an ideal solution for getting started in the SWR research domain.

11.10. Plate-forme matérielle USRP vendue parFigure 11.10. USRP hardware platform sold by Ettus

In this context, the initiative known as Software Defined Radio for all (SDR4all)seeks to access, in a simple manner, the signal processors of a real radio link. Theinterface SDR4all intends eventually to provide access via a radio USB key, to a plugand play radio link for all the researchers, students and engineers. The connection willbe driven by a high level language (e.g. Matlab) to conduct wireless transmissionsbetween portable computers. The tool allows any researcher to test, in a real propaga-tion environment, the algorithms for signal processing/coding/compression. This lowcost solution is complementary to the expensive software radio platforms approaches(that are certainly much efficient) to be purchased. The purpose of SDR4All is notto perform high-speed transmissions or develop algorithms to provide dynamic recon-figuration but to provide, in particular, a complementary tool to perform research andeducation in the field of signal processing. One should know that the majority of algo-rithms are tested on the models of propagation channels and noise models, and theirmigration to real life platforms induces significant errors. SDR4all allows among oth-ers to address this gap, allowing testing in a real context. SDR4all draws the attentionof the researchers/students who are specialists in the field of signal processing and donot have the financial means and time to invest in the hardware constraints.

11.7.2. OpenAirInterface

OpenAirInterface, developed at EURECOM through some collaborative projects(IDROMEL, E2R, etc.) is a hardware/software developing platform available as freesoftware. The objective of the project, called Impact of reconfigurable equipment forthe roll-out of future mobile networks (IDROMEL), was twofold: to study the feasi-bility of deploying reconfigurable equipment and their use in future mobile networksand to offer the R&D community an open platform for reconfigurable experimental

Page 20: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

320 SoftWare Radio to Cognitive Radio

equipment. The developed hardware platform combines: a full flexible baseband pro-cessing, a system on chip called MAGALI, a radio frequency stage ranging from 400MHz to 7.5 GHz (Figure 11.11), a multi-antenna, and a completely flexible MediumAccess Control (MAC) layer. The main function of the platform is to allow the explo-ration of various scenarios of SWR and IR. The specifications, drawings and programcodes for radio part, baseband processing and MAC layer are available on the website(www.openairinterface.org).

Figure 11.11. RF stage of OpenAirInterface platform

11.7.3. Kansas University Agile Radio (KUAR)

The Kansas University Agile Radio (KUAR) platform, shown in Figure 11.12, wascreated to target the frequency range of 5.25 to 5.85 GHz with a bandwidth of 30 MHz[MIN 07]. The platform includes:

– A general purpose processor (GPP) running at 1.4 GHz.– A programmable gate array: a Virtex II FPGA.– Two interfaces, Gigabit Ethernet and PCI-Express, to communicate with other

entities: computer.

The main purpose of this platform is to allow a modular and reconfigurable im-plementation of the digital signal processing algorithms in the areas of wireless radionetworks, the algorithms of dynamic access to spectrum and SWR/CR algorithms.KUAR platform was designed to be powered by battery in order to allow an au-tonomous operation.

11.7.4. Berkeley Cognitive Radio Platform

The Berkeley Emulation Engine (BEE2) platform [BRO 04] (see Figure 11.13),developed at University of Berkeley is intended for implementations of wireless com-munication systems. The platform is based on FPGA technology that enables highcomputing power combined with great flexibility at reconfiguration-level. BEE2 con-sists of a motherboard on which there are five FPGAs (Virtex II Pro 70) of which oneis reserved for communication and remaining four are reserved for the calculations.

Page 21: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

Implementation platforms 321

Figure 11.12. KUAR hardware platform

Each of the five FPGAs has four independent channels with DDR2 memory allowinga high bandwidth. The system design is based on Simulink and system partitioningin many FPGA is manual. To develop a heterogeneous system, the five FPGAs inte-grate an embedded processor core Power-PC named. The radio boards were designedto withstand a bandwidth up to 25 MHz. RF modules have a very good sensitivity.BEE2 requires a slow connection with a PC that justify as well that entire processingis performed on the platform itself.

Figure 11.13. BEE2 hardware platform

11.8. Conclusion

In this chapter, we explored the space of architectural solutions and technologies toimplement the SWR and IR where the main criterion was reconfiguration. By exam-ining this criterion, we first analyzed the flexibility of different hardware componentsaccording to their potential for reconfiguration at physical level. We, subsequently,scanned the reconfiguration at different levels of abstraction that allows us to designthe SWR system independent of the physical layer. Finally, we presented in a non-exhaustive manner, a few platforms for implementing the SWR. However, currentlythe dynamic reconfiguration is not exploited in the context of energy management.This constraint, although unanimously recognized as one of the most critical design

Page 22: Implementation platforms · communication and embedded electronics. The most suited platforms to this type of processing are, advantageously, heterogeneous as they consist of several

322 SoftWare Radio to Cognitive Radio

constraints of embedded systems, is almost never taken into account during the de-sign of reconfigurable architectures. Harnessing the potential of reconfigurable ar-chitectures in the context of energy management i.e. green communication, is at themoment, among the most attractive research themes.