Software-defined Radios: Architecture, State-of-the-art ... · art SDR platforms in the context of...
Transcript of Software-defined Radios: Architecture, State-of-the-art ... · art SDR platforms in the context of...
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 1
Software-defined Radios:Architecture, State-of-the-art, and Challenges
Rami Akeela, and Behnam DezfouliInternet of Things Research Lab, Department of Computer Engineering, Santa Clara University, USA
Email: [email protected], [email protected]
Abstract—Software-defined Radio (SDR) is a programmabletransceiver with the capability of operating various wirelesscommunication protocols without the need to change or updatethe hardware. Progress in the SDR field has led to the escalationof protocol development and a wide spectrum of applications,with more emphasis on programmability, flexibility, portability,and energy efficiency, in cellular, WiFi, and M2M communication.Consequently, SDR has earned a lot of attention and is ofgreat significance to both academia and industry. SDR designersintend to simplify the realization of communication protocolswhile enabling researchers to experiment with prototypes ondeployed networks. This paper is a survey of the state-of-the-art SDR platforms in the context of wireless communicationprotocols. We offer an overview of SDR architecture and itsbasic components, then discuss the significant design trendsand development tools. In addition, we highlight key contrastsbetween SDR architectures with regards to energy, computingpower, and area, based on a set of metrics. We also review existingSDR platforms and present an analytical comparison as a guideto developers. Finally, we recognize a few of the related researchtopics and summarize potential solutions.
Index Terms—SDR, Wireless Communication, Programmabil-ity, Co-design, LTE, WiFi, IoT.
I. INTRODUCTION
ADVANCES in wireless technologies have altered con-sumers communication habits. Wireless technologies are
an essential part of users daily lives, and its effect will becomeeven larger in the future. In a technical report, the WorldWireless Research Forum (WWRF) has predicted that 7 trillionwireless devices for 7 billion people will be deployed by2020 [1]. When these huge numbers of devices are connectedto the Internet to form an Internt of Things (IoT) network,the first challenge is to adjust the basic connectivity andnetworking layers to handle the large numbers of end points.There is an increasing number of wireless protocols that havebeen developed, such as ZigBee, BLE, LTE, and new WiFiprotocols for loT and Machine-to-Machine (M2M) communi-cation purposes due to different demanding requirements, oneof which is high energy efficiency [2]. Wireless standards,in general, are adapting quickly in order to accommodatedifferent user needs and hardware specifications [3], [4]. Thishas called for a transceiver design with the ability to handleseveral protocols, including the existing ones and those beingdeveloped. In order to accomplish this task, one needs torealize the protocols’ need for a flexible, re-configurable, andprogrammable framework.
Both consumer enterprise and military frameworks havea need for programmable platforms. Programmability is of
central significance for designers in the industry due to wire-less protocols that advance rapidly and consistently, hencerequiring a hardware that can keep up with the evolution. Forexample, the authors in [5] proposed a platform, OpenRadio,for programming both PHY and MAC layers while offeringa high level of abstraction. Rather than including yet anotherequipment to deal with a new standard or recurrence band,the equipment of a formerly introduced platform can adjustto the particulars of another standard. In a military scenario,for example, the needs of these platforms change in light ofterrible conditions that develop during a mission, which maynot have been predicted when designed initially, leading to thedevelopment and utilization of new protocols.
Software-defined Radio (SDR) is a technology for radiocommunication. This technology is based on software-definedwireless protocols, as opposed to hardware-based solutions.This translates to supporting various features and functionali-ties, such as updating and upgrading through reprogramming,without the need to replace the hardware on which theyare implemented. This opens the doors to the possibility ofrealizing multi-band and multi-functional wireless devices.
The driving factors for the high demand of SDR includenetwork interoperability, readiness to adapt to future updatesand new protocols, and more importantly, lower hardware anddevelopment costs. In a report [6], the SDR market is projectedto be more than USD 29 billion by the year 2021. GlobalIndustry Analysts [7] highlights some of the market trendsfor SDR as follows: (i) increasing interest from the militarysector in building communication systems and large-scaledeployment in developing countries, (ii) growing demand forpublic safety and disaster preparedness applications, and (iii)building virtualized base stations (BSs). SDRs are also idealfor developing future space communications [8]–[10], GlobalNavigation Satellite System (GNSS) sensors [11], Vehicle-to-Vehicle (V2V) communication [12]–[14], and IoT applications[15], [16], where relatively small and low-power SDRs can beutilized.
SDRs are implemented through employing various typesof hardware platforms, such as General Purpose Processors(GPPs), Graphics Processing Units (GPUs), Digital SignalProcessors (DSPs), and Field Programmable Gate Arrays(FPGAs). Each of these platforms is associated with theirown set of challenges. Some of these challenges are: utilizingthe computational power of the selected hardware platform,keeping the power consumption at a minimum, ease of designprocess, and cost of tools and equipment. The research com-
arX
iv:1
804.
0656
4v1
[cs
.NI]
18
Apr
201
8
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 2
TABLE IKEY ABBREVIATIONS
ADC Analog-to-Digital ConverterASIC Application-specific Integrated CircuitBS Base StationCUDA Compute Unified Device ArchitectureDAC Digital-to-Analog ConverterDSP Digital Signal ProcessorFFT Fast Fourier TransformFLOPS Floating Point Operations Per SecondFPGA Field Programmable Gate ArrayGPP General Purpose ProcessorGPU Graphics Processing UnitHLS High Level SynthesisNFV Network Function VirtualizationSDR Software-defined RadioSDR Software-defined NetworkSNR Signal-to-noise RatioSoC System on ChipUSRP Universal Software Radio Peripheral
munity and industry have both developed SDRs that are basedon the aforementioned hardware platforms. A few examplesinclude USRP [17], Sora [18], Atomix [19], Airblue [20],and WARP [21]. Each SDR is unique with regards to thedesign methodology, development tools, performance, and endapplication.
In this paper, we first present an overview of SDR archi-tecture as well as the analog and digital divides of the systemand interconnection of components. We then introduce thecriteria based on which the different hardware platforms areclassified. We thoroughly examine the architecture and designapproaches employed by these hardware platforms and presenttheir strengths and weaknesses in the context of SDR imple-mentation. Furthermore, we provide an analytical comparisonof hardware platforms as a guide for design decision making.Moreover, we discuss the use of respective development toolsand present a summary to help explain their functionalitiesand the platforms they support. Afterwards, we review theSDR platforms developed by both industry and academia, andanalyze and compare them using the criteria discussed before.Finally, we identify current challenges and the open researchtopics related to future SDR development.
This paper is organized as follows: Section II provides adescription of SDR architecture as well as the classificationprocess used to summarize the various design approachesadopted. Section IV lists some of the corresponding develop-ment tools and platforms. Section V presents an analysis andcomparison of the commercially and academically developedSDR platforms. Research questions and future trends arehighlighted in Section VI. Section VII presents an analysisof the existing literature on SDR surveys. We conclude thepaper in Section VIII. A list of key abbreviations used in thispaper can be found in Table I.
II. CONCEPTS AND ARCHITECTURE
In this section, we examine the general architecture ofSDRs, their main components, and their processing require-ments. As explained in the previous section, SDRs play a vitalrole in wireless standard development due to their flexibilityand ease of programmability. This is due to the fact that most
SignalProcessing
InterpolationFilter
DigitalMixer DAC RF
Up-ConverterPower
Amplifier
DigitalLocal
Oscillator
Antenna
DigitalAnalog
DigitalUp
Converter(DUC)
- FFT/IFFT- FIR/IIR- Decoding
SignalProcessing
LowpassFilter
DigitalMixerADCRF
TunerPower
Amplifier
DigitalLocal
Oscillator
Antenna
DigitalDown
Converter(DDC)
(a) Transmitter
(b) Receiver
Digital Front End RF Front End
Digital Front EndRF Front End
Fig. 1. SDR architecture. Sub-figure (a) shows SDR from a receiver’s pointof view, and sub-figure (b) shows SDR from a transmitter’s point of view.
digital signal processing, and digital front end, which includeschannel selection and modulation/demodulation, take placein the digital domain. This is usually performed in softwarerunning on processors, such as GPPs and DSPs. However, itcan also run on programmable hardware, i.e., FPGAs.
In general, from the transmitter point of view, a basebandwaveform needs to be produced, then an Intermediate Fre-quency (IF) waveform, generate an RF waveform, then sendit through the antenna. From the receiver point of view, this RFsignal is sampled and de-modulated, then decoded. To providemore details to the process, we study the receiving end of thesystem as follows.
The RF signal from the antenna is amplified, with a tunedRF stage that amplifies a range of the frequency band. Thisamplified RF signal is then converted to an analog IF signal.The Analog-to-Digital Converter (ADC) digitizes this IF signalinto digital samples. Then, it is fed into a mixer stage. Themixer has another input coming from a local oscillator with afrequency set by the tuning control. The mixer then translatesthe input signal to a baseband. The next stage is essentiallya FIR filter that permits only one signal. This filter limits thesignal bandwidth and acts as a decimating low-pass filter. Thedigital down-converter includes a large number of multipliers,adders, and shift-registers, in the hardware in order to accom-plish the aforementioned tasks. Next, the signal processingstage performs tasks such as de-modulation and decoding.This stage is typically handled by a dedicated hardware likean Application Specific Integrated Circuit (ASIC) or otherprogrammable alternatives like FPGA or DSP [22].
As shown in Figure 1 (a) and (b), at a high level, a typicalSDR transceiver consists of the following components: SignalProcessing, Digital Front End, Analog RF Front End, and anantenna.
1) Antenna: SDR platforms usually employ several anten-nas to cover a wide range of frequency bands [23]. Antennas
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 3
are often referred to as ”intelligent” or ”smart” due to theirability to select a frequency band and adapt with mobiletracking or interference cancellation [22], [24]. In the case ofSDRs, an antenna needs to meet a certain list of requirementssuch as self-adaptation (i.e., flexibility to tuning to severalbands), self-alignment (i.e., beamforming capability), and self-healing (i.e., interference rejection) [24].
2) RF Front End: This is a RF circuitry that its main func-tion is to transmit and receive the signal at various operatingfrequencies. Its other function is to change the signal to/fromthe Intermediate Frequency (IF). The process of operation isdivided into two, depending on the direction of the signal (i.e.,Tx or Rx mode):
– In the transmission path, digital samples are convertedinto an analog signal by the Digital-to-Analog Converter(DAC), which in turn feeds the RF Front End. This analogsignal is mixed with a preset RF frequency, modulated,and then transmitted.
– In the receiving path, the antenna captures the RF signal.The antenna input is connected to the RF Front Endusing a matching circuitry to guarantee an optimumsignal power transfer. It then passes through a Low NoiseAmplifier (LNA), which resides in a close proximity tothe antenna, to amplify weak signals and minimize thenoise level. This amplified signal, with a signal from theLocal Oscillator (LO), are fed into the mixer in order todown convert it to the IF [25].
3) Analog-to-Digital and Digital-to-Analog Conversion:The DAC, as mentioned in the previous section, is responsiblefor producing the analog signal to be transmitted from thedigital samples. On the receiver side, the ADC resides, and isan essential component in radio receivers. The ADC is respon-sible for converting continuous-time signal to a discrete-time,binary-coded signals. ADC performance can be described byvarious parameters [26], [27] including: (i) Signal-to-NoiseRatio (SNR): the ratio of signal power to noise power inthe output, (ii) resolution: number of bits per sample, (iii)Spurious-free Dynamic Range (SFDR): the strength ratio ofthe carrier signal to the next strongest noise component or spur,and (iv) power dissipation. Advances in SDR developmenthave provided momentum for ADC performance improve-ments. For example, since ADC’s power consumption affectsthe lifetime of battery-powered SDRs, more energy efficientADCs have been developed [28].
4) Digital Front End: The Digital Front End has two mainfunctions [27]:
– Sample Rate Conversion (SRC), which is a functionalityto convert the sampling from one rate to another. This isnecessary since the two communication parties must besynchronized.
– Channelization, which includes up/down conversion inthe transmitter and receiver side, respectively. It alsoincludes channel filtering, where channels that are dividedby frequency are extracted. Examples include interpola-tion and low-pass filters, as can be observed in Figure1.
In a SDR transceiver, the following tasks are executed in thedigital front end:
– In the transmitting side (Figure 1(a)), the Digital UpConverter (DUC) translates the aforementioned basebandsignal to IF. The DAC that is connected to the DUC thenconverts the digital IF samples into an analog IF signal.Afterwards, the RF up-converter converts the analog IFsignal to RF frequencies.
– In the receiving side (Figure 1(b)), the ADC convertsthe IF signal into digital samples. These samples aresubsequently fed into the next block, which is the DigitalDown Converter (DDC). The DDC includes a digitalmixer and a numerically-controlled oscillator. DDC ex-tracts the baseband digital signal from ADC, and afterbeing processed by the Digital Front End, this digitalbaseband signal is forwarded to a high-speed digitalsignal processing block [29].
5) Signal Processing: Signal processing operations, suchas encoding/decoding, interleaving/deinterleaving, modula-tion/demodulation, and scrambling/descrambling, are per-formed in this block. Encoding for the channel serves as anerror correcting code. Specifically, the encoded signal includesredundancy that is utilized by the receiver’s decoder to re-construct the original signal from the corrupted received sig-nal. Examples of error correcting codes include ConvolutionalCodes, Turbo Codes, and Low Density Parity Check (LDPC)[30]. The decoder constitutes the most computationally inten-sive part of the Signal Processing block, due to data transferand memory schemes [31]. The second part that is regardedas highly complex and expensive (in terms of area and power)is the Fast Fourier Transform (FFT) and Inverse FFT (IFFT),as part of the modulation phase [32].
The signal processing block is commonly referred to asthe baseband processing block. When discussing SDRs, thebaseband block is at the heart of the discussion, as it makesup the bulk of the digital domain of the implementation. Thisimplementation runs on top of a hardware circuitry that iscapable of processing signals efficiently. Examples includeASICs, FPGAs, DSPs, GPPs, and GPUs. The second partof the implementation is the software, which provides thefunctionality and high-level abstractions to execute signalprocessing operations. In the next section we examine theaforementioned hardware platforms and analyze in detail thevarious design approaches.
III. DESIGN APPROACHES
In this section, we discuss the classification of the variousSDR design methodologies of the baseband processing block,namely GPP, GPU, DSP, FPGA, and co-design based method-ologies. In this classification, we analyze and compare SDRplatforms based on a set of performance metrics in a criteriawhich we introduce. This criteria includes:
– Flexibility and Reconfigurability. The capability for themodulation and air-interface algorithms and protocols toevolve by merely loading new software onto the platform[10].
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 4
– Adaptability. The SDR platform can adjust its capabilitiesas network or traffic operational requirements change.
– Computational Power. The processing rate of the SDRplatform, namely Giga Operations per Second (GOPS).
– Energy Efficiency. The total power consumption (typi-cally within a few hundreds milli watts), especially formobile [33] and IoT deployments.
– Cost. The total cost of the SDR platform, includingtime-to-market, development, and hardware costs.
A. GPP-based
One of the first approaches to realizing SDR platforms isusing a General Purpose Processor (GPP), or the commonlyknown generic computer microprocessors such as x86/64 andARM architectures. Examples of SDR platforms that utilizeGPPs include Sora [18], KUAR [34], and USRP [17].
1) Definition and Uses: A GPP is a digital circuit that isclock-driven and register-based, and is capable of processingdifferent functions and operates on data streams representedin binary system [35]. These GPPs can be used for severalpurposes, making them extremely useful for unlimited numberof applications, eliminating the need for building application-specific circuits, and thus reducing the overall cost of run-ning applications. GPPs are generally a preferable hardwareplatform by researchers in academia due to their flexibility,abundance, and ease of programmability, which is one ofthe main requirements in SDR platforms [36]. In addition,researchers prefer GPPs since they are more familiar with themand their software frameworks, compared to DSPs and FPGAs.From the performance point of view, GPPs are being enhancedrapidly, credited not only to technological advances in terms ofCMOS technology [37], but also to the increase of the averagenumber of instructions processed per clock cycle. The latter isachieved through different means, and in particular, utilizingparallelism within and between processors. This has led to theevolution of multi-core GPPs [38].
2) Adoption and GPUs: Architecturally, the instruction setof GPPs include instructions for different operations such asArithmetic and Logic Unit (ALU), data transfer, and I/O.A GPP processes these instructions in the sequential order.Because of sequential processing, GPPs are not convenient forhigh-throughput computing with real-time requirements (i.e.,high throughput and low latency) [39]. For example, usingGNU Radio [40] to implement IEEE 802.11 standard, whichrequires 20MHz sampling rate, would be challenging, sinceGNU Radio is restricted by the limited processing capabilitiesof GPPs. This leads to the GPP cores (of the PC attached) toreach saturation and frames become corrupted and discarded.Moreover, wireless protocols require predictable performancein order to guarantee meeting timing constraints. However,conditional branch instructions in GPP’s instruction sets leadto out-of-order execution, which makes it unfeasible to achievepredictability.
To remedy the limitation of GPPs, researchers have pro-posed multiple solutions, one of which is the addition ofco-processors, such as Graphic Processing Unit (GPU) [41].
GPUs are processors specifically designed to handle graphics-related tasks and they efficiently process large blocks ofstreaming data in parallel. SDR platforms comprised of bothGPPs and GPUs are flexible and have a higher level of pro-cessing power. However, this results in a lower level of powerefficiency (e.g., GPP’s power efficiency is ∼9GFLOPS/W forsingle-precision, compared to 20GFLOPS/W for GPU [42]).GPUs act as co-processors to GPPs because a GPP is requiredto act as the control unit and transfer data from externalmemory. After a transfer is completed, signal processingalgorithms are executed by the GPU.
While GPUs are typically used for processing graphics, theyare also useful at signal processing algorithms. Over the pastfew years, theoretical peak performance for GPUs and GPPsfor single and double precision processing has been growing[43]. For example, comparing Intel Haswell’s 900 GFLOPs[44] with NVIDIA GTX TITAN’s 4500 GFLOPS [45] forsingle precision, it is apparent that GPUs have a computationalpower that far exceeds their GPP counterparts [43]. Theirmulti-core architectures and parallel processors are the mainattractive features, in addition to their relatively reasonableprices and credit card sizes. These features makes themgood candidates as co-processors in GPP-based SDRs, wherethey can play a vital role in accelerating computing-intensiveblocks [46]. Another advantage is their power efficiency, whichkeeps improving with every new model (e.g., it went from0.5 to 20GFLOPS/W for single-precision) [42]. To take fulladvantage of GPUs, it is a condition that algorithms conformto their architecture. From an architectural perspective, GPUshave a number of advantages that makes them preferable so-lutions to applications such as video processing. In particular,GPUs employ a concept called Single Program Multiple Data(SPMD) that allows multiple instruction streams to executethe same program. In addition, due to their multi-threadingscheme, data load instructions are more efficient. GPUs alsopresent a high computational density, where cache to ALUratio is low [47].
In Table II, the authors of [46] confirmed that the signaldetection algorithm (which includes intensive FFT computa-tions) shows a faster parallel processing in the case of GPUover GPP, while operating in real-time. This is due to theavailability of cuFFT library developed for NVIDIA GPUsfor more efficient FFT processing [48]. With regards to thearchitectural advantage of GPUs, several hundred CUDA corescan perform a single operation at the same time, as opposedto a few cores in the case of multi-core GPPs.
Examples of using GPUs alongside GPPs to build SDRplatforms is the work in [49], where the authors built aframework on a desktop PC in addition to using a GPU toimplement an FM receiver. The authors in [46] studied real-time signal detection using an SDR platform composed ofa laptop computer and an NVIDIA Quadro M4000M [45].Examples of GPUs available in the market can be foundin Table III. In this table, we show two examples of highperforming GPUs (> 5500 GFLOPS) suitable for SDRs withstrict timing and performance requirements. We also showtwo more examples of less powerful and less expensive GPUssuitable for prototyping SDRs in academic environments.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 5
TABLE IIPERFORMANCE OF SIGNAL DETECTION ALGORITHM ON GPP AND GPU [46]
Processing Platform of Signal Detection AlgorithmADC Data Length (ms) GPP Serial Processing (ms) GPP Parallel Processing (ms) GPU Parallel Processing (ms)1 13.487 1.254 0.278
10 135.852 12.842 2.846100 1384.237 131.026 29.358
1000 13946.218 1324.346 321.254
TABLE IIICOMPARISON OF GPUS
NVIDIA GeForceGTX 980 Ti [45]
AMD RadeonR9 390X [50]
NVIDIA GeForceGTX 680 [45]
AMD RadeonRX 560 [50]
GFLOPS 5632 5913 3090 2611Power Consumption (W) 250 363 356 180
Frequency (MHz) 1000 1050 1006 1175Cost (USD) 870 520 300 150
3) Shortcomings: State-of-the-art GPP and GPU-basedplatforms, such as Sora [18] and USRP [17], utilize desktopcomputers to realize the systems. However, these platformsconsume a significant amount of power for a performancegoal and their form factor is large, which makes it impossiblefor real-world deployments. It is worth noting that GPPsand GPUs alike, present scaling limitation while meetingKoomey’s Law. This law states that energy efficiency ofcomputers doubles roughly every 18 months [51]. This lim-itation calls for alternatives that provide higher computingpower while keeping the energy efficiency the same. Onealternative is the hybrid or co-design approach, where softwareand hardware implementations are combined. This will bediscussed in more details in Section III-D.
When both GPP and GPU are used for a SDR design, datatransfer operations between GPP and GPU can be bottlenecksand cause performance loss, especially for meeting real-time requirements [52]. However, there are continuous effortsto reduce or eliminate the time overhead of data transfersby introducing multi-stream scheduling for pipelining of thememory copy tasks. This would ensure no stalls in the pipelineand thus enhancing processing parallelism [53], [54]. Finally,although the processing power of microprocessors is beingconstantly improved, balancing between sufficient computingpower and a specific goal for energy consumption and cost,stays a very difficult task now and in the future. This is trueespecially with the growing need for more data to be processedand blocks that can handle data processing in parallel.
B. DSP-based
The DSP-based solution can be considered as a specialcase of GPP-based solutions, but due to its popularity andunique processing features, it deserve a separate discussion.An example of DSP-based SDR is the Atomix platform [19]which utilizes TI TMS320C6670 DSP [55].
1) Definition and Uses: DSP is a particular type of mi-croprocessor that is optimized to process digital signals [56].To help understand how DSPs are distinguished from GPPs,we should first note that both are capable of implementingand processing complex arithmetic tasks [57]. Tasks like
modulation/demodulation, filtering, and encoding/decoding arecommonly and frequently used in applications that includespeech recognition, image processing, and communicationsystems. DSPs, however, implement them more quickly andefficiently due to their architecture (e.g., RISC-like architec-ture, parallel processing) which is specifically optimized tohandle arithmetic operations, especially multiplications. SinceDSPs are capable of delivering high performance with lowerpower, they are better candidates for SDR deployment [58],compared to GPPs. Examples of DSPs especially designedfor SDR platforms are TI TMS320C6657 and TMS320C6655.These DSPs are both equipped with hardware accelerators forcomplex functions like the Viterbi and Turbo Decoders [59].
2) Adoption: As discussed in the previous section, GPPsprovide an average performance for a wide range of appli-cations. Needless to say, this performance level might besufficient for research and academia, but if the system is tobe deployed commercially, certain performance requirementsmust be met. To this end, compared to GPPs, DSPs are tailoredfor processing digital signals efficiently, utilizing featureslike combined multiply-accumulate operations (MAC units)and parallelism [60]. DSP manufacturers usually sell theseproducts in two flavors: optimized for performance, and energyoptimized. Therefore, when used in SDRs, high performanceand energy efficient products can be employed in BSs andedge devices, respectively.
In terms of the instruction set, DSPs can be categorized intotwo groups: (i) Single Instruction Multiple Data (SIMD) archi-tecture, and (ii) Multiple Instruction Multiple Data (MIMD)architecture, as described by Michael J. Flynn in what isknown as Flynn’s Taxonomy [61], [62]. This taxonomy is amethod of classifying various architectures depending on thenumber of concurrent instructions and data streams, as follows:
– A SIMD-based DSP can execute an instruction on mul-tiple data streams at the same time. This architecture canbe very efficient in cases when there exists high dataparallelism within the algorithm [63]. This indicates thatthere are similar operations that can be performed ondifferent datasets at the same time. Examples of SIMD-based DSPs include the Cell processor presented in [64]which supports 256 GFLOPS. More examples of DSPs
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 6
that are optimized for low power are NXP CoolFlux DSP[65] and Icera Livanto [66]. A SDR employing a SIMDDSP is the SODA architecture [67]. It has been a commonpractice to add more cores in order to achieve a bettertrade off between performance and power. With eachextra core utilizing Very Long Instruction Word (VLIW),a higher level of parallelism can be accomplished as well.
– On the other hand, MIMDs have the ability to operateon multiple data streams executing multiple instructionsat any point in time. This is essentially an extensionof the SIMD architecture, where different instructionsor programs run on multiple cores concurrently. This isespecially important and useful in cases where parallelismis not uniform across different blocks, but MIMD ar-chitecture allows for parallel execution which leads tospeed improvements. Examples of MIMD-based DSPsinclude Texas Instruments SMJ320C80 and SM320C80DSPs with 100 MFLOPS [59].
Since DSPs are customized to meet certain signalprocessing-related needs, it is crucial to clarify these cus-tomizations in order to understand how DSPs stand out andhow they are successful at not only meeting the requirements,but also becoming a vital player in the wireless communicationfield. These customizations, which are mostly architecture-related, are as follows.
In [68], the authors discuss the energy efficiency of DSPs.In general, DSPs consume more power than ASICs, however,there exists DSPs that are optimized for low power wirelessimplementations, such as TI C674x DSP [59]. One of themethods to lower power consumption is using multiple datamemory buses (e.g., one for write, and two for reads). Thispaves the way for higher memory bandwidth, and allowsfor multiple operand instructions, which in turn results infewer cycles. Also, as discussed above, VLIW architecturesalong with specialized instructions can provide a higher levelof efficiency, and hence lower energy consumption. Theseimprovements can be seen in DSPs such as TI TMS320C6x[59] and ADI TigerSHARC [69]. These techniques, coupledwith proven power saving techniques like clock gating andputting parts of or the entire system in sleep mode, furtherreduce the power consumption. Examples of DSPs available inthe market can be found in Table IV. In this table, we presentthree examples of DSPs that do no include co-processors, andthree DSP-based SoCs that, in addition to DSP cores, includeextra soft cores as control processors.
3) Shortcomings: Although DSPs have been widelyadopted for SDR implementations for decades [72], theypresent shortcomings as follows. As more applications call forincreasing parallelism and reconfigurability in order to handlecomputationally intensive tasks, DSPs can be insufficient.In addition, programming DSPs to achieve higher levels ofparallelism predictability can be challenging. This opened thedoor for parallel architectures like FPGAs, or multi-core GPPs,or even a hybrid of both, to be adopted for SDRs. In addition,power consumption of DSPs is generally higher than FPGAsdue to operating at high frequencies.
C. FPGA-based
Another approach towards realizing SDRs is to use aprogrammable hardware such as FPGAs. Example of FPGA-based SDR platforms are Airblue [20], Xilinx Zynq-basedimplementation of IEEE 802.11ah [73], and [74] that used thesame FPGA board to implement a complete communicationsystem with channel coding.
1) Definition and Uses: An FPGA is an array of pro-grammable logic blocks, such as general logic, memory, andmultiplier blocks, that are surrounded by a routing fabric,which is also programmable [75]. This circuit has the ca-pability of implementing any design or function, with easeof updating it. Although FPGAs consume more power andoccupy more area than ASICs, the programmability feature isthe reason behind their increasing adoption in a wide rangeof applications. Furthermore, when the reconfiguration delayis in the order of milliseconds, the SDR can switch betweendifferent modes and protocols seamlessly [76]. Another majordifference is that, ASIC fabrication is expensive (at least afew tens of thousands of dollars) and requires a few months,whereas FPGAs can be quickly reprogrammed, and their costis within a few tens to a few thousands of dollars, at most.The low end product cycle, along with attractive hardwareprocessing advantages, such as high speed performance, lowpower consumption, and portability, compared to processorssuch as GPPs and DSPs, present FPGAs as contenders thatoffer the best of both worlds [75].
In a study by [77], the authors compared the performance ofXilinx FPGAs [78] against 16-core GPPs. The calculation ofpeak performance for GPPs was performed through multiply-ing the number of floating point function units on each core bythe number of cores and by the clock frequency. For FPGAs,performance is calculated through picking a configuration,adding up the Lookup Tables (LUTs), flip-flops, and DSPslices needed, then multiplying them by the appropriate clockfrequency. The authors calculated the theoretical peaks for 64-bit floating point arithmetic and showed that Xilinx Virtex-7FPGA is about 4.2 times faster than a 16-core GPP. This canbeen seen in Figure 2. Even with a one-to-one adder/multiplierconfiguration, the V7-2000T achieved 345.35GFLOPS, whichis better than a 16-core GPP. From Intel [44], Stratix 10FPGAs can achieve a 10 Tera FLOPS peak floating pointperformance [79]. This is due to the fixed architecture of theGPP, where not all functional units can be fully utilized, andthe inherent parallelism of FPGAs and their dynamic architec-ture. In addition, despite having lower clock frequencies (up to300MHz), FPGAs can achieve better performances due to theirarchitectures which allow higher levels of parallelism throughcustom design [80]. In a study by [81], the authors comparedthe performance and power efficiency of FPGAs to that ofGPPs and GPUs using double-precision floating point matrix-vector multiplication. The results show that FPGAs are capableof outperforming the other platforms, while maintaining theirflexibility. In another study by [47], the authors thoroughlyanalyzed and compared FPGAs against GPUs via the imple-mentations of various algorithms. The authors concluded thatalthough both architectures support a high level of parallelism,
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 7
TABLE IVCOMPARISON OF DSPS AND DSP-BASED SOCS
DSP only SoCTI C66x
(TMS320C6652) [59]CEVA
(XC-4500) [70]Analog Devices
(ADSP-21369) [69]TI Keystone II
(66AK2G02) [59]Analog Devices
(ADSP-SC573) [69]Qualcomm Snapdragon 820
(Hexagon 680) [71]GFLOPS 9.6 40 2.4 28.8 5.4 No Floating Point
Memory (Kb) 1088 No Info 2000 1024 768 No InfoFrequency (MHz) 600 1300 400 600 450 2000
Cost (USD) ∼25 No Info ∼20 ∼20 ∼20 ∼70
Soft Core N/A N/A N/A ARMCortex-A15
ARMCortex-A5
Qualcomm Kyro 385 (CPU)Adreno 530 (GPU)
Dual-core Quad-core Six-core
Sixteen-coreV4-LX200
V5-SX240TV6-SX475T
V7-2000T
0
100
200
300
400
500
600
700
90nm 65nm 40-45nm 28-32nm
GFL
OPS
CMOS Technology
GPP
FPGA (1-to-1)
FPGA (Best)
Fig. 2. Peak performance of GPPs versus FPGAs when performing 64-bitfloating point operations [77]. It can be observed that FPGAs increased theirfloating point performance by an order of magnitude compared to GPPs.
which is crucial to signal processing applications, FPGAs offera larger increase in parallelism, whereas GPUs have a fixedparallelism due to their data path and memory system.
2) Adoption: Over the past decade, FPGAs have signifi-cantly advanced and become more powerful computationally,and now exist in many differnet versions such as XilinxKintex UltraScale [78] and Intel Arria 10 [44] [82], [83]. Inaddition, the availability of various toolsets gave FPGAs anadvantage by making them more accessible. This is supportedby the availability of compilers that have the capability ofgenerating Register-transfer Level (RTL) code, such as Verilogand VHDL, that is needed to run on FPGAs, from high-levelprogramming languages. This process is typically referred toas High Level Synthesis (HLS). Examples of such compilersinclude HDL Coder [84] for MATLAB code [85] and XilinxHLS [86] or Altera Nios II C2H compiler [87] for C, C++,and SystemC. We will explain some of these tools in SectionIV.
HLS allows software engineers to design and implementapplications, such as SDRs, on FPGAs using a familiar pro-gramming language to code, namely C, C++, SystemC, andMATLAB, without the need to posses a prior rich knowledgeabout the target hardware architecture (refer to Section IV-A).These compilers can also be used to speed up or accelerateparts of the software code running on a GPP or DSP thatare causing slowdowns or setbacks to the overall performance.This will be further discussed in Section III-D. Further, FPGAscan achieve high performance while still consuming lessenergy than previously discussed processors [88] (e.g., IntelStratix 10 FPGA can achieve up to 100 GFLOPS/W [89],compared to 23 GFLOPS/W for NVIDIA GeForce GTX 980Ti [45]). In addition, power dissipation can be further lowered
through the implementation of several techniques discussedin [76]. These techniques can be at a system, device, and/orarchitecture level, such as clock gating and glitch reduction.Table V presents a summary of widely-used FPGA platforms.
3) Shortcomings: One of the challenges of using FPGAs,however, can be the prior knowledge about target hardwarearchitecture and resources that a developer needs to possessin order to design an application efficiently for FPGAs. In theSDR domain, designing the platform has typically been thejob of software engineers, and thus the process can be time-consuming and less trivial to incorporate this experience intohardware design. However, as it will be discussed in SectionIV-A, adoption of FPGA solutions can be made more feasiblethrough HLS tools.
D. Hybrid Design (a.k.a., co-design)
The fourth approach towards realizing SDRs is the hybridapproach, where both hardware and software-based techniquesare combined into one platform. This is commonly referredto as the co-design or hybrid approach. Examples of SDRsthat adopted the co-design approach include WARP [21] andCODIPHY [92].
1) Definition: Hardware/software co-design as a concepthas been around for over a decade, and it has evolved at afaster rate in the past few years due to an increasing interestin solving integrated circuit design problems with a new anddifferent approach. Even with GPPs becoming more powerfulthan ever, and with multi-core designs, it is clear that in orderto achieve higher performance and realize applications thatdemand real-time processing, designers had to shift attentionto new design schemes that utilize hardware solutions, namely,FPGAs and ASICs [93], [94]. Co-design indicates the useof hardware design methodology, represented by the FPGAfabric, and software methodology, represented by processors.
As more applications, such as automotive, communication,and medical, grow in complexity and size, it has becomea common practice to design systems that integrate bothsoftware (like firmware and operating system) and hardware[95]. This has been made feasible in the recent years thanks tothe advances in high-level synthesis and developing tools thatnot only have the capability to produce efficient RTL fromsoftware codes, but also define the interface between the bothsides. The industry has realized the huge market for co-design,and provided various SoC boards, that in addition to the FPGAfabric, contain multiple processors. For example, the XilinxZynq board [78] includes an FPGA fabric as well as two ARMCortex-A9 processors [96]. In addition to the aforementioned
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 8
TABLE VCOMPARISON OF FPGAS AND FPGA-BASED SOCS
FPGA only SoCXilinx Kintex-7(XC7K70T) [78]
Intel Cyclone V GX(C5) [44]
Lattice ECP3-70(LFE3-70EA) [90]
Xilinx Zynq-700(Z-7020 XC7Z020) [78]
Intel Cyclone V SE SoC(A5) [44]
Microsemi SmartFuion2(M2S090) [91]
Logic Cells (K) 65.6 77 67 85 85 86.31Memory (Mb) 4.86 4.46 4.42 4.9 3.97 4.488
DSP Slices 240 150 128 220 87 84Cost (USD) 130 185 80 110 110 155
Soft Core N/A N/A N/A Dual-core ARMCortex-A9
Dual-core ARMCortex-A9 ARM Cortex-M3
advantages, there are other reasons that make co-design evenmore interesting including, faster time-to-market, lower powerconsumption (when optimized for this), flexibility, and higherprocessing speeds, as typically hardware in these systems isused as an acceleration to software bottlenecks [97].
Adopting the co-design methodology in essence is a mat-ter of partitioning the system into synthesizable hardwareand executable software blocks. This process depends on astrict criteria that is developed by the designer [98], [99].The authors in [100] and [101] discuss their partitioningmethodologies and present the process of making the properarchitectural decisions. Common methods typically provideuseful information to the designer to help make the bestdecision of what to implement in hardware and what to keepin software. This information can include possible speedups,communication overheads, data dependencies, and locality andregularity of computations [101]. Examples of SoC boardsavailable in the market can be found in Table V.
2) Adoption: As mentioned in Section II, SDRs can be con-sidered inherently hybrid or heterogeneous systems, therebyimplying the need for both hardware and software blocks.This is due to the fact that the control part is usually takencare of by a general processor, and other functions, suchas signal processing, by a specialized processor like DSPs,and sometimes are accelerated using dedicated hardware likeFPGAs [102]. This design approach fits well with SDRs andcan be fully utilized to meet certain requirements that pertainto their attractive features. For example, accelerating portionsof a block or moving it entirely to the FPGA fabric canhelp to push the processing time to the limit in order toachieve a real-time performance for real-life deployment. Inaddition, through careful implementation of RTL optimiza-tion techniques, the development of power efficient systemsfor mobile and IoT applications would be possible. On theother hand, running most of the MAC layer operations on aprocessor, or multi-processors, can be advantageous for easyreconfiguration. Therefore, different partitioning schemes canbe adopted to meet the requirements of the application at hand.
It is worth noting that FPGA vendors, namely Intel [44]and Xilinx [78], are widening their product base with moreSoCs and Multi-processor SoCs (MPSoCs) [103], due to thethe growing demand for such devices. An example of anSDR realized on an MPSoC is the work by [104]. In a whitepaper, National Instruments (the company that owns USRP[17]) predicts that the future of SDRs is essentially a co-design implementation [105], especially with the introductionof FPGAs that are equipped with a large number of DSP slices
0
2000
4000
6000
8000
10000
12000
14000
Virtex-4 Virtex-5 Virtex-6 Kintex-7 Virtex-7 Zynq-7000 KintexUltraScale
VirtexUltraScale
Kintex UltraScale+
Virtex UltraScale+
Num
ber o
f DSP
Slic
es
Xilinx FPGA Series
90nm65nm
40nm 28nm
28nm
28nm
20nm
20nm16nm
16nm
Fig. 3. Number of DSP Slices in Xilinx FPGAs [78]. The values on top ofthe bars refer to the CMOS technology used.
for handling intensive signal processing tasks, as depicted inFigure 3. This also can be seen from USRP E310 model, whichincorporates a Xilinx Zynq SoC [78].
3) Shortcomings: A downside of adopting SoCs for co-design is that their prices are generally higher, comparedto previously mentioned design approaches, due to havingmultiple components on the same board, i.e., processor andFPGA fabric. Other factors contributing to this include extramemory and sophisticated interfaces. Another challenge of co-design is shared memory access, e.g., external DDR memory,between the processor and FPGA fabric. The study of [106]shows that the number of memory read and write operationsperformed by a GPP is higher than that of FPGAs. This isdue to the fact that processors perform operation on registers,while FPGAs operate on buffers. Since memory accesses addup to the overall latency, this can cause a bottleneck to theoverall performance. In addition, the authors have developeda methodology for predicting shared memory bandwidth byusing a functionally-equivalent software. This enables thedesigners to be aware of any bottlenecks before implementingthe entire co-design.
E. Comparison
When we covered different design methodologies and hard-ware platforms for a wide selection of SDR platforms, weintended to compare them analytically one-on-one, using across-platform implementation of one of the wireless commu-nication protocols. What was available in literature, instead,was a series of abstract comparisons using a set of bench-marks targeting High Performance Computing (HPC), and notnecessarily SDR applications. It is somewhat difficult to drawconclusions from these numbers alone, since a performancecomparison in the SDR field requires testing them in real-life.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 9
In Table VI, we provide a high-level comparison betweenthree major design approaches as a guideline for designerstowards choosing the method that best meets their applicationspecifications. In this comparison, we focus on the featuresthat are important to SDR design. However, we do not makeassumptions on what the best approach is and believe itis the developer’s responsibility to make the best judgmentdepending on the application area. Please note that in this tablewe did not include GPUs, as they typically act as co-processorsto GPPs and their addition generally improves performance.We also did not include co-design since it combines GPPswith FPGAs.
As Table VI shows, while GPPs are easy to program andextremely flexible, they lack the power to meet specificationsin real-time and are very inefficient in terms of power. Toincrease their performance, multiple cores with similar instruc-tion sets are included in the same GPP platform to exploitparallelism and perform more operations per clock cycle.However, hardware replication (i.e., adding more cores toGPPs) may not necessarily translate to a higher performance.GPUs tackle this by offering the same control logic forseveral functional units. The sequential portion of the coderuns on the GPP, which can be optimized on multi-coreGPPs, while the computationally intensive portion runs on aseveral-hundred-core GPU, where the cores operate in parallel.Another example of a customized processor is DSP, whichperforms significantly better than GPPs, while at the sametime maintaining the ease-of-use feature that GPPs possess,making them very attractive options. They are also more powerefficient and better fit for signal processing applications. Onthe other hand, they are more expensive, which is the maintrade-off. Finally, FPGAs combine the flexibility of processorsand efficiency of hardware. FPGAs can achieve a high levelof parallelism through dynamic reconfiguration, while yieldingbetter power efficiency [42]. FPGAs are typically more suit-able for fixed-point arithmetic, like signal processing tasks,but in the recent years their floating-point performance hasincreased significantly [81], [107]. However, the designers areexpected to know a lot more about the hardware, which issometimes a deterring feature.
In a comparative analysis by [108], authors studied theperformance and energy efficiency of GPUs and FPGAs usinga number of benchmarks in terms of targeted applications,complexity, and data type. The authors concluded that GPUsperform better for streaming applications, whereas FPGAsare more suitable for applications that employ intensive FFTcomputations, due to their ability to handle non-sequentialmemory accesses in a faster and more energy efficient manner.Similarly, in [42], the authors review and report the sustainableperformance and energy efficiency for different applications.One of their findings related to SDRs is that FPGAs should beused for signal processing without floating point, confirmingaforementioned results. In addition, the authors in [109] reportthat GPUs are ten times faster than FPGAs with regards to FFTprocessing, while authors in [81] demonstrate that the powerefficiency of FPGAs is always better than GPUs for matrixoperations. Finally, authors in [110] compare GPPs, GPUs,and FPGAs through the implementation of LDPC decoders,
and their results lead to the conclusion that GPUs and FPGAsperform better than GPPs. It is obvious from above studiesthat trade-offs are to be expected when a particular designmethodology is adopted, hence careful analysis should becarried out beforehand. Other comparative studies include[111]–[113] with similar results and conclusions.
IV. DEVELOPMENT TOOLS
As we mentioned in Section III-C, HLS is an abstractmethod of designing hardware using a high-level programminglanguage. Developers of FPGA and co-design based SDRscan benefit from HLS as it requires no prior experience withhardware design. Unlike the rest of the development tools,HLS tools share a common theme and offer similar features.Thus, we first discuss HLS in this section. Next, we reviewthe common development tools that are typically used in theprocess of SDR design and implementation for different designapproaches.
A. High Level Synthesis (HLS)
HLS has been a hot research topic for over a decade,with both academia and industry trying to make hardwaredesign more accessible by every developer [114]. HLS isthe process of converting an algorithmic specification of thedesign described by a high-level programming language toan RTL implementation. HLS provides a new level of designabstraction through exploring the micro-architecture and anyhardware constraints. The resulting RTL is highly optimized,in terms of power, throughput and latency, and reasonablycomparable to a hand-tuned code. Figure 4 depicts this pro-cess. The major difference between RTL and C is the absenceof timing constraints in the high-level model, which is merelya behavioral description of the system with no details about theunderlying hardware. The second difference is the processingarchitecture: while GPP architecture is fixed, the best possibleprocessing architecture is built by the compiler for FPGA[115]. In addition, HLS can speed up the development cycle(time to market) to several weeks, down from several months[116]. This is because the task of producing an optimized RTLis handled by the HLS tool and the developer’s efforts arefocused on describing the system’s algorithmic description.
In [117] the authors presented LegUP, an open-source HLStool. This tool is capable of profiling a code to identifyfrequently executed sections of the code for hardware acceler-ation (i.e., moving them to the FPGA fabric). The authors in[118] survey HLS compilers and their capability to provide anaccurate estimation of functional area and timing, comparableto results from hand-tuned hardware designs. In an effort tohelp the developer make the right decision to pick an HLS toolthat yields the best results for their application, the authorsin [116] present a study where they compared three of theindustry tools, namely Vivado HLS [86], Intel FPGA SDK forOpenCL [119], and MaxCompiler [120], through developingLDPC decoders, which are often used as error correctingblocks in SDRs. All three tools successfully synthesized LDPCdecoders and implemented them on Intel [44] and Xilinx [78]
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 10
TABLE VICOMPARISON OF SDR DESIGN APPROACHES
GPP DSP FPGAComputation Fixed Arithmetic Engines Fixed Arithmetic Engines User Configurable Logic
Execution Sequential Partially Parallel Highly ParallelThroughput Low Medium HighData Rate Low Medium High
Data Width Limited by Bus Width Limited by Bus Width HighProgrammability Easy Easy Moderate
Complex Algorithms Easy Easy ModerateI/O Dedicated Ports Dedicated Ports User Configurable PortsCost Moderate Low Moderate
Power Efficiency Low Moderate HighForm Factor Large Medium Small
High-Level Language
HLS Tool
Synthesizable HDL Code
RTL Synthesis and Place-and-Route Tool
Bit Stream
C / C++ / MATLAB / openCL
Pipelining / dataflow / unrolling
Verilog / VHDL
Vivado / Quartus
FPGA Implementation
Fig. 4. HLS design flow.
FPGA boards. The difference, however, was in the logic uti-lization and performance. Similarly, the authors in [121], com-pare the same aforementioned list of compilers quantitativelyand qualitatively using several financial engineering problems(e.g., Monte Carlo-based Option Pricing) and compare theperformance of several FPGA boards. Their results showthat both Intel FPGA SDK for OpenCL and MaxCompilerperformed better than Vivado HLS due to their ability toextract parallelism more effectively. In [122], the authorscomprehensively review recent HLS tools and provide througha careful analysis a methodology based on C benchmarks tocompare some of these tools and their optimization features.The various benchmarks implemented demonstrate that sometools are better suited for certain applications than the rest,with no specific tool dominating the HLS field. The authorsalso show that open-source HLS tools such as LegUP [123]can be as effective as their commercial counterparts. Othersurveys and analyses include [124], [97], [125], which focusedon open-source tools, and [126], which studied some of thetrade-offs of HLS-generated designs and how reliable theyare when errors are injected. All of the studies above provethe feasibility and reliability of HLS tools to generate RTLcodes, despite having different development and optimizationsolutions.
Examples of HLS tools include Xilinx Vivado HLS [86] andSDSoC [127], Intel FPGA SDK for OpenCL [119], CadenceStratus High-level Synthesis (combining Cadence C-to-Siliconand Forte Cynthesizer) [128], Synopsys Synphony C Compiler[129], Maxeler MaxCompiler [120], MATLAB HDL Coder[84], and LegUP [123], which unlike the rest of the tools isvendor-independent (works with all types of FPGA boards,such as Xilinx [78], Intel [44], Lattice [90], and Microsemi[91]).
Table VII presents a summary of commercial HLS tools.
While some of them are vendor-specific, some tools work witha variety of FPGA boards. Examples mentioned in the table allprovide a set of area and timing optimizations such as resourcesharing, scheduling, and pipelining. However, not all of themare capable of generating testbenches for the design.
B. Tools
In this section we review the existing software tools forSDR development. For each design methodology, we discussa compatible development tool and list its features. We alsoprovide an overall comparison between them to highlight thedifferences.
1) MATLAB and Simulink: Most designers start with mod-elling and simulating the system using Mathworks MATLAB[85] and Simulink [132]. Through the availability of a widerange of built-in functions and toolboxes, especially for signalprocessing and communication, developing and testing appli-cations became very common and widely adopted. However,in order to use these models for different platforms, developerswould need to use MATLAB Coder [133] and Simulink Coder[134] to generate C/C++ codes. The generated codes canbe used with Embedded Coder [135] to optimize them andgenerate software interfaces with AXI drivers for the sake ofrunning on embedded processors and microprocessors, like thedual ARM cortex A9 MPcore [96] on the ZedBoard [136].Alternatively, developers can use the HDL Coder [84] togenerate synthesizable RTL (Verilog or VHDL) code to beimplemented on FPGAs or ASICs. It also has support forXilinx [78] and Intel [44] SoC devices by providing someinformation and optimizations pertaining to resource utiliza-tion and distributed pipelining. Figure 5 shows the designflow for SoC platforms that the aforementioned tools offerand how they are connected. Examples of using MATLABand Simulink to develop an SDR are the works by [137] and[138], where the authors used the RTL-SDR very low-costSDR dongle [139] (∼ $20) with a desktop computer to designan academic curriculum for teaching DSP and communicationstheory.
2) Vivado HLS and SDSoC: Xilinx Vivado HLS [86] is adesign environment for high-level synthesis. This tool offers avariety of features to tweak and improve the RTL netlist outputthat is compatible and optimized for Xilinx FPGA boards.It accepts input specifications described in several languages(e.g., C, C++, SystemC, and OpenCL), and generates hardware
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 11
TABLE VIIHLS TOOLS
Xilinx VivadoHLS [86] Intel FPGA
SDK for OpenCL [119] Cadence StratusHigh-level Synthesis [128] Synopsys Synphony
C Compiler [129] MaxelerMaxCompiler [120]
Input C/C++/SystemC C/C++/SystemC C/C++/SystemC C/C++ MaxJOutput VHDL/Verilog/SystemC VHDL/Verilog VHDL/Verilog VHDL/Verilog/SystemC VHDL
Testbench Yes No Yes Yes NoOptimizations Yes Yes Yes Yes YesCompatibility Xilinx FPGA Intel FPGA All All All
TABLE VIIIDEVELOPMENT TOOLS AND PLATFORMS
MATLAB & Simulink [130] Vivado HLS & SDSoC [78] LegUP [123] GNU Radio [40] LabView [131] CUDA [48]Input MATLAB/Graphical C/C++/SystemC C Graphical/Python/C++ Graphical C/C++/Fortran/Python
Output MATLAB/C++/RTL C/RTL C/RTL C/RTL C/RTL Machine CodePlatform GPP/GPU/DSP/FPGA GPP/FPGA GPP/FPGA GPP/GPU/DSP/FPGA GPP/GPU/DSP/FPGA GPULicence commercial commercial open-source open-source commercial commercial
MATLAB/SimulinkAlgorithm Design & Modeling
HDL Coder
Embedded System Tool Integration Simulink Coder + Embedded Coder
FPGA Bitstream SW Code
Xilinx/Intel SoC Device
Fig. 5. Mathworks SoC design flow [130].
modules in Verilog or VHDL. Developers have several optionsto optimize the solution in terms of area and timing through theuse of directives (guidelines for the optimization process) andpragmas for RTL optimization. These optimizations includeloop unrolling, loop pipelining, and operation chaining. SD-SoC [127] is another tool by Xilinx [78]. The major differencebetween the two tools is that the latter has the capabilityto provide solutions for SoCs. SDSoC is built on top ofVivado HLS and has the same C-to-RTL conversion capability.The main advantage of using SDSoC is that it automaticallygenerates data movers, which are responsible for transferringdata between the software on the processor and the hardwareon the FPGA.
A similar tool to SDSoC that is open-source is LegUP[123]. It was developed at the University of Toronto, as partof an academic research effort to design an HLS tool that iscapable of taking in a C code as an input and providing threepossible outputs: a synthesizable RTL code for an FPGA, apure software executable, and a hardware/software co-designsolution for a SoC.
3) GNU Radio: It is an open-source software developmenttoolkit that provides signal processing blocks to implementSDRs [40], [140]. It runs on desktop or laptop computers,and with the addition of simple hardware such as USRP B200[17], can build a basic SDR. It is often used by academiaand the research community for simulation as well as quicksetup of SDR platforms. Similar to System Generator tool[141] and Simulink [132], it includes different kinds of blocks
such as decoders, demodulators, and filters. It is also capableof connecting these blocks and managing data transfer in areliable fashion. In addition, it supports USRP systems [17].One of the attractive features of GNU Radio is the ability todefine and add new blocks. This can be done via programmingin C++ or Python. An example of using GNU Radio is thework by [142], where the author uses it with a USRP to realizedifferent types of transceivers such as TDMA and CSMA, andshowcases some of its capabilities. Similarly, the authors in[143] successfully achieve real-time communication betweentwo computers using USRP [17] and RTL-SDR [139].
4) LabVIEW: A widely used tool from National Instru-ments [131] that offers a visual programming environment fortest, automation and control applications used by both industryand academia. It is similar to GNU Radio and Simulink,where the design can be realized schematically by connectinga chain of various blocks together, each of which performsa certain function. It also offers complete support for USRP[17] enabling rapid prototyping for communications systems.Designing different blocks of the system can be achievedusing high-level languages such as C or MATLAB, or usinga graphical dataflow. An SDR platform development usingLabVIEW is the work by [144], where the author describes awireless communication course design that incorporates USRPand LabVIEW, due to their ease of use, to help teach studentsbasic concepts. Similarly, in [145] the authors designed anSDR platform, namely FRAMED-SOFT, that includes twotypes of USRPs and is intended for an academic environment.
5) CUDA: Developed by NVIDIA, it issues and man-ages computing platforms and programming models for data-parallel computing on GPUs [48]. Developers typically useCUDA when GPUs are part of the processing architecture asco-processors, and want to take full advantage of their powerby speeding up applications. As discussed in Section III-A2,in order to identify application components that should runbe on GPP and the parts that should be accelerated by theGPU, one needs to look at the tasks at hand. Programminglanguages that can be used in CUDA include C, C++, Python,Fortran, and MATLAB [85]. The toolkit includes, in additionto the rich library for GPU-related acceleration functions, acompiler, development tools, and CUDA runtime, to developapplications and optimize them for systems that incorporate
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 12
Daughterboard
Rx
ADC
ADC
FPGABoard
Daughterboard
Tx
DAC
DAC
Daughterboard
Rx
ADC
ADC
Daughterboard
Tx
DAC
DAC
USBController
Fig. 6. USRP board architecture. RF daughterboard selection depends on theapplication specifications in terms of frequency coverage.
GPUs.
V. PLATFORMS
In this section, we list the different types of SDRs fromthe architecture and design point of view. We analyze them,examine their strengths and shortcomings, and discuss theirimpact on SDR development.
A. GPP-based
USRP N-Series. Universal Software Radio Peripheral(USRP) is the most common SDR platform known to thedevelopers’ community [17]. The cost of this platform isaround $4000-5000. It provides a hardware platform for theGNU Radio Project [140]. There are two generations available:USRP1 and USRP2. USRP1 (released in 2004) is connected toa generic computer through USB, with the addition of a smallFPGA. The FPGA board has two roles: routing information,and limited signal processing. This generation was capable ofsupporting a ∼ 3MHz bandwidth due to USB 2.0 limitation.The second generation, USRP2, was released in 2008, and itsupports 25MHz bandwidth by utilizing gigabit Ethernet. Itincludes a Xilinx Spartan 3 FPGA [78] for local processingoperations.
USRP, in general, is a board with ADC and DAC, an RFfront end, a PC host interface, and an FPGA. This boardconsists of a motherboard and typically four daughterboards(two transmitters Tx, and two receivers Rx), as depictedin Figure 6. The daughterboards process analog operationslike filtering and up/down conversions. They are modular sothey can deal with applications operating up to 6GHz. TheFPGA board, depending on the USRP series, handles a fewsignal processing operations, and the majority of operationsare offloaded to the connected host system. USRP platformscan be easily set up to use. However, while their performanceis suitable for research experiments and quick prototyping,these platforms do not necessarily meet the requirements ofcommunication standards. In fact, the minimum bandwidthof the RF, PC host, or FPGA component used affects thethroughput and timing characteristics of the platform.
KUAR. Another platform that is similar to USRP is theKansas University Agile Radio (KUAR) platform [34]. Thebasic architecture is composed of a generic computer and
a Xilinx Virtex II Pro P30 FPGA, which has two PowerPC405 cores [78]. The main advantage of this platform is thedegree of flexibility that it offers. Developers have the op-tion to implement communication standards in three differentmethods: (i) they can fit the entire design onto the FPGAand assign few tasks to the host’s GPP (full hardware), (ii) afull software implementation, where the design is implementedentirely on the GPP with minimal FPGA involvement, and (iii)a hybrid hardware/software co-design implementation, wherethe developer can partition the design in any way that fits theircriteria.
LimeSDR. Resembling the general basic architecture ofUSRP (e.g., USRP B200 [17]), Lime Microsystems [146]develops a series of SDRs that is based on Lime Microsys-tems latest generation of field programmable RF transceiver(FPRF) technology, in addition to an Intel FPGA [44] and amicrocontroller. It is then connected to a computer via USB3.0. The SDR platform has the responsibility of deliveringthe wireless data, while the GPP (computer) has the taskof processing the incoming signals and generating the datato be transmitted by the SDR. The GPP in this case is thesource of computing power for baseband. LimeSDR supportssignal bandwidth ∼30-60MHz. Developers of LimeSDR alsodeveloped LimeSuite software, which is used to model SDRs.This tool, source code, firmware, and schematics, are open-source.
Ziria. It is a programming platform that uses a domain-specific language (DSL) named Ziria, and an optimizingcompiler [147]. The application of Ziria is the implementationof the physical layer (PHY) of wireless protocols. Ziria has a2-layer design:
– A lower layer that is an imperative language, which is amixture of C and MATLAB [85] code, with features ofthe two languages carefully chosen to guarantee efficientcompilation.
– A higher layer, which is the language used to specify andstage stream processors.
The Ziria optimizing compiler consists of two parts: thefrontend and the backend. The frontend parses the Ziriacode, putting it in an abstract representation, then applyingseveral optimizations on it. The backend compiles the resultingoptimized code into an optimized low-level execution model.
The particular benefits of this platform are as follows: Thefirst one is easy and dynamic reconfiguration, due to thedynamic staging of the control graph. This is unlike GNURadio [40] C++ templates that allow limited reconfigurability.In addition, its code optimization can operate on data-flowcomponents, and often yields a faster execution on GPPs (e.g.,through the use of LUTs). In general, a Ziria code is veryconcise and easy to use. For example, an implementation ofa WiFi scrambler in Ziria needs thirteen lines only. Ziria isan interesting research based on data-flow languages, whichare typically used in embedded systems. It also builds uponpopular functional reactive programming framework.
Sora. Sora is a fully programmable software radio platformon PC architecture. It requires C programming on multi-core GPP, and yields high performance that includes highprocessing speed and low latency. The Sora platform uses
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 13
the Ziria language discussed above to write high-level SDRdescriptions, and is tested for real-time operation [18]. UnlikeWARP [21] (which we will explain in Section V-E), Sora canaccommodate various RF front ends.
Since PC hardware is not intended for signal processingof wireless protocols, their performance can be limited. Wediscussed some of the limitations of GPPs in the context ofSDRs in Section III-A. These limitations were the motivationbehind the development of Sora. The overall setup of Soraincludes a soft-radio stack that combines a multi-core GPPand a radio control board, which consists of a Xilinx Virtex-5 FPGA [78], PCIe-x8 interface, and DDR2 SDRAM. Sorauses both hardware and software techniques to address thechallenges of using PC architectures for high speed SDR. It isthe first SDR platform that enables users to develop high speedwireless implementations entirely in software on a standard PCarchitecture.
In Sora, new techniques are proposed for efficient PHYimplementation. Some of these techniques include: (i) ex-ploiting large high-speed cache memory to minimize memoryaccesses, (ii) extensive use of LUTs, where they would tradememory for calculation and still well fit into L2 cache, (iii)exploiting data parallelism in PHY, and (iv) utilizing wide-vector SIMD extension in GPP. One of the main noveltiesof Sora is efficiently partitioning and scheduling the PHYprocessing across cores in GPP. The second innovation is corededication for real-time support. This was accomplished byexclusively allocating enough cores for SDR processing inmulti-core systems. They showcased its effectiveness throughSoftWiFi, which is a full implementation of IEEE 802.11a/b/gPHY and CSMA MAC with 9000 lines of C code and real-timeperformance. They also successfully implemented a 3GPP LTEPhysical Uplink Shared Channel (PUSCH), or Soft-LTE, with5000 lines of C code and a peak rate of 43.8Mbps with20MHz.
Some of the critiques to Sora is that its FPGA is notprogrammable and its capabilities are not fully utilized. Inaddition, the authors do not share the details of its internalroutines. Finally, Sora only works on GPPs, with no clearmapping to DSPs.
Iris. It is a cross-platform SDR, developed to support highlyreconfigurable radio networks [148]. It is built using a plu-gin architecture, namely components, to achieve modularity.These components process data streams and perform differentoperations on them. An engine is used to run, load andmaintain the components. Similar to System Generator [141]and Simulink [132] tools, Iris engine can be used to linkcomponents together to build a complete radio system. XMLis used to specify the components used in the program, theirparameters, and how they should be linked. The main featuresof the Iris architecture include: (i) runtime configurability, (ii)support for the entire network stack (all layers), not just thePHY layer, (iii) support for advanced processing platformsincluding FPGAs.
There are multiple studies on using Iris. An example is thework in [149], where the authors discuss the process throughwhich they were able to implement Iris on Xilinx ZynqSoC [78]. The motivation behind this work is the fact that
text Buffer textBuffer GPU
WARP GPU Server
CPUFPGA
Packet Transfer
Bus
Antenna
Fig. 7. GPU-accelerated SDR Platform using WARP [52]. The GPU serveris used for baseband processing, while WARP is used for radio control andinterface. Offloading signal processing tasks to the GPU has significantlyimproved the overall performance.
communication systems are in a constant state of developmentand they increase in complexity and sophistication. This callsfor higher computational performance and, along with it, ahigher level of flexibility. In order to realize Iris on Zynqplatform, the components are first translated into C++ usingCmake tools and then they are ported to the platform. HLStools, like Xilinx HLS [86], can be used to accelerate parts ofthe design that are considered to be the bottlenecks. This isdone by running system profiling on the software components,and one of these profilers is Linux Perf. Acceleration, inparticular, refers to running parts of the software (after re-writing it in RTL) on the FPGA fabric in order to achievehigher performance.
B. GPU-based
WiMAX SDR. The authors in [150] built a GPU-basedplatform to realize a WiMAX system. In their study, theyalso compared the performance of GeForce 9800GTX GPU[45] against a TMS320C6416 DSP [59] via implementingthe Viterbi decoder algorithm. The results indicate that thethroughput of the GPU is 181.6Mbps, with a considerabledifference compared to the DSP’s 2.07Mbps.
OFDM for WiFi Uplink SDR. In [52], the authors used theWARP framework [21] as a basis to realize their NVIDIAGPU-based SDR platform. They utilized the inherent paral-lelism of GPUs, and with the help of CUDA [48], they wereable to achieve real-time performance on WARP. They usedthis platform to design and implement a Single Input SingleOutput (SISO) OFDM system for WiFi uplink communication.Figure 7 depicts the architecture of this enhanced (accelerated)WARP SDR. For this platform, they used: (i) a WARP version3, which consists of a Xilinx Virtex-6 FPGA [78] for radiocontrol and interface, and (ii) a GPU server, which consists ofan Intel i7-3930K six-core 3.2GHz CPU [44] for transceiverconfiguration, and four NVIDIA GTX TITAN graphic cards[45] for baseband processing. Each TITAN is comprised of2880 core Kepler GPU running at 889MHz. The acceleratedWARP achieves less than 3ms latency and higher than 50Mbpsover-the-air throughput.
Signal Detection SDR. In [46] the authors designed andstudied real-time signal detection using an SDR platformcomposed of a laptop computer with an Intel Xeon E3-1535Mprocessor [44] and an NVIDIA Quadro M4000M GPU [45].
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 14
ALU
Clu
ster
0
ALU
Clu
ster
1
ALU
Clu
ster
2
ALU
Clu
ster
3
ALU
Clu
ster
4
ALU
Clu
ster
5
ALU
Clu
ster
6
ALU
Clu
ster
7
Mic
roco
ntro
ller
StreamController Network
InterfaceStream Register File
Streaming Memory System
SDRAMSDRAMSDRAMSDRAM
HostProcessor
Fig. 8. Imagine Processor Architecture. This platform employs a real-timestream processor for baseband processing.
For 1000ms long samples, this design achieves around 75%reduction in parallel processing time, compared to GPPs.
C. DSP-based
Imagine Processor-based SDR. Authors of [151] proposedone of the earliest SDR solutions that is fully based ona DSP. This SDR employs the Imagine stream processor,developed at Stanford University in 2001 [152]. The Stan-ford Imagine project aimed at providing a signal and imageprocessor that was C programmable and was able to matchthe high performance and density of an ASIC. It is based onstream processing [153]–[155], which is similar to dataflowprogramming in exploiting data parallelism and is suitable forsignal processing applications. This work paved the way tothe development of GPUs.
As Figure 8 shows, Imagine processor uses VLIW-basedALU clusters arranged in a SIMD fashion to handle datastreams. At the middle of the architecture, there is the StreamRegister File, which stores data from other components,thereby minimizing memory accesses. The performance of thisplatform has been evaluated through implementing complexalgorithms relevant to W-CDMA cellular system. The resultsshow higher performance compared to TI C67 DSP [59]:channel estimation and detection are improved by 48x and42x, respectively.
SODA. In [67], the authors introduce the Signal-processingOn-Demand Architecture (SODA), which is an SDR platformbased on multi-core DSPs. It offers full programmability andtargets various radio standards. The SODA design achieveshigh performance, energy efficiency, and programmability.This is attributed to a combination of features that includeSIMD parallelism and hardware support for 16bit computa-tions, since most algorithms operate on small values. The basicprocessing element is an asymmetric processor consisting of ascalar and SIMD pipeline, and a set of distributed scratchpadmemories that are fully managed in software. SODA is a multi-core architecture, with one ARM Cortex-M3 processor [96]for control purposes and multiple processing elements for DSPoperations. Using four processing elements can meet the com-putational requirements of 802.11a and W-CDMA. Comparedto WARP and Sora, as a single-chip implementation, SODAis more appropriate for embedded scenarios. As with WARP,
developers must learn the architecture in order to implementSDRs.
ARM Ardbeg. In [156], a commercial prototype based onSODA architecture was presented. The main enhancementsof Ardbeg compared to SODA are optimized SIMD design,VLIW support, and a few special ASIC accelerators, which arededicated to certain algorithms such as Turbo encoder/decoder,block floating point and arithmetic operations.
Atomix. Typically, programmers need to do one of threetasks to the software workflow of DSPs: tapping into a signalprocessing chain, tweaking a block, or inserting/deleting ablock. To simplify these tasks, modularity is crucial. How-ever, designing a modular software for DSPs is challeng-ing considering the particular requirements, such as latencysensitivity and high throughput, that must be supported. Themain challenge is the need for programmers to define andmanage everything manually and explicitly. In other words,it is necessary to use bare metal features, like moving dataacross cores, managing SRAMs, and parallelizing software.
In order to address these concers, Atomix [19] describesthe software in blocks, named atoms. An atom can be usedto implement any operation. This can be signal processingor system handling. Atoms can be used for realizing blocks,flowgraphs, and states in wireless stacks. In addition, simplecontrol flow makes atoms composable. The easy modificationfeature mentioned above is due to declarative language. Itis important to note that an Atomix signal processing blockimplements a fixed algorithmic function, operates on fixed datalengths, is associated with a specific processor type, and usesonly the memory buffers passed to it during invocation. Theblocks will run fixed sets of instructions executing uninter-rupted on fixed resources using fixed memories. This resultsin having fixed execution times. Atoms can also be composedto build larger atoms. Using Atomix, radio software can bebuilt entirely out of atoms and is easily modifiable. Atomix-based radio also meets throughput and latency requirements.
Developers define the atoms in C. Then, the blocks arecomposed into flowgraphs and states using Atomix interface.The next step involves developing a parallelized schedule andresource assignment in order to meet latency and throughputrequirements. The software is then compiled by Atomix into alow-level C. The compiled code, along with Atomix libraries,will be compiled into a binary. Atomix was used to buildIEEE 802.11a receiver only, namely Atomix11a. Evaluationof Atomix11a shows that it exceeds receiver sensitivity re-quirements, operates in indoor environments robustly, has lowprocessing latency, and atoms have low timing variability.Although the power consumption reported is 7W, it does notinclude the power consumed by the front end, USRP2, whichis about 14W. A shortcoming of Atomix is that it is intendedonly for synthesis on a variety of DSPs, not for GPPs, GPUs,or FPGAs.
BeagleBoard-X15. A collaborative project between TexasInstruments [59], Digi-Key [157], and Newark element14[158], BeagleBoard is an open-source SoC computer [159].It features TI Sitara AM5728 [59], which includes two C66xDSPs [59], two ARM Cortex-A15, two ARM M4 cores [96],and two PowerVR SGX544 GPUs [160]. With its relatively
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 15
low price (∼$270), the DSPs along with the co-processorsmake a powerful platform for implementing standalone SDRs.An example of using the BeagleBoard (an older model butsame general architecture) is the work by [161], where it wasused to implement the Public Safety Cognitive Radio (PSCR)[162] through GNU Radio [40]. PSCR is FM radio-based andwas developed by the Center for Wireless Telecommunications(CWT) at Virginia Tech.
D. FPGA-based
Airblue. This work [20] introduces an FPGA-based SDRplatform for PHY and MAC layers. Airblue is a method toimplement radios on FPGA to achieve configurability. Thisis done using an HDL language called Bluespec, throughwhich all hardware blocks of a radio transceiver are written.In Bluespec, a developer describes the execution semantics ofthe design through Term Rewriting Systems (TRS). TRS isa computational paradigm based on the repeated applicationof simplification rules [163]. The next step is compiling theTRS into RTL codes. TRS has the capability of generatingefficient hardware designs. The main difference between aVerilog interface and a Bluespec interface is that the formeris merely a collection of wires with no semantic meaning,while the latter includes handshake signals for blocks com-munication. Therefore, Bluespec facilitates latency insensitivedesigns which are essential to system construction via modularcomposition. Using Airblue, developers may find the need tomodify the building blocks, or modules, to add new features,make algorithmic modifications, tune the performance to meetthroughput or timing requirements, or make FPGA-specificoptimizations.
In order to reach modular refinements, the design of aconfigurable radio must have two design properties, latency-insensitivity and data-driven control. In addition to flexibility,Airblue meets tight latency requirements by implementingboth PHY and the MAC on FPGA and connecting themwith streaming interfaces, which allows data to be processedas soon as it is ready. Another advantage of Airblue is theimplementation of highly reusable designs through parameter-izations (i.e., same RTL block can be instantiated several timesusing different parameter values). Also, several techniques thatpermit designers to reuse existing designs to implement run-time parameterized designs are developed.
Airblue essentially is a HLS platform that offers modularrefinement, where modifying one module does not affectthe rest of the system, similar to the approach adopted byAtomix. This is why, when compared to WARP, Airblueis more flexible, since WARP was not designed with theaforementioned principles in mind. The authors in [20] alsostudied HLS tools and compared them to Bluespec. Theyfound them to be more effective than Bluespec in early stagesof the design; nevertheless, Bluespec is capable of yieldinga more optimized final RTL. They argued that HLS will beof limited use in final FPGA implementations, especially forhigh performance blocks required by future wireless protocols,hence Bluespec is the language of choice for Airblue. Forperformance evaluation, the authors have implemented 802.11
and experimented with a set of protocol changes. Airbluedemonstrated that it was easily modifiable and still meetstiming requirements. Airblue achieves a higher speed thanSora for cross-layer communication (between MAC and PHYlayers), which typically has the latency requirement of a fewmicroseconds.
E. Hybrid DesignUSRP Embedded (E) Series. This is the embedded version
of USRP, where they incorporate Xilinx SoCs [78] to developstandalone SDRs. USRP E310, for example, is based on XilinxZynq 7020, which yields high performance and is energyefficient. This USRP is stand-alone and suitable for mobileapplications. It supports frequency range 70MHz to 6GHz andfeatures a 2× 2 MIMO RF front end.
WARP. The Wireless open-Access Research Platform isanother example of a co-design specifically developed toprototype wireless protocols [21]. It is programmable andscalable, however, parts of the device are implemented inASIC, which makes it less flexible. WARP v3 contains aXilinx Virtex-6 FPGA [78], which includes two MicroBlazeprocessors and a Gigabit Ethernet peripheral. It requires theuse of Xilinx Embedded Development Kit (EDK) [78] todesign SDRs. It is also open-source, with 802.11 referencedesign made available to the research community. WARPhas become widely used in the research community due toits effectiveness in implementing various wireless protocols(e.g., 802.11 g/n PHY and MAC) and achieving real-timeperformance. It also supports MIMO since it has multiple RFinterfaces.
PSoC 5LP. Developed by Cypress Semiconductor [164],this SoC is composed of an ARM Cortex-M3 GPP [96], ananalog system, and a digital system that uses Universal DigitalBlocks (UDBs). A UDB is a programmable digital blockbased on Programmable Logic Devices (PLDs) for realizingsynchronous state machines, i.e., they resemble an FPGA, butare smaller. All parts are reconfigurable and programmableby using the PSoC Creator software IDE, which includes agraphical design editor. A few developers have used it to buildstandalone SDRs with simplicity [165]. In addition to its lowprice ($15), PSoC is a good candidate to quickly prototypeand get familiar with SDRs.
Zynq-based SDR. The authors in [166] developed anSDR using the Xilinx Zynq ZC706 and ZedBoard SoCs toimplement IEEE 802.11a. For their RF front end, they usedAnalog Devices FMComm3 AD9361 [69]. They used twoZynq boards to compare their performances. They both in-clude dual-core ARM Cortex-A9 [96], with ZC706 containingKintex-7 and ZedBoard containing Artix-7 FPGAs. In additionto HDL Coder [84] and Embedded Coder [135] to generateRTL and software executable, they used Mathworks Simulink[132] to generate the model. They reported an average of 2Wpower consumption for Tx and Rx, compared to 5 to 8Wreported by Atomix [19].
F. ComparisonTable IX compares the SDR platforms discussed above in
an effort to provide a reference guide for developers. SDR
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 16
platforms are compared according to the following criteria:– Programmablity: As protocols evolve, a platform is re-
programmed by simply adding or exchanging parts ofthe design. For example, when 3GPP issues an updatefor LTE standard, instead of replacing the entire radio,an SDR is reprogrammed to accommodate the upgrade.
– Flexibility: A platform is capable of handling future wire-less protocols as requirements become more demanding.This means an SDR should be able to support tightertiming requirements for next generation protocols.
– Portability: A platform is standalone and readily de-ployed. This is generally a requirement for mobile andIoT applications.
– Modularity: A design’s components are separated and re-combined without internal module changes. For example,a developer may need to exchange a Viterbi decoder witha Turbo decoder without having to worry about the restof the design.
– Computing Power: Since performance depends on theprotocol used, we merely evaluate the capability of plat-forms to implement a given protocol, and not be limitedto a subset of the protocol (e.g., implementing the Viterbidecoder only).
– Energy Efficiency: Similar to computing power, we evalu-ate the capability to implement protocols efficiently, whilekeeping power consumption at a minimum.
– Cost: The cost of the hardware equipment. If a platformrequires a PC, its cost is not included. Also, when theprice of the entire setup is unknown, the price of thebasic platform (not including RF front end or interfaces)is shown.
VI. RELATED RESEARCH AND POTENTIAL SOLUTIONS
Although in the previous sections we have highlighted someof the challenges of building SDRs, in this section, we presentadditional research areas that are still being faced by theresearch and development community. These challenges aretechnical or practical.
A. Remote System Update
One of the main features and motivations of SDRs is theirreconfigurability and flexibility. In order to take full advantageof this, the process of updating a SDR platform should bequick and easy. Remote stand alone SDRs are usually FPGAand DSP based, with more FPGAs being used. Hence, most ofthe research has been focused on remote updates of FPGAs.Since FPGAs are volatile, which are configured during systempower-on through their flash memories, update is traditionallydone using Joint Test Action Group (JTAG) method [167].However, with more SDRs deployed and adopted in wirelessand cellular networks, remote update becomes necessary and achallenge. With remote over the air (OTA) updates, it becomespossible to send patches to current designs, or even upload anentirely new and improved design to mobile networks and BSs[168], [169]. Some of the challenges faced by the researchcommunity include speed, reliability, cost, and security [170].
In [167], the authors introduce a method based on RS-422 serial communication and High Level Data Link Control(HDLC) protocol to update DSP and FPGA systems. Theirmethod is fast, stable and easy to implement. The authors in[171] show a new method for remotely updating Xilinx FPGAs[78] by storing the new design or code in Serial PeripheralInterface (SPI) flash memories. They utilized the Xilinx QuickBoot application, along with the KC705 Evaluation Kit fromXilinx, in order to develop their method. However, it is notalways practical or feasible to use a download cable in orderto update the system. The authors in [170] tackle this issueand the problem of downtime during an update or in thecase of a failure. With this method, there exists two images,namely a factory mode configuration image, which acts asthe baseline, and an application mode configuration image,which is used for some specific functions. They show thecapability of updating and running a new application mode(design) image, in addition to rolling back into the factorymode image when no application mode image is available.Others have focused their efforts on improving the securityof remote updates, such as [172] and [173]. While thereexists some solutions to the remote update process, a fewchallenges including partial reconfiguration of FPGAs are notyet resolved.
B. Centralized Algorithms and Network Slicing
In order to simplify the design and provisioning of large-scale networks, software-defined networking (SDN) has beenproposed to centralized network control. In this architecture, acontroller (or multiple controllers) communicate with networkdevices (data plane) to collect their status information andconfigure their operation. Recent studies show that wirelessnetworks can significantly benefit from the central networkview established in the controller to design more efficientnetwork control algorithms such as channel assignment andmobility control.
Centralized control of network resources is the enablerof network slicing, which refers to the abstraction, slicingand sharing of network resources. Three levels of slicing isapplicable to wireless networks: (i) spectrum (a.k.a., link virtu-alization): requires frequency, time or space multiplexing. (ii)infrastructure: the slicing of physical devices such as BSs. (iii)network: this refers to the slicing of the network infrastructure.Compared to wired networks, slicing the resources of wirelessnetworks is significantly more challenging due to the variablenature of wireless channel. Meanwhile, since SDRs enable theslicing of resources both at the spectrum and infrastructurelevel, they can be used to augment SDNs towards a fine-grain allocation of resources. For example, compared to ASIC-based transceivers, SDRs provide a significantly higher levelof control over the parameters of physical and MAC layer.
It should be noted that when centralized network control isemployed, the delay of communication between the controllerand SDR platforms as well as the delay of programming andapplying new configurations should be bounded within thespecification requirements. For example, in a dense and mobileenvironment, the controller may employ a centralized channel
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 17
TABLE IXCOMPARISON OF EXISTING SDR PLATFORMS
Programm-ability
Flex-ability
Port-ability
Modul-arity
ComputingPower
EnergyEfficiency Soft Core FPGA Cost
(USD)
Imagine-based [151] X × × × Medium LowImagine Stream
ProcessorN/A N/A
USRP X300 [17] X X × X High Low PCXilinx
Kintex-7∼ 4 − 5K
Total
USRP E310 [17] X X X X High HighDual-core ARM
Cortex-A9XilinxArtix-4
∼ 3K
Total
KUAR [34] X × × × Medium LowPc +
2× PowerPC coresXilinx
Virtex II ProN/A
LimeSDR [146] X X × X High Low PCIntel
Cyclone IV∼ 300
Board Only
Ziria [147] X X × × High Low PCDependson App
N/A
Sora [18] X X × × High Low PCXilinx
Virtex-5∼ 900
Board Only
SODA [67] X X X × High HighARM Cortex-M3
+ Processing ElementsN/A N/A
Iris [148] X X X X High HighDual-core ARM
Cortex-A9Xilinx
Kintex-4∼ 1.2K
Total
Atomix [19] X X X X High Medium TI 6670 DSP N/A∼ 200
DSP Only
BeagleBoard-X15 [159] X X X X High Medium2× TI C66x DSPs +
2× ARM Cortex-A15 & 2× M4N/A
∼ 270
Board Only
Airblue [20] X X X X High High N/AIntel
Cyclone IV∼ 1.3K
Board Only
WARP v3 [21] X × X X High High2× Xilinx
MicroBlaze coresXilinx
Virtex-6∼ 7K
Total
PSoC 5LP [164] X × X X Low High ARM Cortex-M3 N/A10
Board Only
Zynq-based [166] X X X X High HighDual-core ARM
Cortex-A9Xilinx
Kintex-4∼ 1.2K
Total
and power control algorithm to instruct the nodes adjust theirchannels based on the decisions made centrally. In this case,it is essential to ensure the delay of sending configurationmessages to multiple SDR platforms meets the requirementsof central algorithm. Furthermore, it is essential to ensure allthe SDRs apply the configuration at the same time, otherwiseserious interference and collision might happen. Althoughprotocols such as OpenFlow [174] and Netconf [175] havebeen designed for interactions between the controller anddata plane, their implications on the performance of wirelessnetworks have not been investigated. Specifically, it is essentialto evaluate the effect of hardware platform (i.e., GPP, DSP,FPGA) on the delay of applying configurations. These chal-lenges have not been addressed yet.
C. Network Functions Virtualization (NFV)
One of the new topics is the concept of Network FunctionsVirtualization (NFV), which offers an alternative way ofdesigning and managing networking functions. The conceptis very similar to SDRs, in the sense that various networkfunctions can run in software on top of different hardwareplatforms. These platforms are typically high-volume servers,storage devices, and cloud-computing data centers [176]. Someof the functions that are virtualized via this method are loadbalancing, firewalls, intrusion detection devices and WANaccelerators. This flexibility is what makes NFV very attractivefor network operators, carriers, and manufacturers, in additionto cost reductions [177]. From the SDR point of view, insteadof performing signal processing operations on the platform,
these operations are offloaded to a general computing platform.Such an architecture reduces the load of edge devices and BSsas can benefit from powerful processors to implement complexsignal processing operations. For example, when multipleSDRs are connected to a computing platform, sophisticatedsignal processing algorithms could be developed to cope withchallenges such as interference.
To leverage NFV for SDRs, developers have been attempt-ing to tie them together to achieve complete flexibility acrossthe platform [178], [179]. Although several wireless SDNarchitectures have been proposed to address the challengesof wireless communication [180]–[183], most of them do notbenefit from the features of SDRs.
D. Energy Efficiency
Battery-powered devices in an IoT network face the chal-lenge of minimizing power consumption in order to extendbattery-life before they are due for a replacement, which isa costly operation. Ready-to-deploy SDR systems may usehigh-performing platforms, such as FPGAs, without providingsolutions or alternatives to batteries [184]. Even in the case ofBSs with on-grid power sources, it has become crucial to lowerpower consumption in order to reduce CO2 emissions [185].This is particularly important for cellular network operators,where BSs consume more that 50% of the total energyconsumed in the network [186]. To address these concerns,energy harvesting mechanisms have been introduced. Energyharvesting or scavenging is the process of deriving powerfrom external sources, such as solar and wind energies (also,
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 18
referred to as green energy), and stored for consumption along-side internal sources (e.g., battery or electrical grid sources)[187]. As this green energy is a viable option for poweringBSs [188], Ericsson (a major telecommunication company) hasinvested into and designed green-energy-powered BSs moti-vated by environmental and financial reasons [189]. Therefore,hybrid power operation has been accepted as a solution tolower electrical grid energy consumption and cost [186].
In [187], the authors present a survey on energy harvestingtechnologies with regards to transducers, such as antennasand solar cells, that can utilize renewable energy sources, andcover some of the applications in the IoT and M2M world.The authors in [188] overview the cellular network operatorsthat have adopted the hybrid solution and started using greenenergies to power their BSs. The authors in [190] discuss theissue of implementing an energy harvesting transmitter in acognitive radio. They also derive the optimal spectrum sensingpolicy that maximizes the expected total throughput underenergy causality and collision constraints. Energy causalitystates that the total consumed energy should be equal to orless than the total harvested energy, whereas the collisionconstraint states that the probability of accessing the occupiedspectrum is equal to or less than the target probability of acollision. The authors concluded that a battery-operated radiocan operate for a long time by optimizing both the energy andspectral efficiency. Energy harvesting is often associated withwhat is known as ”Green IoT”, which is the new trend for IoTnetworks and devices, where the main focus is making themmore energy efficient. Another relevant work is [191], [192],where the authors present an overview of the challenges andexisting solutions of energy-efficient IoT systems.
E. Co-Design
By definition, co-design is the process of realizing system-level goals through exploiting the trade-offs between softwareand hardware in an integrated design. This yields a higherdesign quality, and optimizes the cost and design cycle time,which in turn shortens the time-to-market time. As co-designis adopted for more applications, developers typically face theproblem of partitioning and scheduling. Finding the optimaldesign partition is not trivial. While system profilers can assistin providing an insight about the system’s behavior and helpidentify the parts of the code that can be accelerated onhardware, partitioning should be an automatic process andrequires no external involvement. There are several algorithmsproposed to address the challenges of partitioning, such asPSO, FCM, and FCMPSO [193]. These are optimizationalgorithms used to mapping embedded applications to DirectedAcyclic Graphs (DAGs) for multi-core architectures. However,in SDRs, the problem is even more complicated due tohaving two layers of operation, namely PHY and MAC, andpartitioning needs to take into account strict real-time re-quirements. It is a delicate equilibrium between performance,cost, and correct operation. Even with the process being morechallenging than other design approaches, the benefits of co-design for complex systems outweigh the initial cost.
F. Security
SDRs simplify security provisioning. For example, when anew security mechanism does not require hardware replace-ment (e.g., 802.11i’s WPA), it can be implemented throughreprogramming SDRs. On the other side, the reprogramma-bility feature of SDRs exposes security threats, whether theyare standalone or part of a SDN architecture. For example,assume an 802.11 network employs SDR-based BSs (i.e.,access points). In this case, a denial of service (DoS) attackcan be implemented by instructing a large number of clientsto associate with a BS. Also, if the controller is compromised,then all the SDRs might be reprogrammed to be nonfunctional.Therefore, it is important to identify security threats andtake them into account when designing SDR-based wirelessnetworks.
Offloading SDR processing to general processing platformsthrough NFV enables the deployment of sophisticated cen-tral algorithms for detecting abnormal activities and networkbreaches. For example, by analyzing the signal strength re-ceived from a client at one or multiple BSs, a denial of serviceattack caused by generating excessive interference could bedetected. Proposing security mechanisms that benefit from thefeatures of SDRs is an important future direction, especiallyfor large-scale and public networks.
VII. EXISTING SURVEYS
Joseph Mitola III was the first to use the term ”SoftwareRadio” in 1993, and published an important work introducingand explaining the concept of using software rather thantraditionally used hardware for designing radio systems [194].In an interesting early survey (1999) [195] on the ”then”emerging technology, namely SDR, the authors made the casethat SDRs have a great potential to advance and facilitatethe development of communication standards such as WiFiand cellular networks. In a review published a few yearslater [22], the author paid close attention to the hardwarecomponent of SDRs, such as programmable RF modules andhigh-performance DACs and ADCs, as more technologicaladvancement had been made.
There are very few notable works to survey and reviewdifferent SDR platforms and testbeds. One such survey isthe work by [38], where the author discusses the challengesSDR developers face. These challenges include size, weight,power, software architecture, security, certification and busi-ness opportunities. While this work is important for compilinginformation about these challenges and presenting them inone place, it stays away from enumerating the different SDRplatforms, implementation approaches and their applicationsin the communication standards world.
In that regard, the authors of [196] compare the SDR plat-forms developed by the year 2012 and give a brief descriptionof what each platform entails. It lacks, however, any detailedcomparison based on the different categories of computationalpower, energy efficiency, flexibility, adaptability, and cost. Itis through these comparisons that a designer is able to makean informed decision on what platform to adopt for theirspecific application. The authors of [197] attempt to list and
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 19
review several DSP-based SDR platforms from both academiaand industry, with more focus on commercial solutions, andthen providing a simple comparison between them in termsof programmability, power, and flexibility. However, what thiswork lacks, in addition to being outdated, is the discussion ofFPGAs (mentioned only one example) and hardware/softwareco-design solutions, and a methodical analysis of differentdesign approaches based on a set of performance metrics.
Authors of [198] presented a survey of a few SDR platformsthat had been developed more than a decade ago. The authorsof [199] presented a paper that laid out the development andevolution of SDRs over the past several years and discussedthe motivation behind gaining even more attention recently.Another work is the comparison conducted by the authorsin [200] between the Imec Bear platform [201] and a fewmulticore-based SDR platforms. Other attempts include thework by [202], where the authors focused on discussing theanalog end of the SDR concerning signal sampling, process-ing, and the hardware/software responsible for handling thesetasks. The work by [203] compares several SDR platforms toprove the feasibility and reliability of using SDRs in education,industry, and government. Another outdated survey is [204].
Other surveys that are relevant to SDR platforms is the workby [205], where the author discusses and reviews the state-of-the-art in microwave technology in transceivers. The paperexplores the development of SDRs using different technologiesin radio frequency engineering. It is a comprehensive study ofseveral research topics such as the design of tunable radiofrequency components, linear and power efficient amplifiers,linear mixers, and interference rejection techniques. Whereas,authors in [206] provide a very thorough study on the severalsecurity threats and challenges in SDRs, and go over theircertification process. As SDR platforms grow in popularity,and more communication protocols are realized using them,it becomes essential that developers are prepared for securityissues and well-equipped with tools that protect their systems.
VIII. CONCLUSION
In this paper, we provided a comprehensive overview ofthe various design approaches and hardware platforms adoptedfor SDR solutions. This includes GPPs, GPUs, DSPs, FPGAs,and co-design. We explained the basic architectures and ana-lyzed their advantages and disadvantages. Due to the differentfeatures of design approaches, it was important to comparethem against each other in terms of computational power andpower efficiency. We then reviewed the major current and pastSDR platforms, whether they were developed by the industryor in academia. Finally, we discussed some of the researchchallenges and topics that are predicted to improve in the nearfuture, helping to advance SDRs and their wide adoption.
We believe that SDR solutions are going to be mainstreamand that their ability to implement different wireless com-munication standards with high levels of flexibility and re-programmability will be considered the norm. This paper posesas an exhaustive overview of this phenomenon, its enablingtechnologies, applications, and the current research tacklingthis issue from different angles.
REFERENCES
[1] “WWRF (2009) Visions and research direction for the wireless world,”Tech. Rep., 2009.
[2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, andM. Ayyash, “Internet of Things: A Survey on Enabling Technologies,Protocols, and Applications,” IEEE Communications Surveys and Tu-torials, 2015.
[3] IEEE, “IEEE 802.11, The Working Group Setting the Standards forWireless LANs.” [Online]. Available: http://www.ieee802.org/11/
[4] 3GPP, “3GPP - Release 16.” [Online]. Available: http://www.3gpp.org/release-16
[5] M. Bansal, J. Mehlman, S. Katti, and P. Levis, “OpenRadio,” inProceedings of the first workshop on Hot topics in software definednetworks - HotSDN ’12, New York, USA, 2012, p. 109.
[6] ReportBuyer, “Software Defined Radio Market by Application,Component, End User, Type - Global Forecast to 2021,”Tech. Rep., 2016. [Online]. Available: https://www.reportbuyer.com/product/4364831/software-defined-radio-market-by-application-component-end-user-type-global-forecast-to-2021.html
[7] Global Industry Analysts Inc, “Software Defined Radio (SDR)- Global Strategic Business Report,” Tech. Rep., 2017.[Online]. Available: https://www.researchandmarkets.com/research/pc9x7g/software defined
[8] B. Paillassa and C. Morlet, “Flexible satellites: software radio in thesky,” in 10th International Conference on Telecommunications, 2003.ICT 2003., vol. 2, pp. 1596–1600.
[9] P. Angeletti, R. D. Gaudenzi, M. Lisi, I. Introduction, S. Division, andC. Scientist, “From Bent Pipes to Software Defined Payloads: Evolutionand Trends of Satellite Communications Systems,” System, no. June,pp. 10 – 12, jun 2008.
[10] P. Angeletti, M. Lisi, and P. Tognolatti, “Software Defined Radio: A keytechnology for flexibility and reconfigurability in space applications,”in 2014 IEEE Metrology for Aerospace (MetroAeroSpace), may 2014,pp. 399–403.
[11] J. Seo, Y.-H. Chen, D. S. De Lorenzo, S. Lo, P. Enge, D. Akos, andJ. Lee, “A real-time capable software-defined receiver using GPU foradaptive anti-jam GPS sensors.” Sensors (Basel, Switzerland), vol. 11,no. 9, pp. 8966–91, 2011.
[12] W. Xiang, F. Sotiropoulos, and S. Liu, “xRadio: An Novel Software De-fined Radio (SDR) Platform and Its Exemplar Application to Vehicle-to-Vehicle Communications.” Springer, Cham, 2015, pp. 404–415.
[13] K. D. Singh, P. Rawat, and J.-M. Bonnin, “Cognitive radio for vehicularad hoc networks (CR-VANETs): approaches and challenges,” EURASIPJournal on Wireless Communications and Networking, vol. 2014, no. 1,p. 49, dec 2014.
[14] M. Kloc, R. Weigel, and A. Koelpin, “SDR implementation of anadaptive low-latency IEEE 802.11p transmitter system for real-timewireless applications,” in 2017 IEEE Radio and Wireless Symposium(RWS), jan 2017, pp. 207–210.
[15] Y. Chen, S. Lu, H.-S. Kim, D. Blaauw, R. G. Dreslinski, and T. Mudge,“A low power software-defined-radio baseband processor for the In-ternet of Things,” in 2016 IEEE International Symposium on HighPerformance Computer Architecture (HPCA), mar 2016, pp. 40–51.
[16] Y. Park, S. Kuk, I. Kang, and H. Kim, “Overcoming IoT LanguageBarriers Using Smartphone SDRs,” IEEE Transactions on MobileComputing, vol. 16, no. 3, pp. 816–828, mar 2017.
[17] “Ettus Research - Networked Software Defined Radio (SDR).”[Online]. Available: https://www.ettus.com/
[18] K. Tan, H. Liu, J. Zhang, Y. Zhang, J. Fang, and G. M. Voelker,“Sora: high-performance software radio using general-purpose multi-core processors,” Communications of the ACM, vol. 54, no. 1, p. 99,jan 2011.
[19] M. Bansal, A. Schulman, and S. Katti, “Atomix: A Framework forDeploying Signal Processing Applications on Wireless Infrastructure,”in 12th USENIX Symposium on Networked Systems Design and Imple-mentation (NSDI 15), 2015, pp. 173–188.
[20] M.-C. Ng, K. Fleming, M. Vutukuru, S. Gross, Arvind, and H. Bal-akrishnan, “Airblue: A system for cross-layer wireless protocol devel-opment,” Symp. Architectures for Networking & Communications Syst.(ANCS), pp. 1–11, 2010.
[21] “WARP Project.” [Online]. Available: https://warpproject.org/trac[22] A. Haghighat, “A review on essentials and technical challenges of
software defined radio,” in MILCOM 2002. Proceedings, 2002, pp.377–382.
[23] U. L. Rohde and T. T. N. Bucher, Communications Receivers: Princi-ples and Design, 4th ed. McGraw-Hill Education, 1988.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 20
[24] T. J. Rouphael, RF and digital signal processing for software-definedradio : a multi-standard multi-mode approach. Newnes, 2009.
[25] J. J. Carr, The technician’s radio receiver handbook : wireless andtelecommunication technology. Newnes, 2001.
[26] R. Walden, “Analog-to-digital converter survey and analysis,” IEEEJournal on Selected Areas in Communications, vol. 17, no. 4, pp. 539–550, apr 1999.
[27] T. Hentschel, M. Henker, and G. Fettweis, “The digital front-end ofsoftware radio terminals,” IEEE Personal Communications, vol. 6,no. 4, pp. 40–46, 1999.
[28] C. Bowick, J. Blyler, and C. J. Ajluni, RF circuit design.Newnes/Elsevier, 2011.
[29] M. N. O. Sadiku and C. M. Akujuobi, “Software-defined radio: a briefoverview,” IEEE Potentials, vol. 23, no. 4, pp. 14–15, oct 2004.
[30] L. C. Choo and Z. Lei, “CRC Codes for Short Control Frames inIEEE 802.11ah,” in 2014 IEEE 80th Vehicular Technology Conference(VTC2014-Fall), sep 2014, pp. 1–5.
[31] F. Berns, G. Kreiselmaier, and N. Wehn, “Channel decoder architecturefor 3G mobile wireless terminals,” in Proceedings Design, Automationand Test in Europe Conference and Exhibition, pp. 192–197.
[32] P.-Y. Chiueh, Tzi-Dar and Tsai, OFDM baseband receiver design forwireless communications. John Wiley & Sons, 2008.
[33] B. Dezfouli, M. Radi, and O. Chipara, “REWIMO,” ACM Transactionson Sensor Networks, vol. 13, no. 3, pp. 1–42, aug 2017.
[34] G. J. Minden, J. B. Evans, L. Searl, D. DePardo, V. R. Petty, R. Rajban-shi, T. Newman, Q. Chen, F. Weidling, J. Guffey, D. Datla, B. Barker,M. Peck, B. Cordill, A. M. Wyglinski, and A. Agah, “KUAR: AFlexible Software-Defined Radio Development Platform,” in 2007 2ndIEEE International Symposium on New Frontiers in Dynamic SpectrumAccess Networks, apr 2007, pp. 428–439.
[35] K. Kant, Microprocessors and microcontrollers. Prentice-Hall OfIndia, 2014.
[36] T. Kazaz, C. Van Praet, M. Kulin, P. Willemen, and I. Moerman,“Hardware Accelerated SDR Platform for Adaptive Air Interfaces,”apr 2017.
[37] C.-H. Lin, B. Greene, S. Narasimha, and J. Cai, “High performance14nm SOI FinFET CMOS technology with 0.0174m2 embeddedDRAM and 15 levels of Cu metallization,” in 2014 IEEE InternationalElectron Devices Meeting, dec 2014, pp. 3.8.1–3.8.3.
[38] T. Ulversoy, “Software Defined Radio: Challenges and Opportunities,”IEEE Communications Surveys & Tutorials, vol. 12, no. 4, pp. 531–550, 2010.
[39] R. Kamal, Embedded systems : architecture, programming and design.New Delhi: Tata MaGraw-Hill, 2003.
[40] “GNU Radio.” [Online]. Available: https://www.gnuradio.org/[41] K. Vachhani, “Multiresolution analysis: An unified approach using
Discrete Wavelet Transform on GNU radio,” in 2015 InternationalConference on Green Computing and Internet of Things (ICGCIoT),oct 2015, pp. 887–892.
[42] M. Vestias and H. Neto, “Trends of CPU, GPU and FPGA for high-performance computing,” in 2014 24th International Conference onField Programmable Logic and Applications (FPL), sep 2014, pp. 1–6.
[43] “CPU vs GPU performance - michaelgalloy.com.” [Online]. Available:http://michaelgalloy.com/2013/06/11/cpu-vs-gpu-performance.html
[44] “Intel Xeon Processors.” [Online]. Available: https://www.intel.com/content/www/us/en/products/processors/xeon.html
[45] “NVIDIA - Visual Computing Technologies.” [Online]. Available:http://www.nvidia.com/content/global/global.php
[46] A. Fisne and A. Ozsoy, “Grafik processor accelerated real time soft-ware defined radio applications,” in 2017 25th Signal Processing andCommunications Applications Conference (SIU), may 2017, pp. 1–4.
[47] B. Cope, P. Y. Cheung, W. Luk, and L. Howes, “Performance Compar-ison of Graphics Processors to Reconfigurable Logic: A Case Study,”IEEE Transactions on Computers, vol. 59, no. 4, pp. 433–448, apr2010.
[48] “CUDA Toolkit Documentation.” [Online]. Available: http://docs.nvidia.com/cuda/
[49] P. Szegvari and C. Hentschel, “Scalable Software Defined FM-radio re-ceiver running on desktop computers,” in 2009 IEEE 13th InternationalSymposium on Consumer Electronics, may 2009, pp. 535–539.
[50] “Amd — processors — graphics and technology.”[51] J. Koomey, S. Berard, M. Sanchez, and H. Wong, “Implications of
Historical Trends in the Electrical Efficiency of Computing,” IEEEAnnals of the History of Computing, vol. 33, no. 3, pp. 46–54, mar2011.
[52] K. Li, M. Wu, G. Wang, and J. R. Cavallaro, “A high performanceGPU-based software-defined basestation,” in 2014 48th Asilomar Con-ference on Signals, Systems and Computers, nov 2014, pp. 2060–2064.
[53] K. Li, B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Acceleratingmassive MIMO uplink detection on GPU for SDR systems,” in 2015IEEE Dallas Circuits and Systems Conference (DCAS), oct 2015, pp.1–4.
[54] J. G. Millage, “GPU Integration into a Software Defined RadioFramework,” Ph.D. dissertation, Iowa State University, 2010.
[55] “TMS320C6670 Multicore Fixed and Floating-Point System-on-Chip — TI.com.” [Online]. Available: https://www.ti.com/product/tms320c6670
[56] L. R. Rabiner and B. Gold, “Theory and application of digital signalprocessing,” Englewood Cliffs, N.J., Prentice-Hall, Inc., 1975. 777 p.,1975.
[57] S. W. others Smith, The scientist and engineer’s guide to digital signalprocessing. California Technical Pub. San Diego, 1997.
[58] S. A. Dyer and B. K. Harms, “Digital Signal Processing,” 1993, pp.59–117.
[59] “SMJ320C80 Digital Signal Processor — TI.com.” [Online]. Available:http://www.ti.com/product/SMJ320C80
[60] A. Antoniou, Digital Signal Processing. Mcgraw-Hill, 2016.[61] D. A. Patterson and J. L. Hennessy, “Computer organization and
design,” Morgan Kaufmann, pp. 474–476, 2007.[62] M. J. Flynn, “Some Computer Organizations and Their Effectiveness,”
IEEE Transactions on Computers, vol. C-21, no. 9, pp. 948–960, sep1972.
[63] R. Duncan, “A survey of parallel computer architectures,” Computer,vol. 23, no. 2, pp. 5–16, feb 1990.
[64] M. Gschwind, H. Hofstee, B. Flachs, M. Hopkins, Y. Watanabe, andT. Yamazaki, “Synergistic Processing in Cell’s Multicore Architecture,”IEEE Micro, vol. 26, no. 2, pp. 10–24, mar 2006.
[65] “nxp — Microcontrollers and Processors.” [Online]. Avail-able: http://www.nxp.com/products/microcontrollers-and-processors:MICROCONTROLLERS-AND-PROCESSORS
[66] “Livanto ICE8060.” [Online]. Available: http : / /www.icerasemi.com/products/livanto chipsets/livanto soft baseband/Livanto ICE8060/index.html
[67] Y. Lin, H. Lee, M. Woh, Y. Harel, S. Mahlke, T. Mudge, C. Chakrabarti,and K. Flautner, “SODA: A low-power architecture for software radio,”in Proceedings - International Symposium on Computer Architecture,vol. 2006, no. 2, may 2006, pp. 89–100.
[68] A. Gatherer, T. Stetzler, M. McMahan, and E. Auslander, “DSP-basedarchitectures for mobile communications: past, present and future,”IEEE Communications Magazine, vol. 38, no. 1, pp. 84–90, 2000.
[69] “TigerSHARC Processors — Analog Devices.”[70] “CEVA- Leading Licensor of Signal Processing IP.” [Online].
Available: https://www.ceva-dsp.com/[71] “Qualcomm — Wireless Technology and Innovation.” [Online].
Available: https://www.qualcomm.com/[72] G. Frantz, “Digital signal processor trends,” IEEE Micro, vol. 20, no. 6,
pp. 52–59, 2000.[73] R. Akeela and Y. El Ziq, “Design and Verification of IEEE 802.11ah
for IoT and M2M Applications,” 2017 IEEE International Conferenceon Pervasive Computing and Communications Workshops (PerComWorkshops), pp. 491–496, mar 2017.
[74] X. Cai, M. Zhou, and X. Huang, “Model-Based Design for SoftwareDefined Radio on an FPGA,” IEEE Access, vol. 5, pp. 8276–8283,2017.
[75] I. Kuon, R. Tessier, and J. Rose, “FPGA Architecture: Survey andChallenges,” Foundations and Trends in Electronic Design Automation,vol. 2, no. 2, pp. 135–253, 2007.
[76] N. Grover and M. K. Soni, “Reduction of Power Consumption in FP-GAs -An Overview,” Information Engineering and Electronic Business,vol. 5, no. 5, pp. 50–69, 2012.
[77] D. Strenski, C. Kulkarni, J. Cappello, O. Design, P. Sundararajan,F. Programmable, G. Arrays, B. High, and T. Graphical, “LatestFPGAs Show Big Gains in Floating Point Performance,” pp. 1–7, 2014. [Online]. Available: https://www.hpcwire.com/2012/04/16/latest fpgas show big gains in floating point performance/
[78] “Xilinx - All Programmable.” [Online]. Available: https://www.xilinx.com/
[79] D. Lewis, G. Chiu, J. Chromczak, D. Galloway, B. Gamsa,V. Manohararajah, I. Milton, T. Vanderhoek, and J. Van Dyken, “TheStratix 10 Highly Pipelined FPGA Architecture,” in Proceedings of the2016 ACM/SIGDA International Symposium on Field-ProgrammableGate Arrays - FPGA ’16, New York, USA, 2016, pp. 159–168.
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 21
[80] K. Sano and S. Yamamoto, “FPGA-Based Scalable and Power-EfficientFluid Simulation using Floating-Point DSP Blocks,” IEEE Transactionson Parallel and Distributed Systems, vol. 28, no. 10, pp. 2823–2837,oct 2017.
[81] S. Kestur, J. D. Davis, and O. Williams, “BLAS Comparison on FPGA,CPU and GPU,” in 2010 IEEE Computer Society Annual Symposiumon VLSI, jul 2010, pp. 288–293.
[82] R. Woods and Wiley InterScience, FPGA-based implementation ofsignal processing systems. John Wiley & Sons, 2008.
[83] U. Meyer-Baese, Digital Signal Processing with Field ProgrammableGate Arrays, ser. Signals and Communication Technology. Berlin,Heidelberg: Springer Berlin Heidelberg, 2014.
[84] “HDL Coder.” [Online]. Available: https://www.mathworks.com/products/hdl-coder.html
[85] “MATLAB - MathWorks.” [Online]. Available: https://www.mathworks.com/products/matlab.html
[86] “Vivado High-Level Synthesis.” [Online]. Available: https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html
[87] “Intel FPGA and SoC.” [Online]. Available: https://www.altera.com/[88] S. Choi, R. Scrofano, V. K. Prasanna, and J.-W. Jang, “Energy-
efficient signal processing using FPGAs,” in Proceedings of the 2003ACM/SIGDA eleventh international symposium on Field programmablegate arrays - FPGA ’03, New York, USA, 2003, p. 225.
[89] Altera, “Achieving One TeraFLOPS with 28-nm FPGAs,” Altera WhitePaper, 2010.
[90] “Lattice Semiconductor.” [Online]. Available: http://www.latticesemi.com/
[91] “Microsemi — Semiconductor & System Solutions — Power Matters.”[Online]. Available: https://www.microsemi.com/
[92] A. Dutta, D. Saha, D. Grunwald, and D. Sicker, “CODIPHY,” inProceedings of the second workshop on Software radio implementationforum - SRIF ’13, New York, USA, 2013, p. 1.
[93] W. Wolf, “A decade of hardware/software codesign,” Computer,vol. 36, no. 4, pp. 38–43, Apr 2003.
[94] G. De Micheli, R. Ernst, and M. Wolf, Readings in hardware/softwareco-design. Morgan Kaufmann Publishers, 2002.
[95] J. Teich, “Hardware/Software Codesign: The Past, the Present, andPredicting the Future,” Proceedings of the IEEE, vol. 100, no. SpecialCentennial Issue, pp. 1411–1430, may 2012.
[96] “Architecting a Smarter World Arm.” [Online]. Available: https://www.arm.com/
[97] S. Windh, X. Ma, R. J. Halstead, P. Budhkar, Z. Luna, O. Hussaini, andW. A. Najjar, “High-level language tools for reconfigurable computing,”Proceedings of the IEEE, vol. 103, no. 3, 2015.
[98] K. B. Chehida and M. Auguin, “HW / SW partitioning approachfor reconfigurable system design,” in Proceedings of the internationalconference on Compilers, architecture, and synthesis for embeddedsystems - CASES ’02, New York, USA, 2002, p. 247.
[99] M. Lopez-Vallejo and J. C. Lopez, “On the hardware-software partition-ing problem,” ACM Transactions on Design Automation of ElectronicSystems, vol. 8, no. 3, pp. 269–297, jul 2003.
[100] L. Zhuo and V. K. Prasanna, “Hardware/Software Co-Design for MatrixComputations on Reconfigurable Computing Systems,” in 2007 IEEEInternational Parallel and Distributed Processing Symposium, 2007,pp. 1–10.
[101] G. Sapienza, I. Crnkovic, and P. Potena, “Architectural Decisions forHW/SW Partitioning Based on Multiple Extra-Functional Properties,”in 2014 IEEE/IFIP Conference on Software Architecture, apr 2014, pp.175–184.
[102] I. Bolsens, H. De Man, B. Lin, K. Van Rompaey, S. Vercauteren, andD. Verkest, “Hardware/software co-design of digital telecommunicationsystems,” Proceedings of the IEEE, vol. 85, no. 3, pp. 391–418, mar1997.
[103] C. Zhang, Y. Ma, and W. Luk, “HW/SW Partitioning AlgorithmTargeting MPSOC with Dynamic Partial Reconfigurable Fabric,” in2015 14th International Conference on Computer-Aided Design andComputer Graphics (CAD/Graphics), aug 2015, pp. 240–241.
[104] D. K. Halim, S. W. Lee, M. S. Ng, Z. N. Lim, and C. M. Tang, “Ex-ploring software-defined radio on Multi-Processor System-on-Chip,” in2015 3rd International Conference on New Media (CONMEDIA), nov2015, pp. 1–4.
[105] G. Snider, P. Kuekes, and R. S. Williams, “{CMOS}-like logic indefective, nanoscale crossbars,” Nanotechnology, vol. 15, no. 8, pp.881–891, aug 2004.
[106] M. Gobel, A. Elhossini, and B. Juurlink, “A Methodology for Predict-ing Application-Specific Achievable Memory Bandwidth for HW/SW-
Codesign,” in 2017 Euromicro Conference on Digital System Design(DSD), aug 2017, pp. 533–537.
[107] K. Underwood and Keith, “FPGAs vs. CPUs: trends in peak floating-point performance,” in Proceeding of the 2004 ACM/SIGDA 12thinternational symposium on Field programmable gate arrays - FPGA’04, New York, USA, 2004, p. 171.
[108] B. Betkaoui, D. B. Thomas, and W. Luk, “Comparing performanceand energy efficiency of FPGAs and GPUs for high productivitycomputing,” in 2010 International Conference on Field-ProgrammableTechnology, dec 2010, pp. 94–101.
[109] B. Duan, W. Wang, X. Li, C. Zhang, P. Zhang, and N. Sun, “Floating-point mixed-radix fft core generation for fpga and comparison with gpuand cpu,” in 2011 International Conference on Field-ProgrammableTechnology, Dec 2011, pp. 1–6.
[110] M. Owaida, P. Ienne, G. Falcao, J. Andrade, C. Antonopoulos,N. Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, and A. Burg,“Enhancing Design Space Exploration by Extending CPU/GPU Spec-ifications onto FPGAs,” ACM Transactions on Embedded ComputingSystems, vol. 14, no. 2, pp. 1–23, feb 2015.
[111] X. Tian and K. Benkrid, “High-Performance Quasi-Monte Carlo Fi-nancial Simulation: FPGA vs. GPP vs. GPU,” ACM Transactions onReconfigurable Technology and Systems, vol. 3, no. 4, pp. 1–22, nov2010.
[112] Q. Wu, Y. Ha, A. Kumar, S. Luo, A. Li, and S. Mohamed, “Aheterogeneous platform with GPU and FPGA for power efficienthigh performance computing,” in 2014 International Symposium onIntegrated Circuits (ISIC), dec 2014, pp. 220–223.
[113] H. Giefers, P. Staar, C. Bekas, and C. Hagleitner, “Analyzing theenergy-efficiency of sparse matrix multiplication on heterogeneoussystems: A comparative study of GPU, Xeon Phi and FPGA,” in 2016IEEE International Symposium on Performance Analysis of Systemsand Software (ISPASS), apr 2016, pp. 46–56.
[114] J. M. P. Cardoso and M. Weinhardt, “High-Level Synthesis,” in FPGAsfor Software Programmers. Cham: Springer International Publishing,2016, pp. 23–47.
[115] R. Tessier, K. Pocek, and A. DeHon, “Reconfigurable ComputingArchitectures,” Proceedings of the IEEE, vol. 103, no. 3, pp. 332–354,mar 2015.
[116] J. Andrade, N. George, K. Karras, D. Novo, F. Pratas, L. Sousa,P. Ienne, G. Falcao, and V. Silva, “Design Space Exploration of LDPCDecoders using High-Level Synthesis,” IEEE Access, pp. 1–1, 2017.
[117] A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, T. Cza-jkowski, S. D. Brown, and J. H. Anderson, “LegUp: : An open-source high-level synthesis tool for FPGA-based processor/acceleratorsystems,” ACM Transactions on Embedded Computing Systems, vol. 13,no. 2, pp. 1–27, sep 2013.
[118] J. Cong, Bin Liu, S. Neuendorffer, J. Noguera, K. Vissers, andZhiru Zhang, “High-Level Synthesis for FPGAs: From Prototypingto Deployment,” IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, vol. 30, no. 4, pp. 473–491, apr 2011.
[119] “Intel FPGA SDK for OpenCL.” [Online]. Available:https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html
[120] “MaxCompiler — Maxeler Technologies.” [Online]. Available:https://www.maxeler.com/products/software/maxcompiler/
[121] G. Inggs, S. Fleming, D. Thomas, and W. Luk, “Is high level synthesisready for business? A computational finance case study,” in 2014International Conference on Field-Programmable Technology (FPT),dec 2014, pp. 12–19.
[122] R. Nane, V.-M. Sima, C. Pilato, J. Choi, B. Fort, A. Canis, Y. T.Chen, H. Hsiao, S. Brown, F. Ferrandi, J. Anderson, and K. Bertels, “ASurvey and Evaluation of FPGA High-Level Synthesis Tools,” IEEETransactions on Computer-Aided Design of Integrated Circuits andSystems, vol. 35, no. 10, pp. 1591–1604, oct 2016.
[123] “High-Level Synthesis For Any FPGA — LegUp Computing.”[Online]. Available: https://www.legupcomputing.com/
[124] W. Meeus, K. Van Beeck, T. Goedeme, J. Meel, and D. Stroobandt,“An overview of today’s high-level synthesis tools,” Design Automationfor Embedded Systems, vol. 16, no. 3, pp. 31–51, sep 2012.
[125] S. Ravi and M. Joseph, “Open source HLS tools: A stepping stonefor modern electronic CAD,” in 2016 IEEE International Conferenceon Computational Intelligence and Computing Research (ICCIC), dec2016, pp. 1–8.
[126] L. A. Tambara, J. Tonfat, A. Santos, F. Lima Kastensmidt, N. H.Medina, N. Added, V. A. P. Aguiar, F. Aguirre, and M. A. G.Silveira, “Analyzing Reliability and Performance Trade-Offs of HLS-Based Designs in SRAM-Based FPGAs Under Soft Errors,” IEEE
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 22
Transactions on Nuclear Science, vol. 64, no. 2, pp. 874–881, feb2017.
[127] “SDSoC Development Environment.” [Online]. Available: https://www.xilinx.com/products/design-tools/software-zone/sdsoc.html
[128] “Stratus High-Level Synthesis - Cadence Design Systems.” [Online].Available: https://www.cadence.com/content/cadence-www/global/enUS/home/tools/digital-design-and-signoff/synthesis/stratus-high-level-synthesis.html
[129] “Synphony C Compiler - Synopsys.” [Online]. Available:https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/synphony-c-compiler.html
[130] “MathWorks - Makers of MATLAB and Simulink.” [Online].Available: https://www.mathworks.com/
[131] “LabVIEW - National Instruments.” [Online]. Available: http://www.ni.com/en-us/shop/labview.html
[132] “Simulink - Simulation and Model-Based Design.” [Online]. Available:https://www.mathworks.com/products/simulink.html
[133] “MATLAB Coder.” [Online]. Available: https://www.mathworks.com/products/matlab-coder.html
[134] “Simulink Coder - MATLAB & Simulink.” [Online]. Available:https://www.mathworks.com/products/simulink-coder.html
[135] “Embedded Coder - MATLAB & Simulink.” [Online]. Available:https://www.mathworks.com/products/embedded-coder.html
[136] “ZedBoard Zynq-7000 ARM/FPGA SoC Development Board.”[Online]. Available: https://www.xilinx.com/products/boards-and-kits/1-elhabt.html
[137] R. W. Stewart, L. Crockett, D. Atkinson, K. Barlee, D. Crawford,I. Chalmers, M. Mclernon, and E. Sozer, “A low-cost desktop softwaredefined radio design environment using MATLAB, simulink, and theRTL-SDR,” IEEE Communications Magazine, vol. 53, no. 9, pp. 64–71, sep 2015.
[138] R. W. Stewart, K. W. Barlee, D. S. W. Atkinson, and L. H. Crockett,Software Defined Radio using MATLAB & Simulink and the RTL-SDR.Strathclyde Academic Media, 2015.
[139] “RTL-SDR (RTL2832U).” [Online]. Available: https://www.rtl-sdr.com/
[140] Blossom and Eric, “GNU radio: tools for exploring the radio frequencyspectrum,” Linux Journal, vol. 2004, no. 122, p. 4, 2004.
[141] “Vivado System Generator for DSP.” [Online]. Available: https://www.xilinx.com/products/design-tools/vivado/integration/sysgen.html
[142] J. Malsbury, “Modular, open-source software transceiver for PHY/MACresearch,” Proceedings of the second workshop on Software radioimplementation forum - SRIF ’13, 2013.
[143] M. Abirami, V. Hariharan, M. B. Sruthi, R. Gandhiraj, and K. P. Soman,“Exploiting GNU radio and USRP: An economical test bed for realtime communication systems,” in 2013 4th International Conference onComputing, Communications and Networking Technologies, ICCCNT2013, 2013.
[144] K. Kucuk, “RTWiFi-Lab: A real-time Wi-Fi laboratory platform onUSRP and LabVIEW for wireless communications education andresearch,” Computer Applications in Engineering Education, aug 2017.
[145] V. P. G. Jimenez, A. L. Serrano, B. G. Guzman, and A. G. Armada,“Learning Mobile Communications Standards through Flexible Soft-ware Defined Radio Base Stations,” IEEE Communications Magazine,vol. 55, no. 5, pp. 116–123, may 2017.
[146] “Software Defined Radio - Lime Micro.” [Online]. Available:http://www.limemicro.com/products/software-defined-radio/
[147] G. Stewart, M. Gowda, G. Mainland, B. Radunovic, D. Vytiniotis, andC. L. Agullo, “Ziria,” in Proceedings of the Twentieth InternationalConference on Architectural Support for Programming Languages andOperating Systems - ASPLOS ’15, vol. 50, no. 4, New York, USA,2015, pp. 415–428.
[148] P. D. Sutton, J. Lotze, H. Lahlou, S. A. Fahmy, K. E. Nolan, B. ??zg??l,T. W. Rondeau, J. Noguera, and L. E. Doyle, “Iris: An architecture forcognitive radio networking testbeds,” IEEE Communications Magazine,vol. 48, no. 9, pp. 114–122, sep 2010.
[149] J. van de Belt, P. D. Sutton, and L. E. Doyle, “Accelerating softwareradio: Iris on the Zynq SoC,” in 2013 IFIP/IEEE 21st InternationalConference on Very Large Scale Integration (VLSI-SoC), oct 2013, pp.294–295.
[150] J. Kim, S. Hyeon, and S. Choi, “Implementation of an SDR systemusing graphics processing unit,” IEEE Communications Magazine,vol. 48, no. 3, pp. 156–162, mar 2010.
[151] S. Rajagopal, S. Rixner, and J. R. Cavallaro, “A programmable base-band processor design for software defined radios,” in The 2002 45thMidwest Symposium on Circuits and Systems, 2002. MWSCAS-2002.,vol. 3, 2002, pp. 413–416.
[152] B. Khailany, W. J. Dally, U. J. Kapasi, P. Mattson, J. Namkoong,J. D. Owens, B. Towles, A. Chang, and S. Rixner, “Imagine: mediaprocessing with streams,” IEEE Micro, vol. 21, no. 2, pp. 35–46, 2001.
[153] B. Khailany, W. J. Dally, S. Rixner, U. J. Kapasi, J. D. Owens, andB. Towles, “Exploring the VLSI scalability of stream processors,” inThe Ninth International Symposium on High-Performance ComputerArchitecture, 2003. HPCA-9 2003. Proceedings., 2003, pp. 153–164.
[154] J. Gummaraju and M. Rosenblum, “Stream Programming on General-Purpose Processors,” in 38th Annual IEEE/ACM International Sympo-sium on Microarchitecture (MICRO’05), 2005, pp. 343–354.
[155] B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh, D. W. Tovey,and W. J. Dally, “A Programmable 512 GOPS Stream Processor forSignal, Image, and Video Processing,” IEEE Journal of Solid-StateCircuits, vol. 43, no. 1, pp. 202–213, jan 2008.
[156] M. Woh, Y. Lin, S. Seo, S. Mahlke, T. Mudge, C. Chakrabarti, R. Bruce,D. Kershaw, A. Reid, M. Wilder, and K. Flautner, “From SODA toscotch: The evolution of a wireless baseband processor,” in 2008 41stIEEE/ACM International Symposium on Microarchitecture, nov 2008,pp. 152–163.
[157] “DigiKey Electronics - Electronic Components Distributor.” [Online].Available: https://www.digikey.com/
[158] “Newark element14 Electronics — Electronic ComponentsDistributor.” [Online]. Available: https://www.newark.com
[159] “BeagleBoard.org - community supported open hardware computersfor making.” [Online]. Available: http://beagleboard.org/
[160] “Imagination Technologies - Developing and Licensing IP cores.”[Online]. Available: https://www.imgtec.com/
[161] A. S. Fayez, “Designing a Software Defined Radio to Run on aHeterogeneous Processor,” Ph.D. dissertation, Virginia Tech, apr 2011.
[162] B. Le, F. A. Rodriguez, Q. Chen, B. P. Li, F. Ge, M. ElNainay, T. W.Rondeau, and C. W. Bostian, “A public safety cognitive radio node,”in Proceedings of the 2007 SDR forum technical conference, SDRFGoogle Scholar, 2007.
[163] M. M. Bezem, J. W. Klop, R. de. Vrijer, and Terese (Group), Termrewriting systems. Cambridge University Press, 2003.
[164] “32-bit ARM Cortex-M3 PSoC 5LP — Cypress Semiconductor.”[Online]. Available: http://www.cypress.com/products/32-bit-arm-cortex-m3-psoc-5lp
[165] “PSoC SDR PAøRWE.” [Online]. Available: http://pa0rwe.nl/?pageid=32
[166] B. Drozdenko, M. Zimmermann, T. Dao, K. Chowdhury, andM. Leeser, “Hardware-Software Codesign of Wireless Transceivers onZynq Heterogeneous Systems,” IEEE Transactions on Emerging Topicsin Computing, pp. 1–1, 2017.
[167] Z. X. Zhang Xin, T. L.-b. Tang Lin-bo, and J. M.-p. Ji Mei-ping,“Remote updating for DSP and FPGA programs,” in IET InternationalRadar Conference 2015. Institution of Engineering and Technology,2015, pp. 4 .–4 .
[168] J. Hoffmeyer, Il-Pyung Park, M. Majmundar, and S. Blust, “Radiosoftware download for commercial wireless reconfigurable devices,”IEEE Communications Magazine, vol. 42, no. 3, pp. S26–S32, mar2004.
[169] B. Bing, “A fast and secure framework for over-the-air wirelesssoftware download using reconfigurable mobile devices,” IEEE Com-munications Magazine, vol. 44, no. 6, pp. 58–63, jun 2006.
[170] L. Zhou, Q. Liu, B. Wang, P. Yang, X. Li, and J. Zhang, “RemoteSystem Update for System on Programmable Chip Based on ControllerArea Network,” Electronics, vol. 6, no. 2, p. 45, jun 2017.
[171] A. Fernandes, R. C. Pereira, J. Sousa, P. F. Carvalho, M. Correia, A. P.Rodrigues, B. B. Carvalho, C. M. B. A. Correia, and B. Goncalves,“FPGA Remote Update for Nuclear Environments,” IEEE Transactionson Nuclear Science, vol. 63, no. 3, pp. 1645–1649, jun 2016.
[172] J. Vliegen, N. Mentens, and I. Verbauwhede, “Secure, Remote, Dy-namic Reconfiguration of FPGAs,” ACM Transactions on Reconfig-urable Technology and Systems, vol. 7, no. 4, pp. 1–19, dec 2014.
[173] H. Kashyap and R. Chaves, “Compact and On-the-Fly Secure DynamicReconfiguration for Volatile FPGAs,” ACM Transactions on Reconfig-urable Technology and Systems, vol. 9, no. 2, pp. 1–22, jan 2016.
[174] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling innovationin campus networks,” ACM SIGCOMM Computer CommunicationReview, vol. 38, no. 2, pp. 69–74, 2008.
[175] R. Enns, M. Bjorklund, and J. Schoenwaelder, “Network configurationprotocol (netconf),” 2011.
[176] R. Mijumbi, J. Serrat, J.-L. Gorricho, N. Bouten, F. De Turck, andR. Boutaba, “Network Function Virtualization: State-of-the-Art and
INTERNET OF THINGS RESEARCH LAB, DEPARTMENT OF COMPUTER ENGINEERING, SANTA CLARA UNIVERSITY, USA — MARCH 2018 23
Research Challenges,” IEEE Communications Surveys & Tutorials,vol. 18, no. 1, pp. 236–262, 2016.
[177] J. Gil Herrera and J. F. Botero, “Resource Allocation in NFV: AComprehensive Survey,” IEEE Transactions on Network and ServiceManagement, vol. 13, no. 3, pp. 518–532, sep 2016.
[178] S. Sun, M. Kadoch, L. Gong, and B. Rong, “Integrating networkfunction virtualization with SDR and SDN for 4G/5G networks,” IEEENetwork, vol. 29, no. 3, pp. 54–59, may 2015.
[179] P. Shome, Muxi Yan, S. M. Najafabad, N. Mastronarde, andA. Sprintson, “CrossFlow: A cross-layer architecture for SDR usingSDN principles,” in 2015 IEEE Conference on Network FunctionVirtualization and Software Defined Network (NFV-SDN), nov 2015,pp. 37–39.
[180] J. Schulz-Zander, C. Mayer, B. Ciobotaru, S. Schmid, and A. Feld-mann, “Opensdwn: Programmatic control over home and enterprisewifi,” in Proceedings of the 1st ACM SIGCOMM Symposium onSoftware Defined Networking Research. ACM, 2015, p. 16.
[181] A. Zubow, S. Zehl, and A. Wolisz, “BIGAP - Seamless handover inhigh performance enterprise IEEE 802.11 networks,” in in IEEE/IFIPNetwork Operations and Management Symposium (NOMS), April2016, pp. 445–453.
[182] R. Riggio, M. K. Marina, J. Schulz-Zander, S. Kuklinski, andT. Rasheed, “Programming abstractions for software-defined wirelessnetworks,” IEEE Transactions on Network and Service Management,vol. 12, no. 2, pp. 146–162, June 2015.
[183] J. Vestin and A. Kassler, “Qos enabled wifi mac layer processing asan example of a nfv service,” in 1st IEEE Conference on NetworkSoftwarization (NetSoft), April 2015, pp. 1–9.
[184] L.-m. Ang, K. P. Seng, L. W. Chew, L. S. Yeong, and W. C. Chia,Wireless Multimedia Sensor Networks on Reconfigurable Hardware.Berlin, Heidelberg: Springer Berlin Heidelberg, 2013.
[185] T. Han and N. Ansari, “A Traffic Load Balancing Framework forSoftware-Defined Radio Access Networks Powered by Hybrid EnergySources,” IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp.1038–1051, apr 2016.
[186] ——, “On Optimizing Green Energy Utilization for Cellular Networkswith Hybrid Energy Supplies,” IEEE Transactions on Wireless Com-munications, vol. 12, no. 8, pp. 3872–3882, aug 2013.
[187] M. M. Tentzeris, A. Georgiadis, and L. Roselli, “Energy Harvestingand Scavenging [Scanning the Issue],” Proceedings of the IEEE, vol.102, no. 11, pp. 1644–1648, nov 2014.
[188] T. Han and N. Ansari, “Powering mobile networks with green energy,”IEEE Wireless Communications, vol. 21, no. 1, pp. 90–96, feb 2014.
[189] Ericsson Inc., “Sustainable Energy Use in Mobile CommunicationsAbstract — TechOnline,” Tech. Rep., 2017.
[190] S. Park, H. Kim, and D. Hong, “Cognitive Radio Networks with EnergyHarvesting,” IEEE Transactions on Wireless Communications, vol. 12,no. 3, pp. 1386–1397, mar 2013.
[191] F. K. Shaikh, S. Zeadally, and E. Exposito, “Enabling technologies forgreen internet of things,” IEEE Systems Journal, 2017.
[192] R. Arshad, S. Zahoor, M. A. Shah, A. Wahid, and H. Yu, “Green IoT:An Investigation on Energy Saving Practices for 2020 and Beyond,”IEEE Access, vol. 5, pp. 15 667–15 681, 2017.
[193] I. Mhadhbi, S. Ben Othman, and S. Ben Saoud, “An EfficientTechnique for Hardware/Software Partitioning Process in Codesign,”Scientific Programming, pp. 1–11, Jul 2016.
[194] J. M. Mitola, “Software Radios Survey, Critical Evaluation and FutureDirections,” IEEE Aerospace and Electronic Systems Magazine, vol. 8,no. 4, pp. 25–36, apr 1993.
[195] W. H. W. Tuttlebee, “Software-defined radio: facets of a developingtechnology,” IEEE Personal Communications, vol. 6, no. 2, pp. 38–44,apr 1999.
[196] M. Dardaillon, K. Marquet, T. Risset, and A. Scherrer, “Software de-fined radio architecture survey for cognitive testbeds,” in IWCMC 2012- 8th International Wireless Communications and Mobile ComputingConference, aug 2012, pp. 189–194.
[197] O. Anjum, T. Ahonen, F. Garzia, J. Nurmi, C. Brunelli, and H. Berg,“State of the art baseband DSP platforms for Software Defined Ra-dio: A survey,” EURASIP Journal on Wireless Communications andNetworking, vol. 2011, no. 1, p. 5, dec 2011.
[198] M. Cummings and T. Cooklev, “Tutorial: Software-defined radio tech-nology,” in 2007 25th International Conference on Computer Design,oct 2007, pp. 103–104.
[199] C. Moy and J. Palicot, “Software radio: a catalyst for wireless innova-tion,” IEEE Communications Magazine, vol. 53, no. 9, pp. 24–30, sep2015.
[200] M. Palkovic, P. Raghavan, M. Li, A. Dejonghe, L. Van der Perre, andF. Catthoor, “Future Software-Defined Radio Platforms and MappingFlows,” IEEE Signal Processing Magazine, vol. 27, no. 2, pp. 22–33,mar 2010.
[201] V. Derudder and B. Bougard and A. Couvreur and A. Dewilde andS. Dupont and L. Folens and L. Hollevoet and F. Naessens and D.Novo and P. Raghavan and T. Schuster and K. Stinkens and J. W.Weijers and L. Van der Perre, “A 200Mbps+ 2.14nJ/b digital basebandmulti processor system-on-chip for SDRs,” in 2009 Symposium on VLSICircuits, 2009, pp. 292–293.
[202] R. G. Machado and A. M. Wyglinski, “Software-defined radio: Bridg-ing the analog-digital divide,” Proceedings of the IEEE, vol. 103, no. 3,pp. 409–423, mar 2015.
[203] A. M. Wyglinski, D. P. Orofino, M. N. Ettus, and T. W. Rondeau,“Revolutionizing software defined radio: case studies in hardware,software, and education,” IEEE Communications Magazine, vol. 54,no. 1, pp. 68–75, jan 2016.
[204] R. Farrell, M. Sanchez, and G. Corley, “Software-Defined RadioDemonstrators: An Example and Future Trends,” International Journalof Digital Multimedia Broadcasting, pp. 1–12, mar 2009.
[205] T. Nesimoglu, “A review of Software Defined Radio enabling technolo-gies,” in 2010 10th Mediterranean Microwave Symposium, aug 2010,pp. 87–90.
[206] G. Baldini, T. Sturman, A. R. Biswas, R. Leschhorn, G. Godor, andM. Street, “Security Aspects in Software Defined Radio and CognitiveRadio Networks: A Survey and A Way Ahead,” IEEE CommunicationsSurveys & Tutorials, vol. 14, no. 2, pp. 355–379, 2012.