A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures With WirelessRF

8/10/2019 A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures With WirelessRF

1/12

Smart Computing Review, vol. 3, no. 6, December 2013

This research was supported by the Beijing Municipal Natural Science Foundation (No.4122010, 2012.1 - 2014.12).

DOI: 10.6029/smartcr.2013.06.004

425

Smart Computing Review

A Tutorial for Key Problemsin the Design of Hybrid

Hierarchical NoC

Architectures withWireless/RF

Chunhua Xiao , Zhangqin Huang, and Da Li

Embedded Software and System Institution, Beijing University of Technology / 100022, Beijing, CHINA /[email protected]

*Corresponding Author: Chunhua Xiao

Received August 15, 2013; Revised October 31, 2013; Accepted November 8, 2013; Published December 19,2013

Abstract: As processing nodes scale up, it is difficult for traditional electronic networks to supply

on-chip communication efficiently due to unacceptable latency, plus power and area consumption.

Alternative interconnects, such as radio frequency interconnect (RF-I) and optical interconnect,have been explored as interconnection backbones. Hybrid hierarchical architectures with both

traditional interconnects and emerging interconnects have been widely adopted to get excellent

trade-off between latency and power. The hybrid hierarchical architecture with a wireless/RF-I

backbone is more cost-efficient and feasible due to advantages in complementary metal oxidesemiconductor compatibility, compared with other alternative interconnects, and has become one of

the mainstreams of chip multi-processor systems. However, how to efficiently utilize the

wireless/RF-I backbone is a new challenge for designers. Based on analysis of existing typical

hybrid hierarchal wireless/RF-I architectures (HHWAs), the key problems in the Design of

HHWAs are proposed here, and related potential solutions are provided. In particular, strategies for

resource management of wireless/RF-I are explored in detail, and different solutions are discussed.

This work is expected to serve as a basis for future HHWA designs.

Keywords:Network-on-chip, radio frequency interconnect, wireless interconnect


2/12

Xiao et al.: A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures with Wireless/RF426

Introduction

s we enter the era of multiple cores and beyond, the number of cores, coprocessors, and on-chip accelerators grows

rapidly. The dramatic increase of these processing elements (PEs) imposes a tremendous challenge for on-chip

communication that demand high performance, including lower latency and higher bandwidth, but also minimalperformance per energy/area. According to the International Technology Roadmap for Semiconductors (ITRS) [1],

improving characteristics of metal wires will no longer satisfy performance requirements, and new interconnect paradigms

are needed. Different revolutionary approaches, such as optical interconnect [2][3], radio frequency interconnect (RF-I)

[4][5][6], and wireless interconnect with complementary metal oxide semiconductor (CMOS) ultra wide band (UWB)

technology [7][8], have been explored. But these emerging interconnects have associated antenna and transceiver area,

extra integrated components and power overheads, and thus need to be placed and used optimally to achieve the best

performance without undue overhead [9][10]. Although the traditional planar metal interconnects suffer from limitations

arising from multi-hop communication, which result in high latency and power consumption, they are still highly effectiveand suitable for short distances. The vast improvements in CMOS technology have led to wires with only 0.18 pJ/bit of

energy consumption at 1 mm for a 32 nm technology design [11]. Based on these reasons or technology problems, many

researchers adopted hybrid hierarchical wireless/RF-I architectures (HHWAs) to get excellent trade-offs between latency

and power with limited extra cost [12][13][14][15][16]. HHWA is characterized by local traditional wired interconnection

and global wireless/RF-I interconnection, and provides some unique benefits including the following: (1) Instead of multi-hop in traditional interconnection, wireless/RF-I implements one hop for long distance communication, which alleviates

power consumption while providing high bandwidth and low latency without excessive overhead. (2) Taking full advantage

of traditional networks on a chip (NoCs) and emerging interconnects, HHWA employs their respective merits. (3)

Compared with optical interconnects in hybrid architectures, using wireless/RF-I as a global communication backbone

attains better feasibility and cost-efficiency due to an advantage in CMOS compatibility.

As an architecture composites emerging technologies and traditional interconnects, new design challenges arise that

might be bottlenecks to performance improvement. This work explores the key problems in HHWA designs and provides

related potential solutions, which is expected to serve as basis from which to work towards future HHWA design. The rest

of the paper is organized as follows. In Section 2, we provide a brief overview of the new alternative interconnect

technologies (wireless and RF-I) and how they can be leveraged for on-chip communication. Based on the availability ofthese two interconnect technologies, we discuss the topology of HHWAs and explore the existing typical HHWAs in

Section 3. Due to importance of wireless/RF-I resource management in HWWAs, we did an in-depth survey and analyze

the resource arbitration mechanisms in Section 4. In Section 5, we summarize the key problems in HHWA design andprovide related feasible solutions. Finally, we conclude our work in Section 6.

RF-I/Wireless

RF-I

Radio frequency interconnect has been proposed as a high-aggregate bandwidth, low-latency alternative to traditional

interconnect [4][5][19]. Its benefits have been demonstrated for off-chip, on-board communication, as well as for on-chip

interconnection networks [20][21][22].

Unlike conventional metallic wires that require charging and discharging the whole wire to signify either 0 or 1,

RF-I modulates information on an electromagnetic carrier wave that is continuously sent along the transmission line(Figure 1). RF-I has been projected to scale better than traditional RC wires in terms of delay and power consumption; it

can allow signal transmission across a 400 mm2 die in 0.3 ns via propagation at the effective speed of light [5] as opposed

to less than, or equal to, 4 ns on a repeated bus.

Instead of trying to aggressively expand baseband bandwidth (which often involves power-hungry compensation

techniques to achieve a flat channel frequency response), RF-I divides bandwidth into frequency domains, each becoming a

narrow-band signal, which saves power. By doing this, RF-I also improves bandwidth efficiency by sending many

simultaneous streams of data over a single transmission line. This particular technique is referred to as multi-band RF-I [6].

As shown in the Figure 2, there are N mixers on the transmitting (or Tx) side in multi-band RF-I, where N is the number of

senders sharing the transmission line. Each mixer up-converts individual data streams into a specific channel (or frequency

band). On the receiver (Rx) side, N additional mixers are employed to down-convert each signal back to the original data

and N low-pass-filters (LPF) are used to isolate the data from residual high-frequency components. Based on shortcut

selection, each transmitter or receiver in the topology will be tuned to a particular frequency (or disabled entirely) to

implement our shortcuts [5][6].

A


3/12

Smart Computing Review, vol. 3, no. 6, December 2013 427

C $ $ C $ $ C $ $ C $ $

C $ $ C $ $ C $ $ C $ $

$ $ $ $ $ $ $ $

$ $ $ $ $ $ $ $

$ $ $ $ $ $ $ $

$ $ $ $ $ $ $ $

$ $ $ $ $ $ $ $

M M

M

C Core $ L2 cache bank M Off-chip Memory controller

RF-I transmission line RF-I node

M$ $ $ $ $ $ $ $

Router

C

C

C

C

C

C

C

C

C C C C

C C C C

C

C

C

C

C

C

C

C

C C C C

C C C C

C

C

C

C

C

C

C

C

C C C C

C C C C

C

C

C

C

C

C

C

C

Figure 1.RF-I transmission line in a chip multiprocessor system

Figure 2.A ten-carrier RF-I and corresponding waveform at the transmission line

Wireless

Different from RF-I, the transmission channel does not need to be physically laid out for wireless interconnection, and thecommunication medium is free space [23]. Wireless communication can be over different frequency ranges, from several

gigahertzes to thousands of gigahertz [24].

An on-chip antenna is always one of the most difficult, but very important, components that can be integrated on-chipfor HHWAs, because passive devices such as inductors consume the dominant portion of the transceiver area. Fortunately,

as CMOS technology improves, not only the size but also the cost of the antenna and required circuits will decrease

dramatically, which provides the feasibility for integrating multiple on-chip antennas [12]. An example of the necessary

components of wireless transceivers for millimeter wave (mm-wave) links in a chip multiprocessor system is shown in

Figure 3. A metal zigzag antenna was demonstrated to support wireless network-on-a-chip (WiNoC) [25] and was used to

design an mm-wave wireless NoC by Deb et al. [26]. As the transmission frequency increased to the terahertz range, carbonnanotubes (CNTs) were explored for the on-chip antenna [27], and the feasibility of designing a WiNoC was demonstrated

by Ganguly et al. [15]. Compared with RF-I, which needs the transmission line to span the entire chip area, communication

routing is not limited by the physical channel for wireless interconnection. However, wireless interconnection faces

interference challenges and cost problems, which are proportional to the communication distance.


4/12


C0 C0

C1 C1

C0 C0 C1 C1

C3 C3 C2 C2

C3 C3 C2 C2

Cluster 0: C0

Antenna

Swith

Driver

Amplifier LNA

Modulator Carrier

Frequency Demodulator

Serializer Deserializer

Data

to

be

transmitted

Data

Received

Transmitter

Side

Receiver

Side

Figure 3.An example of mm-wave links in a chip multiprocessor system

Hybrid Hierarchical Wireless/RF-I Architectures

Topology

Topology defines how channels and routers are connected in an interconnection network and determines the performance

boundsincluding zero-load latency and network throughput [17]. As showed in Figure 4, A hybrid hierarchical

wireless/RF-I network consists of two types of network: a local network, which uses traditional wire interconnects, and aglobal/express network, which uses wireless/RF-I. For a conventional NoC, there can be various topologies for a local

network, such as mesh, centralized mesh, ring, star, etc. Each local network forms a subnet and is equipped with a

wireless/RF-I access point (WAP). As long as the antennas are placed within communication range (or the RF-I is enabled

between them), only a single hop is needed for inter-subnet communication. All WAPs from all subnets are connected as a

second-level network forming the global/express network. This upper level of the hierarchy can have various designs with

different characteristics to achieve the full benefit of on-chip express networks.An important problem when creating an efficient global/wireless network is the placement of WAPs, which will greatly

influence the trade-off between system performance and cost. If each PE is equipped with a WAP (each local network only

consists of one node) and can communicate with any other node through the express wireless/RF-I, we can get the bestsystem performance with low latency and high throughput. But the area cost may be unpalatable due to the equipment

(antennas, transceivers, etc.). If too many PEs share a WAP, or if the WAP is placed improperly, performance improvement

would be offset by induced overhead. Ganguly et al. induced small world theory to create an HHWA, and inserted wireless

links through a simulated annealingbased algorithm to minimize the average distance (measured by the number of hops)

between all source and destination hubs [15]. Chang et al. used RF-I as an express shortcut between intensively

communicated nodes with communication profiling of the application to accelerate and optimize region-to-region

communication. They placed the RF-enabled routers in a staggered fashion to minimize the distance any given component

would need to travel to reach the RF-I [16]. Different from related works, Lee [12] and Di Tomaso et al. [13] placed the

WAPs at the center of concentrated mesh-based clusters to provide distributed wireless express pathways for inter-cluster,

long-haul communication to support hundreds of PEs.

Existing Typical Architectures

Chang et al. [16] exploited dynamic RF-I bandwidth allocation to realize a reconfigurable hierarchical network-on-a-chip

architecture. As shown in Figure 5, this architecture uses a mesh topology as the baseline and places adaptive shortcuts as

an RF-overlaid topology to match different communication demands of the applications. This approach selects shortcuts

according the optimizing cost equation synthesized with application communication statistics. The selected shortcuts are

implemented through RF-I enabled routers (standard routers extending a port as an RF-I interface). Each transmitter or

receiver in the topology is tuned to a particular frequency (or disabled entirely) to offer a shortcut. To enable the new

available paths (RF-I shortcuts) and also reduce the reconfiguration cost, the routing tables in all network routers will be

updated before executing the application. A shortest path routing strategy is adopted with RF-I shortcuts to transmit packets.

This dynamic allocation approach enables reconfiguring the topology via frequency band reassignment, thereby providing

the benefits of adaptive routing without having to pay the cost of traversing extra channels [23].


5/12


Global Express

Network

Local

Network

Local

Network

Local

Network

Figure 4.Hybrid hierarchical wireless/RF-I network

Base Router

Traditional Wired Link

Shortcuts

Figure 5.An example of adaptive RF-I shortcuts in a chip multiprocessor system

Modern complex network theory provides powerful methods to analyze network topologies. The small-world theory [28]

is incorporated in HHWA to simultaneously address the latency, power consumption, and interconnect routing problems by

minimizing the hop counts in inter-core communication, and we denote these architectures as small-worldbasedarchitectures as shown in Figure 6 [15][18]. For a small-worldbased architecture, the whole system is divided into

multiple small clusters called subnets, and all PEs within each subnet are connected to a centrally located hub through

direct links. These hubs are connected to form a second-level hierarchical structure, or global network. Given the number of

wireless interfaces (WIs), the placement of WIs to these hubs is optimized through a simulated annealingbased algorithm.

The routing strategy adopted is a combination of dimension order routing for the hubs without WIs and a south-east routing

algorithm for the hubs with WIs. For inter-subnet communication, the routing path involving the wireless medium is chosenif it reduces the total path length, compared to the wired path [18]. A token flow control strategy is adopted to alleviate the

potential hotspot problem in WIs, which occurs from the simultaneous multiple access requirements for the wireless links,

while another different token-passing protocol is used to avoid interference and contention for the wireless medium from a

particular hub at a given instant.

An example of a two-level WCube structure is shown in Figure7, which is a multi-level, two-dimensional structure to

interconnect hundreds to thousands of cores in chip multiprocessors [12]. Two types of routers are included in this network:

base routers that make up the baseline concentrated mesh, and wireless routers with wireless interfaces to form a wireless

backbone. Each wireless router is responsible for a cluster of n base routers, while each base router charges k PEs because

the k-way concentrated mesh is adopted. The wireless routers, base routers and PEs are assigned exclusive addresses inWCube to identify their exact positions in the network, and the whole architecture can be recursively described. Every

wireless router is assigned a single, different frequency band and is equipped with one wireless transmitter and multiple

receivers to allow parallel transmission. WCube uses wormhole-based delivery and latency-oriented routing to minimize

communication latency. The wireless link is chosen if latency can obviously be reduced, compared with only using a


6/12


baseline. WCube offers scalable performance in terms of latency and connectivity, compared other HHWAs, and the

architecture has proven cost-efficient with 1024 nodes.

Wireless links

Traditional Wired links

Hub

Processing

Node

Switch

Figure 6.Small-worldbased hybrid hierarchical wireless architecture

Wcube 0

Wcube 1

Wcube 2

Wireless RouterBase RouterCore L2 Cache

Figure 7.A two-level WCube structure with a cluster of 16 base routers (i.e. 64 nodes)

Different from WCube, which uses a centralized wireless hub at each group of 64 nodes, in the iWISE architecture,every router has its own transmitter and receiver for each group of routers. As shown in Figure 8, the iWISE architecturereduces the hop count by distributing these transceivers at each router, as opposed to the centralized hub found in WCube

[13]. A token scheme is adopted for the wireless routers to share the limited bandwidth, while frequency division

multiplexing (FDM) and time division multiplexing (TDM) are induced to avoid transmission interference.

Wireless/RF-I Resource Management

The wireless/RF access points act as the connective bridges in the hybrid hierarchical wireless/RF-I architecture, whichconnects the local network and global network. If there are multiple packets trying to access the same wireless/RF node at

once, the wireless/RF access points might become bottlenecks, thus overloading the access points and resulting in higher

latency, so a reasonable control strategy is needed to alleviate the potential congestion between the multiple wireless/RF


7/12


requirements for the access points. Similarly, another arbitration scheme is needed to decide who can get access to the

particular wireless medium (or RF-I channel) in a given period, because all wireless/RF-I access points can tune to thesame channel and can send or receive data from any other wireless/RF-I access point in the network. Therefore, how to

allocate the wireless/RF resource of the specific wireless/RF access point between multiple transmission requirements from

the PEs (or the base routers in the local network) and how to allocate the specific wireless medium or RF-I channel between

multiple wireless/RF-I points in a given period are two of the important problems in wireless/RF-I resource management.

The solutions to the two problems explored so far by different research groups can be broadly classified into three classes,depending on the specific implementation of the HHWA.

Set 2 Set 3

Set 0 Set 0

Traditional

Wired Link

Wireless linkRouter

Core

Figure 8.An iWISE architecture showing wireless communication between four sets

One is a fixed static allocation strategy with a coarse-gain arbitration mechanism, which assigns the wireless/RF-I to

predetermined communication pairs for the entire duration of an applications execution [6][16][12][29]. The chosen pairs

are allocated a specific wireless link (or RF channel), and each transmitter or receiver in the topology will be tuned to a

particular frequency; thus the specific bandwidth is exclusive to the transmitter, and contention is avoided [16]. Another

frequency band is extended to act as a multicast channel, with multiple receivers tuned to that frequency band to receive

multicast. A certain processing node is chosen as the only transmitter of the multicast channel, and other PEs that want to

send a multicast should first implicitly send the multicast message via conventional mesh links to the designated transmitter.

The destination bit vector (DBV) is used to distinguish multicast transmissions from other network communication. To

improve scalability and connectivity, Lee et al. [12] adopted wireless links instead of RF-I to support thousands of cores. Asingle, different frequency band is assigned to every wireless router, which is exclusively used for transmission. Every

micro wireless router is equipped a single transmit antenna and multiple receive antennas, and the receivers are statically

tuned to the frequency bands of their logical neighbors (whose addresses differ from that router in only one bit) to

implement parallel transmission without frequency interference. However, this approach does not provide a congestion

control mechanism to alleviate the potential bottleneck if too many packets try to use the wireless backbone at once.

Another class adopts a token-based arbitration mechanism [30] to solve access contention for the wireless/RF-I resource

[13][15][18]. To address contention from multiple wireless requirements to transmit packets through the express pathway, a

token flow control along with a distributed routing strategy is adopted to alleviate congestion [18]. If taking the wirelesslink for communication reduces the total hop count, and if the token of this wireless link to the destination is available, the

access transmission is allowed. To address contention between wireless routers for a specific wireless medium, a differentwireless token-passing protocol can be used [18]. The particular wireless router possessing the wireless token can broadcast

flits into the wireless medium, and the wireless token will be forwarded to the next wireless router after all flits belonging

to a packet at the current wireless token-holding router are transmitted. Different from other HHWAs that centralize the

wireless routers, iWISE distributes the transceivers at each router to avoid hotspots and reduce the hop count. In the iWISE

architecture, a sharing scheme with tokens is used to share the limited bandwidth, along with FDM and TDM mechanismsto avoid interference. In this token-based arbitration scheme, possession of a token represents the right to transmit on a

certain frequency to a set [13]. Two different sharing schemes: token-partial and token-full, are explored with different

workloads, which demonstrated how the different design of token-based arbitration can influence arbitration cost (latency)

and channel utilization for different traffic patterns, so as to affect the communication performance.

Although the fixed static allocation strategy can dynamically and adaptively choose different shortcuts for different

applications, the shortcuts cannot be adjusted according to real-time workload requirements. Token-based dynamic


8/12


arbitration, which allocates the channels in real time to communicating pairs on demand with low arbitration latency, power,

and hardware cost, faces a channel utilization problem and long arbitration latency with non-uniform communication.However, modern and future CMPs tend not to exhibit this uniformity due to spatial communication heterogeneity. So

stream arbitration was proposed by Xiao et al. [31] as an efficient dynamic bandwidth utilization scheme that can deal with

both spatial and temporal communication heterogeneity. Unlike token arbitration, where channels are coupled to receivers,

a channel in stream arbitration can be used to send packets from any sender to any receiver, which efficiently addresses the

problem of spatial communication heterogeneity. Since stream arbitration is inherently a dynamic arbitration scheme, italso efficiently handles temporal communication heterogeneity. Stream arbitration partitions the aggregate bandwidth intoarbitration channels and data channels. Active sources (nodes that want to send flits through wireless/RF-I) compete for the

data channels in the arbitration channel in order to talk to their desired destination nodes. Stream arbitration is a distributed

mechanism without a centralized arbitrator and is implemented independently and simply. Stream arbitration proved to be

an efficient scheme for resource arbitration for emerging network technologies, with a case study consisting of a modeled

RF-I network.

Key Problems in HHWA Design

Wireless or RF-I?

As we know, both wireless and RF-I have better compatibility compared to other technologies, such as optical

interconnects, and perform well as an expressway for long and critical communication in an HHWA, compared to

traditional NoCs with only wired connects; but each has its own merits and characters. When we design an HHWA,

which emerging interconnects should we choose? Wireless or RF-I, or both? As we discussed in Section 2, the biggestdifference between RF-I and wireless is the transmission medium, for no channel needs to be physically laid out with

wireless interconnects, whereas a transmission line (TL) is needed for electromagnetic carrier wave transmission in RF-I.

So the area cost of RF-I will be a challenge for the design of very large scale integrated circuits since the long TL needs to

span the whole chip for remote transmission, and the crosstalk (or inter-channel interference) between adjacent TLs may

also pose problems for long TLs with very high frequencies [12]. Without a physical channel needed, wireless

interconnects provide better scalability and connectivity compared with RF-I. But the on-chip antenna is always one of themost difficult components to be integrated for large CMPs [12][15]. In addition, due to the induced cost, wireless is not as

efficient with very short distance communication. A comparative analysis of the energy dissipation per bit between wireless

and wired communication channels was carried out by Chang et al. [18], which showed mm-wave wireless shortcuts arealways energy-efficient when the link length is 7 mm, but inefficient below 7 mm, compared to traditional wired links [24].

Why not employ their (wireless and RF-I) respective merits and complementary strengths? For mid-sized networks

within the range of tens to the low hundreds of PEs, we can adopt RF-I, which is more feasible for reducing latency and

energy consumption. For very large scale networks with thousands of cores, wireless interconnect can be adopted to

provide better scalability. An alternative approach is a combination of wireless links and RF-I, which uses RF-I to bridge

the gap between the baseline mesh and wireless interconnect for midrange messages, using wireless interconnects only for

long-range communication [12]. This hierarchical architecture with three levels provides better trade-off between cost andperformance, but the design of relay nodes for inter-level transmission might be a problem, which should be explored in

depth to minimize the extra cost and potential bottlenecks.

Placement of the wireless/RF-I access points

The placement of wireless/RF-I access points is crucial for optimum performance gain because it establishes high-speed,

low-energy interconnects on the network. The aim is to minimize the number of cycles between distant or critical endpoints

so as to get the optimal architecture design with minimal average latency or hop count. The existing optimizationtechniques, such as evolutionary algorithms (EAs) [32], coevolutionary algorithms [33] and the simulated annealing (SA)

algorithm [34], afford us powerful methods to help with architecture construction. The choice of optimization algorithm is a

trade-off between better results and faster speed for a large search space. EAs are generally believed to give better results

but lengthy times. SA reaches comparably good solutions with acceptable search time [34][18]. No matter which heuristics

is adopted, a cost metric is needed for optimization evaluation, which includes the distance (in hops) and the probability of

communication between sources and destinations. It is a good approach to introduce application communication statistics

into the cost metric to find the optimum position for the placement of wireless/RF-I access points, so as to accelerate

communication on paths that are most frequently used by the application [16].

Routing


9/12


The routing strategy determines the path a packet takes from its source to its destination. Due to the different transmission

characteristics of RF-I/wireless compared with traditional wired interconnects, and the harsh requirements for on-chipdesign of a hierarchical architecture, the routing mechanism in an HHWA should be simple and reliable, without incurring

too much power, area and latency overhead. We divide routing mechanism into local routing and global routing by whether

using wireless/RF-I. Local routing depends on the topology of the subnets. For example, if the PEs within a subnet are

connected in a mesh, then data routing within the subnet follows dimension order routing. Global routing relates to whether

and how to use the RF/wireless interconnects. Flow control, deadlock avoidance and RF-I/wireless resource managementstrategy are key problems in the global routing design. Kim et al. [23] and Deb et al. [24] analyzed the different strategiesadopted by existing HWWAs, and provide very good references and guidance for future HHWA designs. A comprehensive

study quantifying merits and limitations for different strategies and their implementation challenges needs to be carried out,

with an informative comparative analysis [24].

Wireless/RF-I resource allocation

According to the ITRS [36], unity current gain frequency fT and maximum available power gain fmax will be 600 GHz and

1 THz, respectively, in 16 nm CMOS technology. With the advances in CMOS circuits, tens to hundreds of gigahertz of

bandwidth will be available in the near future [26] [12][15][24]. How to efficiently utilize the available bandwidth is one ofthe important problems in HHWA design. The arbitration mechanisms for wireless/RF-I resource contention were

discussed in Section 3, which showed that bandwidth sharing between all the wireless/RF-I access points (referred to as a

bandwidth sharing scheme) with stream arbitration performs better in non-uniform traffic compared with token arbitrationwith a specific exclusive occupancy for every wireless/RF-I access point (referred to as a bandwidth distributed scheme). If

we partition the aggregate bandwidth into a set of communication channels (aggregate bandwidth is calculated as the

number of channels multiplied by the bandwidth of each channel), each wireless/RF-I access point can only obtain a small

proportion of the total bandwidth in the distributed allocation strategy. Because every access point occupies a specific

channel, this mechanism is very efficient for uniform traffic patterns with high access contention. For a sharing mechanism,

all the available bandwidth is a public resource, and only the winners occupy the channels in a fixed period, so as to

dynamically allocate the resource as demanded in real time with better bandwidth utilization.

To further explore the influence of bandwidth allocation, Xiao et al. [31] did an experiment with fixed aggregate

bandwidth with stream arbitration and a bandwidth sharing scheme. This work adjusted the number of channels and the

channel bandwidth to achieve that aggregate bandwidth. The simulation results showed that a compromise needs to be

found between high bandwidth channels and additional channels. There is potential optimization for bandwidth allocation

with a dynamic bandwidth partition [31].

Transmission reliability

Although wireless/RF-I performs well for long distance transmission with high bandwidth, low latency and low energy

consumption, the bit-error problem is a challenge to ensuring reliable message transmission. Within the maximum

communication distance of future CMPs, 1.5 cm, the bit-error rate (BER) of the on-chip wireless channel is less than 109

,

which is far higher than that of RC wires. (Current RC wires have an extremely low BER of approximately 1014 [12].)

Error control coding (ECC) is explored by Ganguly et al. [37], who showed that by implementing joint crosstalk avoidancetriple error correction and simultaneous quadruple error detection codes [38] in the wire line links and Hamming code

based product codes (H-PCs) in the wireless links of a hierarchical wireless NoC with CNT antennas [37], it is possible to

improve overall reliability of the wireless NoC manifold. However, application of ECC introduces timing and area

overhead and also incurs fixed overhead over every packet [12][15]. Research into WCube devised a novel and simple loss

management solution that uses a zero-signalingoverhead scheme, overhearing-and-retransmission (OAR), based onoverhearing on intermediate hops, and uses an on-demand, checksum-based error-detection and retransmission scheme atthe last hop [12]. OAR detects and recovers packet losses without extra signaling overhead with a buffer-based mechanism.

The packet is verified by the checksum at the destination, and retransmits if the checksum does not match. This solution is

simple, and induced less extra cost compared with ECC, but the forwarding sequence of packets should be kept to ensure

the correct transmission.

Scalability

To target future large-scale CMPs, scalability is one of the most important problems for the design of an on-chip hybridhierarchical architecture. Lee et al. [12] proposed the WCube recursive wireless interconnect structure, which offers

connectivity to thousands of cores in CMPs. A case study with a network consisting of 1024 PEs proved efficient with

WCube and demonstrated a reduced observed latency of 20% to 45% compared to current 2-D wired mesh designs. Since

future communication patterns tend towards the non-uniform and heterogeneous, Xiao et al. [31] proposed a cluster-based


10/12


hierarchical architecture that uses a local transmission line for each core cluster, and a global TL to connect the local TLs.

A network with 16x16 RF nodes for a 32x32 router NoC (each 2x2 router shares one RF node) proved efficient in averagenetwork latency and energy consumption with a hierarchical TL architecture and hierarchical stream arbitration, compared

to architecture with a single TL spanning the whole trip [31]. The three-level architecture with traditional RC connects, RF-

I and wireless links is also one of the potential solutions for scalability in architecture, and detailed implementation needs to

be proposed in future designs.

Conclusion

As a new architecture composite with emerging interconnects, new design challenges need to be targeted for hybrid

hierarchical wireless/RF-I architectures. Based on analysis of the existing typical HHWAs, we explored strategies for

wireless/RF-I resource management for the first time and discussed the strengths and disadvantages of different solutions.

The key problems in hybrid hierarchical wireless/RF-I architecture design are explored, and related potential solutions are

provided, which we expect to serve as a basis to help with future HHWA designs. Quantitative analysis for the performance

benefits of different HHWAs need to be benchmarked in future work, and detailed investigations for physicalimplementations need to be explored in the future.

References

[1] International Technology Roadmap for Semiconductors (ITRS), 2012.

[2] A. Shacham, K. Bergman, L. P. Carloni, Photonic networks-on-chip for future generations of chip multiprocessors,

IEEE Transactions on Computers, vol. 57, no. 9, pp. 1246-1260, 2008.Article (CrossRef Link)

[3] D. Vantrease, R. Schreiber, M. Monchiero, M. McLaren, N. P. Jouppi, M. Fiorentino, A. Davis, N. Binkert, R. G.

Beausoleil, J. H. Ahn, Corona: System Implications of Emerging Nanophotonic Technology, in Proc. of the 35th

Annual International Symposium on Computer Architecture (ISCA08), Washington, DC, USA, pp. 153-164, 2008.

Article (CrossRef Link)

[4] M. F. Chang, I. Verbauwhede, C. Chien, Z. Xu, J. Kim, J. Ko, Q. Gu, B. Lai, Advanced RF/baseband interconnectschemes for inter- and intra-ulsi communications,IEEE Transactions on Electron Devices, vol. 52, no. 7, pp. 1271-

1285, 2005.Article (CrossRef Link)

[5]

M. F. Chang, E. Socher, R. Tam, J. Cong, G. Reinman, RF interconnects for communications on-chip,in Proc. ofthe 2008 international symposium on Physical design (ISPD08), ACM New York, NY, pp. 78-83, 2008. Article

(CrossRef Link)

[6] M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, S.-W. Tam, CMP Network-on-Chip Overlaidwith Multi-Band RF-Interconnect, in Proc. of the IEEE Int'l Symposium on High-Performance Computer

Architecture (HPCA), Salt Lake City, UT, February, pp. 191-202, 2008.Article (CrossRef Link)

[7] D. Zhao, Y. Wang, SD-MAC: Design and Synthesis of A Hardware-Efficient Collision-Free QoS-Aware MAC

Protocol for Wireless Network-on-Chip,IEEE Transactions on Computers, vol. 57, no, 9, pp. 1230-1245Sep, 2008.


[8] Y. Wang, D. Zhao, The Design and Synthesis of a Synchronous and Distributed MAC Protocol for Wireless

Network-on-Chip,inProc. IEEE Intl Conf. Computer-Aided Design, Nov. 2007.Article (CrossRef Link)

[9] S. Deb, K. Chang, et al., Design of an Efficient NoC Architecture using Millimeter-Wave Wireless Links,in Proc.

of 13th Intl Symposiumon Quality Electronic Design, pp. 165-172, Mar. 2012.Article (CrossRef Link)

[10]

L. P. Carloni, P. Pande, Y. Xie, Networks-on-chip in emerging interconnect paradigms: Advantages and challenges,in Proc. of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, pp. 93-102, 2009. Article

(CrossRef Link)[11]H. S. Wang, X. Zhu, L. S. Peh, S. Malik, Orion: A power-performance simulator for interconnection networks,in

Proc. of the 35th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 294305, Nov. 2002.Article

(CrossRef Link)

[12]S. B. Lee et al., A scalable micro wireless interconnect structure for CMPs,in Proc. ACM Annu. Int. Con. Mobile

Comput. Network. (MobiCom), pp. 20-25, 2009.Article (CrossRef Link)

[13] D. D. Tomaso et al., iWise: Inter-router wireless scalable express channels for Network-on-Chips (NoCs)

architecture,inProc. Annu. Symp. High Performance Interconnects, pp. 11-18, 2011.Article (CrossRef Link)

[14]W. J. Dally, Express cubes: Improving the performance of k-ary n-cube interconnection networks,IEEE Trans.

Computers, vol. 40, no. 9, pp. 1016-1023, Sep. 1991.Article (CrossRef Link)

[15]A. Ganguly, K. Chang, S. Deb, P. Pande, B. Belzer, C. Teuscher, Scalable hybrid wireless network-on-chip

architectures for multicore systems, IEEE Trans. Computers, vol. 60, no. 10, pp. 1485-1502, Oct. 2011. Article
http://dx.doi.org/10.1109/TC.2008.78http://dx.doi.org/10.1109/TC.2008.78http://dx.doi.org/10.1109/TC.2008.78http://dx.doi.org/10.1109/ISCA.2008.35http://dx.doi.org/10.1109/ISCA.2008.35http://dx.doi.org/10.1109/TED.2005.850699http://dx.doi.org/10.1109/TED.2005.850699http://dx.doi.org/10.1109/TED.2005.850699http://dx.doi.org/10.1145/1353629.1353649http://dx.doi.org/10.1145/1353629.1353649http://dx.doi.org/10.1145/1353629.1353649http://dx.doi.org/10.1109/HPCA.2008.4658639http://dx.doi.org/10.1109/HPCA.2008.4658639http://dx.doi.org/10.1109/HPCA.2008.4658639http://dx.doi.org/10.1109/TC.2008.86http://dx.doi.org/10.1109/TC.2008.86http://dx.doi.org/10.1109/ICCAD.2007.4397332http://dx.doi.org/10.1109/ICCAD.2007.4397332http://dx.doi.org/10.1109/ICCAD.2007.4397332http://dx.doi.org/10.1109/ISQED.2012.6187490http://dx.doi.org/10.1109/ISQED.2012.6187490http://dx.doi.org/10.1109/ISQED.2012.6187490http://dx.doi.org/10.1109/NOCS.2009.5071456http://dx.doi.org/10.1109/NOCS.2009.5071456http://dx.doi.org/10.1109/NOCS.2009.5071456http://dx.doi.org/10.1109/MICRO.2002.1176258http://dx.doi.org/10.1109/MICRO.2002.1176258http://dx.doi.org/10.1109/MICRO.2002.1176258http://dx.doi.org/10.1109/MICRO.2002.1176258http://dx.doi.org/10.1145/1614320.1614345http://dx.doi.org/10.1145/1614320.1614345http://dx.doi.org/10.1145/1614320.1614345http://dx.doi.org/10.1109/HOTI.2011.12http://dx.doi.org/10.1109/HOTI.2011.12http://dx.doi.org/10.1109/HOTI.2011.12http://dx.doi.org/10.1109/12.83652http://dx.doi.org/10.1109/12.83652http://dx.doi.org/10.1109/12.83652http://dx.doi.org/10.1109/TC.2010.176http://dx.doi.org/10.1109/TC.2010.176http://dx.doi.org/10.1109/TC.2010.176http://dx.doi.org/10.1109/12.83652http://dx.doi.org/10.1109/HOTI.2011.12http://dx.doi.org/10.1145/1614320.1614345http://dx.doi.org/10.1109/MICRO.2002.1176258http://dx.doi.org/10.1109/MICRO.2002.1176258http://dx.doi.org/10.1109/NOCS.2009.5071456http://dx.doi.org/10.1109/NOCS.2009.5071456http://dx.doi.org/10.1109/ISQED.2012.6187490http://dx.doi.org/10.1109/ICCAD.2007.4397332http://dx.doi.org/10.1109/TC.2008.86http://dx.doi.org/10.1109/HPCA.2008.4658639http://dx.doi.org/10.1145/1353629.1353649http://dx.doi.org/10.1145/1353629.1353649http://dx.doi.org/10.1109/TED.2005.850699http://dx.doi.org/10.1109/ISCA.2008.35http://dx.doi.org/10.1109/TC.2008.78


11/12


(CrossRef Link)

[16]M. F. Chang, J. Cong, A. Kaplan, A. Kaplan, C. Liu, M. Naik, J. Premkumar, G. Reinman, E. Socher, S.-W. Tam,Power reduction of CMP communication networks via RF-interconnects, in Proc. of the 41st annual IEEE/ACM

International Symposium on Microarchitecture (MICRO 41), Washington, DC, USA, pp. 376-387, 2008. Article

(CrossRef Link)

[17]W. J. Dally, T. B, Principles and Practices of Interconnection Networks. Waltham,MA: Morgan Kaufmann, 2004.

[18]

K. Chang, S. Deb, et al., Performance Evaluation and Design Trade-offs for Wireless Network-on-Chip Architecture,ACM Journal on Emerging Technologies in Computing Systems, vol. 8, no. 8, 2012.Article (CrossRef Link)

[19]M. F. Chang, V. P. Roychowdhury, L. Zhang, H. Shin, Y. Qian, RF/wireless interconnect for inter- and intra-chip

communications,Proceedings of the IEEE, vol. 89, no. 4, Apr. 2001.Article (CrossRef Link)

[20]J. Ko, J. Kim, Z. Xu, Q. Gu, C. Chien, M. Chang, An RF/baseband FDMA -interconnect transceiver for

reconfigurable multiple access chip-to-chip communication, in Proc. of Dig. Tech. Papers Int. Solid-State Circuits

Conf., vol. 1, pp. 338-602, Feb. 2005.Article (CrossRef Link)

[21]H. Wu, L. Nan, S.-W. Tam, et al., A 60GHz on-chip RF-Interconnect with /4 coupler for 5Gbps bi-directional

communication and multi-drop arbitration,inProc. of Custom Integrated Circuits Conference (CICC), pp. 1-4, 2012.


[22]Y. Kim, G.-S. Byun, A. Tang, C.-P. Jou, H.-H. Hsien, G. Reinman, J. Cong, M. F. Chang, An 8Gb/s/pin 4pJ/b/pin

single-t-line dual (Base+RF) band simultaneous bidirectional mobile memory I/O interface, in Proc. of the IEEE

International Solid-State Circuits Conference (ISSCC), pp. 50-51, 2012.Article (CrossRef Link)

[23]

J. Kim, K. Choi, et al., Exploiting New Interconnect Technologies in On-Chip Communication,IEEE Journal onemerging and selected topics in circuits and systems, vol. 2, no. 2, pp124-136, June 2012.Article (CrossRef Link)

[24]S. Deb, A. Ganguly, P. Pande, D. Heo, B. Belzer, Wireless NOC as interconnection backbone for multicore chips:

Promises and challenges,IEEE Journal on emerging and selected topics in circuits and systems, vol. 2, no. 2, pp228-

239, June 2012.Article (CrossRef Link)

[25]J. Lin et al., Communication using antennas fabricated in silicon integrated circuits, IEEE J. Solid-State Circuits,

vol. 42, no. 8, pp.1678-1687, Aug. 2007.Article (CrossRef Link)

[26]S. Deb et al., Enhancing performance of Network-on-Chip architectures with millimeter-wave wireless interconnects,

inProc. IEEE Int. Conf. ASAP, pp. 73-80, 2010.Article (CrossRef Link)

[27]K. Kempa et al., Carbon nanotubes as optical antennae, Adv. Mater., vol. 19, pp. 421-426, 2007.Article (CrossRef

Link)

[28]D. J. Watts, S. H. Strogatz, Collective dynamics of small-world networks, Nature, vol. 393, pp. 440442, 1998.Article (CrossRef Link)

[29]

M. F. Chang, J. Cong, A. Kaplan, M. Naik, G. Reinman, E. Socher, S.-W. Tam, CMP Network-on-Chip Overlaidwith Multi-Band RF-Interconnect, UCLA Computer Science Department Technical Report UCLA/CSD-TR-07-0032,

Dec. 2007.

[30]A. Kumar, L.-S. Peh, N. K. Jha, Token flow control, inProc. of the 41st IEEE/ACM International Symposium on

Microarchitecture (MICRO 08), pp. 342-353, 2008.Article (CrossRef Link)

[31]C. Xiao, M.-C. Frank Chang, J. Cong, M. Gill, Z. Huang, C. Liu, G. Reinman, H. Wu, Stream Arbitration: Towards

Efficient Bandwidth Utilization for Emerging On-Chip Interconnects, ACM Transactions on Architecture and Code

Optimization, vol. 9, no. 4, Jan. 2013.Article (CrossRef Link)

[32]A. E. Eiben, J. E. Smith, Introduction to Evolutionary Computing,Springer Berlin, 2003.Article (CrossRef Link)

[33]M. Sipper, Evolution of Parallel Cellular Machines: The Cellular Programming Approach,Springer Berlin, 1997.


[34]S. Kirkpatrick, Jr C. D. Gelatt M. P. Vecchi, Optimization by simulated annealing,Science, vol. 220, pp. 671-680,

1983.Article (CrossRef Link)

[35]

T. Jansen, I. Wegener, A comparison of simulated annealing with a simple evolutionary algorithm on pseudo-boolean functions of unitation,Theor. Comput. Sci, vol. 386, pp. 73-93, 2007.Article (CrossRef Link)

[36]International technology roadmap for semiconductors, 2007 edition.[37]A. Ganguly et al., A unified error control coding scheme to enhance the reliability of a hybrid wireless Network-on-

Chip, inProc. IEEE Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst, pp.277285, 2011.Article (CrossRef

Link)

[38]A. Ganguly et al., Crosstalk-aware channel coding schemes for energy efficient and reliable NoC interconnects,

IEEE Trans. Very Large Scale (VLSI) Syst., vol. 17, no. 11, pp. 16261639, Nov. 2009.Article (CrossRef Link)

[39]N. Hardavellas, M. Ferdman, B. Falsafi, A. Ailamaki, Reactive NUCA: near-optimal block placement and replication

in distributed caches, in Proc. of the 36th annual international symposium on Computer architecture (ISCA '09).

ACM, New York, NY, USA, 184-195, 2009.Article (CrossRef Link)

[40]H. Lee, S. Cho, R. C. Bruce, StimulusCache: Boosting Performance of Chip Multiprocessors with Excess Cache,

Proc. of the IEEE Int'l Symposium on High-Performance Computer Architecture (HPCA), Bangalore, India, Jan. 2010.
http://dx.doi.org/10.1109/MICRO.2008.4771806http://dx.doi.org/10.1109/MICRO.2008.4771806http://dx.doi.org/10.1109/MICRO.2008.4771806http://dx.doi.org/10.1145/2287696.2287706http://dx.doi.org/10.1145/2287696.2287706http://dx.doi.org/10.1145/2287696.2287706http://dx.doi.org/10.1109/5.920578http://dx.doi.org/10.1109/5.920578http://dx.doi.org/10.1109/5.920578http://dx.doi.org/10.1109/ISSCC.2005.1494007http://dx.doi.org/10.1109/ISSCC.2005.1494007http://dx.doi.org/10.1109/ISSCC.2005.1494007http://dx.doi.org/10.1109/CICC.2012.6330666http://dx.doi.org/10.1109/CICC.2012.6330666http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/JETCAS.2012.2201031http://dx.doi.org/10.1109/JETCAS.2012.2201031http://dx.doi.org/10.1109/JETCAS.2012.2201031http://dx.doi.org/10.1109/JSSC.2007.900236http://dx.doi.org/10.1109/JSSC.2007.900236http://dx.doi.org/10.1109/JSSC.2007.900236http://dx.doi.org/10.1109/ASAP.2010.5540799http://dx.doi.org/10.1109/ASAP.2010.5540799http://dx.doi.org/10.1109/ASAP.2010.5540799http://dx.doi.org/10.1002/adma.200601187http://dx.doi.org/10.1002/adma.200601187http://dx.doi.org/10.1002/adma.200601187http://dx.doi.org/10.1002/adma.200601187http://dx.doi.org/10.1038/30918http://dx.doi.org/10.1038/30918http://dl.acm.org/citation.cfm?id=1521786http://dl.acm.org/citation.cfm?id=1521786http://dl.acm.org/citation.cfm?id=1521786http://dx.doi.org/10.1145/2400682.2400719http://dx.doi.org/10.1145/2400682.2400719http://dx.doi.org/10.1145/2400682.2400719http://dx.doi.org/10.1007/978-3-662-05094-1http://dx.doi.org/10.1007/978-3-662-05094-1http://dx.doi.org/10.1007/978-3-662-05094-1http://dx.doi.org/10.1007/3-540-62613-1http://dx.doi.org/10.1007/3-540-62613-1http://dx.doi.org/10.1126/science.220.4598.671http://dx.doi.org/10.1126/science.220.4598.671http://dx.doi.org/10.1126/science.220.4598.671http://dx.doi.org/10.1016/j.tcs.2007.06.003http://dx.doi.org/10.1016/j.tcs.2007.06.003http://dx.doi.org/10.1016/j.tcs.2007.06.003http://dx.doi.org/10.1109/DFT.2011.24http://dx.doi.org/10.1109/DFT.2011.24http://dx.doi.org/10.1109/DFT.2011.24http://dx.doi.org/10.1109/DFT.2011.24http://dx.doi.org/10.1109/TVLSI.2008.2005722http://dx.doi.org/10.1109/TVLSI.2008.2005722http://dx.doi.org/10.1109/TVLSI.2008.2005722http://dl.acm.org/citation.cfm?id=1555779http://dl.acm.org/citation.cfm?id=1555779http://dl.acm.org/citation.cfm?id=1555779http://dl.acm.org/citation.cfm?id=1555779http://dx.doi.org/10.1109/TVLSI.2008.2005722http://dx.doi.org/10.1109/DFT.2011.24http://dx.doi.org/10.1109/DFT.2011.24http://dx.doi.org/10.1016/j.tcs.2007.06.003http://dx.doi.org/10.1126/science.220.4598.671http://dx.doi.org/10.1007/3-540-62613-1http://dx.doi.org/10.1007/978-3-662-05094-1http://dx.doi.org/10.1145/2400682.2400719http://dl.acm.org/citation.cfm?id=1521786http://dx.doi.org/10.1038/30918http://dx.doi.org/10.1002/adma.200601187http://dx.doi.org/10.1002/adma.200601187http://dx.doi.org/10.1109/ASAP.2010.5540799http://dx.doi.org/10.1109/JSSC.2007.900236http://dx.doi.org/10.1109/JETCAS.2012.2201031http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/ISSCC.2012.6176874http://dx.doi.org/10.1109/CICC.2012.6330666http://dx.doi.org/10.1109/ISSCC.2005.1494007http://dx.doi.org/10.1109/5.920578http://dx.doi.org/10.1145/2287696.2287706http://dx.doi.org/10.1109/MICRO.2008.4771806http://dx.doi.org/10.1109/MICRO.2008.4771806


12/12


Chunhua Xiao received her B.S. in Electronic Information Engineering from Shijiazhuang

Tiedao University, Hebei Province, China, in 2007, and her M.S. in Computer Science from

Beijing University of Technology, Beijing, China, in 2010. She is currently a PhD student in

Department of Computer Science and Technology, Beijing University of Technology. Her

research interests include embedded system co-design, Multi-processor system-on-chip, and

Network-on-Chip.

Zhangqin Huang received his B.S., M.S., and PhD in Computer Science from Xian Jiaotong

University, China, in 1986, 1989 and 2000, respectively. He is currently the Deputy Director of

the Embedded Software and Systems Institute (ESSI), Beijing University of Technology (BJUT),

China. His current research interests include co-design for embedded software and hardware,

humancomputer interaction based on internet, Multi-processor system-on-chip, mass datastorage, and network information security.

Da Li received his B.S., M.S., and PhD in Computer Science from Xian Jiaotong University,

China, in 2002, 2006 and 2012, respectively. He is currently a instructor of Embedded Software

and Systems Institute (ESSI), Beijing University of Technology (BJUT). His research interests

include embedded FPGA system design and multi-core processors.

Copyright 2013 KAIS

A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures With WirelessRF

Documents

Transcript of A Tutorial for Key Problems in the Design of Hybrid Hierarchical NoC Architectures With WirelessRF