Noc Router1PPT

RKGIT,GHAZIABAD

PRESENTED BY

ZIYAUL HUQUE(1103331177)

YADVENDRA SINGH(1103331176)

SONY CHAURASIYA(1203331904)

BTECH(ECE)

8th SEMESTER

RECONFIGURABLE ROUTERS FOR LOW POWER AND HIGH PERFORMANCE USING NOCs TECHNOLOGY

INTRODUCTION

SOFTWARES REQUIREMENT

WORK DONE

REFRENCES

INTRODUCTION

ABSTRACT

• Network on chip designs are based on a compromise among latency, power dissipation , on energy and the balance defined at design time , however setting all parameters such as buffer size,at design time can cause either excessive power dissipation or a higher latency . The situation worsens whenever the application changes its communication pattern ,e .g . a portable phone downloads new service. Large buffer can ensure performance during execution of such application but it is the main reason for power dissipation so here we have made a project which use a reconfigurable router ,where the buffer slots are dynamically allocated to increase the routers efficiency even under different communication loads.

• In the proposed architecture ,the depth of each buffer word in the input channels of the router allow up to 52% power savings, while maintaining the same performance as that of a homogeneous router, but using smaller buffer size.

NETWORK ON CHIP: A QUICK INTRODUCTION A variety of interconnection schemes are currently in use, including crossbar, buses and NOCs. However buses suffers from poor scalability because as the number of processing elements increases, performance degrades dramatically. Hence they are not considered where processing elements are more. To overcome this limitation attention has shifted to packet-based on-chip communication networks, known as Network-On-Chip. A typical NoC consists of computational processing elements (PEs), network interfaces (NIs), and routers.

CONTINUED The NI is used to packetize data before using the router backbone to traverse the NoC. Each PE is attached to an NI which connects the PE to a local router. When a packet was sent from a source PE to a destination PE, the packet is forwarded hop by hop on the network via the decision made by each router. For each router, the packet is first received and stored at an input buffer. Then the control logics in the router are responsible to make routing decision and channel arbitration. Finally, the granted packet will traverse through a crossbar to the next router, and the process repeats until the packet arrives at its destination.

NOC ADVANTAGES Independent implementation & optimization of layers. Simplified customization per application. Support multiple topologies & options for different parts of the network.

Problems Internal network contention causes (often unpredictable) latency. The network has a significant silicon area. Bus-oriented IPs need smart wrappers. Software needs clean synchronization in multiprocessor systems. System designers need reeducation for new concepts.

The design of router mainly consist of three parts:(1) FIFO (2) Arbiter (3) Crossbar

1. FIFO Buffer Buffering is very essential in NoC for flow control and congestion control. Deciding proper size of buffer is the key issue for getting optimal performance.

In this design the buffer of depth eight is used to provide input buffering.The length of buffer is equal to the packet size.

Since the depth of buffer is eight, minimum eight clock signals are required for

first packet to come out of buffer.

2. ARBITER Arbiter controls the arbitration of the ports and resolves contention

problem. It keeps the updated status of all the ports and knows which ports are free and which ports are communicating with each other.

In proposed work, we are using FIXED PRIORITY ARBITER.Packets with the same priority and destined for the same output port are scheduled with a fixed priority arbiter.

Supposing in a given period of time, there was many input ports request the same output or resource, the arbiter is in charge of processing the priorities among many different request inputs.

The arbiter will release the output port which is connected to the crossbar once the last packet has finished transmission.So that other waiting packets could use the output by the arbitration of arbiter.

Depending upon the control logic arbiter generate select lines for mux based crossbar and read or write signal for FIFO buffers.

3. CROSSBAR A crossbar switch (also known as cross-point switch,crosspoint switch or matrix

switch) is a switch connecting multiple inputs to multiple outputs in a matrix manner. The design of crossbar switch has 5 inputs and 5 outputs. According to the select lines generated by arbiter, crossbar establishes the connections between input port and output port.

LOOK UP TABLE (LUT) A k input LUT can implement any Boolean function of k variables. The inputs are

used as addresses that can retrieve the 2k by 1-bit memory that stores the truth table of the Boolean function. Since the size of the memory increases with the number of inputs, k, in order to optimize this mapping and reduce the size of the memory there are a variety of algorithms that map a Boolean network, from a given equation, into a circuit of k-input LUT. These algorithms minimize either the total number of LUTs or the number of levels of LUTs in the final circuit. Minimizing the total number of LUTs reduces the CLB requirements while minimizing the levels of LUTs improves the delay.

SOFTWARES REQUIREMENT

Softwares Requirement

(a) MODELSIM ver.10.4.

(b) Xilinx ISE ver.14.7.

WORK DONE

Proposed Router Architecture A. Original Router Architecture

The original router architecture was embedded in the SoCIN NoC. SoCIN has a regular 2-D-mesh topology and parametric router architecture. The router architecture used is RaSoC, which is a routing switch with up to five bi-directional ports (Local, North, South, West, and East), each port with two unidirectional channels and each router connected to four neighbouring routers (North, South, West, and East). This router is a VHDL soft-core, parameterized in three dimensions: communication channels width, input buffers depth, and routing information width.

Figure (a) shows the original input FIFO. This architectures uses more multiplexers to allow the reconfiguration process. Fig. 3(b) presents the South Channel as an example.

Proposed Router Architecture If an NoC’s router has a larger FIFO buffer, the throughput will be larger and the latency in the network smaller, since it will have fewer flits stagnant on the network. Nevertheless, there is a limit on the increase of the FIFO depth. Since each communication will have its peculiarities, sizing the FIFO for the worst case communication scenario will compromise not only the routing area, but power as well. However, if the router has a small FIFO depth, the latency will be larger, and quality of service (QoS) can be compromised. The proposed solution is to have a heterogeneous router, in which each channel can have a different buffer size. In this situation, if a channel has a communication rate smaller than its neighbour, it may lend some of its buffer slots that are not being used. In a different communication pattern, the roles may be reversed or changed at run time, without a redesign step.

South Channel In our architecture it is possible to dynamically reconfigure different buffer depths for each channel. A channel can lend part or the whole of its buffer slots in accordance with the requirements of the neighbouring buffers. To reduce connection costs, each channel may only use the available buffer slots of its right and left neighbour channels. This way, each channel may have up to three times more buffer slots than its original buffer with the size defined at design time. Previous Figure shows the original and proposed input FIFO. Fig.(b) presents the South Channel as we took an example for our initial start. In this architecture it is possible to dynamically configure different buffer depths for the channels.

CONTINUED In accordance with this figure, each channel has five multiplexers,

and two of these multiplexers are responsible to control the input and output of data. These multiplexers present a fixed size, being independent of the buffer size. Other three multiplexers are necessary to control the read and write process of the FIFO. The size of the multiplexers that control the buffer slots increases according to the depth of the buffer. These multiplexers are controlled by the FSM of the FIFO. In order to reduce routing and extra multiplexers, we adopted the

strategy of changing the control part of each channel.

CONTINUED When a channel fills all its FIFO it can borrow more buffer words

from its neighbours. First the channel asks for buffer words to the right neighbour, and if it still needs more buffers, it tries to borrow from the left neighbour FIFO. In this manner, some signals of each channel must be sent for the neighbouring channels in order to control its stored flits. In result, each channel needs to know how many buffer words it uses of its own channel and of the neighbouring channels, and also how much the neighbour channels occupy of its own buffer set. A control block informs this number.

CONTINUED Then, based on this information, each channel controls the storage of its flits. These flits can be stored on its buffer this information, each channel controls the storage of its flits. These flits can be stored on its buffer slots or in the neighbour channel buffer slots. Each input port has a control to store the flits and this control is based in pointers. Each input channel needs six pointers to control the read and writing process: two pointers to control its own buffer slots, two pointers to control the left neighbour buffer slots, and two more pointers to control the right neighbour buffer slots (in each case, one pointer to the read operation and one pointer to write operation).

RTL SCHEMATIC OF SOUTH CHANNEL

LUTs REPRESENTATION OF REFERENCE CROSSBAR

CROSSBAR WITH FIXED PRIORITY ARIBTER

REFERENCES

REFERENCES[1] L. Benini and G. De Micheli, “Networks on Chips: a New SOC Paradigm”, IEEE Computer, Jan. 2002, pp.70-78.

[2] P. Guerrier and A. Greiner, “A Generic Architecture for on-Chip Packet-Switched Interconnections”, DATE’2000, IEEE CS Press, 2000. pp.250-256.

[3] S. Vangal, et al., "An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS," ISSCC Dig. Tech. Papers, pp. 98-589, Feb. 2007.

[4] M. Azimi, N. Cherukuri, D. N. Jayasimha, A. Kumar, P. Kundu, S.Park, I. Schoinas, and A. S. Vaidya, “Integration challenges and tradeoff for tera-scale architectures,” Intel Technol. J., vol. 11, no.3, Aug. 2007.

CONTINUED[5] L. Manferdelli, N. K. Govindaraju, and C. Crall, “Challenges and op-portunities in many-core computing,” Proc. IEEE, vol. 96, no.5, pp. 808–815, May 2008.

[6] F. Gilabert et al., ”Improved Utilization of NoC Channel Bandwidth by Switch Replication for Cost-Effective Multi-Processor Systems-on-Chip”, NOCS 2010.

[7] Deming Chen, Jason Cong, Chen Dong, Lei He, Fei Li and Chi-Chen Peng, “Technology Mapping and Clustering for FPGA Architectures with Dual Supply Voltages”, IEEE Trans.CAD, vol.29, no. 11, Nov.2010.

[8] S. Vangal, et al., "A 5.1GHz 0.34mm2 Router for Network-on-Chip Applications," Symp. VLSI Circuits Dig. Tech. Papers, pp. 42-43, 2007.

CONTINUED[9] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oberg, M.Millberg.etal,“Network on a chip: an architecture for billion transistor era,” Proc. IEEE NorChip, 2000.

[10] W. Dally and B. Towles, “Route packets, not wires: on-chip interconnection networks,” Proc. Design Automation Conference, Jun.2001, pp. 684–689.

[11] S. Kumar, A. Jantsch, J. Soininen, M. Forsell, M. Millberg, J. Oberg .et al, “A network on chip architecture and design methodology,” Proc. IEEE Computer Society Annual Symposium on VLSI, Apr. 2002, pp.105–112, doi: 10.1109/ISVLSI.2002.1016885.

[12] J. Duato, et al., Interconnection Networks: An Engineering Approach: Morgan Kaufman Publisher, 2003.

Thank You

Noc Router1PPT

Documents

Transcript of Noc Router1PPT