Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete...

73
Design of a core router using the SoCBUS on-chip network Examensarbete utf ¨ ort i Datorteknik vid Tekniska H¨ ogskolan i Link ¨ oping av Jimmy Svensson Reg nr: LiTH-ISY-EX–04/3562–SE Link¨ oping 2004

Transcript of Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete...

Page 1: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Design of a core routerusing the SoCBUS on-chip network

Examensarbete utfort i Datorteknikvid Tekniska Hogskolan i Linkoping

av

Jimmy Svensson

Reg nr: LiTH-ISY-EX–04/3562–SELinkoping 2004

Page 2: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical
Page 3: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Design of a core routerusing the SoCBUS on-chip network

Examensarbete utfort i Datorteknikvid Tekniska Hogskolan i Linkoping

av

Jimmy Svensson

Reg nr: LiTH-ISY-EX–04/3562–SE

Supervisor: Daniel Wiklund

Examiner: Dake Liu

Linkoping 2nd December 2004.

Page 4: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical
Page 5: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Avdelning, Institution Division, Department

Institutionen för systemteknik 581 83 LINKÖPING

Datum Date 2004-12-02

Språk Language

Rapporttyp Report category

ISBN

Svenska/Swedish X Engelska/English

Licentiatavhandling X Examensarbete

ISRN LITH-ISY-EX--04/3562--SE

C-uppsats D-uppsats

Serietitel och serienummer Title of series, numbering

ISSN

Övrig rapport ____

URL för elektronisk version http://www.ep.liu.se/exjobb/isy/2004/3562/

Titel Title

Design of a core router using the SoCBUS on-chip network

Författare Author

Jimmy Svensson

Sammanfattning Abstract The evolving technology has over the past decade contributed to a bandwidth explosion on the Internet. This makes it interesting to look at the development of the workhorses of the Internet, the core routers. The main objective of this project is to develop a 16 port gigabit core router architecture using intellectual property (IP) blocks and a SoCBUS on-chip interconnection network. The router architecture will be evaluated by making simulations using the SoCBUS simulation environment. Some changes will be made to the current simulator to make the simulations of the core router more realistic. By studying the SoCBUS network load the bottlenecks of the architecture can be found. Changes to the router design and SoCBUS architecture will be made in order to boost the performance of the router. The router developed in this project can under normal traffic conditions handle a throughput of 16x10Gbit/s without dropping packets. This core router is good enough to compete with the top of the line single-chip core routers on the market today. The advantage of this architecture compared to others is that it is very flexible when it comes too adding new functionality. The general on-chip network also reduces the design time of this system.

Nyckelord Keyword OCN, SoCBUS, intellectual property, router, switch, Internet

Page 6: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical
Page 7: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Abstract

The evolving technology has over the past decade contributed to a bandwidth explosionon the Internet. This makes it interesting to look at the development of the workhorsesof the Internet, the core routers. The main objective of this project is to develop a 16port gigabit core router architecture using intellectual property (IP) blocks and a SoCBUSon-chip interconnection network.

The router architecture will be evaluated by making simulations using the SoCBUSsimulation environment. Some changes will be made to the current simulator to make thesimulations of the core router more realistic. By studying the SoCBUS network load thebottlenecks of the architecture can be found. Changes to the router design and SoCBUSarchitecture will be made in order to boost the performance of the router.

The router developed in this project can under normal traffic conditions handle athroughput of 16x10Gbit/s without dropping packets. This core router is good enoughto compete with the top of the line single-chip core routers on the market today. The ad-vantage of this architecture compared to others is that it is very flexible when it comes toadding new functionality. The general on-chip network also reduces the design time ofthis system.

i

Page 8: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

ii

Page 9: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Abbreviations

ACL Access Control ListBGP Border Gateway ProtocolCAM Content Addressable MemoryCRC Cyclic Redundancy CheckFT Forwarding TableIPP Input Packet ProcessorIP Intellectual PropertyIP Internet ProtocolIPv4 Internet Protocol version 4LAN Local Area NetworkLPM Longest Prefix MatchMAN Metropolitan Area NetworksMLPS Mega Lookups Per SecondMPLS Multi Protocol Label SwitchingMPPS Mega Packet Per SecondMU Multicast UnitNP Network ProcessorOC-3 SONET Optical Carrier at 155Mbit/sOC-48 SONET Optical Carrier at 2.5Gbit/sOC-192 SONET Optical Carrier at 10Gbit/sOPP Output Packet ProcessorOSI Open Systems InterconnectionOSPF Open Shortest Path FirstPB Packet BufferQoS Quality of ServiceSNMP Simple Network Management ProtocolSONET Synchronous Optical NetworkTCP Transmission Control ProtocolTTL Time To LiveUDP User Datagram ProtocolWAN Wide Area Network

iii

Page 10: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

iv

Page 11: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Contents

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Computer Networks 32.1 Protocol Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 OSI Reference Model . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Internet Reference Model (TCP/IP) . . . . . . . . . . . . . . . . 4

2.2 Network entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Network Processing Tasks . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.4 Data manipulation . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.5 Queue management . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.6 Control processing . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 SoCBUS on-chip network 93.1 SoCBUS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.2 Packet connected circuit (PCC) . . . . . . . . . . . . . . . . . . . . . . . 103.3 Behavioral simulation environment . . . . . . . . . . . . . . . . . . . . . 113.4 Specification of the current implementation . . . . . . . . . . . . . . . . 12

4 Core router design 134.1 Design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Partitioning of functionality . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2.1 Input packet processor (IPP) . . . . . . . . . . . . . . . . . . . . 144.2.2 Output packet processor (OPP) . . . . . . . . . . . . . . . . . . . 144.2.3 Forwarding table (FT) . . . . . . . . . . . . . . . . . . . . . . . 144.2.4 Packet buffer (PB) . . . . . . . . . . . . . . . . . . . . . . . . . 15

v

Page 12: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

vi Contents

4.2.5 Multicast unit (MU) . . . . . . . . . . . . . . . . . . . . . . . . 154.2.6 Central processing unit (CPU) . . . . . . . . . . . . . . . . . . . 15

4.3 On-chip traffic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.1 Data path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.2 Control path . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4 Block interconnection using SoCBUS . . . . . . . . . . . . . . . . . . . 174.4.1 Number of IPP/OPPs and PBs . . . . . . . . . . . . . . . . . . . 184.4.2 General SoCBUS network structure . . . . . . . . . . . . . . . . 184.4.3 Motivation of block placement . . . . . . . . . . . . . . . . . . . 19

4.5 Extraction of execution time for the functional blocks . . . . . . . . . . . 194.5.1 Input and Output Packet Processor (IPP/OPP) . . . . . . . . . . . 194.5.2 Forwarding Table (FT) . . . . . . . . . . . . . . . . . . . . . . . 194.5.3 Packet Buffer (PB) . . . . . . . . . . . . . . . . . . . . . . . . . 194.5.4 Multicast Unit (MU) . . . . . . . . . . . . . . . . . . . . . . . . 204.5.5 Central Processing Unit (CPU) . . . . . . . . . . . . . . . . . . . 20

5 Traffic simulations of the initial design 215.1 Internet traffic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.1.1 Minimum size packets . . . . . . . . . . . . . . . . . . . . . . . 215.1.2 Evenly distributed (RFC2544) . . . . . . . . . . . . . . . . . . . 225.1.3 Internet Mix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2 Improvements in the SoCBUS simulator . . . . . . . . . . . . . . . . . . 225.2.1 Implementation of a model for discrete distributions . . . . . . . 225.2.2 Implementation of dependencies between SoCBUS traffic . . . . 23

5.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3.1 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3.2 SoCBUS router lock . . . . . . . . . . . . . . . . . . . . . . . . 265.3.3 SoCBUS wrapper send lock . . . . . . . . . . . . . . . . . . . . 275.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6 Second router design 316.1 Improvements in the design . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.1.1 Several forward tables . . . . . . . . . . . . . . . . . . . . . . . 316.1.2 More SoCBUS switches . . . . . . . . . . . . . . . . . . . . . . 316.1.3 SoCBUS bus width . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 Complete design after improvements . . . . . . . . . . . . . . . . . . . . 326.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.3.1 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336.3.2 SoCBUS router lock . . . . . . . . . . . . . . . . . . . . . . . . 346.3.3 SoCBUS wrapper send lock . . . . . . . . . . . . . . . . . . . . 356.3.4 SoCBUS transfer overhead . . . . . . . . . . . . . . . . . . . . . 366.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Page 13: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Contents vii

7 Final router design 397.1 Improvements in the design . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.1.1 More packet buffers and forward tables . . . . . . . . . . . . . . 397.1.2 Changes in PCC . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.2 Complete design after improvements . . . . . . . . . . . . . . . . . . . . 407.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.3.1 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417.3.2 SoCBUS router lock . . . . . . . . . . . . . . . . . . . . . . . . 427.3.3 SoCBUS wrapper send lock . . . . . . . . . . . . . . . . . . . . 437.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7.4 New requirements on the functional blocks . . . . . . . . . . . . . . . . 447.4.1 Input and output packet processors (IPP/OPP) . . . . . . . . . . . 447.4.2 Packet buffer (PB) . . . . . . . . . . . . . . . . . . . . . . . . . 447.4.3 Forwarding table (FT) . . . . . . . . . . . . . . . . . . . . . . . 45

8 Conclusions 478.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

A Dependency support in the SoCBUS simulator 51

B Supplementary results 53B.1 Initial design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53B.2 Second design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54B.3 Final design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Page 14: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

viii Contents

Page 15: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 1

Introduction

1.1 Background

The evolving technology has over the past decade contributed to a bandwidth explosionon the Internet. Today the Internet traffic doubles every six months [6]. This rapid changeforces the actors in the networking business to act fast and reduce the time to market to befirst with a new generation of products. One way of decreasing the time to market is toincrease the (re-)use of of Intellectual Property (IP) blocks.

SoCBUS is a research project at Linkoping University that started in 1999. The aimof this project is to develop a bus system that provides the data and control connectionsbetween different IP blocks on one chip. This way of designing chips using IP blocks andSoCBUS will be a tool for the engineers to even further reduce the time to market.

One of the most important building blocks of the Internet is the core routers. Today thebandwidth achieved using fiber optics is much higher than the speed achieved by routersso the limiting factor in the Internet today is the routers. This fact makes the developmentof routers very interesting to look at.

The task of this project is to design a router chip on a system level using IP blocksand a SoCBUS on-chip interconnection network to connect the different blocks. Severalbenchmarks will be performed to find out the performance of the router and to evaluatethe SoCBUS architecture.

1.2 Objectives

The main objective of this thesis is to design and evaluate a 16 port IP version 4 core routerarchitecture developed using IP blocks and the SoCBUS on-chip interconnection network.The router should fulfill the requirements specified in RFC1812 [2]. A number of goalswere defined.

• Evaluate different router architectures.

1

Page 16: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

2 Introduction

• Divide the router functionality into functional blocks that can be implemented as IPblocks.

• Use SoCBUS to make the interconnection between the different IP blocks.

• Perform benchmarks of the router to find bottlenecks in the design of the router andin the present SoCBUS architecture.

• Refine the router design and SoCBUS architecture to boost the performance of therouter in terms of higher throughput and lower latency.

1.3 MethodThe first thing to do is to study present router architectures to find an architecture thatfits this type of implementation, or come up with a new type of architecture. When thetheoretical design of the router is finished it has to be mapped into SoCBUS. This is doneby describing the design in the SoCBUS simulator environment. To make realistic bench-marks of the design typical Internet traffic is used as input to the simulator. The result fromthe simulations will be analyzed and used to boost the performance of the router in termsof throughput and latency. This will be achieved by changing the design of the router andby making some changes to the current SoCBUS architecture.

1.4 Thesis outlineChapters 2 and 3 will give the reader some background information about the technologiesused in this project. Chapter 4 will describe the development of the initial router architec-ture. The results from simulations of this architecture are presented in chapter 5. Chapters6 and 7 describe the process of refining the router model to improve the performance.Conclusions from the work are drawn in chapter 8.

Page 17: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 2

Computer Networks

This chapter will give a brief introduction to packet based computer networks. The mostcommon network protocols and network entities will be described briefly. If you seekfurther knowledge in this field the book Computer Networks [5] is a good starting point.

2.1 Protocol LayersTo help getting a better understanding of computer networks most networks are organizedas a stack of layers. Each layer can only send information to the next higher or lower layer.To exchange information between peer layers at different network nodes a header can beadded to the data on the sending side. When the packet is received on the receiver side thecorresponding layer examines the header.

2.1.1 OSI Reference ModelThe Open Systems Interconnection (OSI) is a standardized way of describing protocollayers. It is composed of seven abstract layers as illustrated in figure 2.1.

Host A Host B

Application Application

Presentation Presentation

Transport

Session Session

Transport

Network Network

Data Link Data Link

PhysicalPhysical

Layer

7

6

5

4

3

2

1

Physical Link

Figure 2.1. The OSI reference model.

3

Page 18: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

4 Computer Networks

2.1.2 Internet Reference Model (TCP/IP)

Although the OSI model is widely used and often referenced to the explosive developmentof the Internet has made the TCP/IP protocol stack totally dominant. TCP/IP use a fourlayer scheme composed of the layers presented below. Some examples of protocols andnetworks used in the TCP/IP model are shown in figure 2.2.

SATNET ARPANET

DNS FTP

UDPTCP

IP

LAN

SMTP

1+2

3

4

7

OSI Layer

Figure 2.2. Protocols and networks in the TCP/IP model.

Link Layer

This layer defined the network hardware and device drivers. In this layer different proto-cols are used depending on the size of the network. When the size of a computer networksis described one often refers to LAN (Local Area Network), MAN (Metropolitan AreaNetworks) and WAN (Wide Area Network). Table 2.1 shows the protocols typically usedin the different networks.

Network size Link layer protocolsLAN EthernetMAN Ethernet or packet over SONETWAN Packet over SONET

Table 2.1. Data link layer protocols used at different network sizes.

Typical speeds for Ethernet is 100Mbit/s or 1Gbit/s while packet over SONET typicallyruns at 155Mbit/s (OC-3), 2.5Gbit/s (OC-48) or 10Gbit/s (OC-192). These speeds can begood to have in mind because the router design in this project will have to support one orseveral of these standards.

Network Layer

This layer handles simple communication with neighbors on the local network. IP is atypical network layer protocol that is used in the Internet. This protocol adds the possibilityto identify each computer on the network using an arbitrary id called IP address.

Page 19: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

2.2 Network entities 5

Transport Layer

The transport layer handles communication between the actual source and destination evenif the computers are not on the same local network. Typical transport layer protocols usedon the Internet is TCP and UDP.

Application Layer

This is the end-user application encapsulation. Typical application layer protocols areDNS, HTTP and SMTP.

2.2 Network entities

The entities present on the Internet can be divided into two major groups, routers and ter-minals. Furthermore routers that connect other routers with high bandwidth in backbonenetworks are normally denoted core routers. At the edge of the Internet edge routers pro-vide an access point between the local network and the Internet. The structure used on theInternet is shown in figure 2.3. Network terminals can be anything from desktop comput-ers to file server systems. Wireless applications like mobile phones are also classified asnetwork terminals.

Internet core

Edge router

Core router

Terminal

Figure 2.3. Typical network structure used on the Internet.

Page 20: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

6 Computer Networks

2.2.1 Routers

The Internet is a packet-switched communication network based on the IP protocol. Thismeans that no dedicated communication channels from source to destination are created.Each network entity receive, store, and forward the packet to the closest entity along thepath towards the destination host.

Routers can operate at different layers in the protocol stack. Layer 2 routers are com-monly denoted as switches and make all routing decisions based on information in the OSIlayer 2 protocol header while layer 3 routers are commonly denoted as routers and makeall routing decisions based on information in the OSI layer 3 protocol header. The basicfunctionality of Internet routers can be divided into the following parts.

Packet forwarding

The packet forwarding can be divided into two different parts, unicast and multicast. Withmulticast the incoming packet is forwarded to one or several output ports while unicast canonly be forwarded to one output port. The packet should be forwarded so that the packeteventually reach their destination(s). To decide the output port(s) to which the packet willbe forwarded the router examines the incoming packet header. By using the destinationaddress as an index into the routing table the router can find the most appropriate outputport(s). Because of the time limits of this work the multicast standard will not be discussedin more detail.

Route Processing

Because of the dynamic properties of Internet, routers implement different routing pro-tocols to share connectivity information and maintain routing tables. This information isneeded to make correct decisions when a packet is forwarded.

The routing on the Internet is divided into two layers. On each local network an interiorgateway routing protocol is used by the routers to determine the best way to the destina-tion. These routing algorithms can be grouped into two major classes: nonadaptive andadaptive. Nonadaptive algorithms do not base their routing decisions on measurements orestimations of the current traffic and network topology. Instead the choice of route is staticand is downloaded to the routers when the network is booted. Adaptive algorithms changetheir routing decisions to reflect the changes in the network and traffic. One of the mostwidely used adaptive routing algorithm is OSPF (Open Shortest Path First).

Between the different networks, in the core of the Internet, an exterior gateway routingprotocol is used. This protocol connects the different service providers using a unique IDfor each network called Autonomous System (AS). Because this traffic involves for ex-ample crossing international borders or being forwarded through another service providerthe exterior gateway protocol needs to be very flexible when it comes to routing policies.The protocol used on the Internet today is called BGP (Border Gateway Protocol) and isdesigned to allow many different kinds of routing policies.

Page 21: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

2.3 Network Processing Tasks 7

Router special services

Tasks that fall into this category are filtering, traffic prioritizing, authentication and net-work management using for example SNMP (Simple Network Management Protocol).These services are not critical for the basic router functionality and will not be furtherdiscussed in thesis.

2.3 Network Processing Tasks

Processors used in network entities are generally called network processors. The tasksgenerally performed by network processors are described below.

2.3.1 Classification

To make a decision on how to process the incoming packets each packet has to be clas-sified. The classification consists of both pattern matching and field value extraction. Inrouters you typically want to match the destination address against the access control list(ACL) to see if the packet should be forwarded or not. The pattern matching is performedeither by calculation or lookup tables.

2.3.2 Lookup

The lookup consists of looking up data based on a key, but is often used in conjunctionwith pattern matching to find one unique entry in the table. The most common applicationof lookup in the network processor domain is the route lookup. Based on data in the packetheader the destination port and/or address is calculated. For MPLS and ATM the mappingis often one to one and only one lookup is required, but IPv4 and IPv6 require LongestPrefix Matching (LPM). Tree like data structures are often used to efficiently store thetable and to speed up the lookup.

2.3.3 Computation

The most common calculations performed in a network processor is calculation and/or up-dating of the header checksum or Cyclic Redundancy Check (CRC). With the new supportfor authentication encryption and decryption algorithms sometimes needs to be applied onthe entire packet.

2.3.4 Data manipulation

Any modification of the packet header is classified as data manipulation. This could forexample be the TTL-field in IPv4 that for every hop has to be decremented by one.

Page 22: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

8 Computer Networks

2.3.5 Queue managementThe queue management is the scheduling and storage of the packets inside the networkprocessor. The queue management kernel is responsible for traffic priorities, traffic shap-ing and other Quality of Service (QoS) applications.

2.3.6 Control processingControl processing consists of several tasks, for example synchronization of the differentparts of the design, the gathering of statistics and routing table updates. These tasks aregenerally performed by a general purpose processor.

Page 23: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 3

SoCBUS on-chip network

This chapter will give an introduction to the SoCBUS on-chip interconnection networkdeveloped at Linkoping University [11]. To clarify the SoCBUS concept the terms Systemon Chip (SoC) and Network on Chip (NoC) will first be introduced.

The continuing development in modern electronics enables increasingly larger systemsto be integrated on single chips. This is called Systems-on-Chip (SoC). When more andmore different functionalities are added to a chip the need for a flexible on-chip bus systemwith support for multiple and simultaneous connections is vital. One way of achieving thisis to add a general network that enables the different functional blocks to communicatewith each other. This feature is called Network-on-Chip (NoC).

3.1 SoCBUS overview

The SoCBUS on-chip network consists of a number of switches that can be connected toeach other using any network topology. Each switch is connected to one functional blockor IP core through a wrapper and an arbitrary number of neighboring switches. Figure 3.1shows an example of the switch interconnection when a 2D mesh network is used. Anexample of a 2D mesh network is shown in figure 3.2.

Switch

IP Core

Wrapper

Figure 3.1. Network connected processing tile.

9

Page 24: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

10 SoCBUS on-chip network

Wrapper Wrapper

WrapperWrapper

IP Core IP Core

IP CoreIP Core

Wrapper Wrapper

WrapperWrapper

IP Core IP Core

IP CoreIP Core

Wrapper Wrapper

WrapperWrapper

IP Core IP Core

IP CoreIP Core

Wrapper Wrapper

WrapperWrapper

IP Core IP Core

IP CoreIP Core

Figure 3.2. Overview of a SoCBUS network.

The routing decisions are made in the switches and are implemented using a staticrouting table. Each entry in the table shows the possible outputs that will take the routecloser to the destination. This information is combined with the dynamic state of the outputports to select the appropriate output for a routing request.

The interface between the network and the IP blocks are the wrappers. The wrappershandle format conversion, necessary buffering, asynchronous clock domain bridging andnetwork signaling.

3.2 Packet connected circuit (PCC)The SoCBUS network uses a novel style circuit switching called Packet Connected Circuit(PCC). When data is sent from one IP block to another a request packet first traverses thenetwork to find the way to the destination. While doing this the packet path is locked andused as a circuit connection for the packet payload. If the route can not be established therequest is sent back to the source and all switches in that path will be unlocked. Once theconnection is established the data is sent. The last data packet will unlock the switches onthe way to the destination. Figure 3.3 shows the PCC connection scheme.

Page 25: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

3.3 Behavioral simulation environment 11

Retry

Source Dest

Request

Source Dest

Request

Ack

Transfer

Cancel

Ack

Cancel

Transfer

(b) Second−try successful(a) First−try successful

nAck

Figure 3.3. PCC transfer.

3.3 Behavioral simulation environmentA complete simulation environment has been developed for making simulations of theSoCBUS on-chip interconnection network. This simulation environment consists of twoparts, a stimuli generator and the actual simulator.

The stimuli generator is a tool for creating interesting traffic patterns used as inputto the simulator. Several mathematical models can be used to specify different trafficproperties like start time and size of packet. The input to the stimuli generator is describedusing XML.

The simulator performs the actual simulation of the network. The output of the stimuligenerator together with a description of the network structure is given as input. The sim-ulator is event based and all components like routers, links, sources and destinations areimplemented as compiled-in behavioral models. The output of the simulator consists ofdifferent measurements of the network. This could for example be the lock time of eachSoCBUS switch. More information about the simulation environment can be found in theSoCBUS Simulator manual [12]. The general SoCBUS simulation flow is shown in figure3.4.

Page 26: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

12 SoCBUS on-chip network

Componentmodels

Results

Stimuli file Network model

Stimuli generator

Traffic model

Simulator

Figure 3.4. SoCBUS simulation flow.

3.4 Specification of the current implementationThe properties of the current simulation environment are based on a real implementationof the switches used in the SoCBUS on-chip network. The bus width is set to 16 bits ineach direction and is clocked at 1.2GHz. The latency in the switches is 6 cycles during theroute setup and 1 cycle for data transfer.

Page 27: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 4

Core router design

In this section the initial router design will be described. The different functional blocksused in the design are introduced and one mapping of functional blocks onto a SoCBUSnetwork will be presented.

4.1 Design flowThe design flow used in this project can be described in the following six points.

1. Partition the router functionality into several functional blocks.

2. Write down the specifications of each block and find a way of implementing themthat fulfills the requirements on time and functionality.

3. Extract the execution time of each function in each functional block to establish adatabase for system timing and scheduling.

4. Connect the different blocks using the SoCBUS on-chip network.

5. Make simulations of the SoCBUS network using realistic traffic.

6. Make modifications in the design until the design fulfills the requirements on timeand functionality.

Number five and six may have to be iterated several times to fulfill the requirements onperformance or to boost the performance beyond the requirements.

4.2 Partitioning of functionalityThis router will be implemented as one main chip that performs the actual routing and oneor several other chips that will be needed for the physical interfaces and for processingof the media dependent protocols. The later chips will not be discussed further since thisthesis only focus on the development of a core routing kernel.

13

Page 28: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

14 Core router design

To make use of the SoCBUS on-chip network architecture the functionality has tobe divided into different functional blocks. A number of functional blocks have beenidentified and their main functionality is described below.

4.2.1 Input packet processor (IPP)There is one input packet processor (IPP) for each network input port. The IPP is respon-sible for the identification and verification of the incoming packet. Each IPP contains anaccess control list (ACL) which contains rules about which packets that are allowed to beforwarded from this input port. Routing or control packets will be directly forwarded tothe CPU and multicast packets will be forwarded to the multicast unit. The processing ofunicast packets consists of the following parts:

• Receive packet from input network interface.

• Associate the packet with a unique ID (32bits)

• Validate IPv4 header (TTL and header checksum).

• Filter packets via the Access Control List (ACL).

• Send the destination IP address together with the packet ID to the forward table.

• Send packet data and ID to the corresponding packet buffer.

4.2.2 Output packet processor (OPP)There is one output packet processor (OPP) for each network output port. The OPP isresponsible for the updating of packet headers, including calculation of checksum andCRC. The processing of a packet consists of these parts:

• Receive packet from buffer.

• Update IPv4 header (TTL and header checksum).

• Send packet to the output network interface.

4.2.3 Forwarding table (FT)The forward table (FT) block is responsible for the address lookups. A typical lookupconsists of the following parts:

• Receive packet destination IP address and ID from the IPP.

• Perform lookup.

• Send the result of the lookup (next-hop and output port) together with the packet IDto the corresponding buffer.

Page 29: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

4.2 Partitioning of functionality 15

4.2.4 Packet buffer (PB)The packet buffer stores the packet until the lookup result from the FT is received andthe corresponding output port is available. The general data flow is shown in figure 4.1.Because of the bandwidth limit in the SoCBUS network there will be no central packetbuffer but each packet buffer will be responsible for a number of input packet processors.These are the tasks of the PB:

• Receive packet from the IPP.

• Save packet data to the buffer memory.

• When the next-hop address and output port has been received from the FT, the next-hop is bundled with the packet and sent to the OPP which is responsible for thegiven output port.

OPPPB

FT

IPP

Figure 4.1. Traffic flow.

4.2.5 Multicast unit (MU)The multicast unit (MU) is responsible for delivering multicast packets. These are thetasks of the MU:

• Receive multicast packets from the IPP.

• Send routing requests to the FT.

• When the next-hop addresses and output ports has been received from the FT, thenext-hop address(es) are bundled with the packet(s) and sent to the OPP(s) whichare responsible for the given output port(s).

4.2.6 Central processing unit (CPU)The CPU will be responsible for the tasks that have no dedicated block. These are thetasks of the CPU:

• Configure and synchronize the functional blocks.

• Handle routing packets (BGP and OSPF) and distribute route information to theforward table(s).

Page 30: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

16 Core router design

• Handle control packets dedicated for the router. This could for example be SNMPrequests.

• Gather statistics from the different blocks.

4.3 On-chip traffic modelIn this section the general SoCBUS traffic flow, independent of the Internet traffic model,will be described. The traffic flow is divided into two different types, data path and controlpath.

4.3.1 Data pathThe data path consists of transferring the incoming packets to the correct output port de-pending on the packet header. In the typical case the packet is forwarded to only oneoutput port it could also be forwarded to several. This feature is called multicast. Multi-cast packets will not be taken into consideration at this point of this study.

The data path traffic flow is critical and determines the speed of the router. A moredetailed description of the data path is given below. Each sending is associated with atraffic number.

1. Send packet payload from IPP-i to PB-j.

2. Send destination IP address and packet ID from IPP-i to FT.

3. Send nexthop address and output port from FT to PB-j.

4. Send packet payload and nexthop from PB-j to OPP-k.

By looking at the description of the data path one realize that there are dependenciesbetween the different transfers. These dependencies are described below.

• Traffic number 3 depends on the completion of number 2 and that the route lookupis finished.

• Traffic number 4 depends on number 1 and 3.

To simplify the understanding of the traffic flow it can be described using flow graphs.Figure 4.2 describes the data path traffic flow of an 8 port router. The numbers present onthe arrows are traffic numbers.

Page 31: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

4.4 Block interconnection using SoCBUS 17

FT PB2PB1

IPP1−4 IPP5−8

OPP1−16

22

3 3

11

4 4

Figure 4.2. Traffic flow, 8 port router.

4.3.2 Control path

Traffic belonging to the control path is not critical for the router functionality in the samedirect way as the data path traffic. The control path traffic is given below.

• Routing packets dedicated for the router. These packets are processed by the CPU.

• Routing updates from the CPU to the forward table.

• Configuration of the different blocks. This is done by the CPU.

• Gathering of statistics from the different blocks.

4.4 Block interconnection using SoCBUS

In this section the task of connecting the different functional blocks using the SoCBUSinterconnection network will be described.

A general view of the core router SoCBUS network is given in figure 4.3. In this figurethere can be several instances of the input/output packet processors and packet buffers.

OPPIPP PB

SoCBUS interconnect

MUCPU FT

Figure 4.3. Initial core router block diagram.

Page 32: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

18 Core router design

4.4.1 Number of IPP/OPPs and PBs

In the current architecture the number of input and output packet processors are simplydetermined by the number of ports on the actual router. If the line speed of the routergets very high one possibility to boost the performance is to let several input/output packetprocessors serve one port. The number of packet buffers needed is more difficult to de-termine. If one global packet buffer is used one realizes that the throughput in the packetbuffer will be equal to the aggregate bandwidth of the router. This would in the case ofa 16 port gigabit router be 16Gbit/s in each direction. This could be compared with themaximum data throughput in one SoCBUS node that is 19Gbit/s. This throughput is thetheoretical maximum and will in reality never be achieved because of the overhead in thePCC protocol. This will further be discussed in the section covering Internet traffic mod-els. The number of packet buffers in the initial design was determined by simulations tofour.

4.4.2 General SoCBUS network structure

In SoCBUS you have the freedom to choose any kind of network structure to connect thedifferent blocks. In this design a 2D mesh network is used to connect the blocks. A 2Dnetwork is easy to understand and to get a view of. It is also easy to physically implementthe wiring and switches when using a 2D mesh structure. The SoCBUS network is shownin figure 4.4.

IPP

IPP

1IPP

IPP 1IPP

2

3 4 5 6 7 8

16

OPPOPPOPPOPPOPPOPPOPP

OPP OPP OPP OPP OPP OPP OPP 16 15 14 13 12 11 10

2 3 4

IPP IPP

9

PB PB PB PB

MU

FT

CPU

IPP 3

IPP 4

IPP 5

IPP 6

7IPP

IPP 8

OPP 1

IPP 9

IPP 10 11

12 13 14

15

IPPOPP 2

Figure 4.4. SoCBUS network for the initial design.

Page 33: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

4.5 Extraction of execution time for the functional blocks 19

4.4.3 Motivation of block placement

The current simulation environment does not provide any optimization of block placementso the task of placement optimization is very much based on “trial and error”. There arehowever some rules of thumb when designing the network. Blocks that communicate alot with each other should be placed close to each other. By doing this the transfer willfinish faster and not so many SoCBUS nodes will be locked during the transfer. Anotherimportant thing is to try to distribute the network load over the whole network. Severalplacement strategies had to be tested in the traffic simulator to find a good one. The finalmapping is shown in figure 4.4.

4.5 Extraction of execution time for the functional blocksThis thesis does not focus on details on the implementation of the different functionalblocks, instead modeling of the execution time and dependencies between the differentblocks is more interesting. Still the requirements on the blocks have to be realistic. Inthis section the requirements in terms of execution time on the functional blocks will bediscussed. In some cases a reference design will be given to ensure that the requirementsare realistic.

4.5.1 Input and Output Packet Processor (IPP/OPP)

To handle a speed of 1Gbit/s per port the execution time for this block have to be lessthan 672ns assuming the worst case scenario of only minimum size packets. The function-ality of the port processors is very much the same as the packet processor for terminalsimplemented by Ulf Nordqvist [8].

4.5.2 Forwarding Table (FT)

The worst case scenario for this block is also when the input only consists of minimumsize packets. Because this block supplies all traffic streams with information about nexthop and output port the execution time for this block is 16 times lower than for the inputand output port processors, thus 42ns. This corresponds to a lookup rate of approximately24MLPS. In these calculations the routing updates triggered by the CPU which processthe routing protocol requests are neglected. These updates takes long time compared tothe lookups.

4.5.3 Packet Buffer (PB)

This block consists of a control block, index structure and a buffer memory. In this designfour packet buffers will be used to serve the 16 input ports. The total execution time forboth saving and fetching data from the memory is approximately 168ns. In this case itmight be more interesting to look at the bandwidth needed to see if any memories fulfillthe requirements. The average bandwidth needed for the buffer memory in this block

Page 34: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

20 Core router design

approximately 8Gbit/s because data will have to be both saved to the memory and laterfetched to be sent to the output port.

4.5.4 Multicast Unit (MU)The multicast unit has basically the same functionality as the packet buffer. The onlydifference is that the multicast unit forwards the packet to several output packet processorsinstead of one. For simplicity the other requirements on the multicast unit is the same asfor the packet buffer.

4.5.5 Central Processing Unit (CPU)The CPU is not part of the data path so the functions are not critical for the actual routing.This fact makes the timing constraints of the CPU uninteresting at this moment. A generalpurpose processor like ARM940 [1] could be used to fulfill the requirements for this block.

Page 35: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 5

Traffic simulations of the initialdesign

This chapter will first introduce the reader to different Internet traffic models. To be able todescribe these traffic models in the SoCBUS simulation environment some improvementshad to be made to the simulator. These changes will be briefly described in this chapterand more details can be found in appendix A. The simulation results will be presented anddiscussed. A deeper discussion about possible improvements of the design can be foundin the next two chapters.

5.1 Internet traffic modelTo be able to make realistic simulations of the purposed router architecture it is importantto use traffic patterns that reflect the actual traffic in the core of the Internet today. A goodway of describing traffic patterns on the Internet is to describe it as a discrete distributionof the most common packet sizes. Today this is also the most common way of describingInternet traffic during core router benchmarks. The different traffic distributions used inthis project are described below.

5.1.1 Minimum size packetsIn general a traffic pattern consisting of only minimum size packet generated the highestload on a router. The reason for this is that for every packet that arrives at the input port ofthe router a lookup has to be performed to determine the output port to which the packetshould be forwarded. In reality this traffic pattern in not really realistic but it is still a goodway of measuring the performance of the router.

For ordinary IP packets over Ethernet the minimum size Ethernet packet is 64 bytes.Because this core router chip only works at OSI protocol layer 3 and higher the actualpacket data shrinks to 40 bytes. This is the actual amount of data that will be sent throughthe SoCBUS network.

21

Page 36: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

22 Traffic simulations of the initial design

5.1.2 Evenly distributed (RFC2544)RCF2544 defines a number of tests used to measure the performance of a network de-vice [3]. In this memo different packet distributions for different physical mediums arepurposed. Benchmarks for Ethernet network devices should use packet sizes evenly dis-tributed across 64, 128, 256, 512, 1024, 1280 and 1518 bytes.

5.1.3 Internet MixThe most realistic way of benchmarking the router would of course be to use real trafficcaptured on the Internet. Newman [4] observed live packet flows from the core of the Meritnetwork. The result from the observations was a distribution of packet sizes that he calledthe Internet mix. This traffic pattern has more or less become standard in benchmarking ofcore Internet devices. The following table shows the probability for different packet sizesaccording to Internet mix.

Probability Packet size56% 40 bytes23% 1500 bytes17% 576 bytes5% 52 bytes

Table 5.1. Distributions of packet sizes using the Internet mix.

5.2 Improvements in the SoCBUS simulatorThe current implementation of the SoCBUS simulation environment has no support fordescribing traffic using a discrete distribution. There is no support for dependencies be-tween different SoCBUS transfers either. To be able to make an accurate description ofthe traffic flow in the SoCBUS network and by that get more realistic simulation results,these features were implemented. A short description of the improvements will be givenin the following section. More implementation details will be given in appendix A.

5.2.1 Implementation of a model for discrete distributionsThe typical way of describing a traffic pattern is to associate different values with differentprobabilities. The model implemented is called “discrete” and is completely implementedin the stimuli generator. The new model was implemented using the GNU Scientific Li-brary [7]. Figure 5.1 shows how the implemented model could be used when specifyingthe network stimuli using the XML format. This example defines that the value 40 is as-sociated with the weight 56 and the value 1500 is associated with the weight 23. This isactually the beginning of a definition of the Internet mix described above, where the valuespecifies the packet size in bytes.

Page 37: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

5.2 Improvements in the SoCBUS simulator 23

MODEL_NAME PARAM

PARAM_NAME

PARAM

MATH_MODEL

VALUE

PARAM_NAME PARAM_VALUE

WEIGHT

DISCRETE PARAM_VALUE

PARAM

PARAM_VALUEPARAM_NAME

A 56

PARAM

PARAM_VALUEPARAM_NAME

A 23

PARAM

PARAM_VALUEPARAM_NAME

A 40

PARAM

PARAM_VALUEPARAM_NAME

A 1500

Figure 5.1. Example of discrete mathematical model.

5.2.2 Implementation of dependencies between SoCBUS trafficTo be able to describe the internal traffic as a data flow with dependent traffic some newfunctionality had to be added to the SoCBUS simulation environment. Before this changethe only way of defining start of a sending between two blocks in the SoCBUS networkwas to define the start time. With this new feature one sending could also be triggered bythe completion of another transfers completion. This feature makes it possible to describethe traffic flow shown in figure 4.2.

To make it possible to describe the dependencies in the input to the stimuli generatorsome changes had to be made to the XML format. A new kind of task that defines thedependencies was added. The new XML format is described in figure 5.2 and 5.3. To geta deeper understanding of the stimuli generator it is recommended to read the thesis byJoakim Wallin [9].

WORKING

EVENT_NAME

VALUE

EVENT_POSITION

MATH_MODEL

EVENT_LENGTH

MATH_MODEL VALUE VALUE

Figure 5.2. Stimuli task working.

EVENT_NAME

DEPENDENCY

VALUE

EVENT_DEPENDENCY

VALUE

EVENT_DELAY

VALUE

EVENT_LENGTH

VALUEMATH_MODEL

Figure 5.3. Stimuli task dependency.

Page 38: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

24 Traffic simulations of the initial design

5.3 Simulation resultsNow that all the tools for making accurate simulations are available it is time to performthe actual simulations and determine the performance of this router architecture. It is alsoimportant to find possible bottlenecks to make it possible to refine the design. First of allthe general conditions for the simulations are defined.

The simulation time is set to 1ms and for consistency this value will be used during allsimulations. All simulations made in this chapter will be based on the network shown infigure 4.4 and with the specifications of SoCBUS shown in chapter three. The throughputshown in the graphs is the actual throughput of each line port connected to the router.During all simulations the throughput is the same at all input ports. No delays are specifiedin the different functional blocks which imply that the latency from input port to outputport is only related to the SoCBUS network latency.

5.3.1 ThroughputTo determine the maximum throughput the router can handle without dropping or delayingpackets to much a series of simulations are performed at different throughputs. A goodway of determine the maximum speed the network can handle is to measure the time ittakes for a packet to be sent from the input port to the correct output port. A sudden riseof the network latency shows that the network is getting highly loaded.

Minimum size packets usually generates the highest load on routers so it is no surprisethat the performance in terms of throughput it much lower than for other distributions ofpacket sizes. Figure 5.4, 5.5 and 5.6 show the SoCBUS network latency for minimum sizepackets, evenly distributed packets and for Internet mix. The latency is measured from theinput packet processor to the correct output packet processor.

0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6150

155

160

165

170

175

180

185

190

195

Late

ncy

(ns)

Throughput per port(Gbit/s)

Figure 5.4. Latency using minimum size packets.

Page 39: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

5.3 Simulation results 25

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2600

800

1000

1200

1400

1600

1800

2000

2200

Late

ncy

(ns)

Throughput per port(Gbit/s)

Figure 5.5. Latency using even distribution.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8500

1000

1500

2000

2500

3000

Late

ncy

(ns)

Throughput per port(Gbit/s)

Figure 5.6. Latency using Internet mix.

Page 40: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

26 Traffic simulations of the initial design

5.3.2 SoCBUS router lock

In this section the SoCBUS network activity for different traffic distributions will be pre-sented using different 3D plots. In all these 3D graphs the x axis represents the horizontalcount of SoCBUS nodes beginning from the left. The y axis represents the vertical countof SoCBUS nodes starting from the top. This means that the upper left corner of theSoCBUS network will appear down to the left in the 3D graph.

The SoCBUS router lock describes the amount of time that each SoCBUS switch, ornode as they sometimes are called, has been locked for transfers. This measure takes allfive ports (up, down, left, right and wrapper) into consideration and is calculated as a meanvalue of the lock time associated with the different ports. Because the network load arevery similar for even distribution and Internet mix only the results for Internet mix will bepresented from now on. The rest of the results are given in appendix B. All results in thissection are at the maximum throughput that the router can handle using that specific trafficmodel.

Figure 5.7 and 5.8 shows the router lock using minimum size packets and Internet mixat 0.6Gbit/s and 1.8Gbit/s respectively. The two graphs look very much the same thoughthere are some interesting differences to notice. One difference is that the traffic is verymuch concentrated to the center of the network. In the case of minimum size packetsthe forward table has the highest router lock while in the case of Internet mix the packetbuffers has the highest lock. This is what can be expected because smaller packet sizeimplies that more packets have to be processed by the forward table at the same speed.

12

34

56

7

1 2 3 4 5 6

0

5

10

15

20

25

30

yx

Lock

tim

e (%

)

Figure 5.7. Router lock using minimum size packet.

Page 41: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

5.3 Simulation results 27

12

34

56

7

1 2 3 4 5 6

0

5

10

15

20

25

30

yx

Lock

tim

e (%

)

Figure 5.8. Router lock using Internet mix.

5.3.3 SoCBUS wrapper send lockThe wrapper send lock is the amount of time that the wrapper associated with each SoCBUSswitch has been locked sending data to the SoCBUS switch. In other words the 3D graphspresenting the wrapper send lock gives an indication of the amount of time that each switchis sending data from the local IP block. The switches that have no value in these graphsdon’t send anything to the SoCBUS network. This is the case for example for the outputpacket processors.

Figure 5.9 and 5.10 shows the wrapper send lock time for minimum size packets andInternet mix at 0.6Gbit/s and 1.8Gbit/s respectively.

By looking the graphs it is easy to see that the forward table has the highest lock timewhile using minimum size packet and the packet buffers has the highest lock time whileusing the Internet mix. One other interesting thing to point out is that the send lock timefor the packet buffers is not four times larger than the lock time for each input port eventhough the packet buffers send exactly 4 times the amount of data that the input ports does.This is because of the overhead in the PCC protocol. The input ports send two packets,one containing data to the packet buffer and one lookup request to the forward table. Thismeans that the input port has to set up two SoCBUS links using the PCC protocol. Thepacket buffer only sends data to the output port and only has to set up one PCC link.

Page 42: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

28 Traffic simulations of the initial design

12

34

56

7

1 2 3 4 5 6

0

10

20

30

40

50

60

yx

Lock

tim

e (%

)

Figure 5.9. Wrapper send lock using minimum size packets.

12

34

56

7

1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

50

yx

Lock

tim

e (%

)

Figure 5.10. Wrapper send lock using Internet mix.

5.3.4 ConclusionsIt is now time to sum up the results from the initial simulations. The results show thatthe performance of the SoCBUS network using Internet mix and the even distribution isquite similar, but that the throughput for minimum size packets is much lower. The reasonfor this is that the overhead in the PCC protocol for small packets are much larger than

Page 43: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

5.3 Simulation results 29

for large packets. In other words, when sending small packets a larger part of the timein consumed while handshaking between the different functional blocks. The throughputtogether with the average latency from IPP to OPP at 1Gbit/s is shown in table 5.2.

The router lock has shown that the network load looks different depending on thetraffic model. For small packets the network is heavily loaded by the forward table, whilefor the other packet distributions the packet buffers carries the highest load. Generally thenetwork load is higher in the middle of the network.

The wrapper send lock time gives us a clear view of the bottlenecks in the currentdesign. Figure 5.9 shows that the forward table has the highest send lock time for minimumsize packets. This means that the links between the forwarding table and the packet buffersare the limiting factor using this particular traffic model. When the Internet mix is usedthe packet buffers have the highest send lock time. This means that the links between thepacket buffers and the output packet processors are the limiting factor. Based on the resultsfrom these simulations the SoCBUS architecture and router model will now be refined toboost the performance of the router.

Traffic model Maximum throughput Avg latency at 1Gbit/sMinimum packets 0.6 Gbit/s -Even distribution 2.0 Gbit/s 925 nsInternet mix 1.8 Gbit/s 822 ns

Table 5.2. Results from the simulation of the initial design.

Page 44: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

30 Traffic simulations of the initial design

Page 45: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 6

Second router design

This and the next chapter will describe the process of refining the router model andSoCBUS architecture to boost the performance of the router. During this process twodifferent designs will be developed and evaluated. First of all the bottlenecks from theinitial design will be identified. The router design and SoCBUS architecture will then bechanged in a way that hopefully will increase the performance of the router. The newrouter design will then be simulated to evaluate the changes.

6.1 Improvements in the designIn this section the bottlenecks of the initial design is identified and improvements are madeto boost the performance of the router.

6.1.1 Several forward tables

A big bottleneck of the initial design appears when the network is populated with smallpackets. The throughput achieved using minimum size packets is only one third of thethroughput achieved when using any of the two other packet distributions. The bottleneckin this case is the forward table.

To improve the performance for small packets one more forward table is added to thenetwork. Each forward table is now responsible for 8 packet processors instead of 16.By adding one more forward table we of course also add the problem of inconsistencybetween the different forward tables. The CPU will be responsible making updates to theforward tables, and to keep the data consistent.

6.1.2 More SoCBUS switches

By looking at the simulation results from the initial design it is clear that the traffic is verymuch concentrated to the middle of the network. This could lead to congestions that maylead to delayed or even discarded packets. To avoid this problem the SoCBUS network

31

Page 46: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

32 Second router design

size is increased from 7x6 to 8x7. The most network intensive blocks were also movedapart from each other to even further distribute the network load over a larger part of theSoCBUS network.

6.1.3 SoCBUS bus widthWhen looking at the current implementation of the SoCBUS switches one realizes that theimplementation is more or less independent of the bus width. This means that the delay inthe switches will remain almost the same independent of the bus width. This is becausethe delay introduced in the switches mainly comes from the control block that for exampledetermines the route. Because of the limited time of this project the implementation of theswitch will not be further described. The details on this implementation can be found in[10].

By increasing the SoCBUS bus width from the current 16 bits to 32 or 64 bits it wouldbe possible to increase the router throughput dramatically. It may even be possible to reachthe magic line speed of 10Gbit/s defined by the standard packet over SONET - OC-192.This of course assumes that the IP blocks in the design can operate at the same speed.When the new bus width should be decided one important thing to take into considerationis that the IP blocks should be able to handle the bus bandwidth. The bus bandwidthassuming a 64 bit bus is 64bits ·1.2GHz= 76.8Gbit/s full duplex. Because of the overheadin the PCC protocol the actual data bandwidth will be much lower. Within a few yearsmemories with these requirements on speed will be possible to implement. From now onthe SoCBUS bus width will be set to 64 bits and the delays in the SoCBUS switch will bethe same as for 16 bits bus.

6.2 Complete design after improvementsFigure 6.1 shows the SoCBUS network for the second design of the router.

6.3 Simulation resultsIn this section the results from simulations of the second router design will be presented.All simulations are performed under the same conditions, except from the bus width, asdescribed in the initial design. The maximum throughput is determined by looking at thelatency from IPP to OPP and the SoCBUS router lock is analyzed to find the bottlenecksof the current design.

Page 47: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

6.3 Simulation results 33

IPP

IPP

IPP 1IPP

2

OPP OPP OPP OPP OPP OPP

IPP IPP

FT

IPP

IPP IPP IPP

IPP IPP

OPP

IPP

IPP

IPP

IPP 3 4

13 14 15 16IPP

PB 1

PB 2 3 4

PB

FT

OPPOPP OPP OPP OPP OPP OPP OPP

OPP 10 11 12

13 14 15 16

MUCPU

PB

1 2 3 4 9

5 6 7 8

1 2

5 6 7 8

9 10 11 12

Figure 6.1. SoCBUS network for the second design.

6.3.1 Throughput

Figure 6.2 and 6.3 shows the latency using minimum size packets and Internet mix respec-tively. The graph showing even distribution can be found in appendix B.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

200

400

600

800

1000

1200

Throughput per port(Gbit/s)

Late

ncy

(ns)

Figure 6.2. Latency using minimum size packets.

Page 48: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

34 Second router design

0 1 2 3 4 5 6 7 80

200

400

600

800

1000

1200

1400

Throughput per port(Gbit/s)

Late

ncy

(ns)

Figure 6.3. Latency using Internet mix.

6.3.2 SoCBUS router lock

To get a view of the SoCBUS network activity we look at the router lock time. To makeit easy to identify the bottlenecks of the network the router lock is shown at the maximumthroughput that the design can handle. Figure 6.4 shows the router lock time for minimumsize packets at 1.6Gbit/s and figure 6.5 shows the router lock time for Internet mix at8Gbit/s.

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

30

35

yx

Rou

ter l

ock

(%)

Figure 6.4. Router lock using minimum size packets.

Page 49: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

6.3 Simulation results 35

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

30

35

yx

Rou

ter l

ock

(%)

Figure 6.5. Router lock using Internet mix.

6.3.3 SoCBUS wrapper send lockThe wrapper send lock is the amount of time that the wrapper has been busy sending data tothe current SoCBUS node. By looking at this graph it is easy to find the bottlenecks in theSoCBUS network. Figure 6.6 shows the wrapper send lock for minimum size packets at1.6Gbit/s and figure 6.7 shows the lock for Internet mix at 8Gbit/s.In the case of minimumsize packets the forward tables and the packet buffers are approximately evenly loadedwhile in the case of Internet mix the packet buffers has the highest load.

12

34

56

78

1 2 3 4 5 6 7

0

10

20

30

40

50

60

70

80

yx

Lock

tim

e (%

)

Figure 6.6. Wrapper send lock using minimum size packets

Page 50: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

36 Second router design

12

34

56

78

1 2 3 4 5 6 7

0

10

20

30

40

50

60

70

80

yx

Lock

tim

e (%

)

Figure 6.7. Wrapper send lock using Internet mix

6.3.4 SoCBUS transfer overhead

Because of the big difference in maximum throughput between minimum size packets andthe other packet size distributions some measures concerning overhead has been studied.Figure 6.8 illustrates the ratio between the time that the packet buffers sends data and thetime that is spent waiting to send data. The overhead has been measured at the maximumthroughput that each packet size distribution can handle.

Minimum size packets Even distribution Internet mix0,00%

10,00%

20,00%

30,00%

40,00%

50,00%

60,00%

70,00%

80,00%

90,00%

100,00%

Overhead Transfer

Figure 6.8. Packet buffer overhead.

Page 51: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

6.3 Simulation results 37

6.3.5 ConclusionsThe simulations of the second design have shown that the changes made to the routerdesign really increased the performance of the router in the form of increased throughputand decreased network latency. It is also obvious that the changes in network size, the newpositions of the IP blocks and the addition of one more forward table has contributed to amore even load over the entire SoCBUS network. The most important measures from thesimulations can be found in table 6.1.

Traffic model Maximum throughput Avg latency at 1Gbit/sMinimum packets 1.6 Gbit/s 122 nsEven distribution 9.5 Gbit/s 280 nsInternet mix 8.0 Gbit/s 224 ns

Table 6.1. Results from simulation of the second design.

Page 52: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

38 Second router design

Page 53: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 7

Final router design

The task of refining this router model to find “the best” design without any automated opti-mization tools is of course very hard or even impossible to accomplish. The router designdescribed in this section is the final design developed during this final year project. Firstthe changes made to the second design will be described, then the simulation results willbe presented and discussed and finally the final implementation of the functional blockswill be described.

7.1 Improvements in the designIn this section the bottlenecks of the initial design is identified and improvements are madeto boost the performance of the router.

7.1.1 More packet buffers and forward tables

By looking at the wrapper send lock from the second design it is obvious that the bottle-neck of the design still is the packet buffer and forward table. Increasing the SoCBUS buswidth could be a solution to the problem, but by looking at the bandwidth of the packetbuffers using for example a 128bit bus one realizes that this is not possible with today’smemory technologies. Another solution to the problem is to add more packet buffers andforward tables to distribute the network load better. The problem with this solution is thatthe buffer memory will not be efficiently used. Despite this problem it was decided to dou-ble the number of packet buffers and forward tables. The router now consists of 4 forwardtables and 8 packet buffers.

7.1.2 Changes in PCC

By looking at the overhead introduced by the PCC protocol, shown in figure 6.8, it is clearthat the protocol overhead stands for a very large part of the actual router lock. The reasonfor this overhead is that it takes time to set up the route between two blocks. The reason

39

Page 54: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

40 Final router design

for the large overhead in the case of minimum size packets is that the ratio between thetime it takes to set up the route and the time it takes to send the data is very big.

A new feature in the SoCBUS nodes that will decrease the overhead is purposed. Thischange will affect the way small packet are treated. All packets that are 64 bits or smallerwill be sent using speculative sending. This means that no route has to be set up in advanceto send the data. Instead the data is included in the first request-packet.

7.2 Complete design after improvementsFigure 7.1 shows the SoCBUS network for the final design of the router. Each packetbuffer is responsible for two IPPs and each forward table for four IPPs. The SoCBUS buswidth is set to 64 bits.

IPP

IPP

IPP 1IPP

2

OPP OPP OPP OPP OPP OPP

IPP IPP

IPP

IPP IPP IPP

IPP IPP

OPP

IPP

IPP

IPP

IPP 3 4

14 15 16IPP

PB PB PB

OPPOPP OPP OPP OPP OPP OPP OPP

OPP 10 11 12

13 14 15 16

MUCPU

PB

1 2 3 4 9

5 6 7 8

PB PB PB PB 1 2 3 4 5 6 7 8

FT 4 2

FTFT 1

FT 3

5 6 7 8

9 10 11 12 13

Figure 7.1. SoCBUS network for the final design.

7.3 Simulation resultsIn this section the results from simulations of the final router design will be presented. Allsimulations are performed under the same conditions as described in the second design,except from the short packet PCC implementation. The maximum throughput is deter-mined by looking at the latency from IPP to OPP and the SoCBUS router lock is analyzedto find the bottlenecks of the current design.

Page 55: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

7.3 Simulation results 41

7.3.1 ThroughputFigure 7.2 and 7.3 shows the latency using minimum size packets and Internet mix respec-tively. The graph showing the even distribution can be found in appendix B.

0 0.5 1 1.5 2 2.5 30

500

1000

1500

2000

2500

3000

3500

Throughput per port(Gbit/s)

Late

ncy

(ns)

Figure 7.2. Latency using minimum size packets.

0 2 4 6 8 10 12 140

500

1000

1500

2000

2500

Throughput per port(Gbit/s)

Late

ncy

(ns)

Figure 7.3. Latency using Internet mix.

Page 56: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

42 Final router design

7.3.2 SoCBUS router lockFigure 7.4 shows the router lock time using minimum size packets at 2.6Gbit/s and figure7.5 shows the router lock time using Internet mix at 14Gbit/s.

The router lock graphs shows that the network load is much more evenly loaded thanbefore. By the output port processors it is also easy to see that the router lock is higherin the center of the network than along the edge of the network. This is obvious becausethe randomized properties of the traffic between the packet buffers and the output portprocessors.

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

30

35

40

yx

Lock

tim

e (%

)

Figure 7.4. Router lock using minimum size packets.

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

30

yx

Lock

tim

e (%

)

Figure 7.5. Router lock using Internet mix.

Page 57: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

7.3 Simulation results 43

7.3.3 SoCBUS wrapper send lockAs described before the wrapper send lock is the amount of time that the wrapper has beenbusy sending data to the current SoCBUS node. Figure 7.6 shows the wrapper send lockfor minimum size packets at 2.6Gbit/s and figure 7.7 shows the lock for Internet mix at14Gbit/s. These graphs shows that the packet buffer now has the highest wrapper sendlock for both minimum size packets and for Internet mix. It is interesting to notice that thewrapper send lock for the packet buffers is smaller in the middle of the network. This isbecause those packet buffers on average have a shorter way to the output ports.

12

34

56

78

1 2 3 4 5 6 7

0

10

20

30

40

50

60

70

80

yx

Lock

tim

e (%

)

Figure 7.6. Wrapper send lock using minimum size packets.

12

34

56

78

1 2 3 4 5 6 7

0

10

20

30

40

50

60

yx

Lock

tim

e (%

)

Figure 7.7. Wrapper send lock using Internet mix.

Page 58: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

44 Final router design

7.3.4 ConclusionsThe graphs describing the router lock shows that the traffic is better distributed over thewhole network than before. In the wrapper send lock graphs it is easy to see that the currentbottleneck is the link between the packet buffers and the output ports. The big difference innetwork load between the minimum size packets and the other packet distributions showsthat the overhead is still very big for minimum size packet despite the change made in thePCC protocol.

These simulations has shown that the SoCBUS network used in this router design canhandle speeds up to, and even above the magic speed 10Gbit/s used in packet over SONETOC-192. The SoCBUS network still has problems achieving high speeds when it comesto small packets. Some possible solutions to this problem will be described later in thischapter. A summary of the results from the simulations can be found in figure 7.1.

Traffic model Maximum throughput Avg latency at 1Gbit/sMinimum packets 2.6 Gbit/s 120 nsEven distribution 18.0 Gbit/s 280 nsInternet mix 15.0 Gbit/s 226 ns

Table 7.1. Results from simulation of the final design.

7.4 New requirements on the functional blocksThis far during this iterative process of increasing the performance of the router the re-quirements on the functional blocks has not been discussed very much. The initial designand requirements on the functional blocks were defined with Gigabit Ethernet in mind.Now that we have a SoCBUS network that can handle speeds beyond 10Gbit/s it is timeto look at the new requirements on the functional blocks. Because of the current standardsused on the Internet today the line speed is set to 10Gbit/s. The properties of the IP blockswill now have to be changed to fit this new line speed. The new requirements in terms ofexecution time and/or bandwidth are described below.

7.4.1 Input and output packet processors (IPP/OPP)The big difference in the input and output packet processors are the time constraints forprocessing a packet. The worst case packet rate using SONET OC-192 at 10Gbit/s is25MPPS, using minimum size packets. This corresponds to a maximum execution timefor this block of 40ns.

7.4.2 Packet buffer (PB)The packet buffer is a critical part of the router. In the final design of the core router eachpacket buffer is responsible for two IPPs. The maximum bandwidth in the packet buffer

Page 59: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

7.4 New requirements on the functional blocks 45

will be 20Gbit/s in each direction.

7.4.3 Forwarding table (FT)In the final design each forward table is responsible for four IPPs. Assuming the worst caseconsisting of only minimum size packets the lookup rate at 10Gbit/s will be 100MLPS.With the current design the maximum speed using only minimum size packets is 2.6Gbit/s.This line speed corresponds to a lookup rate of 26MLPS. In these calculations the routingtable updates has not been taken into consideration.

Page 60: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

46 Final router design

Page 61: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Chapter 8

Conclusions

This chapter presents the final results and conclusions of this project and some ideas forfurther work.

8.1 ResultsIn this thesis a 16 port gigabit core router architecture has been developed using theSoCBUS on-chip interconnection network. The functionality of a core router has beendivided into functional blocks and these blocks have been placed in a SoCBUS 2D meshnetwork. During the design several different architectures have been developed and eval-uated to get the final router design and to find the current bottlenecks of the SoCBUS.

The final design of the core router can operate at speeds up to 16x10Gbit/s full duplexunder normal traffic conditions. For the special case of only minimum size packet themaximum speed achieved was 16x2.6Gbit/s. During the iterative process of increasing theperformance of the router architecture two changes to the present SoCBUS were purposed.

• Because of the high requirements on bandwidth the bus width of the SoCBUS bushas been changed from 16 bits to 64 bits.

• To speed up the sending of small packets in the SoCBUS network packets with asize less or equal to the bus width are sent using speculative sending. This meansthat the actual data is embedded in the first request packet.

The router architecture developed in this final year project is good enough to competewith the top of the line single-chip core routers on the market today. Even though the per-formance of this router is not better than current architectures there are several advantagesof this design.

One of the big benefits of this router architecture is the flexibility when it comes toupgrading and making changes to the current design. Because of the general on-chipinterconnection network you could simply add another IP block without affecting the otherparts of the design and without having to redesign the interconnection between blocks. Ifyou for example want to add MPLS support to this architecture you could simply add a

47

Page 62: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

48 Conclusions

forward table for MPLS and implement a MPLS classification engine in the input packetprocessor.

By the use of a general on-chip interconnection network the design time of the com-plete system will be much less than for a router chip design with full custom design. Asthe range of functions implemented as IP blocks is getting better it might even be possibleto buy all the functional blocks and just put them together to create the final system.

8.2 Further workDespite the iterative process used to develop this router architecture several things couldstill be done to increase the performance even further.

• To overcome the bottleneck between the packet buffers and output port processorswhen sending small packets the packet buffer could collect a number of packetsdedicated for the same output port and send them together as one big SoCBUSpacket. This would decrease the overhead introduced when the connection is setup by the PCC protocol. The drawback of this solution is that the packet latencywill increase dramatically. Still under extreme traffic conditions it might be betterto forward the packet with a delay than to discard the packet.

• Static or semi static routing could be used in the parts of the network that has veryregular traffic patterns. In the final design this is the case between the input port pro-cessors, the forward tables and the packet buffers. This would simplify the routingdecision and reduce the route setup time in the SoCBUS nodes.

Several more different types of benchmarks could be made to further evaluate the perfor-mance of the router. Especially it would be interesting to evaluate the performance formulticast packets.

The router design developed in this project is only a specification on the system level.It would be interesting to develop a real prototype of the system. This would involvedevelopment of the different functional blocks and making some changes of the currentimplementation of the SoCBUS switches.

Page 63: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Bibliography

[1] ARM: ARM homepage: http://www.arm.com, 2004

[2] F. Baker: Requirements for IP Version 4 Routers, RFC1812, Network WorkingGroup, 1995

[3] S. Bradner, J. McQuaid: Benchmarking Methodology for Network Interconnect De-vices, RFC2544, Network Working Group, 1999

[4] D. Newman: Internet Core Router Test, Light Reading(http://www.lightreading.com/), 2001

[5] A. Tanenbaum: Computer Networks, Fourth Edition, 2003

[6] Anderw M. Odlyzko: Internet traffic growth: Sources and implications, Universityof Minesota, Minneapolis, MN, USA

[7] GNU: GSL - GNU Scientific Library, GSL - GNU Scientific Library(http://www.gnu.org/software/gsl/), 2004

[8] Ulf Nordqvist: Protocol Processing in Network Terminals, Department of ElectricalEngineering, Linkoping University, 2004

[9] Joakim Wallin: Design and Implementation of a Traffic Model and a Stimuli Genera-tor for OCN SoCBUS Architecture, Department of Electrical Engineering, LinkopingUniversity, 2004, LITH-ISY-EX-3531-2004

[10] Sumant Sathe, Daniel Wiklund, Dake Liu: Design of a Switching Node (Router) forOn-Chip Networks, Department of Electrical Engineering, Linkoping University

[11] Daniel Wiklund: An on-chip network architecture for hard real time systems, Depart-ment of Electrical Engineering, Linkoping University, 2003

[12] Daniel Wiklund: SoCBUS Simulator: Users Manual, Department of Electrical Engi-neering, Linkoping University, 2004

49

Page 64: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

50 Conclusions

Page 65: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Appendix A

Dependency support in theSoCBUS simulator

In this appendix details about the implementation of dependency support in the SoCBUSsimulator will be given. This description is intended for people who will use the simulatoror make further developments of it.

The changes that are transparent to the user is the new XML stimuli format describedin figure A.1 and A.2. The task working has the same function as before except that younow can define a name for each task. The dependency task is triggered by the completionof the task specified as event-dependency.

WORKING

EVENT_NAME

VALUE

EVENT_POSITION

MATH_MODEL

EVENT_LENGTH

MATH_MODEL VALUE VALUE

Figure A.1. Stimuli task working.

EVENT_NAME

DEPENDENCY

VALUE

EVENT_DEPENDENCY

VALUE

EVENT_DELAY

VALUE

EVENT_LENGTH

VALUEMATH_MODEL

Figure A.2. Stimuli task dependency.

51

Page 66: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

52 Dependency support in the SoCBUS simulator

The format of the stimuli file created by the stimuli generator and used as input to thesimulator is also changed. In this format each transfer is associated with an ID. This ID isused as a reference between different tasks to specify the dependencies. It is also possibleto specify a dependency delay that simulates the execution time of the block. The new rawstimuli format is shown below.

<STARTTIME> <ID> <LENGTH> <SOURCE BLOCK> <DEST BLOCK><DEPENDENCY DELAY> <DEPENDENCY 1> ... <DEPENDENCY N>

Several changes has to be made to the simulator to enable the support for dependen-cies. First the system model had to be changes to enable support for different source anddestination models. This made it possible to choose the type of source and destinationmodel you wanted to use in the network specification. With this functionality present inthe simulator it was possible to implement the dependency model without having to changethe current model.

Page 67: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

Appendix B

Supplementary results

In this appendix some additional graphs from the simulations of the different router designswill be presented.

B.1 Initial design

Figure B.1 shows the router lock using the even distribution of packet sizes at 2.2Gbit/s. Adescription of the router lock can be found in section 5.3.2. Figure B.2 shows the wrappersend lock using the even distribution of packet sizes at 2.2Gbit/s. A description of thewrapper send lock can be found in section 5.3.3.

12

34

56

7

1 2 3 4 5 6

0

5

10

15

20

25

30

yx

Lock

tim

e (%

)

Figure B.1. Router lock using even distribution.

53

Page 68: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

54 Supplementary results

12

34

56

7

1 2 3 4 5 6

0

10

20

30

40

50

60

yx

Lock

tim

e (%

)

Figure B.2. Wrapper send lock using even distribution.

B.2 Second design

Figure B.3 shows the latency versus throughput using the even distribution of packet sizes.Figure B.4 and B.5 shows the router lock and wrapper send lock respectively using evendistribution at 9.5Gbit/s.

0 1 2 3 4 5 6 7 8 9 10200

300

400

500

600

700

800

Throughput per port(Gbit/s)

Late

ncy

(ns)

Figure B.3. Latency using even distribution.

Page 69: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

B.3 Final design 55

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

30

35

yx

Rou

ter l

ock

(%)

Figure B.4. Router lock using even distribution.

12

34

56

78

1 2 3 4 5 6 7

0

10

20

30

40

50

60

70

80

yx

Lock

tim

e (%

)

Figure B.5. Wrapper send lock using even distribution.

B.3 Final design

Figure B.6 shows the latency using even distribution of packet sizes. In this design thepacket loss has also been studied to determine the performance of the router in more detail.The packet loss is measured from the IPP to the OPP. Because the packet loss is measuresduring the simulations the packets that is currently inside the SoCBUS network is classifiedas missing. This explains why packets are lost even at low throughputs. Figure B.7, B.8and B.9 shows the packet loss for minimum size packets, even distribution and Internetmix respectivly.

Page 70: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

56 Supplementary results

Figure B.10 and B.11 shows the router lock and wrapper send lock respectively usingeven distribution at 18Gbit/s.

0 2 4 6 8 10 12 14 16 180

500

1000

1500

2000

2500

3000

3500

4000

4500

Throughput per port(Gbit/s)

Late

ncy

(ns)

Figure B.6. Latency using even distribution.

0 0.5 1 1.5 2 2.5 30

1

2

3

4

5

6

7x 10−3

Throughput per port(Gbit/s)

Pac

ket l

oss

(%)

Figure B.7. Packet loss using minimum size packets.

Page 71: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

B.3 Final design 57

0 2 4 6 8 10 12 14 16 180

0.01

0.02

0.03

0.04

0.05

0.06

Throughput per port(Gbit/s)

Pac

ket l

oss

(%)

Figure B.8. Packet loss using even distribution.

0 5 10 150

0.01

0.02

0.03

0.04

0.05

0.06

Throughput per port(Gbit/s)

Pac

ket l

oss

(%)

Figure B.9. Packet loss using Internet mix.

Page 72: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

58 Supplementary results

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

yx

Lock

tim

e (%

)

Figure B.10. Router lock using even distribution.

12

34

56

78

1 2 3 4 5 6 7

0

5

10

15

20

25

30

35

40

45

yx

Lock

tim

e (%

)

Figure B.11. Wrapper send lock using even distribution.

Page 73: Design of a core router using the SoCBUS on-chip network19981/FULLTEXT01.pdf · Examensarbete utf¤ort i Datorteknik ... OC-48 SONET Optical Carrier at 2.5Gbit/s OC-192 SONET Optical

LINKÖPING UNIVERSITY ELECTRONIC PRESS

Copyright

Svenska

Detta dokument halls tillgangligt pa Internet - eller dess framtida ersattare - under en langretid fran publiceringsdatum under forutsattning att inga extra-ordinara omstandigheter up-pstar.Tillgang till dokumentet innebar tillstand for var och en att lasa, ladda ner, skriva ut enstakakopior for enskilt bruk och att anvanda det oforandrat for ickekommersiell forskning ochfor undervisning. Overforing av upphovsratten vid en senare tidpunkt kan inte upphavadetta tillstand. All annan anvandning av dokumentet kraver upphovsmannens medgivande.For att garantera aktheten, sakerheten och tillgangligheten finns det losningar av tekniskoch administrativ art.Upphovsmannens ideella ratt innefattar ratt att bli namnd som upphovsman i den omfat-tning som god sed kraver vid anvandning av dokumentet pa ovan beskrivna satt samt skyddmot att dokumentet andras eller presenteras i sadan form eller i sadant sammanhang somar krankande for upphovsmannens litterara eller konstnarliga anseende eller egenart.For ytterligare information om Linkoping University Electronic Press se forlagets hemsi-da: http://www.ep.liu.se/

English

The publishers will keep this document online on the Internet - or its possible replacement- for a considerable time from the date of publication barring exceptional circumstances.The online availability of the document implies a permanent permission for anyone toread, to download, to print out single copies for your own use and to use it unchanged forany non-commercial research and educational purpose. Subsequent transfers of copyrightcannot revoke this permission. All other uses of the document are conditional on theconsent of the copyright owner. The publisher has taken technical and administrativemeasures to assure authenticity, security and accessibility.According to intellectual property law the author has the right to be mentioned whenhis/her work is accessed as described above and to be protected against infringement.For additional information about the Linkoping University Electronic Press and its proce-dures for publication and for assurance of document integrity, please refer to its WWWhome page: http://www.ep.liu.se/

c© Jimmy SvenssonLinkoping, 2nd December 2004