Network Layer Dr. Yingwu Zhu 4-1. Network Layer r Introduction r Datagram networks r IP: Internet...
-
Upload
ralf-mcgee -
Category
Documents
-
view
217 -
download
0
Transcript of Network Layer Dr. Yingwu Zhu 4-1. Network Layer r Introduction r Datagram networks r IP: Internet...
Network Layer
Dr. Yingwu Zhu
4-1
Network Layer
Introduction Datagram networks IP: Internet Protocol
Datagram format IPv4 addressing ICMP
What’s inside a router
Routing algorithms Link state Distance Vector Hierarchical routing
Routing in the Internet RIP OSPF BGP
Broadcast and multicast routing
4-2
Network layer transport segment from sending to receiving
host on sending side encapsulates segments into
datagrams on receiving side, delivers segments to
transport layer network layer protocols in every host, router Router examines header fields in all IP
datagrams passing through it
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
4-3
Key Network-Layer Functions
forwarding: move packets from router’s input to appropriate router output
routing: determine route taken by packets from source to dest.
Routing algorithms
analogy:
routing: process of planning trip from source to dest
forwarding: process of getting through single interchange
4-4
1
23
0111
value in arrivingpacket’s header
routing algorithm
local forwarding tableheader value output link
0100010101111001
3221
Interplay between routing and forwarding
4-5
Network Layer
Introduction Datagram networks IP: Internet Protocol
Datagram format IPv4 addressing ICMP
What’s inside a router
Routing algorithms Link state Distance Vector Hierarchical routing
Routing in the Internet RIP OSPF BGP
Broadcast and multicast routing
4-6
Datagram networks Connection-less service; no call setup at network layer routers: no state about end-to-end connections (why?)
no network-level concept of “connection”
packets forwarded using destination host address packets between same source-dest pair may take different paths
application
transportnetworkdata linkphysical
application
transportnetworkdata linkphysical
1. Send data 2. Receive data
4-7
Network Layer
Introduction Datagram networks IP: Internet Protocol
Datagram format IPv4 addressing ICMP
What’s inside a router
Routing algorithms Link state Distance Vector Hierarchical routing
Routing in the Internet RIP OSPF BGP
Broadcast and multicast routing
4-8
The Internet Network layer
forwardingtable
Host, router network layer functions:
Routing protocols•path selection•RIP, OSPF, BGP
IP protocol•addressing conventions•datagram format•packet handling conventions
ICMP protocol•error reporting•router “signaling”
Transport layer: TCP, UDP
Link layer
physical layer
Networklayer
4-9
The Internet Protocol (IP)
App
Transport
Network
Link
TCP / UDP
IP
Data Hdr
Data Hdr
TCP Segment
IP Datagram
Protocol Stack
4-10
The Internet Protocol (IP)
Characteristics of IP CONNECTIONLESS: mis-sequencing
UNRELIABLE: may drop packets…
BEST EFFORT: … but only if necessary
DATAGRAM: individually routed
A
R1
R2
R4
R3
BSource Destination
D H
D H
•Architecture•Links•Topology
Transparent
4-11
IP datagram format
ver length
32 bits
data (variable length,typically a TCP
or UDP segment)
16-bit identifier
Internet checksum
time tolive
32 bit source IP address
IP protocol versionnumber
header length (bytes)
max numberremaining hops
(decremented at each router)
forfragmentation/reassembly
total datagramlength (bytes)
upper layer protocolto deliver payload to
head.len
type ofservice
“type” of data flgsfragment
offsetupper layer
32 bit destination IP address
Options (if any) E.g. timestamp,record routetaken, specifylist of routers to visit.
how much overhead with TCP?
20 bytes of TCP 20 bytes of IP = 40 bytes +
app layer overhead
6 for TCP
4-12
IP Fragmentation & Reassembly Problem: A router may
receive a packet larger than the maximum transmission unit (MTU) of the outgoing link. different link types,
different MTUs Solution: large IP datagram
divided (“fragmented”) within net one datagram becomes
several datagrams “reassembled” only at
final destination, why? IP header bits used to
identify, order related fragments
fragmentation: in: one large datagramout: 3 smaller datagrams
reassembly
E.g., Ethernet frames carry up to 1500 bytes, frames for some wide-area links carry no more than 576 bytes. (MTU: the max of data a link-layer frame can carry) 4-13
IP Fragmentation and Reassembly
ID=x
offset=0
fragflag=0
length=4000
ID=x
offset=0
fragflag=1
length=1500
ID=x
offset=185
fragflag=1
length=1500
ID=x
offset=370
fragflag=0
length=1040
One large datagram becomesseveral smaller datagrams
Example 4000 byte
datagram MTU = 1500
bytes
1480 bytes in data field
offset =1480/8
Note: the offset value is specified in units of 8-byte chunks!!! 4-14
Fragmentation
Fragments are re-assembled by the destination host; not by intermediate routers.
To avoid fragmentation, hosts commonly use path MTU discovery to find the smallest MTU along the path.
Path MTU discovery involves sending various size datagrams until they do not require fragmentation along the path.
Most links use MTU>=1500bytes today. Try:
traceroute www.berkeley.edu 500 –F andtraceroute www.berkeley.edu 1501
-F: Set the "don't fragment" bit, return error it is too long Bonus: Can you find a destination for which the path
MTU < 1500 bytes?
4-15
Network Layer
Introduction Virtual circuit and
datagram networks What’s inside a
router IP: Internet Protocol
Datagram format IPv4 addressing ICMP IPv6
Routing algorithms Link state Distance Vector Hierarchical routing
Routing in the Internet RIP OSPF BGP
Broadcast and multicast routing
4-16
IP Addresses IP (Version 4) addresses are 32 bits long Every interface has a unique IP address:
A computer might have two or more IP addresses A router has many IP addresses
IP addresses are hierarchical They contain a network ID and a host ID E.g. SeattleU addresses start with: 172.17…
IP addresses are assigned statically or dynamically (e.g. DHCP)
IP (Version 6) addresses are 128 bits long
4-17
IP AddressesOriginally there were 5 classes:
CLASS “A” 1 7
0 Net ID Host-ID
CLASS “B” 10 Net ID Host-ID
24
2 14 16
CLASS “C” 110
Net ID Host-ID
3 21 8
CLASS “D”
1110
Multicast Group ID
4 28
CLASS “E” 11110 Reserved
5 27
A B C D0 232-1
4-18
IP AddressesExamples
Class “A” address: www.mit.edu18.181.0.31
(18<128 => Class A)
Class “B” address: www.seattleu.edu172.17.72.14
(128<171<128+64 => Class B)
4-19
IP Addressing
Problem: Address classes were too “rigid”. For most
organizations, Class C were too small and Class B too big. Led to inefficient use of address space, and a shortage of addresses.
Organizations with internal routers needed to have a separate (Class C) network ID for each link.
And then every other router in the Internet had to know about every network ID in every organization, which led to large address tables.
Small organizations wanted Class B in case they grew to more than 255 hosts. But there were only about 16,000 Class B network IDs.
4-20
IP Addressing
Two solutions were introduced: Subnetting within an organization to subdivide
the organization’s network ID. Classless Interdomain Routing (CIDR) in the
Internet backbone was introduced in 1993 to provide more efficient and flexible use of IP address space.
CIDR is also known as “supernetting” because subnetting and CIDR are basically the same idea.
4-21
Subnetting
CLASS “B”e.g.
Company
10 Net ID Host-ID
2 14 16
10 Net ID Host-ID
2 14 16
0000
Subnet ID (20) SubnetHost ID (12)
10 Net ID Host-ID
2 14 16
1111
Subnet ID (20) SubnetHost ID (12)
10 Net ID Host-ID
2 14 16
000000
Subnet ID (22) SubnetHost ID (10)
10 Net ID Host-ID
2 14 16
1111011011
Subnet ID (26) SubnetHost ID (6)
e.g. Site
e.g. Dept
4-22
Subnetting
Subnetting is a form of hierarchical routing. Subnets are usually represented via an address
plus a subnet mask or “netmask”. e.g. [zhuy@cs1 ~]$ /sbin/ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:21:9B:8F:64:6D inet addr:10.124.72.20 Bcast:10.124.72.255
Mask:255.255.255.0
Netmask 255.255.255.0: the first 24 bits are the subnet ID, and the last 8 bits are the host ID.
Can also be represented by a “prefix + length”, e.g. 171.64.15.0/24, or just 171.64.15/24.
4-23
IP Addressing IP address: 32-bit
identifier for host, router interface
interface: connection between host/router and physical link router’s typically have
multiple interfaces host typically has one
interface IP addresses
associated with each interface
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
223.1.1.1 = 11011111 00000001 00000001 00000001
223 1 11
4-24
Subnets IP address:
subnet part (high order bits)
host part (low order bits)
What’s a subnet ? device interfaces
with same subnet part of IP address
can physically reach each other without intervening router
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
network consisting of 3 subnets
subnet
4-25
Subnets 223.1.1.0/24223.1.2.0/24
223.1.3.0/24
Recipe To determine the
subnets, detach each interface from its host or router, creating islands of isolated networks. Each isolated network is called a subnet. Subnet mask: /24
4-26
SubnetsHow many? 223.1.1.1
223.1.1.3
223.1.1.4
223.1.2.2223.1.2.1
223.1.2.6
223.1.3.2223.1.3.1
223.1.3.27
223.1.1.2
223.1.7.0
223.1.7.1223.1.8.0223.1.8.1
223.1.9.1
223.1.9.2
4-27
Classless Interdomain Routing (CIDR)Addressing
The IP address space is broken into line segments. Subnet portion of address of arbitrary length Each line segment is described by a prefix. A prefix is of the form x/y where x indicates the prefix of all
addresses in the line segment, and y indicates the length of the segment.
e.g. The prefix 128.9/16 represents the line segment containing addresses in the range: 128.9.0.0 … 128.9.255.255.
0 232-1
128.9/16
128.9.0.0
216
142.12/19
65/8
128.9.16.14
4-28
Classless Interdomain Routing (CIDR)Addressing
0 232-1
128.9/16
128.9.16.14
128.9.16/20128.9.176/20
128.9.19/24
128.9.25/24
Most specific route = “longest matching prefix”
4-29
Classless Interdomain Routing (CIDR)Addressing
Prefix aggregation: If a service provider serves two organizations
with prefixes, it can (sometimes) aggregate them to form a shorter prefix. Other routers can refer to this shorter prefix, and so reduce the size of their address table.
E.g. ISP serves 128.9.14.0/24 and 128.9.15.0/24, it can tell other routers to send it all packets belonging to the prefix 128.9.14.0/23.
ISP Choice: In principle, an organization can keep its prefix
if it changes service providers. 4-30
Size of the Routing Table at the core of the Internet
Source: http://www.cidr-report.org/
31
IP addresses: how to get one?
Q: How does host get IP address?
hard-coded by system admin in a file Wintel: control-panel->network->configuration->tcp/ip-
>properties UNIX: /etc/rc.config
DHCP: Dynamic Host Configuration Protocol: dynamically get address from as server “plug-and-play”
4-32
IP addresses: how to get one?
Q: How does network get subnet part of IP addr?
A: gets allocated portion of its provider ISP’s address space
ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20
Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23
4-33
Hierarchical addressing: route aggregation
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us“Send me anythingwith addresses beginning 199.31.0.0/16”
200.23.20.0/23Organization 2
...
...
Hierarchical addressing allows efficient advertisement of routing information:
4-34
Hierarchical addressing: more specific routes
ISPs-R-Us has a more specific route to Organization 1
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us“Send me anythingwith addresses beginning 199.31.0.0/16or 200.23.18.0/23”
200.23.20.0/23Organization 2
...
...
Longest prefix match 4-35
IP addressing: the last word...
Q: How does an ISP get block of addresses?
A: ICANN: Internet Corporation for Assigned
Names and Numbers allocates addresses manages DNS assigns domain names, resolves disputes
4-36
NAT: Network Address Translation
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
138.76.29.7
local network(e.g., home network)
10.0.0/24
rest ofInternet
Datagrams with source or destination in this networkhave 10.0.0/24 address for
source, destination (as usual)
All datagrams leaving localnetwork have same single source
NAT IP address: 138.76.29.7,different source port numbers
4-37
NAT: Network Address Translation
Motivation: local network uses just one IP address as far as outside world is concerned: no need to be allocated range of addresses from
ISP: - just one IP address is used for all devices can change addresses of devices in local network
without notifying outside world can change ISP without changing addresses of
devices in local network devices inside local net not explicitly
addressable, visible by outside world (a security plus).
4-38
NAT: Network Address Translation
Implementation: NAT router must:
outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #). . . remote clients/servers will respond using (NAT IP
address, new port #) as destination addr.
remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair
incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table
4-39
NAT: Network Address Translation
10.0.0.1
10.0.0.2
10.0.0.3
S: 10.0.0.1, 3345D: 128.119.40.186, 80
1
10.0.0.4
138.76.29.7
1: host 10.0.0.1 sends datagram to 128.119.40.186, 80
NAT translation tableWAN side addr LAN side addr
138.76.29.7, 5001 10.0.0.1, 3345…… ……
S: 128.119.40.186, 80 D: 10.0.0.1, 3345
4
S: 138.76.29.7, 5001D: 128.119.40.186, 80
2
2: NAT routerchanges datagramsource addr from10.0.0.1, 3345 to138.76.29.7, 5001,updates table
S: 128.119.40.186, 80 D: 138.76.29.7, 5001
3
3: Reply arrives dest. address: 138.76.29.7, 5001
4: NAT routerchanges datagramdest addr from138.76.29.7, 5001 to 10.0.0.1, 3345
4-40
NAT: Network Address Translation
16-bit port-number field: 60,000 simultaneous connections with a
single LAN-side address! NAT is controversial:
routers should only process up to layer 3 violates end-to-end argument
• NAT possibility must be taken into account by app designers, eg, P2P applications
address shortage should instead be solved by IPv6
4-41
Network Layer
Introduction Virtual circuit and
datagram networks What’s inside a
router IP: Internet Protocol
Datagram format IPv4 addressing ICMP IPv6
Routing algorithms Link state Distance Vector Hierarchical routing
Routing in the Internet RIP OSPF BGP
Broadcast and multicast routing
4-42
ICMP: Internet Control Message Protocol
used by hosts & routers to communicate network-level information error reporting:
unreachable host, network, port, protocol
echo request/reply (used by ping)
network-layer “above” IP: ICMP msgs carried in IP
datagrams ICMP message: type, code
plus first 8 bytes of IP datagram causing error
Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header
4-43
An aside:Error Reporting (ICMP) and traceroute
Internet Control Message Protocol: Used by a router/end-host to report some
types of error: E.g. Destination Unreachable: packet can’t
be forwarded to/towards its destination. E.g. Time Exceeded: TTL reached zero, or
fragment didn’t arrive in time. Traceroute uses this error to its advantage.
An ICMP message is an IP datagram, and is sent back to the source of the packet that caused the error.
4-44
Traceroute and ICMP
Source sends series of UDP segments to dest First has TTL =1 Second has TTL=2, etc. Unlikely port number
When nth datagram arrives to nth router: Router discards
datagram And sends to source an
ICMP message (type 11, code 0)
Message includes name of router& IP address
When ICMP message arrives, source calculates RTT
Traceroute does this 3 times
Stopping criterion UDP segment eventually
arrives at destination host
Destination returns ICMP “host unreachable” packet (type 3, code 3)
When source gets this ICMP, stops.It would be fun if you design a traceroute tool on your own!
4-45
Network Layer
Introduction Datagram networks IP: Internet Protocol
Datagram format IPv4 addressing ICMP
What’s inside a router
Routing algorithms Link state Distance Vector Hierarchical routing
Routing in the Internet RIP OSPF BGP
Broadcast and multicast routing
4-46
Router Architecture Overview
Two key router functions: run routing algorithms/protocol (RIP, OSPF, BGP) forwarding datagrams from incoming to outgoing link
4-47
Input Port Functions
Decentralized switching: given datagram dest., lookup output
port using forwarding table in input port memory
goal: complete input port processing at ‘line speed’
queuing: if datagrams arrive faster than forwarding rate into switch fabric
Physical layer:bit-level reception
Data link layer:e.g., Ethernet
4-48
Three types of switching fabrics
4-49
Switching Via MemoryFirst generation routers: traditional computers with switching under direct control of CPUpacket copied to system’s memory speed limited by memory bandwidth (2 bus crossings per datagram)
InputPort
OutputPort
Memory
System Bus
4-50
Switching Via a Bus
datagram from input port memory
to output port memory via a shared bus
bus contention: switching speed limited by bus bandwidth
1 Gbps bus, Cisco 1900: sufficient speed for access and enterprise routers (not regional or backbone)
4-51
Switching Via An Interconnection Network
overcome bus bandwidth limitations Banyan networks, other interconnection nets
initially developed to connect processors in multiprocessor
Advanced design: fragmenting datagram into fixed length cells, switch cells through the fabric.
Cisco 12000: switches Gbps through the interconnection network
4-52
Output Ports
Buffering required when datagrams arrive from fabric faster than the transmission rate
Scheduling discipline chooses among queued datagrams for transmission
4-53
Output port queueing
buffering when arrival rate via switch exceeds output line speed
queueing (delay) and loss due to output port buffer overflow!
4-54
Input Port Queuing
Fabric slower than input ports combined -> queueing may occur at input queues
Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward
queueing delay and loss due to input buffer overflow!
4-55
How a Router Forwards Datagrams
128.9/16128.9.16/20
128.9.176/20
128.9.19/24128.9.25/24
142.12/19
65/8
Prefix Port
3227213
128.17.14.1128.17.14.1
128.17.20.1
128.17.10.1128.17.14.1
128.17.16.1
128.17.16.1
Next-hop
R1
R2
R3
R4
12
3
128.17.20.1
128.17.16.1
e.g. 128.9.16.14 => Port 2
Forwarding/routing table
4-56
How a Router Forwards Datagrams Every datagram contains a destination
address. The router determines the prefix to
which the address belongs, and routes it to the“Network ID” that uniquely identifies a physical network.
All hosts and routers sharing a Network ID share same physical network.
4-57
Forwarding Datagrams
Is the datagram for a host on a directly attached network?
If no, consult forwarding table to find next-hop.
4-58
Inside a router
Link 1, ingress Link 1, egress
Link 2, ingress Link 2, egress
Link 3, ingress Link 3, egress
Link 4, ingress Link 4, egress
ChooseEgress
ChooseEgress
ChooseEgress
ChooseEgress
4-59
Inside a router
Link 1, ingress Link 1, egress
Link 2, ingress Link 2, egress
Link 3, ingress Link 3, egress
Link 4, ingress Link 4, egress
ChooseEgress
ChooseEgress
ChooseEgress
ForwardingDecision
ForwardingTable
4-60
Forwarding in an IP Router
• Lookup packet DA in forwarding table.– If known, forward to correct port.– If unknown, drop packet.
• Decrement TTL, update header Checksum.
• Forward packet to outgoing interface.• Transmit packet onto link.
Question: How is the address looked up in a real router?
4-61
Making a Forwarding DecisionClass-based addressing
Class A Class B Class C D
212.17.9.4
Class A
Class B
Class C212.17.9.0 Port 4
Exact Match
Routing Table:
IP Address Space
212.17.9.0
Exact Match: There are many well-known ways to find an exact match in a table.
4-62
Direct Lookup
IP Address
Ad
dre
ss
MemoryD
ata
Next-hop, Port
Problem: With 232 addresses, the memory would require 4 billion entries.
4-63
Associative Lookups“Contents addressable memory” (CAM)
NetworkAddress
PortNumber
AssociativeMemory or CAM
Search Data
32
PortNumber
Hit?
Advantages:• Simple
Disadvantages• Slow• High Power• Small• Expensive
Search data is compared with every entry in parallel
4-64
Hashed Lookups
HashingFunction
Memory
Addre
ss
Data
Search Data
log2N
AssociatedData
Hit?
Address{1632
4-65
Lookups Using HashingAn example
Hashing Function16
#1 #2 #3 #4
#1 #2
#1 #2 #3Linked list of entrieswith same hash key.
Memory
Search Data
AssociatedData
Hit?32
4-66
Lookups Using Hashing
Advantages:
• Simple
• Expected lookup time can be small
Disadvantages• Non-deterministic lookup time
• Inefficient use of memory
4-67
Trees and Tries
Binary Search Tree:
< >
< > < >
log
2 NN entries
Binary Search Trie: (“reTRIEval”)
0 1
0 1 0 1
111010
Requires 32 memory references, regardless of number of addresses.
4-68
Search TriesMultiway tries reduce the number of memory references
16-ary Search Trie
0000, ptr 1111, ptr
0000, 0 1111, ptr
000011110000
0000, 0 1111, ptr
111111111111
Question: Why not just keep increasing the degree of the trie?
4-69
Classless AddressingCIDR
0 232-1
128.9/16
128.9.16.14
128.9.16/20128.9.176/20
128.9.19/24
128.9.25/24
Most specific route = “longest matching prefix”
Question: How can we look up addresses if they are not an exact match?
4-70
Ternary CAMs
10.1.1.32 1
10.1.1.0 2
10.1.3.0 3
10.1.0.0 4
255.255.255.255
255.255.255.0
255.255.255.0
255.255.0.0
255.0.0.010.0.0.0 4
Value Mask
Priority Encoder
Port
Associative Memory
Port
Note: Most specific routes appear closest to top of table
4-71
Longest prefix matches using Binary Tries
Example Prefixes:a) 00001b) 00010c) 00011d) 001e) 0101f) 011g) 10h) 1010i) 111j) 111100
e
f
g
h
i
j
0 1
a b c
d
kk) 11110001
4-72
Lookup Performance Required
Line Line Rate Pkt size=40B Pkt size=240B
T1 1.5Mbps 4.68 Kpps 0.78 Kpps
OC3 155Mbps 480 Kpps 80 Kpps
OC12 622Mbps 1.94 Mpps 323 Kpps
OC48 2.5Gbps 7.81 Mpps 1.3 Mpps
OC192 10 Gbps 31.25 Mpps 5.21 Mpps
4-73
Discussion
Why was the Internet Protocol designed this way? Why connectionless, datagram, best-effort? Why not automatic retransmissions? Why fragmentation in the network?
Must the Internet address be hierarchical? What address does a mobile host have? Are there other ways to design networks?
4-74