TECHNISCHE UNIVERSITÄTILMENAU
Inte
grat
ed H
ard-
and
Softw
are
Syst
ems
http
://w
ww
.tu-il
men
au.d
e/ih
sReview of Internet Protocol SuiteInternet Protocol Suite
Link Layer: Ethernet, PPP, ARP, MAC AddressingNetwork Layer: IP, ICMP, RoutingTransport Layer: TCP, UDP, Port Numbers, SocketsApplication Layer: FTP, Telnet & Rlogin, HTTP, RTP
TCPBasic PropertiesTCP Datagram FormatConnection Setup and ReleaseMTU and MSSCumulative, Delayed and Duplicate AcknowledgementsSliding Window MechanismFlow and Error Control
Wireless Internet 2Andreas Mitschele-Thiel 6-Apr-06
Internet Protocol Suite
TCP/IP = the “Internet protocol suite“ = a family of protocols for the “Internet”Internet guesstimates 2003:
800 million users (x 2 each two years), 200 million permanent hosts
Standardisation:ISOC: Internet Society
IAB: Internet Architecture BoardIETF: Internet Engineering Task Force: http://www.ietf.org
Standards & other informations are published as RFCs: Requests for Comments
IRTF: Internet Research Task Force
Implementations:De-facto standard: BSD 4.x implementations (Berkeley Software Distribution)
Subsequent versions come with new TCP features, e.g. 4.3 BSD Tahoe (1988): slow start, congestion avoidance, fast retransmit4.3 BSD Reno (1990): fast recovery
Other TCP/IP stacks derived from BSDImplemented mechanisms, default parameter settings, and bugs are different on different operating systems (e.g. versions of MS Windows)!
Wireless Internet 3Andreas Mitschele-Thiel 6-Apr-06
TCP/IP Layer Overview
Protocol ExamplesTasksTCP/IP Layers(OSI model*)
PPP, Ethernet, IEEE 802.x,ARP
Hardware interfacePacket transfer be-tween network nodes
Link(2)
IP, ICMPRouting of packetsbetween hosts
Network(3)
TCP, UDPEnd-to-end flow of data between applicationprocesses
Transport(4)
Telnet, rlogin, FTP, SMTP, SNMP, ...Application specificApplication
(7)
* Mapping between TCP/IP and OSI layers is not always exact.
Wireless Internet 4Andreas Mitschele-Thiel 6-Apr-06
TCP/IP Encapsulation
user data
user dataappl. header
Application
ethheader
IP header application dataTCP
headereth
trailer
Ethernet Driver
14 20 20 4Ethernet frame
Ethernet: 46...1500 bytes
application dataTCP header
TCP
TCP segment20
IP header application dataTCP
header
IP
IP datagram20...65536 bytes
20
Example: Application data transfer using TCP
Wireless Internet 5Andreas Mitschele-Thiel 6-Apr-06
Application Layer
Transport Layer
... Network Layer
... Link Layer
TCP/IP Basics: Link Layer
User Process
User Process
User Process
User Process
TCP UDP
IPICMP
HardwareInterfaceARP
Wireless Internet 6Andreas Mitschele-Thiel 6-Apr-06
Link Layer Protocols
Examples:Ethernet (encapsulation of higher layer packets is defined in RFC 894)PPP: Point-to-Point Protocol for serial lines (RFCs 1332, 1548)
MTU: Maximum Transfer Unit (or Max. Transmission Unit) Maximum IP packet size in bytes (e.g. for Ethernet: 1500, X.25 Frame Relay: 576)
Path MTU:Smallest MTU of any data link in the path between two hostsUsed to avoid IP fragmentationTCP option: path MTU discovery (RFC 1191)
Loopback Interface:A client application can connect to the corresponding server application on the same host by using the loopback IP address “localhost“ = 127.0.0.1Implemented at the link layer, i.e. full processing of transport and IP layers
ARP: Address Resolution Protocol (RFC 826)Address resolution from 32-bit IP addresses to hardware addresses (e.g. 48-bit)
modemMTU=576
ethMTU=1500
Path MTU=576
Wireless Internet 7Andreas Mitschele-Thiel 6-Apr-06
Application Layer
Transport Layer
... Network Layer
... Link Layer
TCP/IP Basics: Network Layer
User Process
User Process
User Process
User Process
TCP UDP
IPICMP
HardwareInterfaceARP
Wireless Internet 8Andreas Mitschele-Thiel 6-Apr-06
IP: Internet Protocol
IP provides routing (forwarding) between hosts:Based on 32-bit IP addresses *Hop-by-hop using routing tables
Unreliable, connectionless datagram delivery service:packet loss, out-of-order delivery, duplication
IP fragmentation: used on any link with MTU < original datagram length:Duplicates IP header for each fragment and sets flags for re-assemblyRe-assembly at the receiving host only, never in the network
RFC 791
* Applications use the Domain Name Service (DNS) to convert hostnames(e.g. “www.lucent.com“) into IP addresses (135.112.22.95) and vice-versa.IPv6 uses 128-bit addresses
Wireless Internet 9Andreas Mitschele-Thiel 6-Apr-06
QoSrequirements;
rarely used and supported
IP Datagram Format
4-bit version 8-bit type of service
16-bit identification
data
20 bytes
4-bit head-er length 16-bit total length (in bytes)
3-bit flags 13-bit fragment offset
8-bit time to live 8-bit protocol 16-bit IP header checksum
32-bit source IP address
32-bit destination IP address
options (if any)
IP datagram length in bytes(limit = 65536)
- (reserved)- don‘t fragment- more fragments
Unique identifier(counter)
Limit on the number of
routers(countdown)
Higher layer identifier,
e.g.: ICMP=1TCP=6
UDP=17
“Real“fragment offset / 8IPv4
Numberof 32-bit words
16-bit one‘s complement sum of the IP header only
checksum error =>discard datagram + try to send ICMP message
Wireless Internet 10Andreas Mitschele-Thiel 6-Apr-06
ICMP: Internet Control Message Protocol
ICMP packet consists of IP header + ICMP message Used for queries and to communicate error messages back to the sender,
e.g.:“IP header bad““echo request“ (or reply)“host unreachable“Mobile IP messages
Messages are used by higher layers, e.g.:ping, traceroute, TCP, ... HTTP
RFC 792
Wireless Internet 11Andreas Mitschele-Thiel 6-Apr-06
Application Layer
Transport Layer
... Network Layer
... Link Layer
TCP/IP Basics: Transport Layer
User Process
User Process
User Process
User Process
TCP UDP
IPICMP
HardwareInterfaceARP
Wireless Internet 12Andreas Mitschele-Thiel 6-Apr-06
UDP vs. TCP
UDP: User Datagram Protocol (RFC 768)Simple, unreliable, datagram-oriented transport of application data blocks
TCP: Transmission Control Protocol (RFC 793 + others)Connection-oriented, reliable byte stream serviceDetails: see section on TCP
Port numbers are used for application multiplexing:Unique address = IP address + port number = “socket“Concept of well-known ports, e.g. TCP port 21 for FTP (RFC 1340)
Popular API for TCP and UDP connections: Socket API“Stream sockets“ use TCP“Datagram sockets“ use UDP
Wireless Internet 13Andreas Mitschele-Thiel 6-Apr-06
UDP Datagram Format
16-bit source port number 16-bit destination port number
16-bit UDP length 16-bit UDP checksum
data (if any)
8 bytes
Optional 16-bit one‘s complementsum of UDP pseudo-header (12 bytesof the IP header ) + UDP header + data (padded to 16-bit multiple)
checksum error =>discard datagram silently
UDP datagram length in bytes
(redundant)
Used for application multiplexing
Used for application multiplexing
Wireless Internet 14Andreas Mitschele-Thiel 6-Apr-06
Application Layer
Transport Layer
... Network Layer
... Link Layer
TCP/IP Basics: Selected Applications
User Process
User Process
User Process
User Process
TCP UDP
IPICMP
HardwareInterfaceARP
Wireless Internet 15Andreas Mitschele-Thiel 6-Apr-06
FTP: File Transfer Protocol
File transfer based on TCPTCP control connection:
To well-known server port 21ASCII commands
TCP data connectionQoS requirements:
High throughput (optimise TCP bulk data flow)RFC 959
Wireless Internet 16Andreas Mitschele-Thiel 6-Apr-06
Telnet and Rlogin
Used for remote login based on TCPRlogin (RFC 1282):
Simple protocol designed for UNIX hostsTelnet (RFC 854):
Any OSOption negotiationMore flexible and better performance
Client operation principle: Send each keystroke to the server Option: TCP’s Nagle algorithm groups multiple bytes into one segmentDisplay every response from the server
QoS requirements: Low-RTT transport of small packets (optimise TCP interactive data flow)
RTT = round-trip-time (sender – receiver – sender)
Wireless Internet 17Andreas Mitschele-Thiel 6-Apr-06
HTTP: Hypertext Transfer ProtocolTransfer of webpages based on TCP:
Webpage typically consists of an HTML (Hyper Text Markup Language) document + various embedded objects, e.g. pictures
HTTP/1.0: Objects are (requested and received) seriallyFor each object, a new TCP connection is established, used and releasedMultiple connections: several TCP connections can be used in parallel
HTTP/1.1: performance improvements by:Persistent Connections:
TCP connections are not released after each object, but used for the next one
– avoids TCP connection establishment and termination– avoids slow start for each new connection
Pipelining: Multiple objects can be requested in one packetRequested objects are sent sequentially over one TCP connection
Together with multiple connections (HTTP/1.0 feature), these options result in significant performance improvements
Wireless Internet 18Andreas Mitschele-Thiel 6-Apr-06
RTP: Real-time Transport Protocol
Transfer of real-time data based on UDPRTP:
for media with real-time characteristics (audio/video)services: payload type specification, sequence numbering, timestamping, source identification & synchronization, delivery monitoringno guaranteed quality of service (QoS)
RTCP (Real-time Transport Control Protocol): QoS monitoring & periodic feedback:
Sender report (synchronisation, expected rates, distance)Receiver report (loss ratios, jitter)
Network independent: on top of unreliable, low-delay transport service
RFC 1889
ITU-T H.225.0 Annex A => H.323 => e.g. MS Netmeeting, VoIP
Wireless Internet 19Andreas Mitschele-Thiel 6-Apr-06
Summary: Internet Protocol Suite
The TCP/IP protocol suite is a heterogenous family of protocols for the global Internet
At the center and always used: IPRouting between hosts
Application data transport by UDP: unreliable datagram serviceTCP: reliable byte-stream service
TCP/IP stack is part of each operating system: Numerous different implementations and bugs exist
TCP performance is extremely important!TCP carries 62% of the flows, 85% of the packets, and 96% of the bytes of Internet traffic (http://www.cs.columbia.edu/~hgs/internet/traffic.html)
TCP’s complex error control mechanisms are designed for wired networks=> special problems for wireless transport
Wireless Internet 20Andreas Mitschele-Thiel 6-Apr-06
TCP (Transmission Control Protocol)
PropertiesConnection-oriented, reliable byte-stream service:
Reliability by ARQ (Automatic Repeat reQuest):TCP receiver sends acknowledgements (acks) back to TCP sender to confirm delivery of received dataCumulative, positive acks for all contiguously received dataTimeout-based retransmission of segments
TCP transfers a byte stream:Segmentation into TCP segments, based on MTUHeader contains byte sequence numbers
Congestion avoidance + flow control mechanism
In the following examples: Packet sequence numbers (instead of byte sequence numbers)ack i acknowledges receipt of packets through packet i (instead of bytes)
Wireless Internet 21Andreas Mitschele-Thiel 6-Apr-06
TCP Segment Format
6 bits reserved
16-bit source port number
data (if any)
20 bytes
4-bit head-er length 16-bit window size
16-bit TCP checksum
32-bit sequence number
options (if any)
16-bit destination port number
32-bit acknowledgment number
6-bit flags
16-bit urgent pointer
16-bit one‘s complement sum of TCP pseudo-header (12 bytes of the IP header) + TCP header + data (padded to 16-bit multiple)
checksum error => discard datagram silently!=> using an erroneous header is dangerous; loss will be detected by other mechanisms
Identifies the number of the first data bytein this segment within the byte stream
Ack for the reverse link: next sequence number that is expected to be received
Number of 32-bit words
URGACKPSHRSTSYNFIN
Advertised window size: number ofbytes the receiver is willing to accept
TCP is full duplex:Each segment contains an ack for the reverse link
A ”pure” ack is a segment with empty data
Wireless Internet 22Andreas Mitschele-Thiel 6-Apr-06
TCP Connection Establishment and Termination
Client Server
Segment 3: ACK
Three-way handshake
*ISN: initial sequence number(RFC 793)
Segment 1: SYN + ISN* + options, e.g. MSS
Active open:
Segment 2: SYN, ACK + ISN + options, e.g. MSS
Passive open:
Application close => Segment 1: FIN
Active close: Passive close:
=> Send EOF to application
Segment 2: ACK; application can still send data
Half-close #1
Application close => Segment 3: FINSegment 4: ACK Half-close #2
=> Connection establishment & termination take at least 1 RTT
Wireless Internet 23Andreas Mitschele-Thiel 6-Apr-06
MTU and MSS: Maximum Segment Size
Client ServerApplication
TCP
IP
Link Layer
Request to connect to Server
SYN, MSS=536TCP Connection
establishment
MSS is optionally announced (not negotiated) by each host at TCP connection establishment. The smaller value is used by both ends, i.e. 536 in the above example.Note that “real“ TCP payload is smaller if TCP options are used.
MSS = 536
- Fixed TCP header = 20
- Fixed IP header = 20
MTU = 576 (e.g. modem)
MSS = 1460
- Fixed TCP header = 20
- Fixed IP header = 20
MTU = 1500 (e.g. ethernet)
SYN, ACK, MSS=1460
find network interface
Wireless Internet 24Andreas Mitschele-Thiel 6-Apr-06
Cumulative Acknowledgements
A new cumulative ack is generated only on receipt of a new in-sequencesegment
i data acki
TCPsender
TCPreceiverRouter40 39 3738
received:...3536
41 40 3839
35 373634
received:...353637
timestep3533 3634
Wireless Internet 25Andreas Mitschele-Thiel 6-Apr-06
Delayed Acknowledgements
Delaying acks reduces ack trafficAn ack is delayed until
another segment is received, ordelayed ack timer expires (200 ms typical)
40 39 3738
3533
received:...3536
New ack not producedon receipt of segment 36,
but on receipt of 37
41 40 3839
35 37
received:...353637
Wireless Internet 26Andreas Mitschele-Thiel 6-Apr-06
Duplicate Acknowledgements 1
A dupack is generated whenever an out-of-order segment arrives at the receiver (packet 37 gets lost)
40 39 3738
3634
received:...36
42 41 3940
36 36
received:...36x
38
dupackon receipt of 38
2 timesteps
packet loss
Wireless Internet 27Andreas Mitschele-Thiel 6-Apr-06
Duplicate Acknowledgements 2
Dupacks are not delayedDupacks may be generated when
a segment is lost (see previous slide), ora segment is delivered out-of-order:
40 39 3837
3634
41 40 3739
36 36
dupackon receipt of 38
received:...36x
38
received:...36
1 timestep
Wireless Internet 28Andreas Mitschele-Thiel 6-Apr-06
Duplicate Acknowledgements 3
40 37 3839
3634
41 40 3937
36 36
dupack
34
received:...36x
38
received:...36
42 41 3740
36 36 36
dupackdupack
received:...36x
3839
43 42 4041
36 36 39
new ackdupack
received:...36373839
36
dupack
Number of dupacksdepends on how much out-of-order a packet is
A series of dupacksallows the sender to guess that a single packet has been lost
Wireless Internet 29Andreas Mitschele-Thiel 6-Apr-06
Window Based Flow Control 1
Sliding window protocol
Window size W is minimum ofreceiver’s advertised window - determined by available buffer space at the receiver and signalled with each ackcongestion window - determined by the sender, based on received acks
TCP’s window based flow control is “self-clocking”:New segments are sent when outstanding segments are ack’d
2 3 4 5 6 7 8 9 10 11 131 12
Sender’s window
Acks received Not transmitted
Wireless Internet 30Andreas Mitschele-Thiel 6-Apr-06
Window Based Flow Control 2
Optimum window size: W = data rate * RTT = “bandwidth-delay product”(optimum use of link capacity: “pipe is full”)
38 373940
3335
3436
packetdimensions:
rate
transmittime
TCPsender
TCPreceiverRouter
size
W = 8 segments (33...40)
What if window size is too large?Queuing at intermediate routers (e.g. at wireless access point)=> increased RTT due to queuing delays=> potential of packet loss
What if window size is too small?Inefficiency: unused link capacity
Wireless Internet 31Andreas Mitschele-Thiel 6-Apr-06
Packet Loss Detection Based on TimeoutTCP sender starts a timer for a segment (only one segment at a time) If ack for the timed segment is not received before timer expires,
outstanding data are assumed to be lost and retransmitted => go-back-N ARQ
Retransmission timeout (RTO) is calculated dynamically based on measured RTT:
RTO = mean RTT + 4 * mean deviation of RTTMean deviation δ = average of |sample – mean| is easier to calculate than standard deviation (and larger, i.e. more conservative)
Large variations in the RTT increase the deviation, leading to larger RTORTT is measured as a discrete variable, in multiples of a “tick”:
1 tick = 500 ms in many implementationssmaller tick sizes in more recent implementations (e.g. Solaris)
RTO is at least 2 clock ticks
Wireless Internet 32Andreas Mitschele-Thiel 6-Apr-06
Exponential Backoff
Double RTO on successive timeouts:
Total time until TCP gives up is up to 9 minRationale: Allow an intermediate, congested router to recoverProblem: If ack is lost, TCP just waits for the next timeout
Segmenttransmitted
Timeout occursbefore ack received,
segment retransmitted
Timeout interval doubled
T1=RTO T2 = 2 * T1
Wireless Internet 33Andreas Mitschele-Thiel 6-Apr-06
Packet Loss Detection Based on Dupacks:Fast Retransmit Mechanism
TCP sender considers timeout as a strong indication that there is a severe link problem
On the other hand, continuous reception of dupacks indicates that following segments are delivered, and the link is ok
=> TCP sender assumes that a (single) packet loss has occurred if it receives three dupacks consecutively
=> Only the (single) missing segment is retransmitted => selective-repeat ARQ
Note: 3 dupacks are also generated if a segment is delivered at least 3 places out-of-order => Fast retransmit useful only if lower layers deliver packets “almost ordered” - otherwise, unnecessary fast retransmit
Wireless Internet 34Andreas Mitschele-Thiel 6-Apr-06
Flow Control by the Sender
Slow StartInitially, congestion window size (cwnd) = 1 MSS Increment cwnd by 1 MSS on each new ackSlow start phase ends when cwnd reaches ssthresh (slow-start
threshold)=> cwnd grows exponentially with time during slow start (in theory)
Factor of 1.5 per RTT if every other segment is ack’dFactor of 2 per RTT if every segment is ack’dIn practice: increase is slower because of network delays (see next slide)
Congestion AvoidanceOn each new ack, increase cwnd by 1/cwnd segments=> cwnd grows linearly with time during congestion avoidance (in
theory)1/2 MSS per RTT if every other segment ack’d1 MSS per RTT if every segment ack’d
Wireless Internet 35Andreas Mitschele-Thiel 6-Apr-06
02468
101214
0 1 2 3 4 5 6 7 8 9
Time / RTT
cwnd
(seg
men
ts)
Slow Start
CongestionAvoidance
ssthresh
Slow Start & Congestion Avoidance – Theory
Theoretical assumption: after sending n segments, n acks arrive within one RTT.
Note that Slow Start starts slowly, but speeds up quickly.
Receiver’sadvertised window = 12
Wireless Internet 36Andreas Mitschele-Thiel 6-Apr-06
Slow Start – Reality (Including Network Delay)Taking network delay into account, “cwnd increases exponentially” turns into:
cwnd increases sub-exponentiallypairs of segments are sent while pipe fills
Simple example: one-way delay = 1 timestepdata rate = 1 segment / timestep
Time-step Sender action cwnd
#segments sent
#segments outstanding
#segments recv'd and
ack'd Receiver action0 initial values 1 0
send segment 1 1 11 1 receive and ack segment 1
2 receive ack 1 2 0send segments 2 and 3 2 2
3 1 receive and ack segment 2
4 receive ack 2 3 1 1 receive and ack segment 3send segments 4 and 5 2 3
5 receive ack 3 4 2 1 receive and ack segment 4send segments 6 and 7 2 4
6 receive ack 4 5 3 1 receive and ack segment 5send segments 8 and 9 2 5
sending rate > data rate (cwnd > 2)(timestep 4 onwards)
=> at some point in time there will be a packet loss, causing TCP to slow down
Wireless Internet 37Andreas Mitschele-Thiel 6-Apr-06
Congestion Control after Packet Loss
Packet loss detected by timeout (=> severe link problem):Retransmit lost segmentsGo back to Slow Start:
Reduce cwnd to initial value of 1 MSSSet ssthresh to half of window size before packet loss:
ssthresh = max((min(cwnd, receiver’s advertised window)/2 ), 2 MSS)
Packet loss detected by ≥3 dupacks (=> single packet loss, but link is ok):Fast Retransmit single missing segmentInitiate Fast Recovery:
Set ssthresh and cwnd to half of window size before packet loss:ssthresh = max((min(cwnd, receiver’s advertised window)/2), 2 MSS)cwnd = ssthresh + number of dupacks
When a new ack arrives: continue with Congestion Avoidance:cwnd = ssthresh
Wireless Internet 38Andreas Mitschele-Thiel 6-Apr-06
Packet Loss Detected by Timeout
0
5
10
15
20
25
0 3 6 9 12 15 20 22 25
Time / RTT
cwnd
(seg
men
ts)
ssthresh = 8ssthresh = 10
cwnd = 20Timeout
cwnd = 1
Wireless Internet 39Andreas Mitschele-Thiel 6-Apr-06
0
2
4
6
8
10
0 2 4 6 10 12 14Time / RTT
cwnd
(seg
men
ts)
After Fast Recovery
Packet Loss Detected by ≥3 Dupacks
After fast retransmit and fast recovery window size is reduced in halfMultiple packet losses within one RTT can result in timeout
ssthresh = 4
≥3 Dupacks
cwnd = 8
cwnd = 4
Wireless Internet 40Andreas Mitschele-Thiel 6-Apr-06
Summary: TCP
TCP provides a connection-oriented, reliable byte-stream service:
application data stream is transferred in segments based on lower layer MTUreceiver sends back cumulative acknowledgements (acks)sliding window mechanism with flow control based on
receiver’s advertised window,sender’s Slow Start and Congestion Avoidance mechanisms
Error control & packet loss detection based on adaptive retransmission timeout => back to Slow Start,duplicate acknowledgments (dupacks) => Fast Retransmit & Fast Recovery
Wireless Internet 41Andreas Mitschele-Thiel 6-Apr-06
References
The bible: W. Richard Stevens, “TCP/IP Illustrated, Volume 1: The Protocols“
Douglas E. Comer: Computernetzwerke und Internets. 3. Auflage, Pearson Studium, Prentice Hall, 2002
The Internet...
Standards (RFCs): http://www.ietf.org/
Top Related