Lecture 7: File Systems Networkinglabe.felk.cvut.cz/~stepan/AE3B33OSD/OSD-Lecture-7.pdf ·...
Transcript of Lecture 7: File Systems Networkinglabe.felk.cvut.cz/~stepan/AE3B33OSD/OSD-Lecture-7.pdf ·...
Lecture 7 / Page 2 AE3B33OSD 2011
Contents
Files & File System Interface
Directories & their Organization
File System Implementation
Disk Space Allocation
File System Efficiency & Reliability
Networking
OSI - Open System Interconnection Model
Ethernet
IP
UDP
TCP
Lecture 7 / Page 3 AE3B33OSD 2011
File Systems Interface
Concept of the file Contiguous logical address space Types:
Data – numeric, character, binary Program
File Structure None - sequence of words, bytes
Simple record structure – lines, fixed length records, variable length records
Complex Structures Formatted documents, relocatable load files
Complex Structures can be simulated by simple record structures through inserting appropriate control
characters
by having special control blocks in the file (e.g., section table at the file beginning)
Lecture 7 / Page 4 AE3B33OSD 2011
File Systems Interface (2)
File Attributes Name – the only information kept in human-readable form
Identifier – unique tag (number) identifies file within file system
Type – needed for systems that support different types
Location – information on file location on a device
Size – current file size
Protection – for control who can do reading, writing, executing
Time, date, and user identification – data for protection, security, and usage monitoring
Information about files is kept in the file-system structures, which are stored and maintained on the disk
File Operations – exported by the OS API (cf. e.g., POSIX)
Open(Fi) – search the directory structure on disk for entry Fi, and move the content of entry to memory
Write, Read, Reposition within file Close(Fi) – move the content of entry Fi in memory to directory
structure on disk Delete, Truncate etc.
Lecture 7 / Page 5 AE3B33OSD 2011
Directory Structure
Directory is a collection of nodes containing information about files Both the directory
structure and the files reside on disk
A Typical File-system Organization
F1 F2 F3
F4
Fn
Directory
Files
Lecture 7 / Page 6 AE3B33OSD 2011
Logical Organization the Directories
Operations Performed on Directory Search for a file Create a file Delete a file List a directory Rename a file Traverse the file system
Organize directories to get Efficiency – locating a file quickly
The same file can have several different names
Naming – convenient to users Two users can have same name for different files
Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)
Lecture 7 / Page 7 AE3B33OSD 2011
Single-Level Directory
A single directory for all users
Easy but Naming problem Grouping problem Sharing problem
Lecture 7 / Page 8 AE3B33OSD 2011
Two-Level Directory
Separate directory for each user
Path name
Can have the same file name for different user
Efficient searching
No grouping capability
Lecture 7 / Page 9 AE3B33OSD 2011
Tree-Structured Directories
Efficient searching Grouping Capability Current directory (working directory)
cd /spell/mail/prog type list
Lecture 7 / Page 10 AE3B33OSD 2011
Acyclic-Graph Directories
Have shared subdirectories and files aliasing – an object can have
different names
Problem: When ‘joe’ deletes file
‘test’, the directory item ‘joetst’ points wrong
Solution: Each object has a counter
containing a count of references. The counter increments when a new reference is created and decrements when a reference is deleted. The object is erased when the counter drops to zero
test mail prg.c
sent inbox
text joetst mail
jeff joe /home:
Lecture 7 / Page 11 AE3B33OSD 2011
File System Mounting
A file system must be mounted before it can be accessed E.g., file system on a removable media must be ‘announced’ to the
OS, i.e. must be mounted
Have prepared a mount point – a directory Anything referenced from the mount-point before mounting will be
hidden after mounting
/
home
adam mnt docs
joe
progs work
docs
/
joe
progs work
/
home
adam mnt
submnt
Lecture 7 / Page 12 AE3B33OSD 2011
File Sharing
Sharing of files on multi-user systems is desirable
Sharing may be done through a protection scheme
On distributed systems, files may be shared across a network Network File System (NFS) is a common distributed file-sharing
method
User IDs identify users, allowing permissions and protections to be per-user
Group IDs allow users to be in groups, permitting group access rights POSIX rwx|rwx|rwx scheme
U G O ACL – Access Control Lists (Windows, some UNIXes)
Lecture 7 / Page 13 AE3B33OSD 2011
File System Implementation Objectives
Implementation possibilities of local file systems and directory structures
File block allocation and free-block strategies, algorithms and trade-offs
File structure Logical storage unit
Collection of related information
File system resides on secondary storage (disks)
File system is organized into layers
File control block – storage structure consisting of information about a file Size, ownership, allocation info, time
stamps, ...
Lecture 7 / Page 14 AE3B33OSD 2011
In-Memory File System Structures
The following figure illustrates the necessary file system structures provided by the operating systems.
opening a file
reading a file
Lecture 7 / Page 15 AE3B33OSD 2011
Virtual File Systems
Virtual File Systems (VFS) provide an object-oriented way of implementing file systems.
VFS allows the same system call interface (the API) to be used for different types of file systems.
The API is to the VFS interface, rather than any specific type of file system.
Lecture 7 / Page 16 AE3B33OSD 2011
Directory Implementation
Linear list of file names with pointer to the data blocks. simple to program
time-consuming to execute
Hash Table – linear list with hash data structure. decreases directory search time
collisions – situations where two file names hash to the same location
fixed size
Complex data structure – e.g., B+ tree NTFS in MS Windows
Lecture 7 / Page 17 AE3B33OSD 2011
Allocation Methods for Files
An allocation method refers to how disk blocks are allocated for files: Contiguous allocation Linked allocation Indexed allocation
Contiguous allocation
– simple to implement Each file occupies a set of
contiguous blocks on the disk Simple – only starting location
(block #) and length (number of blocks) are required
Random access Wasteful of space (dynamic
storage-allocation problem) Files cannot grow
Lecture 7 / Page 18 AE3B33OSD 2011
Extent-Based Systems
Many newer file systems (e.g., Veritas File System) use a modified contiguous allocation scheme
Extent-based file systems allocate disk blocks in extents
An extent is a contiguous block of disks Extents are allocated for file growth
A file consists of one or more extents
Lecture 7 / Page 19 AE3B33OSD 2011
Linked Allocation
Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk.
Simple – need only starting address Free-space management system
– no waste of space Difficult random access
must go through the whole chain
pointer to next block block =
Lecture 7 / Page 20 AE3B33OSD 2011
Linked Allocation with FAT
Allocation chains stored separately File-allocation table (FAT)
Disk-space allocation used by MS-DOS and OS/2.
Problems: Size of the table Access speed Reliability
All file info is concentrated in one place
FAT duplicates
Lecture 7 / Page 21 AE3B33OSD 2011
Allocation block size with FAT
Allocation block, cluster group of adjacent disk sectors
Fixed size of FAT on disk Different FAT types
FAT item has 12, 16 or 32 bits Directory entry (MSDOS):
Addressing capability of different FAT types
File size 1st block Date and time Reserved Attrs Extension Name FAT-16
4 2 4 10 1 3 8 bytes
2 TB 2 GB 32 KB = 64 sectors
2 TB 1 GB 16 KB = 32 sectors
2 TB 512 MB
b)
8 KB = 16 sectors
1 TB 256 MB 16 MB 4 KB = 8 sectors
128 MB 8 MB 2 KB = 4 sectors
4 MB 1 KB = 2 sectors a)
2 MB 0.5 KB = 1 sector
FAT-32 FAT-16 FAT-12 Block size
Empty entries in the table are unused because:
a) FAT is too large compared to the disk capacity
b) losses due to internal fragmentation are to high
Lecture 7 / Page 22 AE3B33OSD 2011
Indexed Allocation
Brings all pointers for one file together into an index block. Logical view
Need index table
Random access
Dynamic access without external fragmentation, but have overhead of index block. Mapping from logical to physical in a file of maximum size of 256K
words and block size of 512 words. We need only 1 block for index table
Only “small” files
index table
Lecture 7 / Page 25 AE3B33OSD 2011
Free-Space Management
Bit vector (n blocks) – one bit per block Bit map requires extra space Easy to get contiguous files
Linked list (free list)
Cannot get contiguous space easily No waste of space
Need to protect:
Pointer to free list Bit map
Must be kept on disk Copy in memory and disk may differ Cannot allow for block[i] to have a
situation where bit[i] = 1 in memory and bit[i] = 0 on disk
Solution: Set bit[i] = 1 in disk Allocate block[i] Set bit[i] = 1 in memory
…
0 1 2 n-1
bit[i] =
0 block[i] free
1 block[i] occupied
Lecture 7 / Page 26 AE3B33OSD 2011
Directory Implementation
Linear list of file names with pointer to the data blocks simple to implement
time-consuming to execute
directory can grow and shrink
Hash Table – linear list with hash data structure decreases directory search time
collisions – situations where two file names hash to the same location
fixed size
Lecture 7 / Page 27 AE3B33OSD 2011
File System Efficiency and Performance
Efficiency dependent on: disk allocation and directory algorithms types of data kept in file’s directory entry
Performance disk cache – separate section of main memory for frequently
used blocks free-behind and read-ahead – techniques to optimize sequential
access improve PC performance by dedicating section of memory as
virtual disk, or RAM disk
Lecture 7 / Page 28 AE3B33OSD 2011
Recovery from a Crash
Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies
Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape, other magnetic disk, optical)
Recover lost file or disk by restoring data from backup
Lecture 7 / Page 29 AE3B33OSD 2011
Log Structured File Systems
Log structured (or journaling) file systems record each update to the file system as a transaction similar to database systems
All transactions are written to a log A transaction is considered committed once it is written to the log
However, the file system may not yet be updated
The transactions in the log are asynchronously written to the file system When the file system is modified, the transaction is removed from
the log
If the file system crashes, all remaining transactions in the log must still be performed
Used by NTFS file system
Lecture 7 / Page 31 AE3B33OSD 2011
Network Structure - LAN
Local-Area Network (LAN) – designed to cover small geographical area. Multiaccess bus, ring, or star network
Speed 10 megabits/second, or higher
Broadcast is fast and cheap
Nodes: usually workstations and/or
personal computers
a few (usually one or two) mainframes
Lecture 7 / Page 32 AE3B33OSD 2011
Network Structure - WAN
Wide-Area Network (WAN) – links geographically separated sites Point-to-point connections over long-haul lines (often leased from a
phone company) Broadcast usually requires
multiple messages Nodes:
often a high percentage of mainframes
Lecture 7 / Page 33 AE3B33OSD 2011
Network Topology Sites in the system can be physically connected in a
variety of ways; they are compared with respect to the following criteria: Basic cost – How expensive is it to link various sites in the
system? Communication cost – How long does it take to send a
message from site A to site B? Reliability – If a link or a site in the
system fails, can the remaining sites still communicate with each other?
The various topologies are depicted as graphs whose nodes correspond to sites An edge from node A to
node B corresponds to a direct connection between the two sites
Lecture 7 / Page 35 AE3B33OSD 2011
Communication Structure
The design of a communication network must address four basic issues:
Naming and name resolution How do two processes locate each other to communicate?
Routing strategies How are messages sent through the network?
Connection strategies How do two processes send a sequence of messages?
Contention The network is a shared resource, so how do we resolve
conflicting demands for its use?
Lecture 7 / Page 36 AE3B33OSD 2011
Communication Protocol
Physical layer – handles the mechanical and electrical details of the physical transmission of a bit stream
Data-link layer – handles the frames, or fixed-length parts of packets, including any error detection and recovery that occurred in the physical layer
Network layer – provides connections and routes packets in the communication network, including handling the address of outgoing packets, decoding the address of incoming packets, and maintaining routing information for proper response to changing load levels
The communication network is partitioned into the
following multiple layers:
Lecture 7 / Page 37 AE3B33OSD 2011
Communication Protocol (Cont.)
Transport layer – responsible for low-level network access and for message transfer between clients, including partitioning messages into packets, maintaining packet order, controlling flow, and generating physical addresses
Session layer – implements sessions, or process-to-process communications protocols
Presentation layer – resolves the differences in formats among the various sites in the network, including character conversions, and half duplex/full duplex (echoing)
Application layer – interacts directly with the users’ deals with file transfer, remote-login protocols and electronic mail, as well as schemas for distributed databases, etc.
Lecture 7 / Page 41 AE3B33OSD 2011
Design Issues
Transparency – the distributed system should appear as a conventional, centralized system to the user
Fault tolerance – the distributed system should continue to function in the face of failure
Scalability – as demands increase, the system should easily accept the addition of new resources to accommodate the increased demand
Clusters – a collection of semi-autonomous machines that acts as a single system
Lecture 7 / Page 42 AE3B33OSD 2011
Example: Networking
The transmission of a network packet between hosts on an Ethernet network
Every host has a unique IP address and a corresponding Ethernet (MAC) address
Communication requires both addresses
Domain Name Service (DNS) can be used to acquire IP addresses
Address Resolution Protocol (ARP) is used to map MAC addresses to IP addresses
If the hosts are on the same network, ARP can be used If the hosts are on different networks, the sending host will send
the packet to a router which routes the packet to the destination network
Lecture 7 / Page 43 AE3B33OSD 2011
An Ethernet Packet
Preamble Destination
address Source address
Frame type
Message data Checksum
64 bits 48 bits 48 bits 16 bits 368-12000 bits 32 bits
Preamble:
7 bytes [10101010] + 1 byte [10101011] used to synchronize the transfer rate
6 byte addresses (MAC addresses) World unique addresses
If destination address = 0xFF-FF-FF-FF-FF-FF then broadcast
Frame type – indicates different standards: 0x0800 = the packet contains an IPv4 datagram 0x86DD indicates an IPv6 frame 0x0806 indicates an ARP frame
Message data usually 1500 bytes Encapsulates packet with the transferred data
Lecture 7 / Page 44 AE3B33OSD 2011
Internet Architecture
Basic Internet properties Each host has its unique identification: the IP address
IP address is composed by the network address and the host address within the network
The applications and the network API behavior is independent of the LAN technology
Basic architecture of the Internet (and general internetworking)
Gateways and routers connect LAN’s
LAN’s may be based on different technologies
Gateways keep info on hosts belonging to LAN’s they connect Routers maintain knowledge about networks
Routers forward packets based on the network part of the IP address (the host part is ignored when forwarding)
Units connecting LAN’s usually merge gateway and router functionality
IP protocols consider all LAN’s as equivalent regardless of the technology
LAN 1 LAN 2 LAN 3 gateway or
router
gateway or
router
Lecture 7 / Page 45 AE3B33OSD 2011
Internet addresses
Current Internet – v. 4 uses 32 bits addresses Convention: decimal numbers per 8 bits each – 147.32.85.1
Internet v. 6 uses 128 bits addresses Not treated here as still under experimental development
IP address Identifies each single network adaptor
Host can have several adaptors („multi-homed“ host)
One adaptor can even have more addresses
Is composed of two parts
Identification (address) of the network – netid (leftmost bits)
Identification (address) of the host within the network – hostid (rightmost
bits)
Lecture 7 / Page 46 AE3B33OSD 2011
Internet addresses (2)
Primary IP address classes
Reserved ranges: In A 0.0.0.0 – 0.255.255.255, 127.0.0.0 – 127.255.255.255,
In B 128.0.0.0 – 128.0.255.255, 191.255.0.0 – 191.255.255.255,
In C 192.0.0.0 – 192.0.0.255, 255.255.255.0 – 255.255.255.255
0 1 2 3 4 8 16 24 31 Net Mask Address range
Class A 0 netid hostid 255.0.0.0 0.0.0.0 –
127.255.255.255
Class B 1 0 netid hostid 255.255.0.0 128.0.0.0 –
191.255.255.255
Class C 1 1 0 netid hostid 255.255.255.0 192.0.0.0 –
255.255.255.255
Lecture 7 / Page 47 AE3B33OSD 2011
Internet addresses (3)
Convention: Network address netid is the full IP address with hostid = 0
Address composed by netid and the hostid part is full of "1" serves for addressing all hosts in the network (network broadcast address)
Net maska: Address „arithmetic“
Special addresses 127.0.0.1 – loopback address – a host speaks to itself
Private addresses – may not spread over Internet – routers must not forward datagrams containing these addresses 1 class A network: 10.0.0.0 – 10.255.255.255
16 class B networks: 172.16.0.0 – 172.31.255.255
256 class C networks: 192.168.0.0 – 192.168.255.255
Multicast addresses – one host sends info to many “subscribed” hosts (e.g. Internet TV)
range 224.0.0.0 – 238.255.255.255
hostidNetMaskAddressIP
netidNetMaskAddressIP
)(_
_
Lecture 7 / Page 48 AE3B33OSD 2011
Internet addresses (4) CIDR addressing (= Classless Inter-Domain Routing)
Address arithmetic enables for more efficient splitting netid|hostid – the border between netid and hostid may be anywhere
Net mask may be any number composed of n (n=0 ... 32) leftmost “1” bits
CIDR notation: IP_Address/n; e.g.: 147.32.85.128 – 147.32.85.191 = 147.32.85.128/26
Reserved ranges in CIDR notation: 0.0.0.0/8, 127.0.0.0 /8, 128.0.0.0/16, 191.255.0.0/16, 192.0.0.0/24, 255.255.255.0/24
LAN 192.168.200.64/30 contains 4 addresses: 192.168.200.64 = netid, 192.168.200.65=host1, 192.168.200.66=host2, 192.168.200.67 = LAN broadcast
Saving IP addresses Using private addresses and their translation to “public” addresses
(NAT = Network Address Translation) Many private addresses is translated into 1 public Problem with publicly available servers on the private address LAN
(behind the NAT router) The principle of NAT is connected to IP protocols
Internet LAN with private
IP addresses Router
with NAT 147.32.85.27 192.168.100.1
Lecture 7 / Page 49 AE3B33OSD 2011
Internet datagrams
Internet creates a virtual network and carries IP datagrams
The network is a best effort delivery system
Datagrams travel through Internet over different physicals LAN’s
Datagrams may not depend on the LAN technology
Format of an IP datagram
Datagram
header Datagram data part
Frame header
Frame data part
Encapsulating IP datagram into physical network frame
Lecture 7 / Page 50 AE3B33OSD 2011
IP datagram header
Every IP datagram has a header carrying information necessary for datagram delivery
Fields VERS: IP protocol version – for IP v. 4 VERS = 4
HLEN: Header length in 32-bit words (standard 5)
TOTAL LENGTH: of the datagram in bytes (header+data) – max. 64 kB
SOURCE IP ADDRESS: sender’s IP address
DESTINATION IP ADDRESS: IP address where to deliver
IDENTIFICATION: usually sequential or a random number generated by the datagram sender
0 4 8 16 19 24 31
VERS HLEN SERVICE TYPE TOTAL LENGTH
IDENTIFICATION FLAGS FRAGMENT OFFSET
TIME TO LIVE PROTOCOL HEADER CHECKSUM
SOURCE IP ADDRESS
DESTINATION IP ADDRESS
IP OPTIONS (IF ANY) PADDING
DATA
…
IP datagram format
Lecture 7 / Page 51 AE3B33OSD 2011
IP datagram header (cont.)
PROTOCOL: Identification of the protocol carried in the IP datagram (UDP=17, ICMP=1, TCP=6, ...). Defined in RFC 1060
FLAGS, FRAGMENT OFFSET: Information on datagram fragmentation
TIME TO LIVE (TTL): Determines how long the datagram may travel through Internet. Every router decrements this value; if TTL==0 the datagram is discarded and the router sends an ICMP message to the sender
SERVICE TYPE: 8-bit field commanding the datagram routing
0 1 2 3 4 5 6 7 PRECEDENCE D T R UNUSED
Datagram precedence: 0=normal,
7=network control Low
delay
High
throughput High reliability
Lecture 7 / Page 52 AE3B33OSD 2011
Datagram fragmentation (1)
MTU: (Maximum Transmission Unit) defines the maximum datagram size that can be transferred in a LAN Net type Implicit MTU Net type Implicit MTU PPP 296 X.25 576 Ethernet 1 500 FDDI 4 352 TokenRing 4Mb 4 464 TokenRing 16Mb 17 914
Internet – a set of LAN’s with different MTU’s Whenever the datagram is larger than MTU it must be fragmented
Host
B
Net 3,
MTU=1500
Host
A
Net 1,
MTU=1500
Net 2,
MTU=620
G 1 G 2
Lecture 7 / Page 53 AE3B33OSD 2011
Datagram fragmentation (2) Fragmentation occurs anywhere during the datagram travel
If the datagram is fragmented it is not defragmented on the way and the reconstruction is the task of the destination host
Every fragment travels as a separate datagram: The following fields are copied from the original datagram header: VERS,
HLEN, SERVICE TYPE, IDENTIFICATION, PROTOCOL, SOURCE IP ADDRESS, DESTINATION IP ADDRESS
TOTAL LENGTH is changed to the fragment size a and the field FRAGMENT OFFSET determines the offset of the fragment in the original datagram
Field FLAGS contains a bit more fragments. If this bit is 0 then the target host knows that the last fragment has been obtained. The target host reconstructs the original datagram using the FRAGMENT OFFSET and TOTAL LENGTH fields
DATAGRAM HEADER
Data_1 600 bytes
Data_2 600 bytes
Data_3 200 bytes
FRAGMENT 1 HEADER
Data_1 Fragment 1 (offset=0)
FRAGMENT 2 HEADER
Data_2 Fragment 2 (offset=600)
FRAGMENT 3 HEADER
Data_3 Fragment 3 (offset=1200)
1400 bytes datagram fragmentation
Lecture 7 / Page 54 AE3B33OSD 2011
UDP Protocol
UDP (= User Datagram Protocol), IP protocol number = 17 Very simple transport protocol defined in RFC 768 Provides connectionless and unreliable transport of user datagrams. If any kind of transmission reliability is needed, it must be
implemented in the user's application. Compared to pure IP datagrams, UDP has the ability to address
target processes on the destination host using the port field
UDP HEADER
UDP DATA AREA
IP HEADER IP DATA AREA
FRAME HEADER FRAME DATA AREA
UDP encapsulated in an IP datagram
0 16 31 UDP SOURCE PORT UDP DESTINATION PORT
UDP MESSAGE LENGTH UDP CHECKSUM
UDP Header
PORT fields are used to distinguish computing processes awaiting a
UDP datagram on the destination host. SOURCE PORT field is optional; must be 0 if unused. Otherwise it contains the port number to which a possible reply should
be sent.
Lecture 7 / Page 55 AE3B33OSD 2011
UDP Protocol (cont.)
To ensure that different hosts on Internet will understand each other, IANA (= Internet Assigned Numbers Authority) issues the obligatory list of “Well Known Port Numbers”
Selected UDP port numbers are:
For the complete list see http://www.iana.org/assignments/port-numbers
Port Keyword Meaning
0 - Reserved
7 echo Echo the datagram
9 discard Discard the datagram
13 daytime Time of the day
53 dns Domain Name Service
67 bootps Bootstrap Protocol Server, DHCP Server
68 bootpc Bootstrap Protocol Client, DHCP Client
69 tftp Trivial File Transfer
137 netbios-ns NETBIOS Name Service
…
Lecture 7 / Page 56 AE3B33OSD 2011
TCP Protocol for reliable data streaming TCP is the most important general reliable transport service
providing a virtual bidirectional communication channel between two hosts TCP/IP is the IP implementation of this service
TCP properties Data stream
Applications communicating through a TCP connection consider the communication channel as a byte stream similarly to a file
Virtual connection Before the data transmission can start, the communicating applications have to negotiate the connection by means of the network components of their local operating systems – create and connect the sockets.
The protocol software (transport layer) in the operating systems of both hosts make an agreement on the connection using messages passed over the network. The hosts also verify that the connection can be reliably established and that both end-point systems are ready to communicate – open the sockets.
Afterwards the end-point applications are informed about the established connection and the data communication can start.
If the connection breaks, both communicating applications are informed The term virtual connection is used to create an illusion that applications are interconnected through a dedicated line.
The reliability is ensured by full hand-shake communication (everything must be acknowledged by the other party)
Lecture 7 / Page 57 AE3B33OSD 2011
TCP/IP – TCP Protocol IP Implementation
Besides of the general TCP properties, the TCP/IP implementation provides: Buffered transport (streaming)
To improve the efficiency, the TCP/IP module in the OS assembles bytes from the stream into packets (datagrams) of reasonable size. If this is not desirable (e.g., TELNET ctrl-C), TCP/IP is equipped by a mechanism enforcing the priority transfer of a short datagram “out-of-order”.
Full duplex connection Application processes can see the TCP/IP link as two independent data
streams running in opposite directions without an apparent interaction. The protocol software actually acknowledges data running in one direction in the packets sent together with the data in the opposite direction.
TCP HEADER TCP DATA AREA
IP HEADER IP DATA AREA
FRAME HEADER FRAME DATA AREA
TCP/IP encapsulated in an IP datagram on a physical network
Lecture 7 / Page 58 AE3B33OSD 2011
TCP reliability solution The reliable transfer is ensured by
positive acknowledging of the received data together with repeated packet transfer
Events on the sender side
Events on the receiver side Datagrams on network
Send packet 1 Get packet 1 Send ACK1
Get ACK1
Send packet 2 Get packet 2
Send ACK2 Get ACK2
Lost packet are repeated based on suitable timeouts
Packet should have arrived ACK should have been sent
Get packet 1
Send ACK1
Send packet 1 Start timer
ACK should have arrived now but it timed-out.
Resend packet 1 Start timer
Get ACK1 Stop timer
Events on the sender side
Events on the receiver side
Datagrams on network
Packet lost
Lecture 7 / Page 59 AE3B33OSD 2011
TCP reliability solution (cont.)
Datagrams may get duplicated during the travel (both data and ACK). This problem is solved by sequential numbering of datagrams
The problem of positive acknowledgement efficiency The moving window method
Starting window position
1 2 3 4 5 6 7 8
The window moves
1 2 3 4 5 6 7 8
Massages on the net
Send packet 1
Send packet 2
Send packet 3
Get ACK1
Get ACK2
Get ACK3
Get packet 1 Send ACK1
Get packet 2
Send ACK2
Get packet 3
Get ACK3
Events on the sender side
Events on the receiver side
Lecture 7 / Page 60 AE3B33OSD 2011
Retransmission timeouts
Constant value of timeout when to resend the TCP/IP segment is inappropriate Internet is too heterogeneous and is composed of a huge number of
LAN’s based on different HW technologies Giga-bit ethernet, 33 kbit serial line, intercontinental satellite link, etc.
TCP/IP adapts to changing timing parameters of the virtual connection A simple adaptive algorithm to adjust the timeout is used
The algorithm is based on continuous monitoring of „round trip time“ (RTT)
Time between dispatching the packet and its acknowledgement.
The real timeout is then computed as a weighted average of RTT measured in the recent history.
This strategy quickly accommodates to the speed and load changes on the intermediate networks and routers
Lecture 7 / Page 61 AE3B33OSD 2011
Establishing the TCP connection
TCP uses a three-stage procedure to establish the virtual connection:
1. In the first step, the connection initiator (client) sends the other party (server) a segment containing SYN bit = 1, randomly generated SEQUENCE NUMBER = x and an empty data section.
2. The server responds by a segment with SYN and ACK set to 1, random SEQUENCE NUMBER = y and ACKNOWLEDGMENT NUMBER = x+1.
3. After receiving this segment, the client acknowledges it by sending a segment with ACK set to 1, SYN bit=0 and the ACKNOWLEDGMENT
NUMBER = y+1.
This way the initial values of SEQUENCE NUMBER and ACKNOWLEDGMENT
NUMBER fields are synchronized for the future life time of the virtual connection and are used to number the following TCP/IP segments
The sequential numbers are random
It enables to detect a failure or restart of hosts on the connection ends that can happen during a longer timeout
Lecture 7 / Page 62 AE3B33OSD 2011
TCP connection termination
Connection is normally terminated on request of one of the connected applications Application tells TCP that there are no more data to be exchanged
If server – passive close If client – active close
TCP software closes the connection by sending a segment with a set FIN bit
For the final termination of the virtual connection, it is necessary also to close the opposite direction. The party that received the segment with FIN bit reacts by sending a segment with FIN bit, too.
The TCP connection can be terminated forcibly using the RST bit
Lecture 7 / Page 63 AE3B33OSD 2011
Well known TCP ports
Examples of some TCP/IP ports
and many others (see http://www.iana.org/assignments/port-numbers)
Port Keyword Meaning
0 - Reserved
7 echo Echo the datagram
13 daytime Time of the day
20 ftp-data File Transfer Protocol (data stream)
21 ftp File Transfer Protocol (controls)
22 ssh Secure Terminal Connection
23 telnet Terminal Connection
25 smtp Simple Mail Transport Protocol
53 domain Domain Name Server Query
79 finger Finger service (who is logged on?)
80 http World-wide web
110 pop3 Post Office Protocol - V3 – get incoming mail from server
443 https Secured world-wide web
Lecture 7 / Page 64 AE3B33OSD 2011
TCP finite state machine
Dashed arrows – server side state transitions
Solid arrows – client side state transitions
Lower left frame – active connection close
Lower right frame – passive connection close
appl: transition caused by system call from application
recv: transition caused by received segment
send: shows what is sent in reaction to state change