Tcpip Notes

TCP: Transmission Control Protocol-TCP provides a connection oriented, reliable, byte stream service

- TCP communication is reliable but not guaranteed

TCP provides reliable communication only by detecting failed transmissions and resending them. It cannot guarantee any particular transmission, because it relies on IP, which is unreliable. All it can do is keep trying if an initial delivery attempt fails.

-TCP Byte streaming ServiceTCP is designed to have applications send data to it as a stream of bytes, rather than requiring fixed-size messages to be used. This provides maximum flexibility for a wide variety of uses; because applications dont need to worry about data packaging and can send files or messages of any size. TCP takes care of packaging these bytes into messages called segments.-TCP Connection Establishment TerminologyTransmission Control Block (TCB)

For each TCP session, both devices create a data structure, called a transmission control block (TCB) that is used to hold important data related to the connection.

The TCB contains all the important information about the connection, such as the two socket numbers that identify it, pointers to buffers that hold incoming and outgoing data, variables that keep track of the number of bytes received and acknowledged, bytes received and not yet acknowledged, current window size, and so forthThe TCB for a connection is maintained throughout the connection and destroyed when the connection is completely terminated

Active and Passive OpensA client process using TCP takes the active role and initiates the connection by sending a TCP message to start the connection (a SYN message).

A server process using TCP prepares for an incoming connection request by performing a passive Open-TCP Connection Establishment Process: The Three-Way HandshakeThe normal process of establishing a connection between a TCP client and server involves the following three steps: The client sends a SYN message. The server sends a message that combines an ACK for the clients SYN and contains the servers SYN. And the client sends an ACK for the servers SYN. This is called the TCP three-way handshake

-TCP Connection Establishment Sequence Number Synchronization and parameter ExchangeSequence Number Synchronization

TCP device, at the time a connection is initiated, chooses a 32-bit initial sequence number (ISN) for the connection. Each device has its own ISN, and those ISNs normally wont be the same. Each device chose the ISN using a random number.Once each device chooses its ISN, it sends the ISN value to the other device in the Sequence Number field in the devices initial SYN message. The device receiving the SYN responds with an ACK message that acknowledges the SYN (which may also contain its own SYN, as in step 2 of the three-way handshake). In the ACK message, the Acknowledgment Number field is set to the value of the ISN that is received from the other device plus one. This represents the next sequence number the device expects to receive from its peer for the first data transmission. This process is called sequence number synchronization.TCP Parameter ExchangeIn addition to the initial sequence numbers, SYN messages also are designed to convey important parameters about how the connection should operate. The variable length Options field in the TCP segment is used to carry these parameters. Some of these include the following.

Maximum segment size (MSS) The maximum size of segment that each end of TCP connection can send to the other

Selective Acknowledgment Permitted Allows a pair of devices to use the optional selective acknowledgment feature to allow only certain lost segments to be retransmitted.

Alternate Checksum Method Lets devices specify an alternative method of performing checksums than the standard TCP checksum mechanism.

-Simultaneous Open Connection EstablishmentIt is possible, although improbable, for two applications to both perform an active open to each other at the same time. Each end must transmit a SYN, and the SYNs must pass each other on the network. It also requires each end to have a local port number that is well known to the other end. This is called a simultaneous open.

TCP was purposely designed to handle simultaneous opens and the rule is that only one Connection results from this, not two connections. We don't call either end a client or a server, because both ends act as client and server.Both ends send a SYN at about the same time, entering the SYN_SENT state. When each end receives the SYN, the state changes to SYN_RCVD, and each end resends the SYN and acknowledges the received SYN. When each end receives the SYN plus the ACK, the state changes to ESTABLISHED

-TCP Connection TerminationA TCP connection is terminating using a special procedure by which each side independently closes its end of the link. The connection termination normally begins with one of the application processes signaling to its TCP layer that the session is no longer needed. That device sends a FIN message to tell the other device that it wants to end the connection, which the other device acknowledges. When the responding device is ready, it too sends a FIN that the other device acknowledges; after waiting for a period of time (2 MSL) for the device to receive the ACK, the device closes the session.

During CLOSE-WAIT state the server may continue sending data, and the client will receive it. However, the client will not send data to the server-The TIME-WAIT StateThe TIME_WAIT state is also called the 2MSL wait state. Every implementation must choose a value for the maximum segment lifetime (MSL). It is the maximum amount of time any segment can exist in the network before being discarded. We know this time limit is bounded, since TCP segments are transmitted as IP data grams, and the IP datagram has the TTL field that limits its lifetime. The TCP standard defines MSL as being a value of 120 seconds (2 minutes), Common implementation values, however, are 30 seconds, 1 minute, or 2 minutes.The TIME-WAIT state is required for two main reasons:-To provide enough time to ensure that the other device receives the ACK, and to retransmit it if it is lost- while the TCP connection is in the 2MSL wait, the socket pair defining that connection (client IP address, client port number, server IP address, and server port number) cannot be reused. This prevents packets from different connections being mixed-Quiet Time ConceptIf a host with ports in the 2MSL wait crashes, reboots within MSL seconds, and

immediately establishes new connections using the same local and foreign IP addresses and port numbers corresponding to the local ports that were in the 2MSL wait before the crash, delayed segments from the connections that existed before the crash can be misinterpreted as belonging to the new connections created after the reboot. This can happen regardless of how the initial sequence number is chosen after the reboot. To protect against this scenario, RFC 793 states that TCP should not create any connections for MSL seconds after rebooting. This is called the quiet time.-Simultaneous Connection TerminationTwo devices can simultaneously terminate a TCP connection. In this case, a different state sequence is followed, with each device responding to the others FIN with an ACK, then waiting for receipt of its own ACK, and pausing for a period of time to ensure that the other device received its ACK before ending the connection

-TCP Finite State Machine (FSM) States, Events and TransitionsStateState DescriptionEvent and Transition

CLOSEDThis is the default state that each connection starts in before the process of establishing it begins. The state is called fictional in the standard. The reason is that this state represents the situation where there is no connection between devicesit either hasn't been created yet, or has just been destroyed. If that makes sense. Passive Open: A server begins the process of connection setup by doing a passive open on a TCP port. At the same time, it sets up the data structure (transmission control block or TCB) needed to manage the connection. It then transitions to the LISTEN state.

Active Open, Send SYN: A client begins connection setup by sending a SYN message, and also sets up a TCB for this connection. It then transitions to the SYN-SENT state.

LISTENA device (normally a server) is waiting to receive a synchronize (SYN) message from a client. It has not yet sent its own SYN message.Receive Client SYN, Send SYN+ACK: The server device receives a SYN from a client. It sends back a message that contains its own SYN and also acknowledges the one it received. The server moves to the SYN-RECEIVED state.

SYN-SENTThe device (normally a client) has sent a synchronize (SYN) message and is waiting for a matching SYN from the other device (usually a server).Receive SYN, Send ACK: If the device that has sent its SYN message receives a SYN from the other device but not an ACK for its own SYN, it acknowledges the SYN it receives and then transitions to SYN-RECEIVED to wait for the acknowledgment to its SYN.

Receive SYN+ACK, Send ACK: If the device that sent the SYN receives both an acknowledgment to its SYN and also a SYN from the other device, it acknowledges the SYN received and then moves straight to the ESTABLISHED state.

SYN-RECEIVEDThe device has both received a SYN (connection request) from its partner and sent its own SYN. It is now waiting for an ACK to its SYN to finish connection setup.Receive ACK: When the device receives the ACK to the SYN it sent, it transitions to the ESTABLISHED state.

ESTABLISHEDThe steady state of an open TCP connection. Data can be exchanged freely once both devices in the connection enter this state. This will continue until the connection is closed for one reason or another.Close, Send FIN: A device can close the connection by sending a message with the FIN (finish) bit sent and transition to the FIN-WAIT-1 state.

Receive FIN: A device may receive a FIN message from its connection partner asking that the connection be closed. It will acknowledge this message and transition to the CLOSE-WAIT state.

CLOSE-WAITThe device has received a close request (FIN) from the other device. It must now wait for the application on the local device to acknowledge this request and generate a matching request.Close, Send FIN: The application using TCP, having been informed the other process wants to shut down, sends a close request to the TCP layer on the machine upon which it is running. TCP then sends a FIN to the remote device that already asked to terminate the connection. This device now transitions to LAST-ACK.

LAST-ACKA device that has already received a close request and acknowledged it, has sent its own FIN and is waiting for an ACK to this request.Receive ACK for FIN: The device receives an acknowledgment for its close request. We have now sent our FIN and had it acknowledged, and received the other device's FIN and acknowledged it, so we go straight to the CLOSED state.

FIN-WAIT-1A device in this state is waiting for an ACK for a FIN it has sent, or is waiting for a connection termination request from the other device. Receive ACK for FIN: The device receives an acknowledgment for its close request. It transitions to the FIN-WAIT-2 state.

Receive FIN, Send ACK: The device does not receive an ACK for its own FIN, but receives a FIN from the other device. It acknowledges it, and moves to the CLOSING state.

FIN-WAIT-2A device in this state has received an ACK for its request to terminate the connection and is now waiting for a matching FIN from the other device.Receive FIN, Send ACK: The device receives a FIN from the other device. It acknowledges it and moves to the TIME-WAIT state.

CLOSINGThe device has received a FIN from the other device and sent an ACK for it, but not yet received an ACK for its own FIN message.Receive ACK for FIN: The device receives an acknowledgment for its close request. It transitions to the TIME-WAIT state.

TIME-WAITThe device has now received a FIN from the other device and acknowledged it, and sent its own FIN and received an ACK for it. We are done, except for waiting to ensure the ACK is received and prevent potential overlap with new connections. Timer Expiration: After a designated wait period, device transitions to the CLOSED state.

-TCP Message (Segment) Format

Source Port and Destination PortThe source and destination port number identify the sending and receiving application. These two values, along with the source and destination IP Addresses in the IP header, uniquely identify each TCP connectionSequence Number

For normal transmissions, this is the sequence number of the first byte of data in this segment. In a connection request (SYN) message, this carries the initial sequence number (ISN) chosen by this host for this connection. The sequence number of the first byte of data sent by this host will be the ISN plus oneAcknowledgment Number

The acknowledgment number contains the next sequence number that the sender of the acknowledgment expects to receive. This is therefore the sequence number plus 1 of the last successfully received byte of data.

Header Length

The header length gives the length of the header in 32-bit words. This is required because the length of the options field is variable. With a 4-bit field, TCP is limited to a 60-byte header. Without options, however, the normal size is 20 bytes

Reserved

This field is 6 bits reserved for future use; sent as zero.

Six Flag BitsThere are six flag bits in the TCP header. One or more of them can be turned on at the same time.

URG The urgent pointer is valid ACK The acknowledgment number is valid.

PSH The receiver should pass this data to the application as soon as possible

RST Reset the connection

SYN Synchronize sequence numbers to initiate a connection

FIN The sender is finished sending data.

Window Size

This is the number of bytes, starting with the one specified by the acknowledgment number field, that the receiver is willing to accept. This is a 16-bit field, limiting the window to 65535 bytes

Checksum

The checksum covers the TCP segment: the TCP header and the TCP data. This is a

Mandatory field that must be calculated and stored by the sender, and then verified by the receiver. The TCP checksum is calculated similar to the UDP checksum, using a pseudo header

Urgent Pointer

The urgent pointer is valid only if the URG flag is set. This pointer is a positive offset that must be added to the sequence number field of the segment to yield the sequence number of the last byte of urgent data. TCP's urgent mode is a way for the sender to transmit emergency data to the other end.

Options

Specifies one or more options that specify how the TCP connection should be operated. Common options field include Maximum segment size (MSS) ,Alternate checksum etc.-The TCP Reset FunctionTCP uses Reset segments with RST flag set to handle problems that happen during an established connection. The device detecting the problem sends a TCP segment with the RST (reset) flag set to 1. The receiving device either returns to the LISTEN state, if it was in the process of connection establishment, or closes the connection and returns to the CLOSED state.The following are some of the most common cases in which the TCP software generates a reset:a) Half Open ConnectionA TCP connection is said to be half-open if one end has closed or aborted the connection without the knowledge of the other end. This can happen any time one of the two hosts crashes. As long as there is no attempt to transfer data across a half-open connection, the end that's still up won't detect that the other end has crashed.

Another common cause of a half-open connection is when a client host is powered off, instead of terminating the client application and then shutting down the client host.

b) Connection Request to Nonexistent PortWhen a connection request arrives and no process is listening on the destination portc) Receipt of any TCP segment from any device with which the device receiving the segment does not currently have a connection (other than a SYN requesting a new connection)

d) Receipt of a message with an invalid or incorrect Sequence Number or Acknowledgment Number field, indicating that the message may belong to a prior connection or is spurious in some other way-TCP Checksum Calculation and the TCP Pseudo HeaderTo provide basic protection against errors in transmission, TCP includes a 16-bit Checksum field in its header. Instead of computing the checksum over only the actual data fields of the TCP segment, a 12-byte TCP pseudo header is created prior to checksum calculation.

TCP pseudo header for checksum calculationOnce this 96-bit pseudo header has been formed, it is placed in a buffer, followed by the TCP segment itself. Then the checksum is computed over the entire set of data (pseudo header plus TCP segment). The value of the checksum is placed in the Checksum field of the TCP header, and the pseudo header is discarded

The Checksum field is itself part of the TCP header and thus one of the fields over which the checksum is calculated. This field is assumed to be all zeros during calculation of the checksum.

When the TCP segment arrives at its destination, the receiving TCP software

performs the same calculation. It forms the pseudo header, prepends it to the actual TCP segment, and then performs the checksum (setting the Checksum field

to zero for the calculation as before). If there is a mismatch between its calculation

and the value the source device put in the Checksum field, this indicates that an error of some sort occurred, and the segment is normally discardedAdvantages of the Pseudo Header MethodThe checksum protects against not just errors in the TCP segment fields, but also against the following problems:

Incorrect Segment Delivery If there is a mismatch in the Destination/Source Address between what the source specified and what the destination that received the segment used, the checksum will fail. Incorrect Protocol If a datagram is routed to TCP that actually belongs to a different protocol for whatever reason, this can be immediately detected.

Incorrect Segment Length If part of the TCP segment has been omitted by accident, the lengths the source and destination used wont match, and the checksum will fail.TCP also supports an optional method of having two devices agree on an alternative checksum algorithm. This must be negotiated during connection establishment-TCP Immediate Data Transfer: Push FunctionTCP includes a special push function to handle cases where data given to TCP needs to be sent immediately. An application can send data to its TCP software and indicate that it should be pushed. The segment will be sent right away rather than being buffered. The pushed segments PSH control bit will be set to 1 to tell the receiving TCP that it should immediately pass the data up to the receiving applicationThere's no API to set the PSH flag. Typically it is set by the kernel when it empties the bufferExample Scenarios-Telnet session

-HTTP request to a web server

-TCP Priority Data Transfer: Urgent FunctionTo deal with situations where a certain part of a data stream needs to be sent with a higher priority than the rest, TCP incorporates an urgent function. When critical data needs to be sent, the application signals this to its TCP layer, which transmits it with the URG bit set in the TCP segment, bypassing any lower-priority data that may have already been queued for transmission. TCP also sets the Urgent Pointer field to an offset value that points to the last byte of urgent data in the segment. So, for example, if the segment contained 400 bytes of urgent data followed by 200 bytes of regular data, the URG bit would be set, and the Urgent Pointer field would have a value of 400.The URG flag causes the receiving TCP to forward the urgent data on a separate channel to the application (for instance on Unix your process gets a SIGURG). This allows the application to process the data out of band.As a side note, it's important to be aware that urgent data is rarely used today and not very well implementedExample ScenariosAborting a file transfer the abort command data should be transferred urgently, not after the file data is sent-TCP Sliding Window Data Transfer and Acknowledgment MechanicsTCP Sliding Window System forms the basis of TCP data transfer and Flow Control.Each of the two devices on a connection must keep track of the data it is sending, as well as the data it is receiving from the other device. This is done by conceptually dividing the bytes into categories. Sliding Window Transmit CategoriesFor data being transmitted, there are four transmit categories:

Transmit Category 1 Bytes sent and acknowledged

Transmit Category 2 Bytes sent but not yet acknowledged

Transmit Category 3 Bytes not yet sent for which recipient is ready

Transmit Category 4 Bytes not yet sent for which recipient is not readyThe Send Window and Usable WindowThe send window represents the maximum number of unacknowledged bytes that a device is allowed to have outstanding (send) at one time. The send window is often called just WindowThe usable window is the amount of the send window that the sender is still allowed to send at any point in time; it is equal to the size of the send window less the number of unacknowledged bytes already transmitted.Basic Sliding Window Mechanism

When a device gets an acknowledgment for a range of bytes, it knows the destination has successfully received them. It moves them from the sent but unacknowledged to the sent and acknowledged category. This causes the send window to slide to the right, allowing the device to send more data.Example Sliding Window

Now lets suppose that the sender transmits all the bytes from usable window (6 bytes)

Now lets suppose that the sender received acknowledgement for bytes 32 to 36, the send window slides right by 5 bytes as below

Three terms are used to describe the movement of the right and left edges of the window.

1. The window closes as the left edge advances to the right. This happens when data is sent and acknowledged.2. The window opens when the right edge moves to the right, allowing more data to be sent. This happens when the receiving process on the other end reads acknowledged data, freeing up space in its TCP receive buffer.3. The window shrinks when the right edge moves to the left. The TCP Standarad strongly discourages this, but TCP must be able to cope with a peer that does this. -Send (SND) PointersThe four transmit categories are divided using three send (SND) pointersSend Unacknowledged (SND.UNA) The sequence number of the first byte of

data that has been sent but not yet acknowledged. This marks the first byte of

Transmit Category 2; all previous sequence numbers refer to bytes in Transmit

Category 1.

Send Next (SND.NXT) The sequence number of the next byte of data to be sent

to the other device (the server, in this case). This marks the first byte of Transmit

Category 3.Send Window (SND.WND) The size of the send window. Recall that the window

specifies the total number of bytes that any device may have outstanding (unacknowledged) at any one time. Thus, adding the sequence number of the first unacknowledged byte (SND.UNA) and the send window (SND.WND) marks the first byte of Transmit Category 4.

Usable window is given by the following formula:

SND.UNA + SND.WND - SND.NXT-Receive Categories and PointersFor data being received, there are three receive categories:Receive Category 1+2 Bytes received and acknowledged. This is the receivers

complement to Transmit Categories 1 and 2.

Receive Category 3 Bytes not yet received for which recipient is ready. This is the receivers complement to Transmit Category 3.

Receive Category 4 Bytes not yet received for which recipient is not ready. This is the receivers complement to Transmit Category 4.-Receive (RCV) PointersThe three receive categories are divided using two pointers:

Receive Next (RCV.NXT) The sequence number of the next byte of data that is

expected from the other device. This marks the first byte in Receive Category 3. All

previous sequence numbers refer to bytes already received and acknowledged, in

Receive Categories 1 and 2.

Receive Window (RCV.WND) The size of the receive window advertised to the

other device. This refers to the number of bytes the device is willing to accept at

one time from its peer, which is usually the size of the buffer allocated for receiving

data for this connection. When added to the RCV.NXT pointer, this pointer marks

the first byte of Receive Category 4.

Both the client and server keep track of both streams being sent over the connection. This is done using a set of special variables called pointers.A devices send pointers keep track of its outgoing data, and its receive pointers keep track of the incoming data.The receive pointers are the complement of the send (SND) pointers.The RCV.WND of one device equals the SND.WND of the other device on the Connection-TCP Segment Fields Used to Exchange Pointer InformationSequence Number This will normally be equal to the value of the SND.UNA

pointer at the time that data is sent.

Acknowledgment Number This field will normally be equal to the RCV.NXT pointer

of the device that sends it.

Window The size of the receive window of the device sending the segment (and

thus, the send window of the device receiving the segment).-TCP Segment Retransmission Timers and the Retransmission

Queue

To retransmit lost segments, TCP employs one retransmission timer (for the whole connection period) that handles the retransmission time-out (RTO), the waiting time for an acknowledgment of a segment. We can define the following rules for the retransmission timer:

1. When TCP sends the segment in front of the sending queue, it starts the timer.

2. When the timer expires, TCP resends the first segment in front of the queue, and

restarts the timer.

3. When a segment (or segments) are cumulatively acknowledged, the segment (or

segments) are purged from the queue.

4. If the queue is empty, TCP stops the timer; otherwise, TCP restarts the timer.

The retransmissions-times and the number of attempts isn't enforced by the standard. It is implemented differently by different operating systems, but the methodology is fixed. The retransmission timeouts(RTO) are measured in terms of the RTT (Round Trip Time)Round-Trip Time (RTT)

The typical time it takes to send a segment from a client to a server and the server to send an acknowledgment back to the client. Measured RTT The measured round-trip time for a segment is the time required for the segment to reach the destination and be acknowledged, although the acknowledgment may include other segments. Note that in TCP, only one

RTT measurement can be in progress at any time. This means that if an RTT measurement is started, no other measurement starts until the value of this RTT is finalized. We use the notation RTTM to stand for measured RTT.

Smoothed RTT The measured RTT, RTTM, is likely to change for each round trip.

The fluctuation is so high in todays Internet that a single measurement alone cannot be used for retransmission time-out purposes. Most implementations use a smoothed RTT, called RTTS, which is a weighted average of RTTM and the previous RTTS as shown belowInitially No value

After first measurement RTTS = RTTM(Old)After each measurement RTTS = (1 ) RTTS + RTTMRTT Deviation Most implementations do not just use RTTS; they also calculate the RTT deviation, called RTTD, based on the RTTS and RTTMRetransmission Time-out (RTO)

The value of RTO is based on the smoothed round-trip time and its deviation. Most implementations use the following formula to calculate the RTO

Original Initial value

After any measurement RTO = RTTS + 4 RTTD-Retransmission Ambiguity and Karns Algorithm

Suppose a packet is transmitted and timeout occurs, the RTO is backed off and the packet is retransmitted with the longer RTO. Now an acknowledgment is received. Is the ACK for the first transmission or the second? This is called the retransmission ambiguity problem.

This ambiguity was solved by Karns algorithm. Karns Algorithm is simple. Do not consider the round-trip time of a retransmitted segment in the calculation of RTTs. Do not update the value of RTTs until you send a segment and receive an acknowledgment without the need for retransmission

Exponential Backoff

What is the value of RTO if a retransmission occurs? Most TCP implementations use

an exponential backoff strategy. The value of RTO is doubled for each retransmission.

So if the segment is retransmitted once, the value is two times the RTO. If it transmitted twice, the value is four times the RTO, and so on.-TCP Acknowledgment Handling and Selective Acknowledgment (SACK)

TCP uses a cumulative acknowledgment system. The Acknowledgment

Number field in a segment received by a device indicates that all bytes of data with sequence numbers less than that value have been successfully received by the other device

TCPs acknowledgment system is cumulative. This means that if a segment is

lost in transit, no subsequent segments can be acknowledged until the missing one is retransmitted and successfully received

There are two approaches to handling retransmission in TCP. In the more conservative approach, only the segments whose timers expire are retransmitted. This saves bandwidth, but it may cause performance degradation if many segments in a row are lost. The alternative is that when a segments retransmission timer expires, both it and all subsequent unacknowledged segments are retransmitted. This provides better performance if many segments are lost, but it may waste bandwidth on unnecessary retransmissions

The optional TCP selective acknowledgment feature provides a more elegant way of handling subsequent segments when a retransmission timer expires. When a device

receives a noncontiguous segment, it includes a special Selective Acknowledgment (SACK) option in its duplicate acknowledgment(duplicate because of the missing segment) that identifies noncontiguous segments that have already been received, even if they are not yet acknowledged. This saves the original sender from needing to retransmit them.

To use SACK, the two devices on the connection must both support the feature,

and must enable it by negotiating the Selective Acknowledge Permitted (SACK Permitted) option in the SYN segment they use to establish the connectionExample Scenario

Step 1 Response segment #2 is lost.

Step 2 The client realizes it is missing a segment between segments #1 and #3. It sends a duplicate acknowledgment for segment #1, and attaches a SACK option indicating that it has received segment #3.

Step 3 The client receives segment #4 and sends another duplicate acknowledgment for segment #1, but this time expands the SACK option to show that it has received segments #3 through #4.

Step 4 The server receives the client's duplicate ACK for segment #1 and SACK for segment #3 (both in the same TCP packet). From this, the server deduces that the client is missing segment #2, so segment #2 is retransmitted. The next SACK received by the server indicates that the client has also received segment #4 successfully, so no more segments need to be transmitted.

Step 5 The client receives segment #2 and sends an acknowledgment to indicate that it has received all data up to an including segment #4.

-TCP Window Size Adjustment and Flow ControlThe TCP sliding window system is used not just for ensuring reliability through

acknowledgments and retransmissions, but it is also the basis for TCPs flow control mechanism. By increasing or reducing the size of its receive window, a device can raise or lower the rate at which its connection partner sends it data. In the case where a device becomes extremely busy, it can even reduce the receive window to zero. This will close the window and halt any further transmissions of data until the window is reopenedExample TCP closing the Send Window and ZERO WindowThis diagram shows three message cycles, each of which results in the server reducing its receive window. In the first cycle, the server reduces it from 360 to 260 bytes, so the clients usable window can increase by only 40 bytes when it gets the servers acknowledgment. In the second and third cycles, the server reduces the window size by the amount of data it receives, which temporarily freezes the clients send window size, halting it from sending new data.

Handling a Closed Window and Sending Probe SegmentsA device that reduces its receive window to zero is said to have closed the window. The other devices send window is thus closed; it may not send regular data segments. It may, however, send probe segments to check the status of the window, thus making sure it does not miss notification when the window reopens.

-TCP Window Management IssuesShrinking the TCP WindowA phenomenon called shrinking the window occurs when a device reduces its receive window so much that its partner devices usable transmit window shrinks in size (meaning that the right edge of its send window moves to the left)

Example

Diagram Description

The client begins with a usable window size of 360 bytes. It sends a 140-byte segment and then a short time thereafter sends one of 180 bytes. The server is busy, however, and when it receives the first transmission, it decides to reduce its buffer to 240 bytes. It holds the 140 bytes just received and reduces its receive window all the way down to 100 bytes. When the clients 180-byte segment arrives,

there is room for only 100 of the 180 bytes in the servers buffer. When the client gets the new window size advertisement of 100, it will have a problem, because it already has 180 bytes sent but not acknowledged.Handling Shrinking issues

Shrinking occurs whenever the server sends back a window size advertisement smaller than what the client considers its usable window size to be at that time.

Shrinking can result in data already in transit needing to be discarded.

To prevent the problems associated with shrinking windows from occurring, TCP adds a simple rule to the basic sliding window mechanism: A device is not allowed to shrink the window, devices must instead reduce their receive window size more graduallyOf course, there may be cases where we do need to reduce a buffer, so how should this be handled? Instead of shrinking the window, the server must be more patient. In the previous, where the buffer needs to be reduced to 240 bytes, the server must send back a window size of 220, freezing the right edge of the clients send window. The client can still fill the 360-byte buffer, but it cannot send more than that. As soon as 120 bytes are removed from the servers receive buffer, the buffer can then be reduced in size to 240 bytes with no data loss. Then the server can resume normal operation, increasing the window size as bytes are taken from the receive buffer-TCP Silly Window SyndromeThe basic TCP sliding window system sets no minimum size on transmitted

segments. Under certain circumstances, this can result in a situation where many small, inefficient segments are sent, rather than a smaller number of large ones. Affectionately termed silly window syndrome (SWS), this phenomenon can occur either as a result of a recipient advertising window sizes that are too small or a transmitter being too aggressive in immediately sending out very small amounts of data.

ExampleThis diagram shows one example of how the phenomenon known as TCP silly window syndrome can arise. The client is trying to send data as fast as possible

to the server, which is very busy and cannot clear its buffers promptly. Each time the client sends data, the server reduces its receive window. The size of the messages the client sends shrinks until it is sending only very small, inefficient segments.

-Silly Window Syndrome Avoidance AlgorithmsSince both the sender and recipient of data contribute to SWS, changes are made to the behavior of both to avoid SWS. These changes are collectively termed SWS avoidance algorithms.Receiver SWS Avoidance

The receiver contributed to SWS by reducing the size of its receive window to smaller and smaller values. This caused the right edge of the senders send window to move by ever-smaller increments, leading to smaller and smaller segments.To avoid SWS we restrict the receiver from moving the right edge of the window by too small an amount. The usual minimum that the edge may be moved is either the value of the MSS parameter or one-half the buffer size, whichever is less.

Sender SWS Avoidance and Nagles Algorithm

Instead of trying to immediately send data as soon as we can, we wait to send it until we have a segment of a reasonable size. The specific method for doing this is called Nagles algorithm Simplified, this algorithm works as follows:

- As long as there is no unacknowledged data outstanding on the connection, as

soon as the application wants, data can be immediately sent. For example, in the case of an interactive application like Telnet, a single keystroke can be pushed in a segment.- While there is unacknowledged data, all subsequent data to be sent is held in the transmit buffer and not transmitted until either all the unacknowledged data is acknowledged or we have accumulated enough data to send a full-sized (MSS-sized) segment. This applies even if a push is requested by the user.-TCP Congestion Handling and Congestion Avoidance

Algorithms

Congestion in the network layer is a situation in which too many datagrams are present in an area of the InternetCongestion in a network may occur if the load on the networkthe number of packets sent to the networkis greater than the capacity of the networkthe number of packets a network can handle.

During Congestion routers may drop some packets.

To deal with congestion and avoid contributing to it unnecessarily, modern TCP implementations include a set of Congestion Avoidance algorithms that alter the normal operation of the sliding window system to ensure more efficient overall operation.The four algorithms, Slow Start, Congestion Avoidance, Fast Retransmit and Fast

Recovery are described below.Slow Start

Slow Start, a requirement for TCP software implementations is a mechanism used by the sender to control the transmission rate, otherwise known as sender-based flow control. This is accomplished through the return rate of acknowledgements from the receiver. In other words, the rate of acknowledgements returned by the receiver determines the rate at which the sender can transmit data.

When a TCP connection first begins, the Slow Start algorithm initializes a congestion window to one segment, which is the maximum segment size (MSS) initialized by the receiver during the connection establishment phase. When acknowledgements are returned by the receiver, the congestion window increases by one segment for each acknowledgement returned. Thus, the sender can transmit the minimum of the congestion window and the advertised window of the receiver, which is simply called the transmission window.

Slow Start is actually not very slow when the network is not congested and network

response time is good. For example, the first successful transmission and acknowledgement of a TCP segment increases the window to two segments. After

successful transmission of these two segments and acknowledgements completes, the window is increased to four segments. Then eight segments, then sixteen segments and so on, doubling from there on out up to the maximum window size advertised by the receiver or until congestion finally does occur.

At some point the congestion window may become too large for the network or network conditions may change such that packets may be dropped. Packets lost will trigger a timeout at the sender. When this happens, the sender goes into congestion avoidance mode as described in the next section.

Congestion Avoidance

During the initial data transfer phase of a TCP connection the Slow Start algorithm is used. However, there may be a point during Slow Start that the network is forced to drop one or more packets due to overload or congestion. If this happens, Congestion Avoidance is used to slow the transmission rate. However, Slow Start is used in Conjunction with Congestion Avoidance as the means to get the data transfer going again so it doesnt slow down and stay slow.

In the Congestion Avoidance algorithm a retransmission timer expiring or the reception of duplicate ACKs can implicitly signal the sender that a network congestion situation is occurring. The sender immediately sets its transmission window to one half of the current window size (the minimum of the congestion window and the receivers advertised window size), but to at least two segments.If congestion was indicated by a timeout, the congestion window is reset to one segment, which automatically puts the sender into Slow Start mode. If congestion was indicated by duplicate ACKs, the Fast Retransmit and Fast Recovery algorithms are invoked .As data is received during Congestion Avoidance, the congestion window is increased. However, Slow Start is only used up to the halfway point where congestion originally occurred. This halfway point was recorded earlier as the new transmission window. After this halfway point, the congestion window is increased by one segment for all segments in the transmission window that are acknowledged. This mechanism will force the sender to more slowly grow its transmission rate, as it will approach the point where congestion had previously been detected.

Fast Retransmit

When a duplicate ACK is received, the sender does not know if it is because a TCP

segment was lost or simply that a segment was delayed and received out of order at the receiver. If the receiver can re-order segments, it should not be long before the receiver sends the latest expected acknowledgement. Typically no more than one or two duplicate ACKs should be received when simple out of order conditions exist. If however more than two duplicate ACKs are received by the sender, it is a strong indication that at least one segment has been lost. The TCP sender will assume enough time has lapsed for all segments to be properly re-ordered by the fact that the receiver had enough time to send three duplicate ACKs.

When three or more duplicate ACKs are received, the sender does not even wait for a retransmission timer to expire before retransmitting the segment (as indicated by the position of the duplicate ACK in the byte stream). This process is called the Fast

Retransmit algorithm. Immediately following Fast Retransmit is the Fast Recovery algorithm.Fast Recovery

Since the Fast Retransmit algorithm is used when duplicate ACKs are being received, the TCP sender has implicit knowledge that there is data still flowing to the receiver. Why? The reason is because duplicate ACKs can only be generated when a segment is received. This is a strong indication that serious network congestion may not exist and that the lost segment was a rare event. So instead of reducing the flow of data abruptly by going all the way into Slow Start, the sender only enters Congestion Avoidance mode.

Rather than start at a window of one segment as in Slow Start mode, the sender resumes transmission with a larger window, incrementing as if in Congestion Avoidance mode. This allows for higher throughput under the condition of only moderate congestion-TCP Maximum Segment Size (MSS)TCP is designed to restrict the size of the segments it sends to a certain maximum limit, to reduce the likelihood that segments will need to be fragmented for

transmission at the IP level. The TCP maximum segment size (MSS) specifies the maximum number of bytes in the TCP segments Data field, regardless of any other factors that influence segment size. The default MSS for TCP is 536 bytes, which is calculated by starting with the minimum IP MTU of 576 bytes and subtracting 20 bytes each for the IP and TCP headers

Devices can indicate that they wish to use a different MSS value from the default by including a Maximum Segment Size option in the SYN message they use to establish a connection. Each device in the connection may use a different MSS value.IP: Internet Protocol

The primary purpose of IP is the delivery of data grams across an internetwork of connected networks.-IP Characteristics- Underlying Protocol-Independent IP is designed to allow the transmission of data across any type of underlying network (layer 2) that is designed to work with a TCP/IP stack- Connectionless Delivery IP is a connectionless protocol. This means that when point A wants to send data to point B, it doesnt first set up a connection to point B and then send the datait just makes the datagram and sends it

- Unreliable Delivery IP does not provide reliability or service-quality capabilities, such as error protection for the data it sends (though it does on the IP header), flow control, or retransmission of lost data grams. For this reason, IP is sometimes called a best-effort protocol. -IP Functions-Addressing IP defines the addressing mechanism for the network

-Data Encapsulation and Formatting/Packaging As the TCP/IP network layer protocol; IP accepts data from the transport layer protocols UDP and TCP. It then encapsulates this data into an IP datagram using a special format prior to transmission.-Fragmentation and Reassembly

-Routing-IP Address ClassesThe class full IP addressing scheme divides the IP address space into five classes, A through E, of differing sizes. Classes A, B, and C are the most important ones, designated for conventional unicast addresses. Class D is reserved for IP multicasting, and Class E is reserved for experimental use.

IP Address ClassFirst Octet of IP AddressLowest Value of First Octet (binary)Highest Value of First Octet (binary)Range of First Octet Values (decimal)Octets in Network ID / Host IDTheoretical IP Address Range

Class A0xxx xxxx0000 00010111 11101 to 1261 / 31.0.0.0 to 126.255.255.255

Class B10xx xxxx1000 00001011 1111128 to 1912 / 2128.0.0.0 to 191.255.255.255

Class C110x xxxx1100 00001101 1111192 to 2233 / 1192.0.0.0 to 223.255.255.255

Class D1110 xxxx1110 00001110 1111224 to 239224.0.0.0 to 239.255.255.255

Class E1111 xxxx1111 00001111 1111240 to 255240.0.0.0 to 255.255.255.255

-IP Address Patterns With Special MeaningsNetwork IDHost IDClass A ExampleClass B ExampleClass C ExampleSpecial Meaning and Description

Network IDHost ID77.91.215.5154.3.99.6227.82.157.160Normal Meaning: Refers to a specific device.

Network IDAll Zeroes77.0.0.0154.3.0.0227.82.157.0The Specified Network: This notation, with a 0 at the end of the address, refers to an entire network.

All Zeroes Host ID0.91.215.50.0.99.60.0.0.160Specified Host On This Network: This addresses a host on the current or default network when the network ID is not known, or when it doesn't need to be explicitly stated.

All ZeroesAll Zeroes0.0.0.0Me: (Alternately, this host, or the current/default host). Used by a device to refer to itself when it doesn't know its own IP address. The most common use is when a device attempts to determine its address using a host-configuration protocol like DHCP. May also be used to indicate that any address of a multihomed host may be used.

Network IDAll Ones77.255.255.255154.3.255.255227.82.157.255All Hosts On The Specified Network: Used for broadcasting to all hosts on the local network.

All OnesAll Ones255.255.255.255All Hosts On The Network: Specifies a global broadcast to all hosts on the directly-connected network. Note that there is no address that would imply sending to all hosts everywhere on the global Internet, since this would be very inefficient and costly.

-Reserved, Loopback and Private IP AddressesRange Start AddressRange End AddressClassful Address EquivalentClassless Address EquivalentDescription

0.0.0.00.255.255.255Class A network 0.x.x.x0/8Reserved.

10.0.0.010.255.255.255Class A network 10.x.x.x10/8Class A private address block.

127.0.0.0127.255.255.255Class A network 127.x.x.x127/8Loopback address block.

128.0.0.0128.0.255.255Class B network 128.0.x.x128.0/16Reserved.

169.254.0.0169.254.255.255Class B network 169.254.x.x169.254/16Class B private address block reserved for automatic private address allocation. See the section on DHCP for details.

172.16.0.0172.31.255.25516 contiguous Class B networks from 172.16.x.x through 172.31.x.x172.16/12Class B private address blocks.

191.255.0.0191.255.255.255Class B network 191.255.x.x191.255/16Reserved.

192.0.0.0192.0.0.255Class C network 192.0.0.x192.0.0/24Reserved.

192.168.0.0192.168.255.255256 contiguous Class C networks from 192.168.0.x through 192.168.255.x192.168/16Class C private address blocks.

223.255.255.0223.255.255.255Class C network 223.255.255.x223.255.255/24Reserved.

-IP Datagram General Format

VersionThis 4-bit field defines the version of the IP protocolHeader Length

This 4-bit field defines the total length of the datagram header in 4-byte words. This field is needed because the length of the header is variable (between 20 and 60 bytes). When there are no options, the header length is 20 bytes, and the value of this field is 5 (5 4 = 20). When the option field is at its maximum size, the value of this field is 15 (15 4 = 60).Type of Service (TOS) Field

In the original design of IP header TOS defined how the datagram should be handled. Part of the field was used to define the precedence of the datagram; the rest defined the type of service (low delay, high throughput, maximize reliability). IETF has changed the interpretation of this 8-bit field. This field now defines a set of differentiated services. In the new interpretation, the first 6 bits make up the codepoint subfield and the last 2 bits are not used. The codepoint subfield can be used in two different ways.a. When the 3 right-most bits are 0s, the 3 left-most bits are interpreted the same

as the precedence bits in the service type interpretation. In other words, it is

compatible with the old interpretation. The precedence defines the eight-level priority of the datagram (0 to 7) in issues such as congestion. If a router is congested and needs to discard some datagrams, those datagrams with lowest precedence are discarded first. b. When the 3 right-most bits are not all 0s, the 6 bits define 56 (64 8) services

based on the priority assignment by the Internet or local authorities.

Total lengthThe total length field defines the total length of the datagram including the header in bytes.

Length of data = total length - header length

Why we need this field anyway?There are occasions in which the datagram is not the only thing encapsulated in a frame; it may be that padding has been added. For example, the Ethernet protocol has a minimum and maximum restriction on the size of data that can be encapsulated in a frame (46 to 1500 bytes).If the size of an IP datagram is less than 46 bytes, some padding will be added to meet this requirement. In this case, when a machine decapsulates the datagram, it needs to check the total length field to determine how much is really data and how much is paddingIdentificationThis field uniquely identifies each data gram (fragment) during fragmentation. When a datagram is fragmented, the value in the identification field is copied into all fragments, thus all fragments belonging to a fragmented datagram are identified during reassembly by the destination.

FlagsThis is a three-bit field. The first bit is reserved (not used). The second bit is called the do not fragment bit. If its value is 1, the machine must not fragment the datagram. If it cannot pass the datagram through any available physical network, it discards the datagram and sends an ICMP error message to the source host. If its value is 0, the datagram can be fragmented if necessary.The third bit is called the more fragment bit. If its value is 1, it means the datagram is not the last fragment; there are more fragments after this one. If its value is 0, it means this is the last or only fragmentFragmentation offset.This 13-bit field shows the relative position of this fragment with respect to the whole datagram. It is the offset of the data in the original datagram measured in units of 8 bytes (64 bits). Below figure shows a datagram with a data size of 4000 bytes fragmented into three fragments. The bytes in the original datagram are numbered 0 to 3999. The first fragment carries bytes 0 to 1399. The offset for this datagram is 0/8 = 0. The second fragment carries bytes 1400 to 2799; the offset value for this fragment is 1400/8 = 175. Finally, the third fragment carries bytes 2800 to 3999. The offset value for this fragment is 2800/8 = 350

Time to Live (TTL) FieldTTL is used as maximum hop count for the datagram. Each time a router processes a datagram, it reduces the value of the TTL field by one. Once the TTL value becomes zero, the datagram is said to have expired, at which point it is dropped, and usually an Internet Control Message Protocol (ICMP) Time Exceeded message is sent to inform the originator of the message that it has expired. The TTL field is one of the primary mechanisms used to prevent router loopsProtocol

The Protocol field identifies the higher-layer protocol, generally either a transport layer protocol or encapsulated network layer protocol carried in the datagramSome of the values of Protocol field for different higher-level protocols are as below

Header ChecksumA checksum is computed over the header to provide basic protection against corruption in transmission. To compute the IP checksum for an outgoing datagram, the value of the checksum field is first set to 0. Then the 16-bit one's complement sum of the header is calculated (i.e., the entire header is considered a sequence of 16-bit words). The 16-bit one's complement of this sum is stored in the checksum field. At each hop, the device receiving the datagram does the same checksum calculation, and if there is a mismatch, it discards the datagram as damaged. Source address This 32-bit field defines the IP address of the source. This field must remain unchanged during the time the IP datagram travels from the source host to the destination host. Destination address

This 32-bit field defines the IP address of the destination. This field must remain unchanged during the time the IP datagram travels from the source host to the destination host.OptionsThe options, is a variable-length list of optional information for the datagram. Options can be used for network testing and debugging. Some of the options include record route, time stamp.-Fragmentation -MTU

The size of the largest IP datagram that can be transmitted over a physical network is called that networks maximum transmission unit (MTU). Ethernet MTU is 1,500 bytes

If a datagram is passed from a network with a high MTU to one with a low MTU, it must be fragmented to fit the other networks smaller MTU.

Since some physical networks on the path between devices may have a smaller MTU than others, it may be necessary to fragment the datagram more than once.

Internet Minimum MTU: 576 Bytes. Routers are required to handle an MTU of at least 576 bytes. This value is specified in IP Standard. Hence 576 bytes is the default MTU for IP data grams.

-MTU Path DiscoveryMTU path discovery is used to determine the optimal MTU to use for a route between two devices.MTU Path Discovery uses ICMP error-reporting mechanism and Dont Fragment (DF) bit of the IP headerMTU Path Discovery MechanismThe source node sends a packet (data gram) having MTU of its local physical link and with Dont Fragment (DF) bit set. If this packet goes through without any errors, the devices can use that value for future packets to that destination. If the packet encounters a router whose local MTU is less than the packet size, the packet is discarded and an ICMP Destination Unreachable - Fragmentation Needed, Dont fragment bit set message is sent to the originating host. The ICMP error message includes the MTU of the link necessitating fragmentation. Now the source node tries again with MTU smaller than the MTU mentioned in the ICMP error message. This continues until it finds the largest MTU that can be used for the Path.-The IP Fragmentation ProcessWhen an MTU requirement forces a datagram to be fragmented, it is split into several smaller IP data grams, each containing part of the original. The header of the original datagram is changed into the header of the first fragment with few fields modified and new headers are created for the other fragments.Each fragment is set to have the same Identification value to mark them as part of the same original datagram. The Fragment Offset of each is set to the location where the fragment belongs in the original datagram. The More Fragments field is set to 1 for all fragments except to the last, to let the recipient know when it has received all the fragments.-Fragmentation ExampleThe four fragments shown in below figure are created as follows:

The first fragment is created by taking the first 3,300 bytes of the 12,000-byte IP datagram. This includes the original header, which becomes the IP header of the first fragment (with certain fields changed, as described in the next section). So, 3,280 bytes of data are in the first fragment. This leaves 8,700 bytes (11,9803,280) to encapsulate.

The next 3,280 bytes of data are taken from the 8,700 bytes that remain after the first fragment is built and paired with a new header to create the second fragment. This leaves 5,420 bytes.

The third fragment is created from the next 3,280 bytes of data, with a 20-byte header. This leaves 2,140 bytes of data.

The remaining 2,140 bytes are placed into the fourth fragment, with a 20-byteheader.

-Fragmentation-Related IP Datagram Header FieldsThe following IP header fields participate in IP fragmentation.

Total LengthAfter fragmenting, the Total Length field indicates the length of each fragment

Identification, More Fragments and Fragment Offset fields are used as described above in the IP header description.

-IP Message ReassemblyWhen a datagram is fragmented, it becomes multiple fragment datagrams. The destination of the overall message must collect these fragments and reassemble them into the original message.

In IP version 4 (IPv4), fragmentation can be performed by a router between the source and destination of an IP datagram, but reassembly is done only by the destination device.

Reasons/Advantages behind Reassembly at the End-Fragments can take different routes to get from the source to destination, so any given router on the path may not see all the fragments in a message-Reassembly at intermediate routers would increase complexity-Routers doing reassembly need to wait for all the fragments before sending the reassembled message, which would slow down the routingDisadvantage of Reassembly at the end

-Reassembly at the end results in more, smaller fragments traveling over longer routes than if intermediate reassembly occurred. This increases the chances of a fragment getting lost and the entire message being discarded-Potential inefficiency in the utilization of data link layer frame capacity. In situations where some of the links on the path having higher MTU, because of reassembly at the end the links capacity may be under utilized -Reassembly Process -The receiving device initializes a buffer where it can store the fragments of the message as they are received. It keeps track of which portions of this buffer have been filled with received fragments, perhaps using a special table.

-The recipient knows it has received a message fragment the first time it sees a datagram with the More Fragments bit set to 1 or the Fragment Offset a value other than 0. It identifies the message based on the source and destination IP addresses, the protocol specified in the header, and the Identification field generated by the sender

-The receiving device sets up a timer for reassembly of the message. Since it is possible that some fragments may never show up, this timer ensures that the device will not wait an infinite time trying to reassemble the message

-Reassembly is complete when the entire buffer has been filled and the fragment with the More Fragments bit set to 0 is received, indicating that it is the last fragment of the datagram.

-On the other hand, if the timer for the reassembly expires with any of the fragments missing, the message cannot be reconstructed. The fragments are discarded, and an ICMP Time Exceeded message is generated. Since IP is unreliable, it relies on higher-layer protocols such as the Transmission Control Protocol (TCP) to determine that the message was not properly received and then retransmit it.ICMP: Internet Control Message Protocol-In TCP/IP, diagnostic, test, and error-reporting functions at the internetwork layer are performed by the Internet Control Message Protocol (ICMP), which is like IPs administrative assistant.

-ICMP is not like most other TCP/IP protocols in that it does not perform a specific task. It defines a mechanism by which various control messages can be transmitted and received to implement a variety of functions.

-ICMP messages are transmitted within IP data grams

-ICMP Message format

-The first 4 bytes have the same format for all messages, but the remainder differs from one message to the next.

-The type field identified the particular ICMP message. There are 15 different values for the type field.

-The code field Identifies the subtype of message within each ICMP message Type value-The checksum field covers the entire ICMP message. The algorithm used is the same as for IP header checksum.-ICMP Message ClassesICMP messages are divided into two classes:

Error Messages These messages are used to provide feedback to a source device about an error that has occurred. They are typically generated specifically in response to some sort of action, usually the transmission of a datagram. Errors are usually related to the structure or content of a datagram or to problem situations on the internetwork encountered during datagram routing

Informational (or Query) Messages These are messages that are used to let devices exchange information, implement certain IP-related features, and perform testing. They are generated either when directed by an application or on a regular basis to provide information to other devices-ICMP Messages with their Type

-ICMP Error Messages-ICMP error messages always contain the IP header and the first 8 bytes of the IP datagram that caused the ICMP error to be generated. This lets the receiving ICMP module associate the message with one particular protocol (TCP or UDP from the protocol field in the IP header) and one particular user process (from the TCP Or UDP port numbers that are in the TCP or UDP header contained in the first 8 bytes of the IP datagram).-ICMP error-reporting messages sent in response to a problem seen in an IP datagram can be sent back only to the originating device. Intermediate devices cannot be the recipients of an ICMP message because their addresses are normally not carried in the IP datagrams header.

-ICMP error message must not be generated in response to any of the following:An ICMP Error Message Responding to ICMP error message with another error message create message loops. An ICMP error message can be generated in response to an ICMP informational message.A Broadcast or Multicast Datagram IP Datagram Fragments Except the First In many cases, the same situation that

might cause a device to generate an error for one fragment would also apply to each successive one, causing unnecessary ICMP traffic. For this reason, when a datagram is fragmented, a device may send an error message only in response to a problem in the first fragment.Data grams with Non-Unicast Source Address If a datagrams source address doesnt define a unique, unicast device address, an error message cannot be sent back to that source. This prevents ICMP messages from being broadcast, unicast, or sent to non routable special addresses such as the loopback address.-ICMPv4 Destination Unreachable MessagesICMPv4 Destination Unreachable messages are used to inform a sending device of a failure to deliver an IP datagram

Example Scenarios

Code ValueMessage SubtypeDescription

0Network UnreachableThe datagram could not be delivered to the network specified in the network ID portion of the IP address. Usually means a problem with routing but could also be caused by a bad address.

1Host UnreachableThe datagram was delivered to the network specified in the network ID portion of the IP address but could not be sent to the specific host indicated in the address. Again, this usually implies a routing issue.

2Protocol UnreachableThe protocol specified in the Protocol field was invalid for the host to which the datagram was delivered.

3Port UnreachableThe destination port specified in the UDP or TCP header was invalid.

4Fragmentation Needed and DF SetIf a packet with DF bit set encounters a router whose local MTU is less than the size of packet the router drops the packet and sends ICMP error message.

-ICMPv4 Source Quench Messages

A source-quench message informs the source that a datagram has been discarded due to congestion in a router or the destination host. The source must slow down the sending of data grams until the congestion is relieved.One source-quench message is sent for each datagram that is discarded due to congestion.

-ICMPv4 Time Exceeded MessagesThe time-exceeded message is generated in two cases.-The first is whenever router decrements a datagram with a time-to-live value to zero; it discards the datagram and sends a time-exceeded message to the original source.

-The second is when the final destination does not receive all of the fragments in a set time; it discards the received fragments and sends a time-exceeded message to the original source.-Parameter Problem MessagesAny ambiguity in the header part of a datagram can create serious problems as the datagram travels through the Internet. If a router or the destination host discovers an ambiguous or missing value in any field of the datagram, it discards the datagram and sends a parameter-problem message back to the source-ICMP Query Messages-Echo (Request) and Echo Reply MessagesICMPv4 Echo (Request) and Echo Reply messages are used to facilitate network reachability testing. A device can test its ability to perform basic communication with another one by sending an Echo message and waiting for an Echo Reply message to be returned by the other device. The ping utility, a widely used diagnostic tool in TCP/IP internetworks, makes use of these messages.-Ping ProgramThe TCP/IP ping utility is used to verify the ability of two devices on a TCP/IP internetwork to communicate. It operates by having one device send ICMP Echo (Request) messages to another, which responds with Echo Reply messages. The program can be helpful in diagnosing network connectivity issues.

-Methods of Diagnosing Connectivity Problems using pingInternal Device TCP/IP Stack Operation By performing a ping on the devices own address, you can verify that its internal TCP/IP stack is working. This can also be done using the standard IP loopback address, 127.0.0.1.Local Network Connectivity If the internal test succeeds, its a good idea to do a ping on another device on the local network, to verify that local communication is possible.

Local Router Operation If there is no problem on the local network, it makes sense to ping whatever local router the device is using to make sure it is operating and reachable.Domain Name Resolution Functionality If a ping performed on a DNS domain name fails, you should try it with the devices IP address instead. If that works, this implies either a problem with domain name configuration or resolution.Remote Host Operation If all the preceding checks succeed, you can try performing a ping to a remote host to see if it responds. If it does not, you can try a different remote host. If that one works, it is possible that the problem is actually with the first remote device itself and not with your local device.-Trace Route Program/Utility-The trace route program sends a dummy UDP data gram to an invalid port which cant be used by an application at the destination. The TTL field of the IP datagram is set to 1.-The first router to handle the datagram decrements the TTL value and it becomes 0.The router discards the datagram, and sends back the ICMP time exceeded message. IP datagram containing this ICMP message has the router's IP address as the source address. This identifies the first router in the path. -Trace route then sends a datagram with a TTL of 2, and we find the IP address of the second router. This continues until the datagram reaches the destination host. -Once the datagram reaches the destination the destination host's UDP module generates an ICMP "port unreachable" error, as the UDP data gram is arrived on an invalid port. With this message trace route program concludes it operation. ARP: Address Resolution Protocol- Address resolution is required because internetworked devices communicate logically using layer 3 addresses, but the actual transmissions between devices take place using layer 2 (hardware) addresses.

- ARP is a full-featured, dynamic resolution protocol used to match IP addresses to underlying data link layer addresses. Originally developed for Ethernet, it has now been generalized to allow IP to operate over a wide variety of layer 2 technologiesARP General Operation

1. Source Device Checks Cache The source device will first check its cache to determine if it already has a resolution of the destination device. If so, it can skip to step 9.

2. Source Device Generates ARP Request Message The source device generates an ARP Request message. It puts its own data link layer address as the Sender Hardware Address and its own IP address as the Sender Protocol Address. It fills in the IP address of the destination as the Target Protocol Address. (It must leave the Target Hardware Address blank, since that it is what it is trying to determine!)

3. Source Device Broadcasts ARP Request Message The source broadcasts the ARP Request message on the local network.

4. Local Devices Process ARP Request Message The message is received by each device on the local network. It is processed, with each device looking for a match on the Target Protocol Address. Those that do not match will drop the message and take no further action.

5. Destination Device Generates ARP Reply Message The one device whose IP address matches the contents of the Target Protocol Address of the message will generate an ARP Reply message. It takes the Sender Hardware Address and

Sender Protocol Address fields from the ARP Request message and uses these as the values for the Target Hardware Address and Target Protocol Address of the reply. It then fills in its own layer 2 address as the Sender Hardware Address and its IP address as the Sender Protocol Address. Other fields are filled in, as explained in the description of the ARP message format in the following section.

6. Destination Device Updates ARP Cache Next, as an optimization, the destination device will add an entry to its own ARP cache that contains the hardware and IP addresses of the source that sent the ARP Request. This saves the destination from needing to do an unnecessary resolution cycle later on.

7. Destination Device Sends ARP Reply Message The destination device sends the ARP Reply message. This reply is, however, sent unicast to the source device, because there is no need to broadcast it.

8. Source Device Processes ARP Reply Message The source device processes the reply from the destination. It stores the Sender Hardware Address as the layer 2 address of the destination and uses that address for sending its IP datagram.

9. Source Device Updates ARP Cache The source device uses the Sender protocol Address and Sender Hardware Address to update its ARP cache for use in the future when transmitting to this device.

-ARP Message FormatAn ARP packet is encapsulated directly into a data link frame

The type field in the frame indicates that the data carried by the frame is an ARP packet

Field NameSize (bytes)Description

HRD2

PRO2Protocol Type: This field is the complement of the Hardware Type field, specifying the type of layer three addresses used in the message. For IPv4 addresses, this value is 2048 (0800 hex), which corresponds to the EtherType code for the Internet Protocol.

HLN1Hardware Address Length: Specifies how long hardware addresses are in this message. For Ethernet or other networks using IEEE 802 MAC addresses, the value is 6.

PLN1Protocol Address Length: Again, the complement of the preceding field; specifies how long protocol (layer three) addresses are in this message. For IP(v4) addresses this value is of course 4.

OP2

SHA(Variable, equals value in HLN field)Sender Hardware Address: The hardware (layer two) address of the device sending this message (which is the IP datagram source device on a request, and the IP datagram destination on a reply, as discussed in the topic on ARP operation).

SPA(Variable, equals value in PLN field)Sender Protocol Address: The IP address of the device sending this message.

THA(Variable, equals value in HLN field)Target Hardware Address: The hardware (layer two) address of the device this message is being sent to. This is the IP datagram destination device on a request, and the IP datagram source on a reply)

TPA(Variable, equals value in PLN field)Target Protocol Address: The IP address of the device this message is being sent to.

Four Different Cases

The following are four different cases in which the services of ARP can be used Case 1: The sender is a host and wants to send a packet to another host on the same network. In this case, the logical address that must be mapped to a physical address is the destination IP address in the datagram header. Case 2: The sender is a host and wants to send a packet to another host on another network. In this case, the host looks at its routing table and finds the IP address of the next hop (router) for this destination. If it does not have a routing table, it looks for the IP address of the default router. The IP address of the router becomes the logical address that must be mapped to a physical address.

Case 3: The sender is a router that has received a datagram destined for a host on another network. It checks its routing table and finds the IP address of the next router. The IP address of the next router becomes the logical address that must be mapped to a physical address.

Case 4: The sender is a router that has received a datagram destined for a host in the same network. The destination IP address of the datagram becomes the logical address that must be mapped to a physical address.

-ARP CachingA sender usually has more than one IP datagram to send to the same destination. It is inefficient to use the ARP protocol for each datagram destined for the same host or router. The solution is the cache table. When a host or router receives the corresponding physical address for an IP datagram, the address can be saved in the cache table. This address can be used for the data grams destined for the same receiver within the next few minutes.

The ARP cache takes the form of a table containing matched sets of hardware and

IP addresses. Each device on the network manages its own ARP cache table. There are two different ways that cache entries can be put into the ARP cache:Static ARP Cache Entries These are address resolutions that are manually added to the cache table for a device and are kept in the cache on a permanent basis.

Dynamic ARP Cache Entries These are hardware and IP address pairs that are added to the cache by the software itself as a result of past ARP resolutions that were successfully completed. They are kept in the cache for only a period of time and are then removed.

Cache Entry ExpirationDynamic entries cannot be added to the cache and left there foreverdynamic entries left in place for a long time can become stale. Consider Device As ARP cache, which contains a dynamic mapping for Device B, which is another host on the network. If dynamic entries stayed in the cache forever, the following situations might arise.

Device Hardware Changes Device B might experience a hardware failure that requires its network interface card to be replaced. The mapping in Device As cache would become invalid, since the hardware address in the entry is no longer on the network.Device IP Address Changes Similarly, the mapping in Device As cache also would become invalid if Device Bs IP address changed.Device Removal Suppose Device B is removed from the local network. Device A would never need to send to it again at the data link layer, but the mapping would remain in Device As cache, wasting space and possibly taking up search time.

To avoid these problems, dynamic cache entries must be set to automatically expire after a period of time. This is handled automatically by the ARP implementation, with typical timeout values being 10 or 20 minutes. After a particular entry times out, it is removed from the cache. The next time that address mapping is needed, a fresh resolution is performed to update the cache-Proxy ARPSince ARP relies on broadcasts for address resolution, and broadcasts are not propagated beyond a physical network, ARP cannot function between devices on different physical networks. When such operation is required, a device, such as a router, can be configured as an ARP proxy to respond to ARP requests on the behalf of a device on a different network

These two examples show how a router acting as an ARP proxy returns its own hardware address in response to requests by one device for an address on the other network.

In this small internetwork shown, a single router connects two LANs that are on the same IP network or subnet. The router will not pass ARP broadcasts, but has been configured to act as an ARP proxy. In this example, Device A and Device D are each trying to send an IP datagram to the other, and so each broadcasts an ARP request. The router responds to the request sent by Device A as if it were Device D, giving to Device A its own hardware address (without propagating Device As broadcast). It will forward the message sent by Device A to Device D on Device Ds network. Similarly, it responds to Device D as if it were Device A, giving its own address, then forwarding what Device D sends to it over to the network where Device A is located.Proxy Arp Pros and ConsThe main advantage of proxying is that it is transparent to the hosts on the different physical network segments. The technique has some drawbacks, however.

First, it introduces added complexity. Second, if more than one router connects two physical networks using the same network ID, problems may arise. Third, it introduces potential security risks; since it essentially means that a router impersonates devices by acting as a proxy for them, the potential for a device spoofing another is real-Gratuitous ARPA gratuitous ARP request is a broadcast request for a hosts (router, switch, and device) own IP address. If a host sends an ARP request for its own IP address and no ARP replies are received, the hosts assigned IP address is not being used by other nodes. If a host sends an ARP request for its own IP address and an ARP reply is received, hosts assigned IP address is already being used by another node.A gratuitous ARP request is an Address Resolution Protocol request packet where the source and destination IP are both set to the IP of the machine issuing the packet and the destination MAC is the broadcast address ff:ff:ff:ff:ff:ff. Ordinarily, no reply packet will occur. A gratuitous ARP reply is a reply to which no request has been made.Gratuitous ARPs are useful for three reasons:

They can help detect IP conflicts. When a machine receives an ARP request containing a source IP that matches its own, then it knows there is an IP conflict. They assist in the updating of other machines' ARP tables. If the host sending the gratuitous ARP has just changed its hardware address (perhaps the host was shut down, the interface card replaced, and then the host was rebooted), this packet causes any other host on the cable that has an entry in its cache for the old hardware address to update its ARP cache entry accordingly They inform switches of the MAC address of the machine on a given switch port, so that the switch knows that it should transmit packets sent to that MAC address on that switch port.

-Summary Comparison of UDP and TCP

Characteristic / DescriptionUDPTCP

General DescriptionSimple, high-speed, low-functionality wrapper that interfaces applications to the network layer and does little else.Full-featured protocol that allows applications to send data reliably without worrying about network layer issues.

Protocol Connection SetupConnectionless; data is sent without setup.Connection-oriented; connection must be established prior to transmission.

Data Interface To ApplicationMessage-based; data is sent in discrete packages by the application.Stream-based; data is sent by the application with no particular structure.

Reliability and AcknowledgmentsUnreliable, best-effort delivery without acknowledgments.Reliable delivery of messages; all data is acknowledged.

RetransmissionsNot performed. Application must detect lost data and retransmit if needed.Delivery of all data is managed, and lost data is retransmitted automatically.

Features Provided to Manage Flow of DataNoneFlow control using sliding windows; window size adjustment heuristics; congestion avoidance algorithms.

OverheadVery lowLow, but higher than UDP

Transmission SpeedVery highHigh, but not as high as UDP

Data Quantity SuitabilitySmall to moderate amounts of data (up to a few hundred bytes)Small to very large amounts of data (up to gigabytes)

Types of Applications That Use The ProtocolApplications where data delivery speed matters more than completeness, where small amounts of data are sent; or where multicast/broadcast are used.Most protocols and applications sending data that must be received reliably, including most file and message transfer protocols.

Well-KnownApplications and ProtocolsMultimedia applications, DNS, BOOTP, DHCP, TFTP, SNMP, RIP, NFS (early versions)FTP, Telnet, SMTP, DNS, HTTP, POP, NNTP, IMAP, BGP, IRC, NFS (later versions)

-Example of Protocols that use both TCP and UDP are DNS, NFSUDP: User Datagram Protocol- The User Datagram Protocol (UDP) was developed for use by application protocols that do not require reliability, acknowledgment, or flow control features at the transport layer. It is designed to be simple and fast. It provides only transport layer addressing (in the form of UDP ports), an optional checksum capability, and little else.- UDP is probably the simplest protocol in all of TCP/IP. It takes application layer data that has been passed to it, packages it in a simplified message format, and sends it to IP for transmission.

-A protocol uses UDP instead of TCP in two situations. The first is when an application values timely delivery over reliable delivery, and when TCPs retransmission of lost data would be of limited or even no value. The second is when a simple protocol can handle the potential loss of an IP datagram itself at the application layer using a timer/retransmit strategy, and when the other features of TCP are not required. Applications that require multicast or broadcast transmissions also use UDP, because TCP does not support those transmissions.UDP Message Format

Length The length of the entire UDP datagram, including both header and Data fields.

Checksum An optional 16-bit checksum computed over the entire UDP datagram plus a special pseudo header of fields. The method is same as that of TCP

Tcpip Notes

Documents

Transcript of Tcpip Notes