Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Post on 21-Dec-2015

229 views 10 download

Transcript of Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Transport Layer: UDP

COMS W6998

Spring 2010

Erich Nahum

Outline

UDP Layer Architecture Receive Path Send Path

Length (16)Length (16)

0 3 7 15 31

UDP packet format

Checksum (16)Checksum (16)

DataData

Recall what UDP Does RFC 768 IP Proto 17 Connectionless Unreliable Datagram Supports multicast Optional

checksum Nice and simple. Yet still 2187 lines

of code!

Source Port (16)Source Port (16) Destination Port (16)Destination Port (16)

UDP Header

The udp header: include/linux/udp.h

struct udphdr {

__be16 source;

__be16 dest;

__be16 len;

__sum16 check;

};

Checksum Coverage (16)Checksum Coverage (16)

0 3 7 15 31

UDP packet format

Checksum (16)Checksum (16)

DataData

Sidebar: UDP-Lite RFC 3828 Very similar to UDP Difference is checksum

covers part of packet rather than all

Checksum coverage says how many bytes (starting from header) are covered by checksum

Idea is certain apps would rather have a damaged packet than none

Examples are audio, video codecs

IP Protocol 136 Linux UDP-Lite

implementation shares most code with UDP

Source Port (16)Source Port (16) Destination Port (16)Destination Port (16)

1. Packets arrive on an interface and are passed to the udp_rcv() function.

2. UDP packets are packed into an IP packet and passed down to IP via ip_append_data() and ip_push_pending_frames()

Sources of UDP Packets

Higher LayersHigher Layers

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb

__udp_queue_rcv_skb

udp.cudp.c

Ip_output.cIp_output.c

udp_sendmsg

ip_append_data

ROUTING

ip_route_output_flow

socket.csocket.c

sock_sendmsg

udp_push_pending_frames

ip_push_pending_frames

UDP Implementation Design

UDP Protostruct proto udp_prot = { .name = "UDP", .owner = THIS_MODULE, .close = udp_lib_close, .connect = ip4_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, .destroy = udp_destroy_sock, .setsockopt = udp_setsockopt, .getsockopt = udp_getsockopt, .sendmsg = udp_sendmsg, .recvmsg = udp_recvmsg, .sendpage = udp_sendpage, .backlog_rcv = __udp_queue_rcv_skb, .hash = udp_lib_hash, .unhash = udp_lib_unhash, .get_port = udp_v4_get_port, .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem = &sysctl_udp_wmem_min, .sysctl_rmem = &sysctl_udp_rmem_min, .obj_size = sizeof(struct udp_sock), .slab_flags = SLAB_DESTROY_BY_RCU, .h.udp_table = &udp_table,};

udp_table/** * struct udp_table - UDP table * * @hash: hash table, sockets are hashed on (local port) * @hash2: hash table, sockets are hashed on (local port, local address) * @mask: number of slots in hash tables, minus 1 * @log: log2(number of slots in hash table) */struct udp_table { struct udp_hslot *hash; struct udp_hslot *hash2; unsigned int mask; unsigned int log;};

udp_table_init() allocates the hash tables, initializes them:

for (i = 0; i <= table->mask; i++) { INIT_HLIST_NULLS_HEAD(&table->hash[i].head, i);

table->hash[i].count = 0; spin_lock_init(&table->hash[i].lock); }

Outline

UDP Layer Architecture Receive Path Send Path

Receiving packets in UDP

From user space, you can receive udp traffic with three system calls: recv() (when the socket is connected). recvfrom() recvmsg()

All three are handled by udp_rcv() in the kernel.

Recall IP’s inet_protos

handlerhandler

err_handlererr_handler

net_protocol

gso_send_checkgso_send_check

udp_rcv()udp_err()

igmp_rcv()

Null

inet_protos[MAX_INET_PROTOS]inet_protos[MAX_INET_PROTOS]0

1

MAX_INET_PROTOS

net_protocol

gso_segmentgso_segment

gro_receivegro_receive

gro_completegro_complete

handlerhandler

err_handlererr_handler

net_protocol

gso_send_checkgso_send_check

gso_segmentgso_segment

gro_receivegro_receive

gro_completegro_complete

Higher LayersHigher Layers

Receive Path: udp_rcv

Calls __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP); Function is used by both

UDP and UDP-Lite

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb

__udp_queue_rcv_skb

Higher LayersHigher Layers

Receive: __udp4_lib_rcv

Looks up the route table from the skb

Checks that skb has a header Checks that length is good Calcs the checksum Pulls out saddr, daddr Checks if address is multicast

Calls __udp4_lib_mcast_deliver()

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb

__udp_queue_rcv_skb

Higher LayersHigher Layers

Receive: __udp4_lib_rcv (cont)

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb Looks up the socket in the

udptable Via __udp4_lib_lookup_skb() Increases refcount on the sk

(socket) If socket is found

Calls __udp_queue_rcv_skb() Decrements refcount with

sock_put(sk) If not,

Send ICMP_UNREACHABLE Drop packet.

__udp_queue_rcv_skb

Higher LayersHigher Layers

Recv: __udp_queue_rcv_skb

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb Calls sock_queue_rcv_skb Increments some statistics

__udp_queue_rcv_skb

Outline

IP Layer Architecture Receive Path Send Path

Sending packets in UDP From user space, you can send udp traffic with three

system calls: send() (when the socket is connected). sendto() sendmsg()

All three are handled by udp_sendmsg() in the kernel. udp_sendmsg() is much simpler than the tcp parallel

method , tcp_sendmsg(). udp_sendpage() is called when user space calls sendfile()

(to copy a file into a udp socket). sendfile() can be used also to copy data between one file descriptor

and another. udp_sendpage() invokes udp_sendmsg().

UDP Socket Options

For IPPROTO_UDP/SOL_UDP level, there exists a socket option UDP_CORK

Added in Linux kernel 2.5.44.int state=1;setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,sizeof(state));for (j=1;j<1000;j++)sendto(s,buf1,...)state=0;setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,sizeof(state));

UDP_CORK (cont) The above code fragment will call udp_sendmsg() 1000

times without actually sending anything on the wire (in the usual case, when without setsockopt() with UDP_CORK, 1000 packets will be sent).

Only after the second setsockopt() is called, with UDP_CORK and state=0, one packet is sent on the wire.

Kernel implementation: when using UDP_CORK, udp_sendmsg() passes MSG_MORE to ip_append_data().

UDP_CORK is not in glibc, you need to add it to your program:#define UDP_CORK 1

Higher LayersHigher Layers

Send Path: udp_sendmsg()

udp.cudp.c

Ip_output.cIp_output.c

udp_sendmsg

ip_append_data

ROUTING

ip_route_output_flow

socket.csocket.c

sock_sendmsg Checks length, MSG_OOB Checks if there are frames

pending If so, jump to do_append_data

Gets the address Checks if socket is connected

If so, pull routing info out of sk Otherwise, look up via

ip_route_output_flow() Calls ip_append_data()

Handles fragmentation Calls

udp_push_pending_frames()

udp_push_pending_frames

ip_push_pending_frames

Higher LayersHigher Layers

udp_push_pending_frames()

udp.cudp.c

Ip_output.cIp_output.c

udp_sendmsg

ip_append_data

ROUTING

ip_route_output_flow

socket.csocket.c

sock_sendmsg Checks that there is room

in the skb via skb_peek() If not, goto out and bail

Creates UDP header Checksums if necessary

(or partially for UDP-Lite) Calls

ip_push_pending_frames() Combines all pending IP

fragments on the socket as one IP datagram and sends it out

udp_push_pending_frames

ip_push_pending_frames

UDP Backup

nextnextprevprev

sk_buff

transport_headertransport_headernetwork_headernetwork_header

mac_headermac_header

...lots.....lots..

headheaddatadatatailtail

Packetdata

dataref: 1dataref: 1

UDP-Data

UDP-HeaderIP-Header

MAC-Header

net_devicenet_device

sk_buffsk_buffsk_buff_headsk_buff_head

struct sockstruct sock

sksktstamptstampdevdev

nr_fragsnr_frags

...of.....of.....stuff.....stuff..

endendtruesizetruesizeusersusers skb_shared_info

......destructor_argdestructor_arg

``headroom‘‘

``tailroom‘‘

linux-2.6.31/include/linux/skbuff.h

Recall the sk_buff structure

pkt_type: specifies the type of a packet PACKET_HOST: a packet sent to the local host PACKET_BROADCAST: a broadcast packet PACKET_MULTICAST: a multicast packet PACKET_OTHERHOST:a packet not destined for the

local host, but received in the promiscuous mode. PACKET_OUTGOING: a packet leaving the host PACKET_LOOKBACK: a packet sent by the local host

to itself.

Recall pkt_type in sk_buff