Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

25
Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    229
  • download

    10

Transcript of Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Page 1: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Transport Layer: UDP

COMS W6998

Spring 2010

Erich Nahum

Page 2: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Outline

UDP Layer Architecture Receive Path Send Path

Page 3: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Length (16)Length (16)

0 3 7 15 31

UDP packet format

Checksum (16)Checksum (16)

DataData

Recall what UDP Does RFC 768 IP Proto 17 Connectionless Unreliable Datagram Supports multicast Optional

checksum Nice and simple. Yet still 2187 lines

of code!

Source Port (16)Source Port (16) Destination Port (16)Destination Port (16)

Page 4: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

UDP Header

The udp header: include/linux/udp.h

struct udphdr {

__be16 source;

__be16 dest;

__be16 len;

__sum16 check;

};

Page 5: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Checksum Coverage (16)Checksum Coverage (16)

0 3 7 15 31

UDP packet format

Checksum (16)Checksum (16)

DataData

Sidebar: UDP-Lite RFC 3828 Very similar to UDP Difference is checksum

covers part of packet rather than all

Checksum coverage says how many bytes (starting from header) are covered by checksum

Idea is certain apps would rather have a damaged packet than none

Examples are audio, video codecs

IP Protocol 136 Linux UDP-Lite

implementation shares most code with UDP

Source Port (16)Source Port (16) Destination Port (16)Destination Port (16)

Page 6: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

1. Packets arrive on an interface and are passed to the udp_rcv() function.

2. UDP packets are packed into an IP packet and passed down to IP via ip_append_data() and ip_push_pending_frames()

Sources of UDP Packets

Page 7: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb

__udp_queue_rcv_skb

udp.cudp.c

Ip_output.cIp_output.c

udp_sendmsg

ip_append_data

ROUTING

ip_route_output_flow

socket.csocket.c

sock_sendmsg

udp_push_pending_frames

ip_push_pending_frames

UDP Implementation Design

Page 8: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

UDP Protostruct proto udp_prot = { .name = "UDP", .owner = THIS_MODULE, .close = udp_lib_close, .connect = ip4_datagram_connect, .disconnect = udp_disconnect, .ioctl = udp_ioctl, .destroy = udp_destroy_sock, .setsockopt = udp_setsockopt, .getsockopt = udp_getsockopt, .sendmsg = udp_sendmsg, .recvmsg = udp_recvmsg, .sendpage = udp_sendpage, .backlog_rcv = __udp_queue_rcv_skb, .hash = udp_lib_hash, .unhash = udp_lib_unhash, .get_port = udp_v4_get_port, .memory_allocated = &udp_memory_allocated, .sysctl_mem = sysctl_udp_mem, .sysctl_wmem = &sysctl_udp_wmem_min, .sysctl_rmem = &sysctl_udp_rmem_min, .obj_size = sizeof(struct udp_sock), .slab_flags = SLAB_DESTROY_BY_RCU, .h.udp_table = &udp_table,};

Page 9: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

udp_table/** * struct udp_table - UDP table * * @hash: hash table, sockets are hashed on (local port) * @hash2: hash table, sockets are hashed on (local port, local address) * @mask: number of slots in hash tables, minus 1 * @log: log2(number of slots in hash table) */struct udp_table { struct udp_hslot *hash; struct udp_hslot *hash2; unsigned int mask; unsigned int log;};

udp_table_init() allocates the hash tables, initializes them:

for (i = 0; i <= table->mask; i++) { INIT_HLIST_NULLS_HEAD(&table->hash[i].head, i);

table->hash[i].count = 0; spin_lock_init(&table->hash[i].lock); }

Page 10: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Outline

UDP Layer Architecture Receive Path Send Path

Page 11: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Receiving packets in UDP

From user space, you can receive udp traffic with three system calls: recv() (when the socket is connected). recvfrom() recvmsg()

All three are handled by udp_rcv() in the kernel.

Page 12: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Recall IP’s inet_protos

handlerhandler

err_handlererr_handler

net_protocol

gso_send_checkgso_send_check

udp_rcv()udp_err()

igmp_rcv()

Null

inet_protos[MAX_INET_PROTOS]inet_protos[MAX_INET_PROTOS]0

1

MAX_INET_PROTOS

net_protocol

gso_segmentgso_segment

gro_receivegro_receive

gro_completegro_complete

handlerhandler

err_handlererr_handler

net_protocol

gso_send_checkgso_send_check

gso_segmentgso_segment

gro_receivegro_receive

gro_completegro_complete

Page 13: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

Receive Path: udp_rcv

Calls __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP); Function is used by both

UDP and UDP-Lite

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb

__udp_queue_rcv_skb

Page 14: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

Receive: __udp4_lib_rcv

Looks up the route table from the skb

Checks that skb has a header Checks that length is good Calcs the checksum Pulls out saddr, daddr Checks if address is multicast

Calls __udp4_lib_mcast_deliver()

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb

__udp_queue_rcv_skb

Page 15: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

Receive: __udp4_lib_rcv (cont)

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb Looks up the socket in the

udptable Via __udp4_lib_lookup_skb() Increases refcount on the sk

(socket) If socket is found

Calls __udp_queue_rcv_skb() Decrements refcount with

sock_put(sk) If not,

Send ICMP_UNREACHABLE Drop packet.

__udp_queue_rcv_skb

Page 16: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

Recv: __udp_queue_rcv_skb

udp.cudp.c

udp_rcv

Ip_input.cIp_input.c

__udp4_lib_rcv

__udp4_lib_lookup_skb

ip_local_deliver_finish

MULTICASTMULTICAST

__udp4_lib_mcast_deliver

ICMPICMP

icmp_send

sock.csock.c

sock_queue_rcv_skb Calls sock_queue_rcv_skb Increments some statistics

__udp_queue_rcv_skb

Page 17: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Outline

IP Layer Architecture Receive Path Send Path

Page 18: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Sending packets in UDP From user space, you can send udp traffic with three

system calls: send() (when the socket is connected). sendto() sendmsg()

All three are handled by udp_sendmsg() in the kernel. udp_sendmsg() is much simpler than the tcp parallel

method , tcp_sendmsg(). udp_sendpage() is called when user space calls sendfile()

(to copy a file into a udp socket). sendfile() can be used also to copy data between one file descriptor

and another. udp_sendpage() invokes udp_sendmsg().

Page 19: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

UDP Socket Options

For IPPROTO_UDP/SOL_UDP level, there exists a socket option UDP_CORK

Added in Linux kernel 2.5.44.int state=1;setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,sizeof(state));for (j=1;j<1000;j++)sendto(s,buf1,...)state=0;setsockopt(s, IPPROTO_UDP, UDP_CORK, &state,sizeof(state));

Page 20: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

UDP_CORK (cont) The above code fragment will call udp_sendmsg() 1000

times without actually sending anything on the wire (in the usual case, when without setsockopt() with UDP_CORK, 1000 packets will be sent).

Only after the second setsockopt() is called, with UDP_CORK and state=0, one packet is sent on the wire.

Kernel implementation: when using UDP_CORK, udp_sendmsg() passes MSG_MORE to ip_append_data().

UDP_CORK is not in glibc, you need to add it to your program:#define UDP_CORK 1

Page 21: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

Send Path: udp_sendmsg()

udp.cudp.c

Ip_output.cIp_output.c

udp_sendmsg

ip_append_data

ROUTING

ip_route_output_flow

socket.csocket.c

sock_sendmsg Checks length, MSG_OOB Checks if there are frames

pending If so, jump to do_append_data

Gets the address Checks if socket is connected

If so, pull routing info out of sk Otherwise, look up via

ip_route_output_flow() Calls ip_append_data()

Handles fragmentation Calls

udp_push_pending_frames()

udp_push_pending_frames

ip_push_pending_frames

Page 22: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

Higher LayersHigher Layers

udp_push_pending_frames()

udp.cudp.c

Ip_output.cIp_output.c

udp_sendmsg

ip_append_data

ROUTING

ip_route_output_flow

socket.csocket.c

sock_sendmsg Checks that there is room

in the skb via skb_peek() If not, goto out and bail

Creates UDP header Checksums if necessary

(or partially for UDP-Lite) Calls

ip_push_pending_frames() Combines all pending IP

fragments on the socket as one IP datagram and sends it out

udp_push_pending_frames

ip_push_pending_frames

Page 23: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

UDP Backup

Page 24: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

nextnextprevprev

sk_buff

transport_headertransport_headernetwork_headernetwork_header

mac_headermac_header

...lots.....lots..

headheaddatadatatailtail

Packetdata

dataref: 1dataref: 1

UDP-Data

UDP-HeaderIP-Header

MAC-Header

net_devicenet_device

sk_buffsk_buffsk_buff_headsk_buff_head

struct sockstruct sock

sksktstamptstampdevdev

nr_fragsnr_frags

...of.....of.....stuff.....stuff..

endendtruesizetruesizeusersusers skb_shared_info

......destructor_argdestructor_arg

``headroom‘‘

``tailroom‘‘

linux-2.6.31/include/linux/skbuff.h

Recall the sk_buff structure

Page 25: Transport Layer: UDP COMS W6998 Spring 2010 Erich Nahum.

pkt_type: specifies the type of a packet PACKET_HOST: a packet sent to the local host PACKET_BROADCAST: a broadcast packet PACKET_MULTICAST: a multicast packet PACKET_OTHERHOST:a packet not destined for the

local host, but received in the promiscuous mode. PACKET_OUTGOING: a packet leaving the host PACKET_LOOKBACK: a packet sent by the local host

to itself.

Recall pkt_type in sk_buff