Network

89
Network 1 Principles of Computer System (2012 Fall) End-to-end Layer

description

Network. End-to-end Layer. Principles of Computer System (2012 Fall). Review. System Complexity Modularity & Naming Enforced Modularity C/S Virtualization: C/S on one Host Network Layers & Protocols P2P & Congestion control. Network is a system too. Network As a System - PowerPoint PPT Presentation

Transcript of Network

Page 1: Network

1

Network

Principles of Computer System (2012 Fall)

End-to-end Layer

Page 2: Network

2

Review

• System Complexity• Modularity & Naming• Enforced Modularity– C/S– Virtualization: C/S on one Host

• Network– Layers & Protocols– P2P & Congestion control

Page 3: Network

3

Network is a system too

• Network As a System– Network consists of many networks many links

many switches– Internet is a case study of successful system

Page 4: Network

Delay (transit time)• Propagation delay

– Depends on the speed of light in the transmission medium• Transmission delay

– Depends on the data rate of the link and length of the frame– Each time the packet is transmitted over a link

• Processing delay– E.g. examine the guidance info, checksum, etc.– And copying to/from memory

• Queuing delay– Waiting in buffer– Depends on the amount of other traffic

4

Page 5: Network

Recursive network composition

5

• Gnutella is a large decentralized P2P network• The link layer itself is a network

Page 6: Network

The Internet “Hour Glass”

6

Page 7: Network

Packet routing/forwarding

7

• Packet switching– Routing: choosing a particular path (control plane)– Forwarding: choosing an outgoing link (data plane)

• Usually by table lookup

Page 8: Network

Best-effort network• Best-effort network– If it cannot dispatch, may discard a packet

• Guaranteed-delivery network– Also called store-and-forward network, no discarding data– Work with complete messages rather than packets– Uses disk for buffering to handle peaks– Tracks individual message to make sure none are lost

• In real world– No absolute guarantee– Guaranteed-delivery: higher layer; best-effort: lower layer

8

Page 9: Network

NAT (Network Address Translation)• Private network– Public routers don’t accept routes to network 10

• NAT router: bridge the private networks– Router between private & public network– Send: modify source address to temp public address– Receive: modify back by looking mapping table

• Limitations– Some end-to-end protocol place address in payloads– The translator may become the bottleneck– What if two private network merge?

9

Page 10: Network

10

CASE STUDY: MAPPING INTERNET TO ETHERNET

Page 11: Network

Case study: mapping Internet to Ethernet

• Listen-before-sending rule, collision• Ethernet: CSMA/CD– Carrier Sense Multiple Access with Collision Detection

• Ethernet type– Experimental Ethernet, 3mpbs– Standard Ethernet, 10 mbps– Fast Ethernet, 100 mbps– Gigabit Ethernet, 1000 mbps

11

Page 12: Network

Overview of Ethernet• A half duplex Ethernet– The max propagation time is less than the 576 bit times,

the shortest allowable packet– So that two parties can detect a collision together

• Collision: wait random first time, exponential backoff if repeat

• A full duplex & point-to-point Ethernet– No collisions & the max length of the link is determined by

the physical medium

12

Page 13: Network

Broadcast aspects of Ethernet

• Broadcast network– Every frame is delivered to every station– (Compare with forwarding network)

• ETHERNET_SEND– Pass the call along to the link layer

• ETHERNET_HANDLE– Simple, can even be implemented in hardware

13

Page 14: Network

Layer mapping• The internet network layer– NETWORK_SEND (data, length, RPC, INTERNET, N)– NETWORK_SEND (data, length, RPC, ENET, 18)

• L must maintain a table

14

Page 15: Network

ARP (Address Resolution Protocol)• NETWORK_SEND (“where is M?”, 11, ARP, ENET,

BROADCAST)• NETWORK_SEND (“M is at station 15”, 18, ARP, ENET,

BROADCAST)• L ask E’s Ethernet address, E does not hear the Ethernet

broadcast, but the router at station 19 does, and it sends a suitable ARP response instead

• Manage forwarding table as a cache

15

Page 16: Network

ARP & RARP protocol

16

• Name mapping: IP address <-> MAC address

Page 17: Network

ARP spoofing

17

Page 18: Network

18

END-TO-END LAYER

Page 19: Network

19

E2E Transport

• Reliability: “At Least Once Delivery”– Lock-step– Sliding Window

• Congestion Control– Flow Control– Additive Increase Multiplicative Decrease

Page 20: Network

The end-to-end layer

• Network layer is not enough– No guarantees about delay– Order of arrival– Certainty of arrival– Accuracy of content– Right place to deliver

• End-to-end layer– No single design is likely to suffice– Transport protocol for each class of application

20

Page 21: Network

Famous transport protocols• UDP (User Datagram Protocol)– Be used directly for some simple applications– Also be used as a component for other protocols

• TCP (Transmission Control Protocol)– Keep order, no missing, no duplication– Provision for flow control

• RTP (Real-time Transport Protocol)– Built on UDP– Be used for streaming video or voice, etc.

• Other protocols, as presentation protocols

21

Page 22: Network

Assurance of end-to-end protocol

• Seven Assurances1. Assurance of at-least-once delivery2. Assurance of at-most-once delivery3. Assurance of data integrity4. Assurance of end-to-end performance5. Assurance of stream order & closing of

connections6. Assurance of jitter control7. Assurance of authenticity and privacy

22

Page 23: Network

Assurance of at-least-once delivery

• RTT (Round-trip time)– to_time + process_time + back_time (ack)

• At least once on best effort network– Send packet with nonce– Sender keeps a copy of the packet– Resend if timeout before receiving acknowledge– Receiver acknowledges a packet with its nonce

• Try limit times before return error to app

23

Page 24: Network

Assurance of at-least-once delivery• Dilemma– 1. The data was not delivered– 2. The data was delivered, but no ACK received– No way to know which situation

• At-least-once delivery– No absolute assurance for at-least-once– Ensure if it is possible to get through, the message will get

through eventually– Ensure if it’s not possible to confirm delivery, app will

know– No assurance for no-duplication

24

Page 25: Network

Timeout

• Fixed timer: dilemma of fixed timer– Too short: unnecessary resend– Too long: take long time to discover lost packets

• Adaptive timer– E.g. Adjust by currently observed RTT, set timer to 150%– Exponential backoff: wait 1, 2, 4, 8, 16, ... seconds

• NAK (Negative Acknowledgment)– Receiver sends a message that lists missing items– Receiver can count arriving segments rather than timer– Receiver can have no timer (only once per stream)

25

Page 26: Network

Congestion collapse in NFS

• Congestion Collapse in NFS– Using at-least-once with stateless interface– Persistent client: repeat resending forever– Server: FIFO– Timeout when queuing and resend– Re-execute the resend request and waste time• When the queue becomes longer

– Lesson: Fixed timers are always a source of trouble, sometimes catastrophic trouble

26

Page 27: Network

Emergent phase synchronization of periodic protocols

• Periodic polling– E.g. picking up mail, sending “are-you-there?”– A workstation sends a broadcast packet every 5 minutes– All workstations try to broadcast at the same time

• Each workstation– Send a broadcast– Set a fixed timer

• Lesson: Fixed timers have many evils. Don’t assume that unsynchronized periodic activities will stay that way

27

Page 28: Network

Wisconsin time server meltdown• NETGEAR added a feature to wireless router– Logging packets -> timestamp -> time server (SNTP) ->

name discovery -> 128.105.39.11– Once per second until receive a response– Once per minute or per day after that

• Wisconsin Univ.– On May 14, 2003, at about 8:00 a.m– From 20,000 to 60,000 requests per second, filtering

23457– After one week, 270,000 requests per second, 150Mbps

28

Page 29: Network

Wisconsin time server meltdown

• Lesson(s)– Fixed timers again– Fixed Internet address– The client implements only part of a protocol• There is a reason for features such as the “go away”

response in SNTP

29

Page 30: Network

30

Timeout

• RTT– Timeout should depend on

RTT– Sender measures the time

between transmitting a packet and receiving its ack, which gives one sample of the RTT

Page 31: Network

31

RTT Could be Highly Variable

Page 32: Network

32

Calculating RTT and Timeout (in TCP)

• Exponentially Weighted Moving Average– Estimate both the average rtt_avg and the

deviation rtt_dev

– Procedure calc_rtt(rtt_sample)• rtt_avg = a*rtt_sample + (1-a)*rtt_avg; /* a = 1/8 */ • dev = absolute(rtt_sample – rtt_avg); • rtt_dev = b*dev + (1-b)*rtt_dev; /* b = 1/4 */

– Procedure calc_timeout(rtt_avg, rtt_dev)• Timeout = rtt_avg + 4*rtt_dev

Page 33: Network

End-to-end performance• Multi-segment message questions– Trade-off between complexity and performance– Lock-step protocol

33

Page 34: Network

Overlapping transmissions

• Pipelining technique

34

Page 35: Network

35

Fixed Window• Receiver tells the sender a

window size• Sender sends window• Receiver acks each packet as

before• Window advances when all

packets in previous window are acked– E.g., packets 4-6 sent, after 1-3 ack’d

• If a packet times out -> rxmit packets

• Still much idle time

Page 36: Network

36

Sliding Window• Sender advances the

window by 1 for each in-sequence ack it receives– Reduces idle periods– Pipelining idea

• But what’s the correct value for the window?– We’ll revisit this question– First, we need to understand

windows

Page 37: Network

Overlapping transmissions• Problems– Packets or ACK may be lost

• Sender holds a list of segments sent, check it off when receives ACK

• Set a timer (a little more than RTT) for last segment– If list of missing ACK is empty, OK– If timer expires, resend packets and another timer

37

Page 38: Network

38

Handling Packet Loss

Page 39: Network

39

Chose the right window size

• Window is too small– Long idle time– Underutilized network

• Window too large– Congestion

Page 40: Network

Sliding window sizewindow size ≥ round-trip time × bottleneck data rate

•Sliding window with one segment in size– Data rate is window size / RTT

•Enlarge window size to bottleneck data rate– Data rate is window size / RTT

•Enlarge window size further– Data rate is still bottleneck– Larger window makes no sense

40

Page 41: Network

Self-pacing• Sliding window size– Although the sender doesn’t know the bottleneck, it is

sending at exactly that rate– Once sender fills a sliding window, cannot send next data

until receive ACK of the oldest data in the window– The receiver cannot generate ACK faster than the network

can deliver data elements– e.g. receive 500 KBps, sender 1 MBps, RTT 70ms, segment

carries 512 Bytes, sliding window size = 70 (35KB)– RTT estimation still needed

• Needs to err on the side of being too small

41

Page 42: Network

Congestion control• Requires cooperation of more than one layer

• A shared resource, and demands from several statistically independent sources, there will be fluctuations in the arrival of load, and thus in the length of queue and time spent waiting

42

Page 43: Network

Managing shared resources• Overload is inevitable, but how long does it last?

– A queue handles short bursts by time-averaging with adjacent periods when there is excess capacity

– If overload persists longer than service time, then may cause max delay, which is called congestion

• Congestion– Maybe temporary or chronic– Stability of offered load

• Large number of small source vs. small number of large source– Congestion collapse

• Competition for a resource sometimes leads to wasting of that resource (Sales & checkout clerks)

43

Page 44: Network

Congestion collapse

44

Page 45: Network

45

Setting Window Size: Congestion

Page 46: Network

46

Setting Window Size: Congestion

Page 47: Network

47

Congestion Control• Basic Idea:– Increase cwnd slowly– If no drops -> no congestion yet– If a drop occurs -> decrease cwnd quickly

• Use the idea in a distributed protocol that achieves– Efficiency: i.e., uses the bottleneck capacity efficiently– Fairness, i.e., senders sharing a bottleneck get equal

throughput (if they have demands)• Every RTT:– No drop: cwnd = cwnd + 1– A drop: cwnd = cwnd / 2

Page 48: Network

48

Additive Increase

Page 49: Network

49

AIMD Leads to Efficiency and Fairness

Page 50: Network

Retrofitting TCP• 1. Slow start: one packet at first, then double until– Sender reaches the window size suggested by the receiver – All the available data has been dispatched– Sender detects that a packet it sent has been discarded

• 2. Duplicate ACK– When receiver gets an out-of-order packet, it sends back a

duplicate of latest ACK• 3. Equilibrium– Additive increase & multiplicative decrease

• 4. Restart, after waiting a short time

50

Page 51: Network

Retrofitting TCP

51

Page 52: Network

52

Summary of E2E Transport

• Reliability Using Sliding Window– Tx Rate = W / RTT

• Congestion Control– W = min(Receiver_buffer, cwnd)– cwnd is adapted by the congestion control

protocol to ensure efficiency and fairness– TCP congestion control uses AIMD which provides

fairness and efficiency in a distributed way

Page 53: Network

53

P2P NETWORK

Page 54: Network

54

Downsides of C/S

• Centralized Infrastructure– Centralized point of failure– High management costs• If one org has to host millions of files, etc.

– Not suitable for many scenarios• E.g., cooperation between you and me

– Lack ability to aggregate clients

Page 55: Network

55

P2P: Peer-to-peer

• No central servers!• Questions– How to track nodes and objects in the system?– How do you find other nodes in the system?– How should data be split up between nodes?– How to prevent data from being lost? • How to keep it available?

– How to provide consistency?– How to provide security? anonymity?

Page 56: Network

56

BitTorrent• Usage Model: Cooperative– User downloads file from someone using simple user

interface– While downloading, BitTorrent serves file also to others– BitTorrent keeps running for a little while after

download completes• 3 Roles– Tracker: What peer serves which parts of a file– Seeder: Own the whole file– Peer: Turn a seeder once has 100% of a file

Page 57: Network

57

BitTorrent• Publisher a .torrent file on a Web server (e.g., suprnova.org)

– URL of tracker– file name, length– SHA1s of data blocks (64-512Kbyte)

• Tracker– Organizes a swarm of peers (who has what block?)

• Seed posts the URL for .torrent with tracker– Seed must have complete copy of file– Every peer that is online and has copy of a file becomes a seed

• Peer asks tracker for list of peers to download from– Tracker returns list with random selection of peers

• Peers contact peers to learn what parts of the file they have etc.– Download from other peers

Page 58: Network

58

A torrent file { 'announce': 'http://bttracker.debian.org:6969/announce', 'info': { 'name': 'debian-503-amd64-CD-1.iso', 'piece length': 262144, 'length': 678301696, 'pieces': '841ae846bc5b6d7bd6e9aa3dd9e551559c82abc1...d14f1631d 776008f83772ee170c42411618190a4' }}

Page 59: Network

59

Which piece to download?

• Order of parts downloading– Strict?– Rarest first?– Random?– Parallel?

• BitTorrent– Random for the first one– Rarest first for the rest– Parallel for the last one

Page 60: Network

60

Drawback of BitTorrent

• Rely on Tracker– Tracker is central component– Cannot scale to large number of torrents

Page 61: Network

61

Scalable Lookup• Interface– Provide an abstract interface to store and find data

• Typical DHT interface:– put(key, value)– get(key) -> value– loose guarantees about keeping data alive

• For BitTorrent trackers:– announce tracker: put(SHA(URL), my-ip-address)– find tracker: get(SHA(url)) -> IP address of tracker

• Some DHT-based trackers exist.– Many other usages of DHTs

Page 62: Network

62

P2P Implementation of DHT

• Overlay Network– partition hash table over n nodes– not every node knows about all other n nodes– rout to find right hash table

• Goals– log(n) hops– Guarantees about load balance

Page 63: Network

63

A DHT in Operation: put()

Page 64: Network

64

A DHT in Operation: get()

Page 65: Network

65

Chord Properties

• Efficient: O(log(N)) messages per lookup– N is the total number of servers

• Scalable: O(log(N)) state per node• Robust: survives massive failures

Page 66: Network

66

Chord IDs

• Key identifier = SHA-1(key)• Node identifier = SHA-1(IP address)• Both are uniformly distributed• Both exist in the same ID space

• How to map key IDs to node IDs?

Page 67: Network

67

Consistent Hashing

Page 68: Network

68

Basic lookup

Page 69: Network

69

Simple lookup algorithm

Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(id) on node n // next hop else return my successor // done

• Correctness depends only on successors

Page 70: Network

70

“Finger Table” allows log(N) lookups

Page 71: Network

71

Finger i points to successor of n+2i

Page 72: Network

72

Lookup with fingers

Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n < key-id if n exists call Lookup(id) on node n // next hop else return my successor // done

Page 73: Network

73

Lookups take O(log(N)) hops

Page 74: Network

74

Join (1)

N36

1. Lookup(36)

N25

N40 K30K38

Page 75: Network

75

Join (2)

N361. Lookup(36)

N25

N40 K30K38

Page 76: Network

76

Join (3)

N361. Lookup(36)

N25

N40 K30K38

K30

Page 77: Network

77

Join (4)

N364. Set N25’s

successor pointer

N25

N40 K30K38

K30

Update finger pointers in the background Correct

successors produce correct lookups

Page 78: Network

78

Failures might cause incorrect lookup

N80

N85

N102

N113

N120 N10

N80 doesn’t know correct successor, so incorrect lookup

Lookup(90)

Page 79: Network

79

Solution: successor lists

• Successor Lists– Each node knows r immediate successors– After failure, will know first live successor– Correct successors guarantee correct lookups

• Guarantee is with some probability

Page 80: Network

80

Solution: successor lists

• Successor List Length– Assume 1/2 of nodes fail – P(successor list all dead) = (1/2)r

• I.e. P(this node breaks the Chord ring) • Depends on independent failure

– P(no broken nodes) = (1 – (1/2)r)N

• r = 2log(N) makes prob. = 1 – 1/N

Page 81: Network

81

Lookup with fault tolerance

Lookup(my-id, key-id) look in local finger table and successor-list for highest node n s.t. my-id < n < key-id if n exists call Lookup(id) on node n // next hop if call failed, remove n from finger table return Lookup(my-id, key-id) else return my successor // done

Page 82: Network

82

Other design issues

• Concurrent joins• Locality• Heterogeneous node• Dishonest nodes• ...

Page 83: Network

83

WAR STORIES

Page 84: Network

War stories: surprises in protocol design

• Fixed Timers Lead to Congestion Collapse in NFS• Autonet Broadcast Storms• Emergent Phase Synchronization of Periodic

Protocols• Wisconsin Time Server Meltdown

84

Page 85: Network

Congestion collapse in NFS

• Congestion Collapse in NFS– Using at-least-once with stateless interface– Persistent client: repeat resending forever– Server: FIFO– Timeout when queuing and resend– Re-execute the resend request and waste time• When the queue becomes longer

– Lesson: Fixed timers are always a source of trouble, sometimes catastrophic trouble

85

Page 86: Network

Autonet broadcast storms• Designed by DEC to handle broadcast elegantly– Physical layer: point-to-point coaxial cables

• Network as a tree– Broadcast packet first to root then down– Nodes accepted only packets going downward

• No duplicate broadcast– Problem: every once in a while, the network collapsed with a

storm of repeated broadcast packets• Lesson: Emergent properties often arise from the interaction of

apparently unrelated system features operating at different system layers, in this case, link-layer reflections and network-layer broadcasts.

86

Page 87: Network

Emergent phase synchronization of periodic protocols

• Periodic polling– E.g. picking up mail, sending “are-you-there?”– A workstation sends a broadcast packet every 5 minutes– All workstations try to broadcast at the same time

• Each workstation– Send a broadcast– Set a fixed timer

• Lesson: Fixed timers have many evils. Don’t assume that unsynchronized periodic activities will stay that way

87

Page 88: Network

Wisconsin time server meltdown• NETGEAR added a feature to wireless router– Logging packets -> timestamp -> time server (SNTP) ->

name discovery -> 128.105.39.11– Once per second until receive a response– Once per minute or per day after that

• Wisconsin Univ.– On May 14, 2003, at about 8:00 a.m– From 20,000 to 60,000 requests per second, filtering

23457– After one week, 270,000 requests per second, 150Mbps

88

Page 89: Network

Wisconsin time server meltdown

• Lesson(s)– Fixed timers again– Fixed Internet address– The client implements only part of a protocol• There is a reason for features such as the “go away”

response in SNTP

89