Computer Networks and Applications - University of New … · 2016-03-07 · Introduction(Protocol...

126
Introduction(Protocol Layering, Security) & Application Layer (Principles, Web, FTP) Computer Networks and Applications Week 2 COMP 3331/COMP 9331 Reading Guide: Chapter 1, Sections 1.5 - 1.7 Chapter 2, Sections 2.1 – 2.3 1

Transcript of Computer Networks and Applications - University of New … · 2016-03-07 · Introduction(Protocol...

Introduction(Protocol Layering, Security) &

Application Layer (Principles, Web, FTP)

Computer Networks and Applications

Week 2

COMP 3331/COMP 9331

Reading Guide: Chapter 1, Sections 1.5 - 1.7 Chapter 2, Sections 2.1 – 2.3

1

Course Introduction

v  Web: http://www.cse.unsw.edu.au/~cs3331 v  Read course outline on the webpage v  Labs begin in Week 2

§  Please attend your allocated slot §  Please read the Tools of the Trade introduction

document §  You get one week to work on your reports. Lab

Reports due in Week 3 before your next lab, e.g., for students attending the Monday 12noon lab, the report is due 11:59am, Monday.

RECAP

2

1. Introduction: roadmap 1.1 what is the Internet? 1.2 network edge

§  end systems, access networks, links

1.3 network core §  packet switching, circuit switching, network structure

1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history

3

Three (networking) design steps

v  Break down the problem into tasks

v  Organize these tasks

v  Decide who does what

4

Tasks in Networking

v  What does it take to send packets across country?

v  Simplistic decomposition: §  Task 1: send along a single wire

§  Task 2: stitch these together to go across country

v  This gives idea of what I mean by decomposition

5

Tasks in Networking (bottom up)

v  Bits on wire v  Packets on wire v  Deliver packets within local network v  Deliver packets across global network v  Ensure that packets get to the destination v  Do something with the data

6

Resulting Modules

v  Bits on wire (Physical) v  Packets on wire (Physical) v  Delivery packets within local network (Datalink) v  Deliver packets across global network (Network) v  Ensure that packets get to the dst. (Transport) v  Do something with the data (Application)

This is decomposition… Now, how do we organize these tasks?

7

Dear John, Your days are numbered.

--Pat

Inspiration… v  CEO A writes letter to CEO B

§  Folds letter and hands it to administrative aide »  Aide:

»  Puts letter in envelope with CEO B’s full name

»  Takes to FedEx

v  FedEx Office §  Puts letter in larger envelope §  Puts name and street address on FedEx envelope §  Puts package on FedEx delivery truck

v  FedEx delivers to other company

8

CEO

Aide

FedEx

CEO

Aide

FedEx Location Fedex Envelope (FE)

The Path of the Letter

Letter

Envelope

Semantic Content

Identity

“Peers” on each side understand the same things No one else needs to (abstraction)

Lowest level has most packaging

9

The Path Through FedEx

Truck

Sorting Office

Airport

FE

Sorting Office

Airport

Truck

Sorting Office

Airport

Crate Crate

FE

New Crate Crate

FE

Higher “Stack” at Ends

Partial “Stack” During Transit

Deepest Packaging (Envelope+FE+Crate) at the Lowest Level of Transport

Highest Level of “Transit Stack” is Routing

10

In the context of the Internet

Applications

…built on…

…built on…

…built on…

…built on…

Reliable (or unreliable) transport

Best-effort global packet delivery

Best-effort local packet delivery

Physical transfer of bits

11

Internet protocol stack v  application: supporting network

applications §  FTP, SMTP, HTTP, Skype, ..

v  transport: process-process data transfer §  TCP, UDP

v  network: routing of datagrams from source to destination §  IP, routing protocols

v  link: data transfer between neighboring network elements §  Ethernet, 802.111 (WiFi), PPP

v  physical: bits “on the wire” 12

Quiz: Layering

This model is many years old. If you had to pick a layer to “update” which would you choose? Why?

This model is many years old. If you had to pick a layer to “update” which would you choose? Why?

38

Application: the application (e.g., the Web, Email)

Transport: end-to-end connections, reliability

Network: routing

Link (data-link): framing, error detection

Physical: 1’s and 0’s/bits across a medium (copper, the air, fiber)

A

B

C

D

E

Three Observations v  Each layer:

§  Depends on layer below §  Supports layer above §  Independent of others

v  Multiple versions in layer §  Interfaces differ somewhat §  Components pick which lower-

level protocol to use

v  But only one IP layer §  Unifying protocol

Layering Crucial to Internet’s Success

v  Reuse v  Hides underlying detail v  Innovation at each level can

proceed in parallel v  Modularization eases

maintenance, updating of system §  change of implementation of

layer’s service transparent to rest of system

v  Pursued by very different communities

2-16

An Example: No Layering

v  No layering: each new application has to be re-implemented for every network technology !

ssh HTTP

Wireless Ether-net

Fiber optic

Application

Transmission Media

Skype

An Example: Benefit of Layering

v  Introducing an intermediate layer provides a common abstraction for various network technologies

Skype ssh HTTP

Wireless Ethernet Fiber optic

Application

Transmission Media

Transport & Network

17

v  Layer N may duplicate lower level functionality §  E.g., error recovery to retransmit lost data

v  Information hiding may hurt performance §  E.g. packet loss due to corruption vs. congestion

v  Headers start to get really big §  E.g., typically TCP + IP + Ethernet headers add up to

54 bytes v  Layer violations when the gains too great to resist

§  E.g., TCP-over-wireless v  Layer violations when network doesn’t trust ends

§  E.g., Firewalls

Is Layering Harmful?

18

Distributing Layers Across Network

v  Layers are simple if only on a single machine §  Just stack of modules interacting with those above/

below

v  But we need to implement layers across machines §  Hosts §  Routers §  Switches

v  What gets implemented where?

19

“End-to-End” Argument

v  Don’t provide a function at lower layer of abstraction (layer) if you have to do it at higher layer anyway – unless there is a very good performance reason to do so

v  Examples: error control, quality of service

20

J. Satlzer, D. Reed and D. Clark, “End-to-end Arguments in System Design”, ACM Transactions on Computer Systems, vol. 2 (4), pp. 277-288, 1984 http://web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf

What Gets Implemented on Host?

v  Bits arrive on wire, must make it up to application

v  Therefore, all layers must exist at host!

21

What Gets Implemented on Router?

v  Bits arrive on wire §  Physical layer necessary

v  Packets must be delivered to next-hop §  datalink layer necessary

v  Routers participate in global delivery §  Network layer necessary

v  Routers don’t support reliable delivery §  Transport layer (and above) not supported

22

23

Internet Layered Architecture

HTTP

TCP

IP

Ethernet interface

HTTP

TCP

IP

Ethernet interface

IP IP

Ethernet interface

Ethernet interface

SONET interface

SONET interface

host host

router router

HTTP message

TCP segment

IP packet IP packet IP packet

23

Logical Communication

v  Layers interacts with peer’s corresponding layer

Transport Network Datalink Physical

Transport Network Datalink Physical

Network Datalink Physical

Application Application

Host A Host B Router

24

Physical Communicationv  Communication goes down to physical network v  Then from network peer to peer v  Then up to relevant layer

Transport Network Datalink Physical

Transport Network Datalink Physical

Network Datalink Physical

Application Application

Host A Host B Router

25

Organization of air travel

v  a series of steps

ticket (purchase) baggage (check) gates (load) runway takeoff airplane routing

ticket (complain) baggage (claim) gates (unload) runway landing airplane routing

airplane routing

26

Self Study

ticket (purchase)

baggage (check)

gates (load)

runway (takeoff)

airplane routing

departure airport

arrival airport

intermediate air-traffic control centers

airplane routing airplane routing

ticket (complain)

baggage (claim

gates (unload)

runway (land)

airplane routing

ticket

baggage

gate

takeoff/landing

airplane routing

Layering of airline functionality

layers: each layer implements a service §  via its own internal-layer actions §  relying on services provided by layer below

Self Study

27

source application transport network

link physical

Ht Hn M

segment Ht

datagram

destination

application transport network

link physical

Ht Hn Hl M

Ht Hn M

Ht M

M

network link

physical

link physical

Ht Hn Hl M

Ht Hn M

Ht Hn M

Ht Hn Hl M

router

switch

Encapsulation message M

Ht M

Hn

frame

28

1. Introduction: roadmap 1.1 what is the Internet? 1.2 network edge

§  end systems, access networks, links

1.3 network core §  packet switching, circuit switching, network structure

1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history

29

Self Study

Network security

v  field of network security: §  how bad guys can attack computer networks §  how we can defend networks against attacks §  how to design architectures that are immune to

attacks v  Internet not originally designed with (much)

security in mind §  original vision: “a group of mutually trusting users

attached to a transparent network” J §  Internet protocol designers playing “catch-up” §  security considerations in all layers!

30 Disclaimer: This is a high-level view, details will be covered later

Self Study

Bad guys: put malware into hosts via Internet

v  malware can get in host from: §  virus: self-replicating infection by receiving/executing

object (e.g., e-mail attachment)

§  worm: self-replicating infection by passively receiving object that gets itself executed

v  spyware malware can record keystrokes, web sites visited, upload info to collection site

v  infected host can be enrolled in botnet, used for spam. DDoS attacks

31

Self Study

target

Denial of Service (DoS): attackers make resources (server, bandwidth) unavailable to legitimate traffic by overwhelming resource with bogus traffic

1. select target

2. break into hosts around the network (see botnet)

3. send packets to target from compromised hosts

Bad guys: attack server, network infrastructure

32

Self Study

Bad guys can sniff packets packet “sniffing”:

§  broadcast media (shared ethernet, wireless) §  promiscuous network interface reads/records all packets

(e.g., including passwords!) passing by

A

B

C

src:B dest:A payload

v wireshark software used for end-of-chapter labs is a (free) packet-sniffer

33

Self Study

Bad guys can use fake addresses

IP spoofing: send packet with false source address

A

B

C

src:B dest:A payload

… lots more on security (throughout, Chapter 8)

34

Self Study

Source: www.dilbert.com

35

1. Introduction : roadmap 1.1 what is the Internet? 1.2 network edge

§  end systems, access networks, links

1.3 network core §  packet switching, circuit switching, network structure

1.4 delay, loss, throughput in networks 1.5 protocol layers, service models 1.6 networks under attack: security 1.7 history

Self Study

Hoobes’Internet timeline: http://www.zakon.org/robert/internet/timeline/

36

Internet history

v  1961: Kleinrock - queueing theory shows effectiveness of packet-switching

v  1964: Baran - packet-switching in military nets

v  1967: ARPAnet conceived by Advanced Research Projects Agency

v  1969: first ARPAnet node operational

v  1972: §  ARPAnet public demo §  NCP (Network Control

Protocol) first host-host protocol

§  first e-mail program §  ARPAnet has 15 nodes

1961-1972: Early packet-switching principles

Self Study

37

v  1970: ALOHAnet satellite network in Hawaii

v  1974: Cerf and Kahn - architecture for interconnecting networks

v  1976: Ethernet at Xerox PARC v  late70’s: proprietary

architectures: DECnet, SNA, XNA

v  late 70’s: switching fixed length packets (ATM precursor)

v  1979: ARPAnet has 200 nodes

Cerf and Kahn’s internetworking principles: §  minimalism, autonomy - no

internal changes required to interconnect networks

§  best effort service model §  stateless routers §  decentralized control

define today’s Internet architecture

1972-1980: Internetworking, new and proprietary nets Internet history Self Study

38

v  1983: deployment of TCP/IP

v  1982: smtp e-mail protocol defined

v  1983: DNS defined for name-to-IP-address translation

v  1985: ftp protocol defined v  1988: TCP congestion

control

v  new national networks: Csnet, BITnet, NSFnet, Minitel

v  100,000 hosts connected to confederation of networks

1980-1990: new protocols, a proliferation of networks Internet history Self Study

39

v early 1990’s: ARPAnet decommissioned

v 1991: NSF lifts restrictions on commercial use of NSFnet (decommissioned, 1995)

v early 1990s: Web § hypertext [Bush 1945, Nelson

1960’s] § HTML, HTTP: Berners-Lee § 1994: Mosaic, later Netscape §  late 1990’s:

commercialization of the Web

late 1990’s – 2000’s: v  more killer apps: instant

messaging, P2P file sharing v  network security to

forefront v  est. 50 million host, 100

million+ users v  backbone links running at

Gbps

1990, 2000’s: commercialization, the Web, new apps

Internet history Self Study

40

2005-present v  ~750 million hosts

§  Smartphones and tablets

v  Aggressive deployment of broadband access v  Increasing ubiquity of high-speed wireless access v  Emergence of online social networks:

§  Facebook: soon one billion users v  Service providers (Google, Microsoft) create their own

networks §  Bypass Internet, providing “instantaneous” access to

search, emai, etc. v  E-commerce, universities, enterprises running their

services in “cloud” (eg, Amazon EC2)

Internet history Self Study

41

Introduction: summary

covered a “ton” of material! v  Internet overview v  what’s a protocol? v  network edge, core, access

network §  packet-switching versus

circuit-switching §  Internet structure

v  performance: loss, delay, throughput

v  layering, service models v  security v  history

you now have: v  context, overview, “feel”

of networking v  more depth, detail to

follow!

42

2. Application Layer: outline

2.1 principles of network applications

2.2 Web and HTTP 2.3 FTP 2.4 electronic mail

§  SMTP, POP3, IMAP 2.5 DNS

2.6 P2P applications 2.7 socket programming

with UDP and TCP

43

2. Application layer

our goals: v  conceptual,

implementation aspects of network application protocols §  transport-layer

service models §  client-server

paradigm §  peer-to-peer

paradigm

v  learn about protocols by examining popular application-level protocols §  HTTP §  FTP §  SMTP / POP3 / IMAP §  DNS

v  creating network applications §  socket API

44

Some network apps

v  e-mail v  web v  text messaging v  remote login v  P2P file sharing v  multi-user network games v  streaming stored video

and audio (YouTube, NetFlix, Spotify)

v  voice over IP (e.g., Skype) v  real-time video

conferencing v  social networking v  Search v  virtual reality v  …

45

Creating a network app write programs that: v  run on (different) end systems v  communicate over network v  e.g., web server software communicates

with browser software

Varying degrees of integration v  Loose: email, web browsing v  Medium: chat, Skype, remote file systems v  Tight: process migration, distributed file

systems

no need to write software for network-core devices

v  network-core devices do not run user applications

v  applications on end systems allows for rapid app development, propagation

application transport network data link physical

application transport network data link physical

application transport network data link physical

46

Interprocess Communication (IPC)

v  Processes talk to each other through Inter-process communication (IPC)

v  On a single machine: §  Shared memory

v  Across machines: §  We need other abstractions (message passing)

47

Shared Segment

Interprocess Communication (IPC)

• In order to cooperate, need to communicate • Achieved via IPC: interprocess communication

– ability for a process to communicate with another

• On a single machine: – Shared memory

• Across machines:

– We need other abstractions (message passing)

Text

Data

Stack

Text

Data

Stack P1 P2

Sockets v  process sends/receives messages to/from its socket v  socket analogous to door

§  sending process shoves message out door §  sending process relies on transport infrastructure on other

side of door to deliver message to socket at receiving process

v  Application has a few options, OS handles the details

Internet

controlled by OS

controlled by app developer

transport

application

physical

link

network

process

transport

application

physical

link

network

process socket

48

Addressing processes v  to receive messages,

process must have identifier v  host device has unique 32-

bit IP address v  Q: does IP address of host

on which process runs suffice for identifying the process?

v  identifier includes both IP address and port numbers associated with process on host.

v  example port numbers: §  HTTP server: 80 §  mail server: 25

v  to send HTTP message to cse.unsw.edu.au web server: §  IP address: 129.94.242.51 §  port number: 80

v  more shortly…

§  A: no, many processes can be running on same host

49

Client-server architecture server: v  Exports well-defined request/

response interface v  long-lived process that waits for

requests v  Upon receiving request, carries

it out

clients: v  Short-lived process that makes

requests v  “User-side” of application v  Initiates the communication

client/server

50

Client versus Server

v  Server §  Always-on host §  Permanent IP address

(rendezvous location) §  Static port conventions

(http: 80, email: 25, ssh:22)

§  Data centres for scaling §  May communicate with

other servers to respond

v  Client §  May be intermittently

connected §  May have dynamic IP

addresses §  Do not communicate

directly with each other

51

P2P architecture v  no always-on server

§  No permanent rendezvous involved

v  arbitrary end systems (peers) directly communicate

v  Symmetric responsibility (unlike client/server)

v  Often used for: §  File sharing (BitTorrent) §  Games §  Video distribution, video chat §  In general: “distributed

systems”

peer-peer

52

v  In P2P architecture are there clients and servers?

v  A. Yes

v  B. No

53

Quiz: Peer-to-Peer

P2P architecture: Pros and Cons + peers request service from other peers, provide service in return to other peers

§  self scalability – new peers bring new service capacity, as well as new service demands

+ Speed: parallelism, less contention + Reliability: redundancy, fault tolerance + Geographic distribution

-  Fundamental problems of decentralized

control §  State uncertainty: no shared memory or

clock §  Action uncertainty: mutually conflicting

decisions

-  Distributed algorithms are complex

peer-peer

54

App-layer protocol defines v  types of messages

exchanged, §  e.g., request, response

v  message syntax: §  what fields in messages

& how fields are delineated

v  message semantics §  meaning of information

in fields v  rules for when and how

processes send & respond to messages

open protocols: v  defined in RFCs v  allows for interoperability v  e.g., HTTP, SMTP proprietary protocols: v  e.g., Skype

55

What transport service does an app need? data integrity v  some apps (e.g., file transfer,

web transactions) require 100% reliable data transfer

v  other apps (e.g., audio) can tolerate some loss

timing v  some apps (e.g., Internet

telephony, interactive games) require low delay to be “effective”

throughput v  some apps (e.g.,

multimedia) require minimum amount of throughput to be “effective”

v  other apps (“elastic apps”) make use of whatever throughput they get

security v  encryption, data integrity,

56

Self Study

Transport service requirements: common apps

application

file transfer e-mail

Web documents real-time audio/video

stored audio/video interactive games

text messaging

data loss no loss no loss no loss loss-tolerant loss-tolerant loss-tolerant no loss

throughput elastic elastic elastic audio: 5kbps-1Mbps video:10kbps-5Mbps same as above few kbps up elastic

time sensitive no no no yes, 100’s msec yes, few msecs yes, 100’s msec yes and no

57

Self Study

Internet transport protocols services

TCP service: v  reliable transport between

sending and receiving process

v  flow control: sender won’t overwhelm receiver

v  congestion control: throttle sender when network overloaded

v  does not provide: timing, minimum throughput guarantee, security

v  connection-oriented: setup required between client and server processes

UDP service: v  unreliable data transfer

between sending and receiving process

v  does not provide: reliability, flow control, congestion control, timing, throughput guarantee, security, orconnection setup,

Q: why bother? Why is there a UDP?

NOTE: More on transport in Weeks 4 and 5 58

Self Study

Internet apps: application, transport protocols

application

e-mail remote terminal access

Web file transfer

streaming multimedia

Internet telephony

application layer protocol SMTP [RFC 2821] Telnet [RFC 854] HTTP [RFC 2616] FTP [RFC 959] HTTP (e.g., YouTube), RTP [RFC 1889] SIP, RTP, proprietary (e.g., Skype)

underlying transport protocol TCP TCP TCP TCP TCP or UDP TCP or UDP

59

Self Study

Securing TCP

TCP & UDP v  no encryption v  cleartext passwds sent

into socket traverse Internet in cleartext

SSL v  provides encrypted

TCP connection v  data integrity v  end-point

authentication

SSL is at app layer v  Apps use SSL libraries,

which “talk” to TCP SSL socket API v  cleartext passwds sent

into socket traverse Internet encrypted

v  See Chapter 7

60

Self Study

2. Application Layer: outline

2.1 principles of network applications §  app architectures §  app requirements

2.2 Web and HTTP 2.3 FTP 2.4 electronic mail

§  SMTP, POP3, IMAP 2.5 DNS

2.6 P2P applications 2.7 socket programming

with UDP and TCP

Note: Some of the material here, particularly the descriptive details of various protocols is for self-study. Lectures will focus on design principles.

61

The Web – Precursorv  1967, Ted Nelson, Xanadu:

§  A world-wide publishing network that would allow information to be stored not as separate files but as connected literature

§  Owners of documents would be automatically paid via electronic means for the virtual copying of their documents

v  Coined the term “Hypertext” Ted Nelson

62

The Web – Historyv  World Wide Web (WWW): a

distributed database of “pages” linked through Hypertext Transport Protocol (HTTP) §  First HTTP implementation - 1990

•  Tim Berners-Lee at CERN §  HTTP/0.9 – 1991

•  Simple GET command for the Web §  HTTP/1.0 –1992

•  Client/Server information, simple caching §  HTTP/1.1 - 1996

Tim Berners-Lee

63

Web and HTTP

First, a review… v  web page consists of objects v  object can be HTML file, JPEG image, Java applet,

audio file,… v  web page consists of base HTML-file which

includes several referenced objects v  each object is addressable by a URL, e.g., www.someschool.edu/someDept/pic.gif

host name path name

64

Uniform Record Locator (URL)

protocol://host-name[:port]/directory-path/resource

v  Extend the idea of hierarchical hostnames to include anything in a file system §  http://www.cse.unsw.edu.au/~salilk/papers/journals/TMC2012.pdf

v  Extend to program executions as well… §  http://us.f413.mail.yahoo.com/ym/ShowLetter?box=%40B

%40Bulk&MsgId=2604_1744106_29699_1123_1261_0_28917_3552_1289957100&Search=&Nhead=f&YY=31454&order=down&sort=date&pos=0&view=a&head=b

§  Server side processing can be incorporated in the name

65

Uniform Record Locator (URL)

protocol://host-name[:port]/directory-path/resource

v  protocol: http, ftp, https, smtp, rtsp, etc. v  hostname: DNS name, IP address v  port: defaults to protocol’s standard port; e.g. http: 80 https: 443 v  directory path: hierarchical, reflecting file system v  resource: Identifies the desired resource

66

HTTP overview

HTTP: hypertext transfer protocol

v  Web’s application layer protocol

v  client/server model §  client: browser that

requests, receives, (using HTTP protocol) and “displays” Web objects

§  server: Web server sends (using HTTP protocol) objects in response to requests

PC running Firefox browser

server running

Apache Web server

iphone running Safari browser

67

HTTP overview (continued)

uses TCP: v  client initiates TCP

connection (creates socket) to server, port 80

v  server accepts TCP connection from client

v  HTTP messages (application-layer protocol messages) exchanged between browser (HTTP client) and Web server (HTTP server)

v  TCP connection closed

HTTP is “stateless” v  server maintains no

information about past client requests

protocols that maintain “state” are complex!

v  past history (state) must be maintained

v  if server/client crashes, their views of “state” may be inconsistent, must be reconciled

aside

68

HTTP request message

v  two types of HTTP messages: request, response v  HTTP request message:

§  ASCII (human-readable format)

request line (GET, POST, HEAD commands)

header lines

carriage return, line feed at start of line indicates end of header lines

GET /index.html HTTP/1.1\r\n Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: text/html,application/xhtml+xml\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7\r\n Keep-Alive: 115\r\n Connection: keep-alive\r\n \r\n

carriage return character line-feed character

69

v  How else (i.e. other that \r\n) might we delineate headers?

§  A: There’s not much else we can do

§  B: Force all messages to be the same size

§  C: Send the message size prior to the message

§  D: Some other way (discuss)

70

Quiz: Header delineation

HTTP response message

status line (protocol status code status phrase)

header lines

data, e.g., requested HTML file

HTTP/1.1 200 OK\r\n Date: Sun, 26 Sep 2010 20:09:20 GMT\r\n Server: Apache/2.0.52 (CentOS)\r\n Last-Modified: Tue, 30 Oct 2007 17:00:02 GMT

\r\n ETag: "17dc6-a5c-bf716880"\r\n Accept-Ranges: bytes\r\n Content-Length: 2652\r\n Keep-Alive: timeout=10, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html;

charset=ISO-8859-1\r\n \r\n data data data data data ...

71

HTTP response status codes

200 OK §  request succeeded, requested object later in this msg

301 Moved Permanently §  requested object moved, new location specified later in this msg

(Location:)

400 Bad Request §  request msg not understood by server

404 Not Found §  requested document not found on this server

505 HTTP Version Not Supported 451 Unavailable for Legal Reasons 429 Too Many Requests 418 I’m a Teapot

v  status code appears in 1st line in server-to-client response message. v  some sample codes:

72

HTTP is all text

v  Makes the protocol simple §  Easy to delineate messages (\r\n) §  (relatively) human-readable §  No issues about encoding or formatting data §  Variable length data

v  Not the most efficient §  Many protocols use binary fields

•  Sending “12345678” as a string is 8 bytes •  As an integer, 12345678 needs only 4 bytes

§  Headers may come in any order §  Requires string parsing/processing

73

Request Method types (“verbs”)

HTTP/1.0: v  GET

§  Request page

v  POST §  Uploads user response to a

form v  HEAD

§  asks server to leave requested object out of response

HTTP/1.1: v  GET, POST, HEAD v  PUT

§  uploads file in entity body to path specified in URL field

v  DELETE §  deletes file specified in

the URL field v  TRACE, OPTIONS,

CONNECT, PATCH §  For persistent

connections 74

Uploading form input

POST method: v  web page often includes form input v  input is uploaded to server in entity body

Get (in-URL) method: v  uses GET method v  input is uploaded in URL field of request line:

www.somesite.com/animalsearch?monkeys&banana

75

GET vs. POST

v  GET can be used for idempotent requests §  Idempotence: an operation can be applied multiple

times without changing the result (the final state is the same)

v  POST should be used when.. §  A request changes the state of the session or server or

DB §  Sending a request twice would be harmful

•  (Some) browsers warn about sending multiple post requests

§  Users are inputting non-ascii characters §  Input may be vary large §  You want to hide how the form works/user input 76

77

Quiz: When might you use GET vs. POST

GET POST A. Forum post Search terms, Pizza order B. Search terms, Pizza order Forum post C Search terms Forum post, Pizza order D. Forum post, Search

terms, Pizza order

E. Forum post, Search terms, Pizza order

Trying out HTTP (client side) for yourself 1. Telnet to your favorite Web server:

opens TCP connection to port 80 (default HTTP server port) at cis.poly.edu. anything typed in sent to port 80 at cis.poly.edu

telnet cis.poly.edu 80

2. type in a GET HTTP request: GET /~ross/ HTTP/1.1 Host: cis.poly.edu

by typing this in (hit carriage return twice), you send this minimal (but complete) GET request to HTTP server

3. look at response message sent by HTTP server! (or use Wireshark to look at captured HTTP request/response)

Web-based sniffer: http://web-sniffer.net/ 78

Your 2nd lab

URL Shortening v  Some URLs can get really long

§  http://en.wikipedia.org/w/index.php?title=TinyURL&diff=283621022&oldid=283308287

§  Inconvenient for sharing, prone to errors, hard to recall v  URL shortening services

§  Produce short and easy to remember URLs §  A key is associated with each long URL

•  http://tinyurl.com/m3q2xt•  Keys - hash functions or random number generator or user

proposed •  URL Redirection - 3XX status codes •  Warning: Also used extensively in phishing attacks

key

HTTP/1.1 301 Moved PermanentlyLocation: http://www.abc.org/Content-Type: text/htmlContent-Length: 174<html><head><title>Moved</title></head><body><h1>Moved</h1><p>This page has moved to <a href="http://www.abc.org/">http://www.abc.org/</a></p></body></html> 79

State(less)

80

State(less)

(XKCD #869, “Server Attention Span”) XKCD #869, “Server Attention Span”

User-server state: cookies

many Web sites use cookies four components:

1) cookie header line of HTTP response message

2) cookie header line in next HTTP request message

3) cookie file kept on user’s host, managed by user’s browser

4) back-end database at Web site

example: v  Susan always access Internet

from PC v  visits specific e-commerce

site for first time v  when initial HTTP requests

arrives at site, site creates: §  unique ID §  entry in backend

database for ID

81

Cookies: keeping “state” (cont.) client server

usual http response msg

usual http response msg

cookie file

one week later:

usual http request msg cookie: 1678 cookie-

specific action

access

ebay 8734 usual http request msg Amazon server creates ID

1678 for user create entry

usual http response set-cookie: 1678 ebay 8734

amazon 1678

usual http request msg cookie: 1678 cookie-

specific action

access ebay 8734 amazon 1678

backend database

82

Cookies (continued) what cookies can be used

for: v  authorization v  shopping carts v  recommendations v  user session state (Web

e-mail)

cookies and privacy: v  cookies permit sites to

learn a lot about you v  you may supply name and

e-mail to sites

aside

how to keep “state”: v  protocol endpoints: maintain state at

sender/receiver over multiple transactions

v  cookies: http messages carry state

83

The Dark Side of Cookies

v  Cookies permit sites to learn a lot about you

v  You may supply name and e-mail to sites (and more)

v  3rd party cookies (from ad networks, etc) can follow you across multiple sites §  Ever visit a website, and the next day ALL your ads are from

them ? •  Check your browser’s cookie file (cookies.txt, cookies.plist) •  Do you see a website that you have never visited

v  You COULD turn them off §  But good luck doing anything on the Internet !!

84

Performance Goals

v  User §  fast downloads §  high availability

v  Content provider §  happy users (hence, above) §  cost-effective infrastructure

v  Network (secondary) §  avoid overload

85

Solutions?

v  User §  fast downloads §  high availability

v  Content provider §  happy users (hence, above) §  cost-effective infrastructure

v  Network (secondary) §  avoid overload

Improve HTTP to achieve faster downloads

86

Solutions?

v  User §  fast downloads §  high availability

v  Content provider §  happy users (hence, above) §  cost-effective delivery infrastructure

v  Network (secondary) §  avoid overload

Caching and Replication

87

Improve HTTP to achieve faster downloads

Solutions?

v  User §  fast downloads §  high availability

v  Content provider §  happy users (hence, above) §  cost-effective delivery infrastructure

v  Network (secondary) §  avoid overload

Caching and Replication

Exploit economies of scale (Webhosting, CDNs, datacenters)

Improve HTTP to compensate for TCP’s weak spots

88

HTTP Performancev  Most Web pages have multiple objects

§  e.g., HTML file and a bunch of embedded images v  How do you retrieve those objects (naively)?

§  One item at a time v  New TCP connection per (small) object!

non-persistent HTTP v  at most one object sent over TCP connection

§  connection then closed v  downloading multiple objects required multiple

connections

89

Non-persistent HTTP suppose user enters URL:

1a. HTTP client initiates TCP connection to HTTP server (process) at www.someSchool.edu on port 80

2. HTTP client sends HTTP request message (containing URL) into TCP connection socket. Message indicates that client wants object someDepartment/home.index

1b. HTTP server at host www.someSchool.edu waiting for TCP connection at port 80. “accepts” connection, notifying client

3. HTTP server receives request message, forms response message containing requested object, and sends message into its socket time

(contains text, references to 10

jpeg images) www.someSchool.edu/someDepartment/home.index

90

Non-persistent HTTP (cont.)

5. HTTP client receives response message containing html file, displays html. Parsing html file, finds 10 referenced jpeg objects

6. Steps 1-5 repeated for each of 10 jpeg objects

4. HTTP server closes TCP connection.

time

91

Non-persistent HTTP: response time

RTT (definition): time for a small packet to travel from client to server and back

HTTP response time: v  one RTT to initiate TCP

connection v  one RTT for HTTP request

and first few bytes of HTTP response to return

v  file transmission time v  non-persistent HTTP

response time = 2RTT+ file transmission

time

time to transmit file

initiate TCP connection

RTT

request file

RTT

file received

time time

92

Internet

Improving HTTP Performance:

Concurrent Requests & Responses

v  Use multiple connections in parallel

v  Does not necessarily maintain order of responses

•  Client = J •  Content provider = J

•  Network = L Why?

R1 R2 R3

T1

T2 T3

93

Nonpersistent HTTP issues: v  requires 2 RTTs per object v  OS must work and allocate host

resources for each TCP connection v  but browsers often open parallel

TCP connections to fetch referenced objects

Persistent HTTP v  server leaves connection open after

sending response v  subsequent HTTP messages

between same client/server are sent over connection

v  Allow TCP to learn more accurate RTT estimate (APPARENT LATER)

v  Allow TCP congestion window to increase (APPARENT LATER)

v  i.e., leverage previously discovered bandwidth (APPARENT LATER)

Persistent without pipelining: v  client issues new request only

when previous response has been received

v  one RTT for each referenced object

Persistent with pipelining: v  default in HTTP/1.1 v  client sends requests as soon as it

encounters a referenced object v  as little as one RTT for all the

referenced objects

Persistent HTTP

94

HTTP 1.1: response time

95

initiate TCP connection

RTT

request file

RTT

file received

time time

Internet

time to transmit file

Website with one index page and three embedded objects

Scorecard: Getting n Small Objects

Time dominated by latency (i.e. RTT)

v  One-at-a-time: ~2n RTT v  M concurrent: ~2[n/m] RTT v  Persistent: ~ (n+1)RTT v  Pipelined+ Persistent: ~2 RTT

96

v  Among the following, in which case would you get the greatest improvement in performance with persistent HTTP compared to non-persistent HTTP?

A.  Low throughput network paths (irrespective of distance)

B.  High throughput network paths (irrespective of distance)

C.  Long-distance network paths (irrespective of throughput)

D.  High throughput, short-distance network paths

E.  High throughput, long-distance network paths 97

Quiz: Persistent vs. Non-persistent HTTP

v  Pipelining allows the client to send multiple HTTP requests on a single TCP connection without waiting for the corresponding responses. What could be a potential bottleneck despite using pipelining?

98

Quiz: Pipelining

Improving HTTP Performance: Caching

v Why does caching work? § Exploits locality of reference

v How well does caching work? § Very well, up to a limit § Large overlap in content § But many unique requests

99

Web caches (proxy server)

v  user sets browser: Web accesses via cache

v  browser sends all HTTP requests to cache §  object in cache: cache

returns object §  else cache requests

object from origin server, then returns object to client

goal: satisfy client request without involving origin server

client

proxy server

client origin server

origin server

100

More about Web caching

v  cache acts as both client and server §  server for original

requesting client §  client to origin server

v  typically cache is installed by ISP (university, company, residential ISP)

why Web caching? v  reduce response time

for client request v  reduce traffic on an

institution’s access link v  Internet dense with

caches: enables “poor” content providers to effectively deliver content

101

Caching example:

origin servers

public Internet

institutional network

1 Gbps LAN

1.54 Mbps access link

assumptions: v  avg object size: 100K bits v  avg request rate from

browsers to origin servers:15/sec

v  avg data rate to browsers: 1.50 Mbps

v  RTT from institutional router to any origin server: 2 sec

v  access link rate: 1.54 Mbps

consequences: v  LAN utilization: 15% v  access link utilization = 99% v  total delay = Internet delay +

access delay + LAN delay = 2 sec + minutes + usecs

problem!

102

assumptions: v  avg object size: 100K bits v  avg request rate from

browsers to origin servers:15/sec

v  avg data rate to browsers: 1.50 Mbps

v  RTT from institutional router to any origin server: 2 sec

v  access link rate: 1.54 Mbps

consequences: v  LAN utilization: 15% v  access link utilization = 99% v  total delay = Internet delay + access

delay + LAN delay = 2 sec + minutes + usecs

Caching example: fatter access link

origin servers

1.54 Mbps access link

154 Mbps

154 Mbps

msecs Cost: increased access link speed (not cheap!)

9.9%

public Internet

institutional network

1 Gbps LAN

103

institutional network

1 Gbps LAN

Caching example: install local cache

origin servers

1.54 Mbps access link

local web cache

assumptions: v  avg object size: 100K bits v  avg request rate from

browsers to origin servers:15/sec

v  avg data rate to browsers: 1.50 Mbps

v  RTT from institutional router to any origin server: 2 sec

v  access link rate: 1.54 Mbps

consequences: v  LAN utilization: v  access link utilization = v  total delay =

? ?

How to compute link utilization, delay?

Cost: web cache (cheap!)

public Internet

?

104

Caching example: install local cache Calculating access link

utilization, delay with cache: v suppose cache hit rate is 0.4

§  40% requests satisfied at cache, 60% requests satisfied at origin

origin servers

1.54 Mbps access link

v access link utilization: §  60% of requests use access link

v data rate to browsers over access link = 0.6*1.50 Mbps = .9 Mbps §  utilization = 0.9/1.54 = .58

v total delay §  = 0.6 * (delay from origin servers) +0.4

* (delay when satisfied at cache) §  = 0.6 (2.01) + 0.4 (~msecs) §  = ~ 1.2 secs §  less than with 154 Mbps link (and

cheaper too!)

public Internet

institutional network

1 Gbps LAN local web

cache

105

v  Distribution of web object requests generally follows a Zipf-like distribution

v  The probability that a document will be referenced k requests after it was last referenced is roughly proportional to 1/k . That is, web traces exhibit excellent temporal locality.

106

But what is the likelihood of cache hits?

Paper – “Web Caching and Zipf-like Distributions: Evidence and Implications” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.8742&rep=rep1&type=pdf

Video content exhibits similar properties: 10% of the top popular videos account for nearly 80% of views, while the remaining 90% of videos account for total 20% of requests. Paper – http://yongyeol.com/papers/cha-video-2009.pdf

Conditional GET

v  Goal: don’t send object if cache has up-to-date cached version §  no object transmission

delay §  lower link utilization

v  cache: specify date of cached copy in HTTP request If-modified-since: <date>

v  server: response contains no object if cached copy is up-to-date: HTTP/1.0 304 Not Modified

HTTP request msg If-modified-since: <date>

HTTP response HTTP/1.0

304 Not Modified

object not

modified before <date>

HTTP request msg If-modified-since: <date>

HTTP response HTTP/1.0 200 OK

<data>

object modified

after <date>

client server

107

Example Cache Check Request

108

Example Cache Check Response

109

Improving HTTP Performance: Caching: Where?

v Options

§ Client

§ Forward proxies

§ Reverse proxies

§ Content Distribution Network

110

v  Baseline: Many clients transfer same information §  Generate unnecessary server and network load §  Clients experience unnecessary latency

Server

Clients

Tier-1 ISP

ISP-1 ISP-2

111

Improving HTTP Performance: Caching: Where?

v  Cache documents close to server à decrease server load

v  Typically done by content provider

Clients

Backbone ISP

ISP-1 ISP-2

Server

Reverse proxies

112

Improving HTTP Performance: Reverse Proxies

v  Cache documents close to clients à reduce network traffic and decrease latency

v  Typically done by ISPs or enterprises

Clients

Backbone ISP

ISP-1 ISP-2

Server

Reverse proxies

Forward proxies

113

Improving HTTP Performance: Forward Proxies

v  Replicate popular Web site across many machines §  Spreads load on servers §  Places content closer to clients §  Helps when content isn’t cacheable

v  Problem:

§  Want to direct client to particular replica •  Balance load across server replicas •  Pair clients with nearby servers

§  Expensive

v  Common solution: §  DNS returns different addresses based on client’s geo

location, server load, etc.

Improving HTTP Performance: Replication

114

v  Caching and replication as a service v  Integrate forward and reverse caching functionality v  Large-scale distributed storage infrastructure (usually)

administered by one entity §  e.g., Akamai has servers in 20,000+ locations

v  Combination of (pull) caching and (push) replication §  Pull: Direct result of clients’ requests §  Push: Expectation of high access rate

v  Also do some processing §  Handle dynamic web pages §  Transcoding §  Maybe do some security function – watermark IP

115

Improving HTTP Performance: CDN

v  Akamai creates new domain names for each client §  e.g., a128.g.akamai.net for cnn.com

v  The CDN’s DNS servers are authoritative for the new

domains (will become apparent when we study DNS)

v  The client content provider modifies its content so that embedded URLs reference the new domains. §  “Akamaize” content §  e.g.: http://www.cnn.com/image-of-the-day.gif becomes http://

a128.g.akamai.net/image-of-the-day.gif

v  Requests now sent to CDN’s infrastructure…

116

Improving HTTP Performance: CDN

Cost-Effective Content Delivery

v  General theme: multiple sites hosted on shared physical infrastructure §  efficiency of statistical multiplexing §  economies of scale (volume pricing, etc.) §  amortization of human operator costs

v  Examples:

§  Web hosting companies §  CDNs §  Cloud infrastructure

117

What about HTTPS?

v  HTTP is insecure v  HTTP basic authentication: password sent using

base64 encoding (can be readily converted to plaintext)

v  HTTPS: HTTP over a connection encrypted by Transport Layer Security (TLS)

v  Provides: §  Authentication §  Bidirectional encryption

v  Widely used in place of plain vanilla HTTP

118

What’s on the horizon: HTTP/2 v  Standardised in May 2015: RFC 7540 v  Improvements

§  Severs can push content and thus reduce overhead of an additional request cycle

§  Fully multiplexed •  Requests and responses are sliced in smaller chunks called frames,

frames are tagged with and ID that connects data to the request/response

•  overcomes Head-of-line blocking in HTTP 1.1 §  Prioritisation of the order in which objects should be sent (e.g.

CSS files may be given higher priority) §  Data compression of HTTP headers

•  Some headers such as cookies can be very long •  Repetitive information

119

More details: https://http2.github.io/faq/ Demo: https://http2.akamai.com/demo

2. Application Layer: outline

2.1 principles of network applications §  app architectures §  app requirements

2.2 Web and HTTP 2.3 FTP 2.4 electronic mail

§  SMTP, POP3, IMAP 2.5 DNS

2.6 P2P applications 2.7 socket programming

with UDP and TCP

Self Study

120

FTP: the file transfer protocol file transfer

FTP server

FTP user

interface FTP client

local file system

remote file system

user at host

v  transfer file to/from remote host v  client/server model

§  client: side that initiates transfer (either to/from remote) §  server: remote host

v  ftp: RFC 959 v  ftp server: port 21

Self Study

121

FTP: separate control, data connections

v  FTP client contacts FTP server at port 21, using TCP

v  client authorized over control connection

v  client browses remote directory, sends commands over control connection

v  when server receives file transfer command, server opens 2nd TCP data connection (for file) to client

v  after transferring one file, server closes data connection

FTP client

FTP server

TCP control connection, server port 21

TCP data connection, server port 20

v  server opens another TCP data connection to transfer another file

v  control connection: “out of band”

v  FTP server maintains “state”: current directory, earlier authentication

Self Study

122

FTP commands, responses sample commands: v  sent as ASCII text over

control channel v  USER username v  PASS password v  LIST return list of file in

current directory v  RETR filename

retrieves (gets) file v  STOR filename stores

(puts) file onto remote host

sample return codes v  status code and phrase (as

in HTTP) v  331 Username OK, password required

v  125 data connection already open; transfer starting

v  425 Can’t open data connection

v  452 Error writing file

Self Study

123

Active FTP

v  Client connects from port N (N>1023) to FTP server listening on port 20

v  Sends a command “PORT N+1” to the FTP server

v  Server sends back ACK v  FTP server’s port 20 opens a

TCP connection with port N+1 on the client’s host

v  Client sends back ACK v  Issues with firewalls - client’s

Sys Admin may prevent incoming TCP connections

Self Study

124

Passive FTP

v  Client initiates both connections - hence OK with firewalls

v  Client connects from port N (N>1023) to FTP server listening on port 20

v  Sends a command “PASV” to the FTP server

v  FTP server opens a listening socket on some port X (not 20) and replies to the client with “X”

v  Client connects to port X v  Server sends back ACK

Self Study

125

Summary v  Completed Introduction (Chapter 1)

§  Solve Sample Problem Set §  Check questions on website

v  Application Layer (Chapter 2) §  Principles of Network Applications §  HTTP §  FTP

v  Next Week §  Application Layer (contd.)

•  E-mail •  DNS •  P2P

§  First Programming assignment will be released

126

Solve all sample problems Reading Exercise Chapter 2: 2.4 – 2.7