URIs and RFC 3986

50
Copyright: 2012, 2013 & 2015 – Noah Mendelsohn URIs and RFC 3986 Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

description

COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012). URIs and RFC 3986. Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah. Goals. Learn the detailed design of URIs See how the naming principles we’ve explored are reflected in URIs - PowerPoint PPT Presentation

Transcript of URIs and RFC 3986

Page 1: URIs and RFC 3986

Copyright: 2012, 2013 & 2015 – Noah Mendelsohn

URIsand

RFC 3986

Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noah

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

Page 2: URIs and RFC 3986

© 2010 Noah Mendelsohn2

Goals

What is named when we use the Web Learn the detailed design of URIs See how the naming principles we’ve explored are reflected

in Web architecture and URIs Learn to read RFCs and to study the art of writing

specifications Understand why grammars are important

Page 3: URIs and RFC 3986

© 2010 Noah Mendelsohn3

Review:Naming Questions

Page 4: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of names

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too few names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Page 5: URIs and RFC 3986

© 2010 Noah Mendelsohn

ReviewWeb Architecture Basics

Page 6: URIs and RFC 3986

© 2010 Noah Mendelsohn

Architecting a universal Web

Identification: URIs

Interaction: HTTP

Data formats: HTML, JPEG, etc.

Page 7: URIs and RFC 3986

© 2010 Noah Mendelsohn

What Happens When We Browse a Web Page?

Consider:What are all the things that are

“named”in the interaction between browser

and Web Server?

Page 8: URIs and RFC 3986

© 2010 Noah Mendelsohn

The user clicks on a link

URI is http://webarch.noahdemo.com/demo1/test.htmlURI is http://webarch.noahdemo.com/demo1/test.html

Page 9: URIs and RFC 3986

© 2010 Noah Mendelsohn

The http “scheme” tells client to send HTTP GET msg

HTTP GET

URI is http://webarch.noahdemo.com/demo1/test.htmlURI is http://webarch.noahdemo.com/demo1/test.html

Page 10: URIs and RFC 3986

© 2010 Noah Mendelsohn

HTTP GET

Host: webarch.noahdemo.com

GET /demo1/test.html HTTP/1.0Host: webarch.noahdemo.comUser-Agent: Noahs Demo HttpClient v1.0Accept: */*Accept-language: en-us

URI is http://webarch.noahdemo.com/demo1/test.html

demo1/test.html

The client sends an HTTP GET

Page 11: URIs and RFC 3986

© 2010 Noah Mendelsohn

The server sends an HTTP Response

HTTP GET

HTTP RESPONSE

Host: webarch.noahdemo.com

demo1/test.html

HTTP/1.1 200 OKDate: Tue, 28 Aug 2007 01:49:33 GMTServer: ApacheTransfer-Encoding: chunkedContent-Type: text/html

<html><head><title>Demo #1</title></head><body><h1>A very simple Web page</h1></body></html>

HTTP Status Code200

Means Success!

Page 12: URIs and RFC 3986

© 2010 Noah Mendelsohn

The server sends an HTTP Response

HTTP GET

HTTP RESPONSE

Host: webarch.noahdemo.com

demo1/test.html

HTTP/1.1 200 OKDate: Tue, 28 Aug 2007 01:49:33 GMTServer: ApacheTransfer-Encoding: chunkedContent-Type: text/html

<html><head><title>Demo #1</title></head><body><h1>A very simple Web page</h1></body></html>

The “representation” returned is an HTML

document

Page 13: URIs and RFC 3986

© 2010 Noah Mendelsohn

Architecting a universal Web

Identification: URIs

Interaction: HTTP

Data formats: HTML, JPEG, etc.

Page 14: URIs and RFC 3986

© 2010 Noah Mendelsohn

Assign URIs for all Resources

A resource is something that has information (e.g. a Web page)

If a resource doesn’t have a URI, you can’t link to it…it’s not part of the Web.

Page 15: URIs and RFC 3986

© 2010 Noah Mendelsohn16

The Structure of URIs

Page 16: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

Page 17: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

Page 18: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

Scheme

Page 19: URIs and RFC 3986

© 2010 Noah Mendelsohn

Schemes

http://uss.tufts.edu/stuserv/acadcal/

Scheme

mailto:[email protected]

Schemes let us name different kinds of things, accessed in different ways.

Page 20: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

Authority

Authority: who controls allocation of this name?

Page 21: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

// Fixed in grammar to indicate authority

follows

Page 22: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

Path

Path: provides for hierarchical naming…… also supports “../xxx” relative syntax

Page 23: URIs and RFC 3986

© 2010 Noah Mendelsohn

A simple URI

http://uss.tufts.edu/stuserv/acadcal/

Path

Path: provides for hierarchical naming…… maps well to heirarchical information systems

Page 24: URIs and RFC 3986

© 2010 Noah Mendelsohn

A more complex URI

http://www.tufts.edu?student=smith

Page 25: URIs and RFC 3986

© 2010 Noah Mendelsohn

A more complex URI

http://www.tufts.edu?student=smith

Query component

The query is part of the URI... However, in many cases, all URIs with a common path are processed by the same server-side code

Also…HTML forms are useful for filling in the query components

Page 26: URIs and RFC 3986

© 2010 Noah Mendelsohn

Fragments

http://tools.ietf.org/html/rfc3986#section-3.5

Fragments identify parts of

documents

Fragment interpretation depends on the media type of the returned representation (text/html)…this is useful

but tricky and causes a variety of problems.

Page 27: URIs and RFC 3986

© 2010 Noah Mendelsohn28

Characteristics of URIs

Page 28: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Page 29: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Both supported

Page 30: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Depends on scheme

Page 31: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Yes…URIs “on the side of a bus”

is an important goal…but some URIs are complex

Page 32: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Allowed but not required

Page 33: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

With most schemes, absolute URIs are global

Page 34: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

FILE: scheme is not global!

Page 35: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

NO!!Status code 404 is key to

Web scalability

Page 36: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Some aliases requirede.g.: http vs. HTTP…

worst cases depend on users

Page 37: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Depends on scheme and user…see Metadata in URI finding

Page 38: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

URIs are the structuringmechanism for the Web as a whole

Page 39: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Designed to allow mappings to hierarchical systems

Page 40: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Decentralized allocation except:

Page 41: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

scheme names centrally registered with IANA

Page 42: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

For http and mailto schemes:central Domain Name (DNS)

registration required for authority

Page 43: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

Yes. E.g.: ASCII-only, spaces and some punctuation must be %encoded

Page 44: URIs and RFC 3986

© 2010 Noah Mendelsohn

Some characteristics of URIs

Absolute vs. relative Address (locator)? Human readable? Short/convenient? Global (context independent)? Ensures referent exists? Aliases? (too many names)

Opaque vs. data-carrying? Reflect structure of system?

– Supports navigation: e.g. “..”?

Who can generate them? Constraints from environment

– E.g. no “-” in C/C++ variable names

Indirect identification allowed?

URIs are silent on this…but HTTP redirection provides for indirect

identification

Page 45: URIs and RFC 3986

© 2010 Noah Mendelsohn46

Grammars

Page 46: URIs and RFC 3986

© 2010 Noah Mendelsohn

Grammars are formal languages for specifying other languages A grammar allows you to:

– Always: determine whether a given string is “in” the specified language– Often: associate structures in the grammar with parts of the string

The Chomsky hierarchy: – Different grammars have different expressive power– Regular expressions recognize “regular languages” (ab*) a, ab, abb, abbb– Context-free grammars are more powerful: typically used for programming languages– The ABNF used in RFC’s is a context-free grammar– Context-free grammars can be recognized (parsed) by a finite-state pushdown

automaton

What are formal grammars?

Tutorial at: http://en.wikipedia.org/wiki/Chomsky_hierarchy

Page 47: URIs and RFC 3986

© 2010 Noah Mendelsohn

Why use formal grammars for specifying languages?

Precise and rigorous– Less ambiguous than an explanation in English

Membership of a string in a language can be checked automatically

Tools to process the language can often be constructed automatically from the grammar

Page 48: URIs and RFC 3986

© 2010 Noah Mendelsohn

ABNF: the grammar for IETF RFCs

ABNF example from RFC 3986:

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

ABNF is itself specified in RFC 2234 ABNF is convenient to use in fixed-font specification

documents like RFCs

Page 49: URIs and RFC 3986

© 2010 Noah Mendelsohn50

Summary

Page 50: URIs and RFC 3986

© 2010 Noah Mendelsohn

Summary

The structure and interpretation of URIs is set out in RFC 3986 URIs embody many of the principles we have studied Formal grammars are powerful tools for specifying names The design decisions embodied in URIs are keys to the success

of the Web!!