UFCEKG-20-2 Data, Schemas & Applications

22
UFCEKG-20-2 Data, Schemas & Applications Lecture 2 Introduction to the WWW, URLs. HTTP, Services and Mashups

description

UFCEKG-20-2 Data, Schemas & Applications. Lecture 2 Introduction to the WWW, URLs. HTTP, Services and Mashups. - PowerPoint PPT Presentation

Transcript of UFCEKG-20-2 Data, Schemas & Applications

Page 1: UFCEKG-20-2  Data, Schemas & Applications

UFCEKG-20-2 Data, Schemas & Applications

Lecture 2Introduction to the WWW, URLs. HTTP, Services and

Mashups

Page 2: UFCEKG-20-2  Data, Schemas & Applications

Suppose all the information stored on computers everywhere were linked, I thought. Suppose I could program my computer to create a space in which anything could be linked to anything. All the bits of information in every computer at CERN, and on the planet, would be available to me and to anyone else. There would be a single, global information space.

Tim Berners-Lee, Weaving the Web

Page 3: UFCEKG-20-2  Data, Schemas & Applications

WWW : definition

The World Wide Web (abbreviated as WWW or W3, commonly known as the Web), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks.

Wikipedia : World Wide Web

Concept originally proposed by Sir Tim Berners-Lee (1989) based on earlier hypertext systems. Berners-Lee and Belgian computer scientist Robert Cailliau proposed in 1990 to use hypertext "to link and access information of various kinds as a web of nodes in which the user can browse at will", and they publicly introduced the project in December of the same year.

Page 4: UFCEKG-20-2  Data, Schemas & Applications

Mashups

o Originally a term for the sampling & mixing of two pieces of music together. Here the term refers to web applications which combine data from multiple sources to create added value sites. o on Wikipediao Programmable Web run by John Musser tracks the

emerging collection of mashups.o Here we review the basic mechanisms for integration.

Next week we will cover the basics of XML, one of the data formats widely used for integration and configuration.

Page 5: UFCEKG-20-2  Data, Schemas & Applications

Mashup pre-requisites

o HTTP protocolo client - server interactiono URI Schema o HTML/HTML Forms - the simplest Mashup

techniqueo media type (Mime-type, content-type)o URL Encodingo Character encoding

Page 6: UFCEKG-20-2  Data, Schemas & Applications

HTTPo Request : query string, attached files and information about

the client. Server can access all this data to determine the appropriate response.

o Response : document formated by the server, with a wrapper which identiies the kind of content Content-type

o GET: query string appended to URI - limited length, exposes the parameter names, easy to edit, use for development, formally only for requests which only read data and don't update.

o POST: query string passed in HTTP request body, unlimited size, hides the interface, use for sending data to server for update

o PUT: add a resource to the remote store o DELETE: delete a resource Often authentication is required - username/password passed in the HTTP header.

Page 7: UFCEKG-20-2  Data, Schemas & Applications

HTTP interactiono This sequence diagram explains the main processes in the HTTP

Protocol. It is the foundation for much of the interaction on the web Client-server interaction with HTTP

o We can think of an HTTP request/response as a remote procedure call (RPC). There are other, more low-level mechanisms for RPC which are useful in special circumstances but the web is built around HTTP

o Applications built on HTTP interaction are often called RESTful. REST is an abbreviation which stands for Representational State Transfer.

o Strictly this is a well-defined architectural style in which the HTTP operations are used in a specific restricted sense, and unique URIs identify each resource in the application.

o Informally it refers to any interface to a site in which all data is requested and transmitted via HTTP without any additional layers such as is found in SOAP and Web Services, and the state of the interaction is passed in the request.

Page 8: UFCEKG-20-2  Data, Schemas & Applications

Three essential technologies : uri, html & http

1. a system of globally unique identifiers for resources on the Web and elsewhere, the Universal Document Identifier (UDI), later known as Uniform Resource Locator (URL) and Uniform Resource Identifier (URI);

2. the publishing language HyperText Markup Language (HTML);

3. the Hypertext Transfer Protocol (HTTP).

Page 9: UFCEKG-20-2  Data, Schemas & Applications

Anatomy of a URI

o Uniform Resource Identifier (more general than URL). The structure of a URI is defined by the URI scheme. URIs are case-sensitive

http://www.example.com/modules/dsa/index.html?year=2012 < scheme > : < hierarchical part > [ ? < query > ] [ # < fragment > ]

o httpo //www.example.com/modules/dsa/index.html

o user info - terminated by @ o hostname - www.cems.uwe.ac.uk o port - :80 o path - /modules/dsa/index.html

o year=2012 (query parameter)

Page 10: UFCEKG-20-2  Data, Schemas & Applications

URI Scheme names

o http - The most common scheme name - Hypertext Transfer Protocol . Typically web pages are requested and delivered using this protocol.

o https - secure HTTP o mailto - an email address - usually handled by the

browser handing responsibility to another applicationo file - read a local file (but do not execute it)o ftp - file transfer

any others?

Page 11: UFCEKG-20-2  Data, Schemas & Applications

URI hierarchical part

o user info - e.g. [email protected]

o hostname - www.uwe.ac.uk - converted by DNS to an IP address - 164.11.132.21

o port - e.g. : 80 - default http port

o path - /modules/dsa/index.html

Page 12: UFCEKG-20-2  Data, Schemas & Applications

Query String

o Parameters passed to the script. Multiple parameters are passed in several common forms

o delimited values are positional and delimited by a special character such as ";"

o /modules/dsa/index.html;2o where the two parameters are /modules/dsa/index.html

and 2o keyword/value pairs each parameter value is passed as a

keyword=value pair, with pairs separated by & This is the form used by HTML forms. The order of the parameters is not significant

Page 13: UFCEKG-20-2  Data, Schemas & Applications

Fragment

o address a place within a document - place marked as

<a name="fragid">

Page 14: UFCEKG-20-2  Data, Schemas & Applications

Uses of URIso Destination of HTTP request

o a link in an HTML document body <a href="http://en.wikipedia.org/wiki/URI">URI<a>

o a link in an HTML document head <link rel="alternate" media-type="application/rss+xml“ href="news.rss"/>

o typed into the location bar in a browser - or editing an existing URIo created in a browser by javascript

document.location = "http://en.wikipedia.org/wiki/" + term

o used by the Javascript AJAX technique to add interactivity to a web page

o created by a server script e.g. PHP$x = file("http://en.wikipedia.org/wiki/$term")

o Unique id for a resource - o XML namespaces - http://www.w3.org/1999/xhtml

o semantic web resource id -http://www.cems.uwe.ac.uk/rdffold/moduleRun/UFIEKG-20-2

Page 15: UFCEKG-20-2  Data, Schemas & Applications

URI re-write

o URIs are often re-written by the server e.g. using Apache mod-rewrite to map to a different internal location.

o http://www.cems.uwe.ac.uk/rdffold/module/UFIEKG-20-2o re-written to

http://fold.cems.uwe.ac.uk:8080/exist/servlet/db/fold1/rdf/rdf.xq?p=module/UFIEKG-20-2

o This allows the actual server, file locations and script languages to be changed while providing a stable resource identifier.

o “Any software problem can be solved by adding another layer of indirection.” Steve Bellovin of AT&T Labs

Page 16: UFCEKG-20-2  Data, Schemas & Applications

Form interface to create URI

o The simplest way to reuse another application is to create a new form to create the appropriate URIs. This form also documents the interface. To understand the application is to understand the interface, the scripts, the parameters to scripts and the range and meaning of parameter values.

o Here the example is a site in the US run by NOAA which gathers data on Weather observation stations at sea.

o UK buoysBuoy near PembrokeWind speed http://www.ndbc.noaa.gov/show_plot.php?station=62303&meas=wspd&uom=E&time_diff=0&time_label=GMT

Page 17: UFCEKG-20-2  Data, Schemas & Applications

Hypertext Markup Language (HTML)

o Hypertext Markup Language (HTML) is the language of the Webo Hypertext because the Web is a hypermedia systemo Markup because documents are encoded using texto Language because HTML is used for communications

o Markup Languages are different from most file formatso many computer formats are binary encoded and not just texto markup allows structured documents to be encoded as just text

o Web data formats use markup as well as other encodingso HTML and XML are markup languageso JavaScript is also exchanged textually (but it's not markup)o images and other multimedia content is encoded as binary files

Page 18: UFCEKG-20-2  Data, Schemas & Applications

Texto <h1>-<h6> are different levels of headingso <p> contains paragraph text

o whitespace and line wrapping are ignoredo paragraphs are set as boxes containing a number of lines

o Text inside paragraphs can use additional markup (phrase markup)o <em> for emphasized texto <strong> for text with a strong emphasiso <sub> for subscript text

o <sup> for superscript text

o <q> for quoted text (try nesting quotes)o <code> for code examples

o rendering of all these elements is built into the browsero more sophisticated issues probably are more browser-dependent

Page 19: UFCEKG-20-2  Data, Schemas & Applications

More Advanced Text

o Quotations can be explicitly marked up as suchoblockquote for block-level quotationsoq for inline quotations (part of a block)o cite provides support for pointing to the source

o Preformatted text allows text formatting in the HTML sourceopre leaves whitespace intact and usually uses

monospaced fontsoword wrapping may be turned off by default

Page 20: UFCEKG-20-2  Data, Schemas & Applications

Lists and Tables

o HTML supports three kinds of listso <ul> for unordered lists containing lio <ol> for ordered lists containing lio <dl> for definition lists containing dt/dd

o Tables are the most complex visual structure in HTMLo <table> represents a table as a sequence of rowso <tr> represents a table row as a sequence of cellso <td> represents a table cell containing table datao <th> is a special cell containing header data

Page 21: UFCEKG-20-2  Data, Schemas & Applications

Imageso The Web is an open hypermedia system

o hyper refers to the term hypertext for linked contento media refers to the fact that multiple media types are

supportedo For a long time, the Web only supported text and images

o images can be used in a variety of formats (GIF, JPEG, PNG)o audio and video are possible today, but not part of the Web

o Images are not part of a Web page, they are included by markupo img is an empty element for including imageso src is a URI pointing to the image (often a relative URI)

<img src="../img/portrait.png" alt="Portrait">

Page 22: UFCEKG-20-2  Data, Schemas & Applications

Linkso Links are the most important feature of the Web

o conceptually, the Web is one large hypermedia documento links are based on Web identifiers, the Uniform Resource Identifier

(URI)o <a> is a link anchor and links to a URI (the link target)<a

href="http://www.cems.uwe.ac.uk" title=“CSCT UWE">CSCT</a>o URIs can have various forms

o http: points to resources available on Web serverso https: is the same but uses encrypted connectionso URIs can use a variety of other URI Schemeso URIs can be relative (in the same was as file names)o relative URIs are evaluated relative to the URI of their occurrenceo relative URIs can use path segments such as / and ..