Life after HTML

59
Life after HTML an introduction to the future of electronic publication Lou Burnard Humanities Computing Unit Oxford University http://users.ox.ac.uk/~lou

description

Life after HTML. an introduction to the future of electronic publication. Lou Burnard Humanities Computing Unit Oxford University http://users.ox.ac.uk/~lou. What went wrong?. The web today!!!. who cares?. application developers and maintainers (the desperate perl hacker) - PowerPoint PPT Presentation

Transcript of Life after HTML

Page 1: Life after HTML

Life after HTML

an introduction to the future of

electronic publication

Lou BurnardHumanities Computing Unit

Oxford Universityhttp://users.ox.ac.uk/~lou

Page 2: Life after HTML

What went wrong?

Page 3: Life after HTML

who cares?

• application developers and maintainers (the desperate perl hacker)

• tools builders (the mythical CS grad student)

• document creators and conservators

• document managers• you and me, anxious to

communicate

Page 4: Life after HTML

Information Interchange (1)

A

B

C D

E

20 translations required (n2-n)

Page 5: Life after HTML

Information Interchange (2)

A

B

C D

ECommonInterchang

eStandard

10 translations required (2n)

Page 6: Life after HTML

What is XML? 

• eXtensible Markup Language• An activity of the World Wide

Web Consortium (W3C) – original goal: delivering SGML on

the web– new goals: refocus web

development• Rewriting the rules of the

game?

Adding intelligence to dataDatabase exchangeClient-side processingAccess to richer dataBetter data management

http://www.w3.org/pub/WWW/Markup/Activity

Page 7: Life after HTML

The XML WG Hall of Fame

Jon Bosak, Sun (Chair)Paula Angerstein,

Texcel Tim Bray, Textuality &

NetscapeJames ClarkDan Connolly, W3C Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, Isogen Tom Magliery, NCSA

Eve Maler, ArborText Murray Maloney, Muzmo &Veo Systems Makoto Murata, Fuji Xerox Joel Nava, Adobe Conleth O'Connell, Vignette Jean Paoli, MicrosoftPeter Sharpe, SoftQuad C. M. Sperberg-McQueen, UIC John Tigue, DataChannel

(plus a cast of hundreds on the SIG)

Page 8: Life after HTML

making data into information

Page 9: Life after HTML

What is a document?

• content: the components (words, images etc). which make up a document

• structure: the organization and inter-relationship of the components

• presentation: how a document looks and what processes are applied to it

Page 10: Life after HTML

Separating these things means...• the content can be re-used• the structure can be formally

validated• the presentation can be customized

for– different media– different audiences

• … in short, the information can be uncoupled from its processing

• This is not a new idea! But it’s a good one...

Page 11: Life after HTML

The XML family • XML (Extensible Markup Language):

– A subset of SGML (ISO 8879) designed for easy implementation

• XLink (Extensible Linking Language): – A set of standard hypertext mechanisms based on

HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI)

• XSL (Extensible Stylesheet Language): – A standard stylesheet language for structured

information derived from DSSSL (ISO/IEC 10179) and key CSS concepts

Page 12: Life after HTML

like HTML, XML must...

• be usable on the net (but not restricted to it!)

• support a wide variety of applications• be compatible with SGML• be easy to process• have few optional features (ideally none)• be human-legible and reasonably clear• be specifed in a way that is both formal

and concise

Page 13: Life after HTML

unlike HTML...

• XML is an extensible markup language

• XML markup can be verified• XML markup reflects the

meaning of your data, not its appearance

Page 14: Life after HTML

Some intelligent questions...

• what’s the author’s name?• what titles have the classification …? • what authors have the name… ?• what translators are there ?• which books have more than 400

pages?

Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century

Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century

Page 15: Life after HTML

… which non-extensible markup doesn’t help us answer

<p><b>Perec, Georges</b> <I>Life - a users manual. Collins, 1988. Translated from the French </I>[La vie mode d’emploi] <I> by David Bellos. xviii+581 pp. 841.941</I> Literature - French - 20th century

Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century

Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century

Page 16: Life after HTML

Extensible (user-defined) markup

<author>Perec, Georges</author> <title>Life - a users manual</title> <publisher>Collins</publisher> <publDate>1988</publDate> <note>Translated from the French [<title>La vie mode d’emploi</title>] by <translator>David Bellos</translator></note> <pages>xviii+581</pages> <ddc>841.941</ddc><keywords> <term>Literature</term> <term>French</term> <term>20th century</term></keywords>

Page 17: Life after HTML

Verifiable markup

• well-formed XML markup– tags (etc.) are syntactically correct– every tag has an end-tag– tags are properly nested

• valid XML markup– only declared tags are used– all tag occurrences conform to

specified positional constraints

Page 18: Life after HTML

Well-formedness<?xml version=“1.0” standalone=“yes”?> <greeting>hello world!</greeting><greeting>hello world!</Greeting> <grunting> <greeting>hello</greeting> world!</grunting>>

<greeting><grunting>hello</greeting> world!</grunting>

<greeting type=“loud”>ho!</greeting><greeting type=loud>ho!</greeting> <greeting file=“ho.wav”/><greeting file=“ho.wav”>

Page 19: Life after HTML

A Valid XML Document

• invokes a Document Type Declaration (dtd)

• a dtd specifies– names for all your tags – names and default values for their attributes– rules about how tags can nest – names for re-usable pieces of data (entities)– and a few other things

• XML dtds are much simpler than SGML dtds

Page 20: Life after HTML

A simple dtd<!ELEMENT greeting (#PCDATA)>a greeting consists of character data...

<!ELEMENT name (#PCDATA)><!ATTLIST name reg CDATA #IMPLIED>as does a name, which can also have an attribute called reg

<!ELEMENT grunting (#PCDATA|greeting|name)* >a grunting contains zero or more of the other things, possibly mixed up with some character data

Page 21: Life after HTML

When do you need a dtd?

• at document preparation time (definitely)– validation, checking, consistency

• at document processing time (probably)– simplifies generic/specific processing– may clarify intended semantics

• at document delivery time (possibly)– strictly unnecessary for wf docs– but reduces processing effort

Page 22: Life after HTML

Where do I get a dtd?• flood of industry announcements• some recent examples

– Resource Description Framework (for metadata) – Channel Definition Format (for push

technologies)– Electronic Data Interchange (banking etc.)– Handheld Device Markup Language (sic)– Chemical Markup Language (chemical

modelling)– Math Markup Language (maths!)– Text Encoding Initiative (scholarly texts)

Page 23: Life after HTML

The meaning of markup

• ontologically speaking…– markup may be performative or

descriptive– markup asserts an intention or

interpretation which cannot be formally defined

• tags have no predefined meaning• presentation or behaviour of an

XML document is specified elsewhere

Page 24: Life after HTML

Where is the behaviour of an XML document defined?• in a stylesheet

– using XSL or CSS • possibly embedded in a program

applet, or script, or JAVA bean– defined for that particular dtd, tagset, or tag

•by reference to pre-existing mutual agreement amongst user communities– aka “namespaces”

• by reference to a Document Object Model

Page 25: Life after HTML

Xlink: the future of hypertext

We believe in the interconnectedness of all

thingsF. Braudel

Page 26: Life after HTML

Some linking terminology

• a link asserts a relationship between linkends

• links may be typed• link behaviour is what happens

when a link is activated– transclusion: new content appears without

displacing current content• linkends may be single or multiple

resources• linkends may be target or source

with respect to each other

Page 27: Life after HTML

Linking in HTML

• link behaviour is tied to particular tags• only two types

– <A> replace in same (or new) window– <IMG> transclude inline (usually)

• link targets are always whole documents– cannot reassemble fragments

• cannot add links to read-only documents

• linkends are inherently fragile

Page 28: Life after HTML

Xlink aims to do better

• formerly XLL, formerly XML-Link

• two components– Xlink – XPointer

• working drafts athttp://www.w3.org/TR/WD-xlinkhttp://www.w3.org/TR/WD-xptr

• WARNING: This is all subject to change!

Page 29: Life after HTML

XLink goals (1)

Provide advanced linking constructs within XML documents(XLink)– To anything

<?xml version="1.0"?><toplevel>...(link to HTML doc)...(link to GIF graphic)...</toplevel>

<html>....</html>

Page 30: Life after HTML

Xlink goals (2)

• Provide advanced addressing into XML document structure(XPointer)– From anything

<html>....</html>

modelnumber

partcode

statusrevisioncode

title descrip

doc

Page 31: Life after HTML

XPointer is…

• for pointing to subparts of XML resources (even if they don’t have IDs)

• based on the Text Encoding Initiative (TEI)“extended pointer” notation

• usable in association with URLs/URIs

<a href="http://some.url.com/Thing/foo.xml#id(foo)"><!ENTITY bar SYSTEM "http://some.url.com/Thing/foo.xml#id(foo)">

Page 32: Life after HTML

An XPointer consists of

• a series of location terms in the form termname(parms)

• terms are separated by a dot id(foo).child(3,SEC).child(4,LIST)

• each term is the location source for the next

• you can also use terms which point at strings, attributes, etc.

Page 33: Life after HTML

XPointer advantages

• a compact syntax which scales well

• as robust as possible– any changes “off the path” won’t

(necessarily) break the link– IDs are as safe as it gets...

•if there’s an ID nearby, point to it and walk down/up

•if not, walk down from the root

Page 34: Life after HTML

Xpointers: a flavour

• An Xpointer addresses the tree that the markup represents, not the markup itself

• Location terms address particular nodes in the tree e.g.– absolutely eg id(), html()– relatively eg child(), descendant(), ancestor(), psibling(), fsibling()

– string and attribute matches• can also specify spans

Page 35: Life after HTML

id() and html()

id(concepts) html(baz)

chapter

title sectionsection

title chapter

doc

abstract chapter

title p p list. . .

ID ="intro" ID="concepts" ID="summary"

p

ID="p37"xref

href="#id(intro)"

a

name="baz"

Page 36: Life after HTML

child() and descendant()

child(1,chapter).child(2,section)

descendant(1,abstract)

chapter

title sectionsection

title chapter

doc

abstract chapter

title p p list. . .

ID ="intro" ID="concepts" ID="summary"

p

ID="p37"xref

href="#id(intro)"

a

name="baz"

Page 37: Life after HTML

Xpointer examplesid(intro).child(3,div1)

the third <div1> within the element with identifier INTRO

html(foo).child(2,div1).(4,p).child(1,quote,lang,”LAT”)the first <quote> whose LANG attribute is set to “LAT” within the fourth <p> of the second <div1> of whatever element contains an HTML <A NAME=“#foo”>

descendant(#all,para)every <para> within the current location source

span(child(1,pb,n,”14”),child(1,pb,n,”23”))everything between the first <pb> whose N attribute is “14” and the first one whose N attribute is “23”

id(intro).child(3,div1)the third <div1> within the element with identifier INTRO

html(foo).child(2,div1).(4,p).child(1,quote,lang,”LAT”)the first <quote> whose LANG attribute is set to “LAT” within the fourth <p> of the second <div1> of whatever element contains an HTML <A NAME=“#foo”>

descendant(#all,para)every <para> within the current location source

span(child(1,pb,n,”14”),child(1,pb,n,”23”))everything between the first <pb> whose N attribute is “14” and the first one whose N attribute is “23”

Page 38: Life after HTML

Xlink proper

• allows you to invent your own linking elements and define their behaviour– the xml:link attribute is used to specify

the linking properties of your element• allows you to create link

databases– “standoff” markup allows you to link to

non-modifiable documents• inline vs out-of-line links

Page 39: Life after HTML

Link behaviours

• show attribute– new/replace/embed

• actuate attribute– user/auto

• behavior attribute– “for other instructions”

Page 40: Life after HTML

The importance of XLink

• Not just about fancy capabilities and new ways of associating information

• Promotes the creation of advanced information structures and site management

• Makes possible an industry devoted to knowledge management (that's us!)

• For example: OED + LION

Page 41: Life after HTML

XSL: bringing it all back home

Page 42: Life after HTML

transforming xml documents

TransformationTool

script

validXML documents

XML or non-XML documents

Page 43: Life after HTML

XSL: the final piece

• Standard Style Sheet Language• Combines DSSSL “flow objects”

and CSS objects• Uses XML syntax (rather than

Scheme)• Also uses ECMAscript for

extensions• Automatic conversion from CSS

Page 44: Life after HTML

DSSSL components

Page 45: Life after HTML

XSL is the next step for publishing• XSL is not just about

translation– user-configurability– enhanced clients

• Single source for print and online delivery

• XSL is intended to complete the internationalization of publishing

Page 46: Life after HTML

Tools you can use now

• Editing/creating documents– emacs + psgml; XED; any SGML editor

• Parsers– free standing: SP– java applets: (many)– embedded in applications

http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html

Page 47: Life after HTML

Tools you can use now

• Browsers and viewers– Hybrick; IE5; Netscape 4; Amaya,

Xmetal…• Toolkits

– DOM support now in Perl, TCL…• Transformers

– Jade

Page 48: Life after HTML

The big picture: empowerment

Page 49: Life after HTML

The wider picture• XML is not just about exchanging

data between machines– It's also about communication between

humans• XML is not just about the web

– It's about information in general• XML is not just about technology

– It's also about the relationship between content creators and software vendors

Page 50: Life after HTML

How we will use XML (1)

Heterogenous clients interfacing with a single database

Page 51: Life after HTML

How we will use XML (2)

A single client interfacing with heterogenous databases

Page 52: Life after HTML

The social agenda of XML 

• The social agenda of SGML has always been about user ownership of content.

• Freedom from proprietary data formats– Vendor neutrality– Platform neutrality– Language neutrality

Page 53: Life after HTML

The good news• XML is human-readable• XML data needs no special

tools– Perl is being optimized for XML support

• XML is an Open standard– In theory, XML users can't be held hostage

to vendor control• XML is easy

– witness the ever-growing set of free XML tools (almost all of them Java-based)

– There will be many powerful, cheap, off-the-shelf commercial XML tools

Page 54: Life after HTML

Economic and political implications of XML/XSL • The combination of XML and XSL

could replace all existing word-processing and publishing formats.

• What would this mean?– Users no longer tied to a proprietary

format– An end to domination of the market by a

few big companies– An end to domination of the market by a

few big countries

Page 55: Life after HTML

What can go wrong? 

• The XML agenda is one of user-empowerment at the expense of businesses based on control through proprietary formats.

• Companies that have built their business models on such formats can be expected to resist this.

• We must be on our guard!– accept nothing less than the whole!

Page 56: Life after HTML

How can we keep the XML/SGML dream alive? • XML is not just middleware• Define DTDs and namespaces• Avoid non-standard

extensions• Insist on real XSL -- an XSL

that outputs true formatting objects, not just HTML tags

• Build your own tools

Page 57: Life after HTML

If XML is So Great Why Has It Not Taken Over the World Already?• the world isn’t worth taking over• the world is not yet ready for the

truth• it’s those bad guys at IBM (or

Microsoft or Netscape or the CIA or ...)• …watch this space

Page 58: Life after HTML

Further reading...• The Horse’s Mouth: www.w3.org/XML/

– XML spec: www.w3.org/TR/REC-xml– XLink spec:www.w3.org/TR/WD-xlink– XPointer spec: www.w3.org/TR/WD-xptr

• Other useful resources– XML FAQ: www.ucc.ie/xml/–www.xml.com/xml/pub–www.lists.ic.ac.uk/hypermail/xml-dev/– All you want to know about everything:

•http://www.oasis-open.org/cover/xml.html

Page 59: Life after HTML

Today’s lesson...

And the LORD said, Behold the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do. Genesis xi.3

And the LORD said, Behold the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do. Genesis xi.3