1 CP3024 Lecture 9 XML revisited, XSL, XSLT, XPath, XSL Formatting Objects.
Introduction to XSLT - TEItei.oucs.ox.ac.uk/Talks/2011-07-dhox/presentations/xsl-01.pdf · XSLT The...
Transcript of Introduction to XSLT - TEItei.oucs.ox.ac.uk/Talks/2011-07-dhox/presentations/xsl-01.pdf · XSLT The...
Introduction to XSLT
TEI@Oxford
July 2011
Summer School 2011 1/59
Publishing XML files using XSLT
Our work will be divided into four partsBasic XSLT Target: make HTML from TEI documents
More complex XSLT Target: making more complex HTML,sorting and summarizing
Using TEI XSL family Target: customize existing library ofstylesheets
TEI, ODT, DOCX, and ePub Target: transforming TEI to andfrom word-processor and epub formats
… depending on how fast or slow we go, and what the class wantsto talk about …It is assumed that we are working on TEI XML documents.
Summer School 2011 2/59
What is the XSL family?
XPath: a language for expressing paths through XML treesXSLT: a programming language for transforming XMLXSL FO: an XML vocabulary for describing formatted pages
Summer School 2011 3/59
XSLT
The XSLT language isexpressed in XMLuses namespaces to distinguish output from instructionspurely functionalreads and writes XML trees
It was designed to generate XSL FO, but now widely used togenerate HTML.
Summer School 2011 4/59
What is a transformation?Take this:.
.
. ..
.
.
<persName><forename>Milo</forename><surname>Casagrande</surname>
</persName><persName><forename>Corey</forename><surname>Burger</surname>
</persName><persName><forename>Naaman</forename><surname>Campbell</surname>
</persName>
and make this:.
.
. ..
.
.
<item n="1"><name>Burger</name>
</item><item n="2"><name>Campbell</name>
</item><item n="3"><name>Casagrande</name>
</item>
Summer School 2011 5/59
A text example
Take this.
.
. ..
.
.
<div type="recipe" n="34"><head>Pasta for beginners</head><list><item>Pasta</item><item>Grated cheese</item>
</list><p>Cook the pasta and mix with the cheese</p>
</div>
and make this.
.
. ..
.
.
<html><h1>34: Pasta for beginners</h1><p>Ingredients: Pasta Grated cheese</p><p>Cook the pasta and mix with the cheese</p>
</html>
Summer School 2011 6/59
How do you express that in XSL?.
.
. ..
.
.
<xsl:stylesheetxpath-default-namespace="http://www.tei-
c.org/ns/1.0" version="2.0"><xsl:template match="div"><html><h1><xsl:value-of select="@n"/>:
<xsl:value-of select="head"/></h1><p>Ingredients:<xsl:apply-templates select="list/item"/></p><p><xsl:value-of select="p"/>
</p></html>
</xsl:template></xsl:stylesheet>
Note: the namespace declaration linking xsl: tohttp://www.w3.org/1999/XSL/Transform is not shownin these examples.
Summer School 2011 7/59
Structure of an XSL file
.
.
. ..
.
.
<xsl:stylesheetxpath-default-namespace="http://www.tei-
c.org/ns/1.0" version="2.0"><xsl:template match="div">
<!-- .... do something with div elements....--></xsl:template><xsl:template match="p">
<!-- .... do something with p elements....--></xsl:template>
</xsl:stylesheet>
The div and p are XPath expressions, which specify which bit ofthe document is matched by the template.Any element not starting with xsl: in a template body is put intothe output.
Summer School 2011 8/59
The Golden Rules of XSLT
...1 If there is no template matching an element, we go on andprocess the elements inside it
...2 If there are no elements to process by Rule 1, any text insidethe element is output
...3 Children elements are not processed by a template unless youexplicitly say so
...4 xsl:apply-templates select="XX" looks fortemplates which match element ”XX”; xsl:value-ofselect="XX" simply gets any text from that element
...5 The order of templates in your program file is immaterial
...6 You can process any part of the document from any template
...7 Everything is well-formed XML. Everything!
Summer School 2011 9/59
Important magic
Our examples and exercises all start with two important attributeson <stylesheet>:
<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"xpath-default-namespace="http://www.tei-c.org/ns/1.0"version="2.0">....
which indicates...1 In our XPath expressions, any element name without a
namespace is assumed to be in the TEI namespace...2 We want to use version 2.0 of the XSLT specification. This
means that we must use the Saxon processor for our work.
Summer School 2011 10/59
A simple test file
.
.
. ..
.
.
<text><front><div><p>Material up front</p>
</div></front><body><div><head>Introduction</head><p rend="it">Some sane words</p><p>Rather more surprising words</p>
</div></body><back><div><p>Material in the back</p>
</div></back>
</text>
Summer School 2011 11/59
Feature: apply-templates.
.
. ..
.
.
<xsl:stylesheet version="2.0"xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:template match="/"><html><xsl:apply-templates/>
</html></xsl:template>
</xsl:stylesheet>
.
.
. ..
.
.
<xsl:template match="TEI"><xsl:apply-templates select="text"/>
</xsl:template>
.
.
. ..
.
.
<xsl:template match="text"><h1>FRONT MATTER</h1><xsl:apply-templates select="front"/><h1>BODY MATTER</h1><xsl:apply-templates select="body"/>
</xsl:template>
Summer School 2011 12/59
Feature: value-of
Templates for paragraphs and headings:.
.
. ..
.
.
<xsl:template match="p"><p><xsl:apply-templates/>
</p></xsl:template><xsl:template match="div"><h2><xsl:value-of select="head"/>
</h2><xsl:apply-templates/>
</xsl:template><xsl:template match="div/head"/>
Notice how we avoid getting the heading text twice.Why did we need to qualify it to deal with just <head> inside<div>?
Summer School 2011 13/59
More complex patterns
The select attribute can point to any part of the document. UsingXPath expressions, we can find:
/ the root of document (outside the root element)* any elementtext()name an element called ‘name’@name an attribute called ‘name’
Example of complete path in <value-of>:.
.
. ..
.
.
<xsl:value-ofselect="/TEI/teiHeader/fileDesc/titleStmt/title"/>
Summer School 2011 14/59
XPath
XPath is the basis of most other XML querying and transformationlanguages.
It is a syntax for accessing parts of an XML documentIt uses a path structure to define XML elementsIt has a library of standard functionsIt is a W3C Standard and one of the main components ofXQuery and XSLT
Summer School 2011 15/59
Example text
.
.
. ..
.
.
<body n="anthology"><div type="poem"><head>The SICK ROSE </head><lg type="stanza"><l n="1">O Rose thou art sick.</l><l n="2">The invisible worm,</l><l n="3">That flies in the night </l><l n="4">In the howling storm:</l>
</lg><lg type="stanza"><l n="5">Has found out thy bed </l><l n="6">Of crimson joy:</l><l n="7">And his dark secret love </l><l n="8">Does thy life destroy.</l>
</lg></div>
</body>
Summer School 2011 16/59
XML Structure
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Really attributes (and text) are separate nodes!
Summer School 2011 17/59
/body/div/head
body type=“anthology”
div type= “poem”
div type= “shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
XPath locates any matching nodes
Summer School 2011 18/59
/body/div/lg ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 19/59
/body/div/lg
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 20/59
/body/div/@type ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
@ = attributes
Summer School 2011 21/59
/body/div/@type
body type=“anthology”
div type= “poem”
div
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
type=“poem”
type=“shortpoem”
Summer School 2011 22/59
/body/div/lg/l ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 23/59
/body/div/lg/l
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 24/59
/body/div/lg/l[@n=“2”] ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Square Brackets Filter Selection
Summer School 2011 25/59
/body/div/lg/l[@n=“2”]
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 26/59
/body/div[@type=“poem”]/head ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 27/59
/body/div[@type=“poem”]/head
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 28/59
//lg[@type=“stanza”] ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
// = any descendant
Summer School 2011 29/59
//lg[@type=“stanza”]
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 30/59
//div[@type=“poem”]//l ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 31/59
//div[@type=“poem”]//l
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 32/59
//l[5] ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Square brackets can also filter by counting
Summer School 2011 33/59
//l[5]
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 34/59
//lg/../@type ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Paths are relative: .. = parent
Summer School 2011 35/59
//lg/../@type
body type=“anthology”
div type= “poem”
div
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
type=“poem”
type=“shortpoem”
Summer School 2011 36/59
//l[@n > 5] ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Numerical operations can be useful.
Summer School 2011 37/59
//l[@n > 5]
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 38/59
//div[head]/lg/l[@n=“2”] ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Notice the deleted <head> !
Summer School 2011 39/59
//div[head]/lg/l[@n=“2”]
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 40/59
//l[ancestor::div/@type=“shortpoem”] ?
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
ancestor:: is an unabbreviated axis name
Summer School 2011 41/59
//l[ancestor::div/@type=“shortpoem”]
body type=“anthology”
div type=“poem”
div type=“shortpoem”
head
head
lg type=“stanza”
lg type=“couplet”
l n=“4”
l n=“6”
l n=“2”
l n=“3”
l n=“7”
l n=“1”
l n=“8”
l n=“5”
l n=“1”
lg type=“stanza”
l n=“2”l n=“2”
l n=“2”
Summer School 2011 42/59
XPath: More About Paths
A location path results in a node-setPaths can be absolute (/div/lg[1]/l)Paths can be relative (l/../../head)Formal Syntax: (axisname::nodetest[predicate])For example:child::div[contains(head, 'ROSE')]
Summer School 2011 43/59
XPath: Axes
ancestor:: Contains all ancestors (parent, grandparent, etc.)of the current node
ancestor-or-self:: Contains the current node plus all itsancestors (parent, grandparent, etc.)
attribute:: Contains all attributes of the current nodechild:: Contains all children of the current node
descendant:: Contains all descendants (children,grandchildren, etc.) of the current node
descendant-or-self:: Contains the current node plus all itsdescendants (children, grandchildren, etc.)
Summer School 2011 44/59
XPath: Axes (2)
following:: Contains everything in the document after theclosing tag of the current node
following-sibling:: Contains all siblings after the currentnode
parent:: Contains the parent of the current nodepreceding:: Contains everything in the document that is
before the starting tag of the current nodepreceding-sibling:: Contains all siblings before the current
nodeself:: Contains the current node
Summer School 2011 45/59
Axis examplesancestor::lg = all <lg> ancestorsancestor-or-self::div = all <div> ancestors orcurrentattribute::n = n attribute of current nodechild::l = <l> elements directly under current nodedescendant::l = <l> elements anywhere under currentnodedescendant-or-self::div = all <div> children orcurrentfollowing-sibling::l[1] = next <l> element at thislevelpreceding-sibling::l[1] = previous <l> element atthis levelself::head = current <head> element
Summer School 2011 46/59
XPath: Predicates
child::lg[attribute::type='stanza']child::l[@n='4']child::div[position()=3]child::div[4]child::l[last()]child::lg[last()-1]
Summer School 2011 47/59
XPath: Abbreviated Syntax
nothing is the same as child::, so lg is short forchild::lg@ is the same as attribute::, so @type is short forattribute::type. is the same as self::, so ./head is short forself::node()/child::head.. is the same as parent::, so ../lg is short forparent::node()/child::lg// is the same as descendant-or-self::, so div//l isshort for child::div/descendant-or-self::node()/child::l
Summer School 2011 48/59
Example of context-dependent matches
Compare.
.
. ..
.
.
<xsl:template match="head"> ....</xsl:template>
with.
.
. ..
.
.
<xsl:template match="div/head"> ...</xsl:template><xsl:template match="figure/head"> ....</xsl:template>
Summer School 2011 49/59
Priorities when templates conflict
It is possible for it to be ambiguous which template is to be used:.
.
. ..
.
.
<xsl:template match="person/name">…</xsl:template><xsl:template match="name">…</xsl:template>
when the processor meets a <name>, which template is used?
Summer School 2011 50/59
Solving priorities
There is a priority attribute on <template>; the higher the value,the more inclined the XSLT engine is to use it:.
.
. ..
.
.
<xsl:template match="name" priority="1"><xsl:apply-templates/>
</xsl:template><xsl:template match="person/name" priority="2"> A name</xsl:template>
Summer School 2011 51/59
Template priority generally
The more normal rule is that the most specific template wins..
.
. ..
.
.
<xsl:template match="*"><!-- ... --></xsl:template><xsl:template match="tei:*"><!-- ... --></xsl:template><xsl:template match="p"><!-- ... --></xsl:template><xsl:template match="div/p"><!-- ... --></xsl:template><xsl:template match="div/p/@n"><!-- ... --></xsl:template>
Summer School 2011 52/59
Pushing and pullingXSLT stylesheets can be characterized as being of two types:
push In this type of stylesheet, there is a different templatefor every element, communication via<xsl:apply-templates> and the overall result isassembled from bits in each template. It is sometimeshard to visualize the final design. Common fordata-oriented processing where the structure is fixed.
pull In this type, there is a master template (usuallymatching /) with the main structure of the output,and specific <xsl:for-each> or <xsl:value-of>commands to grab what is needed for each part. Thetemplates tend to get large and unwieldy. Commonfor document-oriented processing where the inputdocument structure varies.
Summer School 2011 53/59
Attribute value template
What if we want to turn... ..
. .<ref target="http://www.oucs.ox.ac.uk/">OUCS</ref>
into... ..
. .<a href="http://www.oucs.ox.ac.uk/"/>
? What we cannot do is.
.
. ..
.
.
<xsl:template match="ref"><a href="@target"><xsl:apply-templates/>
</a></xsl:template>
This would give the @href attribute the value ‘@target’.
Summer School 2011 54/59
For example
Instead we use {} to indicate that the expression must beevaluated:.
.
. ..
.
.
<xsl:template match="ref"><a href="{@target}"><xsl:apply-templates/>
</a></xsl:template>
This would give the @href attribute whatever value the attribute@target has!
Summer School 2011 55/59
Feature: for-eachIf we want to avoid lots of templates, we can do in-line loopingover a set of elements. For example:.
.
. ..
.
.
<xsl:template match="listPerson"><ul><xsl:for-each select="person"><li><xsl:value-of select="persName"/>
</li></xsl:for-each>
</ul></xsl:template>
contrast with.
.
. ..
.
.
<xsl:template match="listPerson"><ul><xsl:apply-templates select="person"/>
</ul></xsl:template><xsl:template match="person"><li><xsl:value-of select="persName"/>
</li></xsl:template>
Summer School 2011 56/59
Feature: ifWe can make code conditional on a test being passed:.
.
. ..
.
.
<xsl:template match="person"><xsl:if test="@sex='1'"><li><xsl:value-of select="persName"/>
</li></xsl:if>
</xsl:template>
contrast with.
.
. ..
.
.
<xsl:template match="person[@sex='1']"><li><xsl:value-of select="persName"/>
</li></xsl:template><xsl:template match="person"/>
The @test can use any XPath facilities.
Summer School 2011 57/59
Feature: choose
We can make a multi-value choice conditional on what we find inthe text:.
.
. ..
.
.
<xsl:template match="person"><xsl:apply-templates/><xsl:choose><xsl:when test="@sex='1'">(male)</xsl:when><xsl:when test="@sex='2'">(female)</xsl:when><xsl:when test="not(@sex)">(no sex specified)</xsl:when><xsl:otherwise>(unknown sex)</xsl:otherwise>
</xsl:choose></xsl:template>
Summer School 2011 58/59
Summary
Now you can...1 Write templates which match any element or attribute...2 Pick out text from anywhere...3 Write code conditional on something in the text
Summer School 2011 59/59