Post on 26-Jul-2020
Introduction to XPATH
Adapted from” XML How To Program”by Deitel ” XML How To Program”by Deitel
Chapter 5 –XML Path Language(XPath)
– Readings:� XML Path Language (XPath)
http://www.w3.org/TR/xpath
Introduction
� XPath is a language for specifying navigation within an XML document
� It also provides basic facilities for manipulating strings, � It also provides basic facilities for manipulating strings, numbers, and booleans
� XPath models an XML document as a tree of nodes� Most common nodes are: d, e, a, and t-nodes� XPath defines a way to compute a string value for
each type of a node� Used by other XML technologies
– e.g.XSLT, Xpointer,Xquery
Nodes
� XML document– XML documents are treated as trees of nodes– The root of the tree is called the document node (or root node).– Each node represents part of XML document
� Seven types– Root (document node)– Element– Attribute– Text– Comment– Processing instruction– Namespace
� Attributes or namespaces are not children of their parent node– They describe their parent node
Partial Faculty.xml
d101 faculty.xml
e101faculty
a
name
student student
e101a101
e102 e110 e118
Computingcourse course course
a102
t101
e103 e104 e107
e105a103
c2313
cid
year
2009 ADA
name
e106
t102
sid
x000123
grade
t103
A
e108 e109
t104
sid grade
t105
x0008787 B+
Element node continued
e110course
a
cid
e110
student studenta104
t106
e111 e112 e115
e113a105
c2314
year
2009 CIS
name
e114
t107
sid
x000123
grade
t108
A
e116 e117
t109
sid
x0008787
grade
t110
C
Element node continued
e118coursecid
e118
lecturer student
e118course
a106
t111
e119 e120 e121
a107
c2313
cid
year
2008 IMD
name
t112
Jones
e122 e123
t113
sid
x0008787
grade
t114
C+
XPath Axis
� Within an XPath step, Axis specifies “direction ” in which to navigate through a document
– For example, the step:child::studentchild::studentwhere Axis = child:: and Node-test = student would select all child nodes (of a context node) that have the name student
� The XPath supports 12 different axis for navigation� The child:: axis is the most commonly used � Some of the others are:
– attribute:: (access attributes of a context node),– descendant:: (access descendant nodes of a context node),– self:: (access the context node itself),– descendant-or-self:: (access the context node and its
descendants, and returns the contents of the nodes that satisfy the node test)
– parent:: (access the parent node of a context node),
XPath axes.Axis Name Ordering Description
self:: none The context node itself. See Note
parent:: reverse The context node’s parent, if one exists. See Note
child:: forward The context node’s children, if they exist. The default if no axis is provided See Note
ancestor:: reverse The context node’s ancestors parents, grandparents etc, if they exist. ancestor:: reverse The context node’s ancestors parents, grandparents etc, if they exist.
ancestor-or-self:: reverse The context node’s ancestors and also itself.
descendant:: forward The context node’s descendants i.e children grandchildren etc.
descendant-or-self:: forward The context node’s descendants and also itself.
following:: forward Selects everything in the document after the closing tag of the current node.
following-sibling:: forward The sibling nodes following the context node.
preceding:: reverse Selects everything in the document that is before the start tag of the current node
preceding-sibling:: reverse The sibling nodes preceding the context node.
attribute:: forward The attribute nodes of the context node. See Note
namespace:: forward The namespace nodes of the context node.
NOTE: These AXES can be used in an abbreviated form
Some location -path abbreviations.
Location Path Description
child:: This location path is used by default if no axis is supplied and may therefore be omitted. be omitted.
attribute:: The attribute axis may be abbreviated as @.
/descendant-or-self::node()/ This location path is abbreviated as two slashes (//).
self::node() The context node is abbreviated with a period (.).
parent::node() The context node’s parent is abbreviated with two periods (..).
Node-set operators.
Node-set Operators Description
pipe (|)
Performs the union of two node-sets.
slash (/) Separates location steps.
double-slash (//) Abbreviation for the location path /descendant-or-self::node()/
div Division
!= , <=, <, =, >=, > are also supported
and AND
or OR
Some Node -set and String functions
Functions Description
last Returns the number of nodes in the node-set.
position Returns the position number of the current node in the node-set being tested. the node-set being tested.
count Returns the number of nodes in node-set.
name Returns a string containing a the name of the node in the node-set argument that is first in document order.
string-length Returns the number of characters in the string.
starts-with Returns true if the first argument string starts with the second argument string; otherwise returns false.
contains Returns true if the first argument string contains the second argument string; otherwise returns false.
See http://www.w3.org/TR/xpathfor more functions
The Node -test of an XPath Step
� A Node-test specifies a simple test on XML nodes found along the steps’ axisnodes found along the steps’ axis
� Nodes that pass that test are candidates for the next step
� The node test can be based on the– Node name, or– Node kind
Node-Test Based on Names
� Each axis has a main node kind– the attribute:: axis has attribute – all other axes (child:: , descendant:: , parent:: ) have
element as the main node kindelement as the main node kind
� Only node name tests on nodes of the main node kind can be true
� Suppose course e118 is the context node– descendant::sid returns (e122),– child::* returns (e119, e120, e121),– attribute::year returns (the value 2008 of a107),– attribute::name returns () (an empty sequence of
nodes)
Node-Test Based on the Node Kind
� The most common node-tests that are based on the node kind are:
– node() that selects each node, regardless of the kindnode() that selects each node, regardless of the kind– text() that selects each t-node,– element() that selects each e-node, and– attribute() that selects each a-node- comment() that selects each c-node
� Suppose student node e121 is the context node,then
child::grade/child::text()
returns the sequence (t114) whose string value is C+ (actually, query processor returns only the string C+)
XPath Location Paths
� Navigation through an XML document is declared using Location Paths expressions
� Location paths can be expressed using either an unabbreviated or an abbreviated syntax
� Location Paths are made up of steps
Evaluation of a Location Path
� A location path is evaluated step by step, from left to right
� A step is applied to a single node, so called � A step is applied to a single node, so called context node
� The application of a step on a context node selects a sequence of result nodes
� Each node of a result sequence is then used as a context node in the following step
� The result of an expression is a concatenation of node-sequences selected by the last step
Unabbreviated Syntax of Location Paths
� A location path has the following syntax:Path ::= Step1/…/Stepn
where each Step is a triple (Axis, Node-test, Predicate) where each Step is a triple (Axis, Node-test, Predicate) and is defined as follows:
Step ::= Axis:: Node-test Predicate*– The axis specifies the direction to move in the document tree– The node test selects nodes along the specified axis, and– The predicates (if any) filter the nodes selected
� Separators “/” between two subsequent steps indicate a direct superior-subordinated relationship between nodes involved in steps
What Does an XPath Expression Return?
� A location path expression returns a sequence of result nodes with their contents in the form of an XML document
� This XML document does not have to be well formedThis XML document does not have to be well formed� Xpath expression:
/child::course[attribute::cid=“c2313”]/child::student[child::sid=“x0008787"]
� Result: <student> <sid>x0008787</sid><grade>B+</grade>
</student><student>
<sid>x0008787</sid><grade>C+</grade>
</student>
Predicates of a Step
� An XPath step can also include a sequence of predicatesin square brackets
[<predicate>] [<predicate>]
� Predicates are applied to nodes selected by a node-test� Only nodes that evaluate true for all predicates will belong
to the result of a step� A predicate compares a node property with a value using
operators from the set {=, <, >,<, >, !=,}� A node property can be:
– The value of an attribute,– The value of #PCDATA of an element, or– The sibling order value of a node (returned by the function position() )
Examples of XPath Predicates
� Let faculty e101 be the context node– child::course[position()=2] selects the second child
element of the context node that has the name course , and element of the context node that has the name course , and returns e110
– child::course[attribute::cid= “c2313 ”] selects all course children of the context node that have the attribute cid= “c2313 ”, and returns (e102, e118)
– descendant::student[child::sid= “x000123 ”] selects the student descendants of the context node that have a sid child with a string value equal to “s1 ” (e104, e112)
Abbreviated Syntax of Location Path (1)
� The most important abbreviation is that child:: axis can be omitted from a location step
� In fact, child:: is the default axis� For example,
– student/sid is a short for – child::student/child::sid
� There is also an abbreviation for attributes: attribute:: can be abbreviated to @
� For example,– course[@year= “2009 ”] is short for– child::course[attribute::year= “2007 ”] and will
select all course children of the context node whose year is “2009 ” (e102, e110)
Abbreviated Syntax of Location Path (2)
� If a predicate expression evaluates to an integer value that value is considered to be the position of the node selected
For example, step would select the second – For example, student[2] step would select the second student child of the context node
� An empty step ‘// ’ is also a frequently used abbreviation, it specifies that the element that follows may be nested anywhere within the document
– //student would select all student nodes anywhere within the document
– course[@cid= “c2313 ”][@year= “2008 ”]//grade will select all grade elements subordinated to the course element with pid= “p13 ” and year= “2008 ”
Abbreviated Syntax of Location Path (3)
� A location step of “. “is short for self::node() , where self:: refers to the context node and node()
returns nodes of any typereturns nodes of any type� For example:
– .//student is short for– self::node()/descendant-or-self::node()/child::student
and will select all student elements that are children of the context node itself or of any of its descendants
� A location step of .. is short for parent::node()– For example,
� ../lecturer is short for � parent::node()/child::lecturer and will select all lecturer
children of the parent of the context node
1 <?xml version = "1.0"?>
2
3 <!--: books.xml -->
4 <books>
5 <!-- XML book list -->
6 <book>
7 <title>Java How to Program</title>
8 <translation edition = "1">Spanish</translation>
9 <translation edition = "1">Chinese</translation>
10 <translation edition = "1">Japanese</translation>
11 <translation edition = "2">French</translation>
12 <translation edition = "2">Japanese</translation><price> 75</price>13 <price>75</price>
14 </book>
1515
16
17 <book>
18 <title>C++ How to Program</title>
19 <translation edition = "1">Korean</translation>
20 <translation edition = "2">French</translation>
21 <translation edition = "2">Spanish</translation>
22 <translation edition = "3">Italian</translation>
23 <translation edition = "3">Japanese</translation>
24 <price>65</price>
25 </book>
26 </books>
Predicate Exercises for book.xml
� Examine the XPATH expressions and– In your own words explain what will be returned– Execute them. Did you get it right?1. /books/book[2]
2. /child::books/child::book[position()=2]
3. /books/book[price>70]
4. /books/book[price>70]/title/text()
5. /books/book[last()]
6. /books/book/translation[@edition="1" and text() ="Chinese"]/preceding-sibling::title/text()
Write some XPATH Expressions
� Which books have Japanese translations?– Hint– Use predicate– Use predicate
� Boolean expression for filtering nodes from the search� Compare string value of current node to string ‘Japanese’
� Find the textbook name that has a 3rd edition and a Italian translation
� What translations of the C++ How To Program text book are on the first and second editions?
Summary
� XPath is a language for specifying navigation through an XML document
� XPath models an XML document as a tree of nodesA location path has the following syntax:� A location path has the following syntax:
Path ::= Step1/…/Stepn
where each Step is a triple (Axis, Node-test, Predicate): – The axis specifies the direction to move in the document tree– The node test selects nodes along the specified axis, and– The predicates (if any) filter the nodes selected
� A location path can be either:Relative, or Absolute� A relative location path is declared with regard to a context node
and its evaluation starts from this node