Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
-
Upload
gervais-derek-johnston -
Category
Documents
-
view
217 -
download
1
Transcript of Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Overview of XPath
Author: Dan McCreary
Date: October, 2008
Version: 0.2 with TEI ExamplesM
D
Copyright 2008 Dan McCreary & Associates 2
M
D
Objectives
• Provide a short overview of XPath
• Describe why knowledge of XPath is important for:– Business analysts
– Project managers
– Report developers
– Programmers
• Demonstrate XPath using TEI document
Copyright 2008 Dan McCreary & Associates 3
M
D
Definition XPath
• XPath: "A language for addressing parts of an XML document"
• Similar to a DOS or UNIX "file system path" but with powerful expressions
• XPath is to XML what the SQL "select" statement is to SQL
• XPath is not a full programming language or a query language.
Copyright 2008 Dan McCreary & Associates 4
M
D
XPath Related Standards
• XSLT – XPath is used to tell XSLT how to match tags• XLink – similar to HTML links <a> but more powerful• XPointer - a standard manner for identifying document
fragments• XQuery – a newer, more comprehensive standard that
includes XPath 2.0 and allows more complex searches and data types include relational database searches. Includes clauses for "for, let, where, order-by, return"
Copyright 2008 Dan McCreary & Associates 5
M
D
Versions
• Version 1.0– W3C Recommendation November, 16 1999– http://www.w3.org/TR/xpath
• Version 2.0– W3C Working Draft October, 29 2004– http://www.w3.org/TR/xpath20/
Copyright 2008 Dan McCreary & Associates 6
M
D
Other Familiar Path Names
• DOS:– C:\Program Files\Altova\XMLSPY2004\Examples\Tutorial
• Web– http://www.google.com/search?hl=en&lr=lang_en&&q=XPath
• Unix– /usr/local/lib/mylib/myprogram.jar
• Similarities– Absolute path starts with "/"
– Relative paths express do not start with "/"
Copyright 2008 Dan McCreary & Associates 7
M
D
XPath Has Relative Expressions
self
child
parent
ancestor
sibling
descendant
sibling
preceding following
ancestorancestor
following-sibling
child child
descendantdescendant
preceding-sibling
Copyright 2008 Dan McCreary & Associates 9
M
D
Location Steps
• An XPath expression contains one or more "location steps", separated by slashes. Each location step has the following unabbreviated form:– axis-name :: node-test [predicate]*
• The abbreviated form is more commonly used, so that is what this example uses. It refers to the child axis:– node-test [predicate]*
Copyright 2008 Dan McCreary & Associates 10
M
D
XPath Also Provides
• XPath has functions for manipulation of:– Nodes and sets of nodes (elements-NameTest,
KindTest)– Strings (substring, contains etc.)– Numbers (position etc.)– Booleans (and/or/not etc.)
Copyright 2008 Dan McCreary & Associates 11
M
D
Disclaimer
• Example was chosen for teaching purposes only
• Not necessarily indicative of a full actual TEI structure
Copyright 2008 Dan McCreary & Associates 12
M
D
oXygen XPath Builder
Perspective > Show View > XPath Builder
Copyright 2008 Dan McCreary & Associates 13
M
D
Sample XML Document<TEI xmlns="http://www.tei-c.org/ns/1.0"
xmlns:frus="http://history.state.gov/2008/frus">
<teiHeader>
<fileDesc>
<titleStmt>
<title type="complete">Foreign Relations of the United States, 1969-1976, Volume
E-13: Documents on China, 1969-1972</title>
<title type="series">Foreign Relations of the United States</title>
<title type="subseries">1969-1976</title>
<title type="volumenumber">Volume E-13</title>
<title type="volume">Documents on China, 1969-1972</title>
<editor role="primary">Steven E. Phillips</editor>
<editor role="general">Edward C. Keefer</editor>
</titleStmt>
Copyright 2008 Dan McCreary & Associates 14
M
D
TEI Tree
text
TEI namespace
teiHeader
front body
div type="compilation"
head pb
div type=“document"
pbdiv type=“document"
Copyright 2008 Dan McCreary & Associates 15
M
D
90/10 rule for XPath
• 90% of the time your XPath expressions will be very simple.
• They will be just getting subsets of all the data in the document
• They will use the default “child” selector
Copyright 2008 Dan McCreary & Associates 16
M
D
Sample XPath Child Expressions
• Find the root element of any document–/
• Find the root element–/TEI
• Find the TEI Header–/TEI/teiHeader
• Find the text body–/TEI/text/body
• Find the text body headers–/TEI/text/body/div/head
• Find the text body sub headers–/TEI/text/body/div/head/div/head
Copyright 2008 Dan McCreary & Associates 17
M
D
Selecting Attributes
• To select an attribute you must add the @ symbol to the expression
• Example: select only the divs that have an attribute type="document"– //div[@type="document"]
Copyright 2008 Dan McCreary & Associates 18
M
D
Getting Data with //
• Get all the person names in the document– //persName
• Get all the dates or notes in the document– //date
• Get all the notes of type summary
//note[@type='summary']
Copyright 2008 Dan McCreary & Associates 19
M
D
Sample Counts
• Count of the number of paragraphs in the document– count(//p)
• Count of the number of divs– count(//div)
• Count of the number of person names– count(//persName)
Copyright 2008 Dan McCreary & Associates 20
M
D
What’s the Context?
• XPath evaluates expressions relative to a context.– Usually specified by one of the technologies
that extend XPath, such as XSLT and XPointer.– Includes context node, context size, context
position, etc.
• In XSLT, the context node is the current node being evaluated.
Copyright 2008 Dan McCreary & Associates 21
M
D
Sample XPath Syntax
• Test for equality– node=‘value’
• [ ] – predicate, e.g. First in sequence: [1] • Last: last()• * - wildcards• @ - attributes• // - all children of the context node• . – context node itself• .. – parent of the context node
Copyright 2008 Dan McCreary & Associates 22
M
D
XPath String Operations
concat
starts-with
ends-with
contains
substring
string-length
substring-before
substring-after
normalize-space
normalize-unicode
upper-case
lower-case
translate
string-pad
matches
replace
tokenize
escape-uri
Copyright 2008 Dan McCreary & Associates 23
M
D
Boolean Operators
• boolean(1 and 1) returns true• (1 and 0) returns false• (1 or 0) returns true• (1 and 1 and (1 or 0)) returns true
Copyright 2008 Dan McCreary & Associates 24
M
D
Contains()
• Returns true if the first string passed contains the second string passed
• Example:– //p[contains(.,'President')]
Copyright 2008 Dan McCreary & Associates 25
M
D
Benefits of XPath
• Business analysts– Precise specification of constraints, support for graphical representation of
rules, easy to change.
• Project managers– Shorten development cycle, easier to change business logic.
• Report developers– Pattern matching language for routing of documents based on rules,
platform and language neutral.
• Programmers– Functions available for manipulating nodes, sets of nodes, strings, numbers,
or boolean. Based on regular expressions and mathematical model. Good for mapping and transforming documents from one type to another.
Copyright 2008 Dan McCreary & Associates 26
M
D
To Find Out More
• XPath 2.0 Programmer's Reference– by Michael Kay
• Definitive XSLT and XPath– by G. Ken Holman
• XSLT and XPath– John Robert Gardner and Zarella L. Rendon
Copyright 2008 Dan McCreary & Associates 27
M
D
Thank You!
Please contact me for more information:• XML Development• Metadata Management Services• Web Services• Service Oriented Architectures• Business Intelligence and Data Warehouse• Metadata Registries• Semantic Web
Dan McCreary, PresidentDan McCreary & Associates
Metadata Strategy [email protected]
(952) 931-9198