Reading XML: Introducing XPath - Earth &...

29
Reading XML: Introducing XPath Andrew Walker

Transcript of Reading XML: Introducing XPath - Earth &...

Page 1: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Reading XML:Introducing XPath

Andrew Walker

Page 2: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Why?

• Makes writing analysis scripts for your XML encoded results easy.

• Is used by many other XML technologies. XSLT, XPointer etc.

• A Fortran XPath API is in development.

• It helps you design your XML documents.

Page 3: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

#! /usr/bin/perl -w

$num = ‘[+-]?\d+\.?\d*’;

while (<>) {

if (/temperature = ($num)/) {

$temp = $1;

} elsif (/pressure = ($num)/) {

$pres = $1;

}

}

...

Page 4: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

#! /usr/bin/perl -w

use XML::XPath;

my $xp = XML::XPath->new(filename => $ARGV[0]);

$temp = $xp -> find(“//temperature”);

$pres = $xp -> find(“//pressure”);

...

Page 5: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

/XPath/queries/are/like/paths

Page 6: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

kml

Document

Style

Style

Folder

Folder

Placemark

Placemark

Placemark

Placemark

<kml>

</kml>

<Document>

</Document>

<Style/>

<Style/>

<Folder>

<Folder>

<Placemark/>

</Folder>

</Folder>

<Placemark/>

<Placemark/>

<Placemark/>

Elements are nodes

Page 7: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

/path/to/node

path

to

node

<path><to>

<node/></to>

<path>

Page 8: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Everything is a node

path

to

node

<path><to type=’foo’>

<node>Some text</node></to>

<path>

type

Some text

Page 9: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Types of node

attributeselements

text nodesprocessing instructions

the root node

comments

namespace nodes

Page 10: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

• / -- the root node

• elementName -- an element

• @attributeName -- an attribute

• text() -- a text node

/text/to/node/text()

/path/to/node/@attribute

Page 11: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Data types - What can be used or returned?

number: -24.451

string: ‘some text in quotes’

boolean: true or false

nodeset: one or more nodes

Page 12: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

/path/to/node/text()gives string: ‘Some text’

path

to

node

<path><to type=’foo’>

<node>Some text</node></to>

<path>

type

Some text

Page 13: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

/path/to/@typegives string: ‘foo’

path

to

node

<path><to type=’foo’>

<node>Some text</node></to>

<path>

type

Some text

Page 14: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

/path/togives: a nodeset

path

to

node

<path><to type=’foo’>

<node>Some text</node></to>

<path>

type

Some text

Page 15: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

(Some) functions

count(nodeset)

namespace-uri(nodeset)

string(anytype)

normalize-space(string)

Page 16: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

count(/path/to)gives: number 2

path

to

node<path>

<to type=’foo’><node>Some text</node>

</to><to> type=’bar’>the end</to>

<path>

type

Some text

to type

the end

Page 17: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Predicates

Path matches query if predicate is true

Included in square brackets in query

/path/to[@type=’foo’]

Page 18: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

/path/to[@type=’bar’]/text()gives: string ‘the end’

path

to

node<path>

<to type=’foo’><node>Some text</node>

</to><to> type=’bar’>the end</to>

<path>

type

Some text

to type

the end

Page 19: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

XPath in python

import lxml.etree

docRoot = lxml.etree.parse(source="monty.xml")

answer = docRoot.xpath("/film/@name")

print answer

Page 20: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Namespaces

• Create a dictionary relating namespace URIs to local namespace prefix.

• Pass dictionary to XPath function.

• Include prefix before all elements separated by a colon.

• No inheritance in query!

Page 21: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

count(/ns:path/ns:to)gives: number 2

path

to

node<path xmlns=’name1’>

<to type=’foo’><node>Some text</node>

</to><to> type=’bar’>the end</to>

<path>

type

Some text

to type

the end

Page 22: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

count(/ns:path/ns2:to)gives: number 1

path

to

node<path xmlns=’name1’>

<to type=’foo’ xmlns=’name2’><node>Some text</node>

</to><to> type=’bar’>the end</to>

<path>

type

Some text

to type

the end

Page 23: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Namespaces in python

import lxml.etree

namespaces = {'q':'http://www.example.com/quotes', 'f':'http://www.example.com/films'}

docRoot = lxml.etree.parse(source="monty.xml")

answer = docRoot.xpath ("/q:film/@name", namespaces)

print answer

Page 24: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Delimiters

• / -- separates location steps

• // -- descendent or self

• .. -- parent

• . -- self

• [ ] -- predicate

• @ -- attribute

Page 25: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

count(//ns:to)gives: number 2

path

to

node<path xmlns=’name1’>

<to type=’foo’><node>Some text</node>

</to><to> type=’bar’>the end</to>

<path>

type

Some text

to type

the end

Page 26: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

//ns:to/ns:node/text()gives: string ‘Some text’

path

to

node<path xmlns=’name1’>

<to type=’foo’><node>Some text</node>

</to><to> type=’bar’>the end</to>

<path>

type

Some text

to type

the end

Page 27: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

Relative paths

import lxml.etree

docRoot = lxml.etree.parse(source="films.xml")

nodes = docRoot.xpath("//film[@type=’comedy’]")

for node in nodes:

answer = node.xpath("name/text()")

print answer

Page 28: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

What have I not covered

more functions

the axis

unusual nodes

numeric valued predicates and the document order

Page 29: Reading XML: Introducing XPath - Earth & Environmenthomepages.see.leeds.ac.uk/~earawa/FoX/iFaX/Docs/... · 2017-10-25 · Why? • Makes writing analysis scripts for your XML encoded

XPath APIs

• Java: standard API provided by javax.xml.xpath as of Java 5

• Perl: CPAN module XML::XPath

• Python: lxml.xpath http://codespeak.net/lxml

• C and bindings to other languages: libxml at http://xmlsoft.org/