XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development...

40
XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Tutorial: Introduction to XML and Java: Introduction to XML and Java: XML, dom4j and XPath XML, dom4j and XPath Eran Toch Methodologies in the Development of Information Systems December 2003

Transcript of XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development...

Page 1: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

Tutorial: Tutorial: Introduction to XML and Java:Introduction to XML and Java:

XML, dom4j and XPathXML, dom4j and XPath

Eran Toch

Methodologies in the Development of Information Systems

December 2003

Page 2: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

2

SourcesSources

• Major Sources:– http://www.cis.upenn.edu/~cis550/slides/xml.ppt

CIS550 Course Notes, U. Penn, source for many slides

– http://www.cs.technion.ac.il/~oshmu/236804 - Seminar in Computer Science 4: XML - Technology, Systems and Theory

– http://dom4j.org

Page 3: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

3

AgendaAgenda

• Short Introduction to XML– What is XML– Structure and Terminology– JAVA APIs for XML: an Overview

• dom4j– Parsing an XML document– Writing to an XML document

• Xpath– Xpath Queries– Xpath in dom4j

• References

Page 4: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

4

The Structure of XMLThe Structure of XML

• XML consists of tags and text

• Tags come in pairs <date> ...</date>

• They must be properly nested

<date> <day> ... </day> ... </date> --- good

<date> <day> ... </date>... </day> --- bad

Page 5: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

5

XML textXML text

• XML has only one “basic” type -- text. It is bounded by tags e.g.

<title> The Big Sleep </title> <year> 1935 </ year> --- 1935 is still

text

• XML text is called PCDATA (for parsed character data). It uses a 16-bit encoding, e.g. \&\#x0152 for the Hebrew letter Mem Later we shall see how new types are specified by XML-data

Page 6: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

6

XML structureXML structure

• Nesting tags can be used to express various structures. E.g. A tuple (record):

<person><name> Jeff Cohen</name><tel> 04-828-1345 </tel><tel> 054-470-778 </tel><email> [email protected] </email>

</person>

Page 7: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

7

XML structure (cont.)XML structure (cont.)

• We can represent a list by using the same

tag repeatedly:

<addresses> <person> ... </person> <person> ... </person> <person> ... </person> ...</addresses>

Page 8: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

8

XML structure (cont.)XML structure (cont.)

• Nested tags can be part of a list too:

<addresses><person>

<name> Yossi Orr</name><tel> 04-828-1345 </tel><email> [email protected] </email>

</person><person>

<name> Irma Levy</name><tel> 03-426-1142 </tel><email>[email protected]</email>

</person></addresses>

Page 9: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

9

TerminologyTerminology

• The segment of an XML document between an opening and a corresponding closing tag is called an element.

• Meta date about an element can appear in an attribute.

<person type=“Friend”> <name>Ortal Derech</name>

<tel>04-8732122</tel> <tel>054-646888</tel>

<email>[email protected]</email> </person>

element

element, a sub-elementof

attribute

text

Page 10: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

10

XML is tree-likeXML is tree-like

person

name emailtel tel

Malcolm Atchison

(215) 898 4321

(215) 898 4321

[email protected]

Page 11: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

11

A Complete XML DocumentA Complete XML Document

<?XMLversion ="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE addresses SYSTEM "http://www.technion.ac.il/~erant/addresses.dtd">

<addresses>

<person>

<name> Jeff Cohen</name>

<tel> 04-828-1345 </tel>

<tel> 054-470-778 </tel>

<email> [email protected] </email>

</person>

</addresses>

Tells whether or not this document references an external entity or an external data type specification

Page 12: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

12

XML Structure DefinitionsXML Structure Definitions

• DTD– Document Type Definition – defines structure

constraints for XML documents

• XML Schema– Same as DTD, more powerful because it includes

facilities to specify the data type of elements and it is based on XML.

• Namespaces– Namespaces are a way of preventing name clashes

among elements from more than one source within the same XML document.

Page 13: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

13

More StandardsMore Standards

• Xpath– XML Path Language, a language for locating parts of

an XML document.

• Xquery– A query language for XML documents (like SQL…).

• XSLT– XSL Transformations, a language for transforming

XML documents into other XML documents.

• RDF– Resource Description Framework. A formal

knowledge model from the World Wide Web.

Page 14: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

14

Why Is XML Important?Why Is XML Important?

• Because it exists, and everybody uses it.

• Plain Text - you can create and edit files with anything.

• Data Identification - XML tells you what kind of data you have, not how to display it.

• Separation from style.

• Hierarchical, and easily processed.

Page 15: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

15

An Overview of the APIsAn Overview of the APIs

• JAXP: Java API for XML Processing – It provides a common interface for creating and using

the standard SAX, DOM, and XSLT APIs.

• JAXB: Java Architecture for XML Binding – defines a mechanism for writing out Java objects as

XML.

• JDOM– Represents an XML file as a tree of objects

(sophisticated version of DOM)

• dom4j– Lightweight version of JDOM.

Page 16: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

16

AgendaAgenda

• Introduction to XML– What is XML– Structure and Terminology– JAVA APIs for XML: an Overview

• dom4j– Parsing an XML document– Writing to an XML document

• Xpath– Xpath Queries– Xpath in dom4j

• References

Page 17: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

17

dom4jdom4j

• An Open Source XML framework for Java.

• Allows you to read, write, navigate, create and modify XML documents.

• Integrates with DOM and SAX.

• Full XPath support.

• XSLT Support.

Page 18: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

18

Download and UseDownload and Use

• Go to: http://dom4j.org.

• Go to http://dom4j.org/download.html, and download the latest release (current = 1.4).

• Unzip.

• Don’t forget the classpath. When working in an IDE, don’t forget to add the log4j.jar library.

• Javadoc: http://dom4j.org/apidocs/index.html.

• Quick start guide: http://dom4j.org/guide.html.

Page 19: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

19

Opening an XML DocumentOpening an XML Document

import org.dom4j.*;public class Foo {

public Document parse(String id) throws DocumentException{

SAXReader reader = new SAXReader();Document document = reader.read(id);return document;

}}

We can read: file, URL, InputStream, String

Page 20: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

20

Example XML FileExample XML File

<?xml version="1.0" encoding="UTF-8" ?><salesdata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\Documents and Settings\eran\My Documents\Academic\Courses\XML\xpath_ass_schema.xsd">

<year><theyear>1997</theyear><region><name>central</name><sales unit="millions">34</sales></region><region><name>east</name><sales unit="millions">34</sales></region><region><name>west</name><sales unit="millions">32</sales></region>

</year><year>

<theyear>1998</theyear><region><name>east</name><sales unit="millions">35</sales></region>region><name>west</name><sales unit="millions">42</sales> </region> 

</year></salesdata>

Page 21: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

21

Accessing XML ElementsAccessing XML Elements

public void dump(Document document) throws DocumentException{

Element root = document.getRootElement();for (Iterator i = root.elementIterator(); i.hasNext(); ) {

Element element = (Element)i.next();System.out.println(element.getQualifiedName());System.out.println(element.getTextTrim());System.out.println(element.elementText("theyear"));

}}

Accessing root element

Retrieving child elements

Retrieving element text

Retrieving element name

Retrieving the text of the child element “theyear”

Page 22: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

22

Accessing XML Elements – cont’dAccessing XML Elements – cont’d

• What will be the output of dump()?

year

1997year

1998

Why?

Page 23: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

23

Accessing XML Elements RecursivelyAccessing XML Elements Recursively

public void go(Element element, int depth){for (int d=0; d<depth; d++){

System.out.print(" ");}System.out.print(element.getQualifiedName());System.out.println(" "+ element.getTextTrim());for (Iterator i = element.elementIterator(); i.hasNext(); ) {

Element son = (Element)i.next();go(son, depth+1);

}}

What will be the output?

Page 24: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

24

Accessing Recursively – cont’dAccessing Recursively – cont’dsalesdata year theyear 1997 region name central sales 34 region name east sales 34 region name west sales 32 year theyear 1998 region name east sales 35 region name west sales 42

The whole XML tree, element names + values

Page 25: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

25

Creating an XML documentCreating an XML document

public Document createDocument() {Document document = DocumentHelper.createDocument();Element root = document.addElement("phonebook");

Element address1 = root.addElement("address").addAttribute("name", "Yuval").addAttribute("category", "family").addText("Ehud 3, Jerusalem");

Element address2 = root.addElement("address").addAttribute("name", "Ortal").addAttribute("category", "friends").addText("Kibbutz Givaat Haim");return document;

}

Creating root element

Adding elementsWhat will we get when running go()?

Page 26: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

26

Creating an XML document – cont’dCreating an XML document – cont’d

phonebook address Ehud 3, Jerusalem address Kibbutz Givaat Haim

XML tree structure of the new document

FileWriter out = new FileWriter("C:\\addresses.xml");document.write(out);String XML = document.asXML()

Writing the XML document to a fileRetrieving the

XML itself as string

Page 27: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

27

Client ProgramClient Program

public static void main(String[] args) {Foo foo = new Foo();try{

Document doc = foo.parse("C:\\Documents and Settings\\eran\\My Documents\\Academic\\Courses\\XML\\sales.xml");foo.dump(doc);foo.go(doc.getRootElement(), 0);foo.xpath(doc);Document newDoc = foo.createDocument();foo.go(newDoc.getRootElement(), 0);FileWriter out = new FileWriter( "C:\\addresses.xml" );newDoc.write(out);

}catch (Exception E){

System.out.println(E);}

}

Opening the file

Dumping and printed recursively

Creating a new document

Page 28: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

28

AgendaAgenda

• Introduction to XML– What is XML– Structure and Terminology– JAVA APIs for XML: an Overview

• dom4j– Parsing an XML document– Writing to an XML document

• Xpath– Xpath Queries– Xpath in dom4j

• References

Page 29: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

29

Xpath - IntroductionXpath - Introduction

• XML Path Language. XPath is a language for addressing parts of an XML document.

• Enables node locating and retrieving, very much like directory accessing in file systems.

• Limited (but not bad) filtering and querying abilities.

• Retrieved the actual PCDATA or node sets

Page 30: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

30

Xpath – Simple Path SelectionXpath – Simple Path Selection

Xpath Expression: /salesdata/year/theyear

<theyear>1997</theyear>

<theyear>1998</theyear>

/salesdata/year[2]/theyear

<theyear>1998</theyear>

“/” signifies child-of

Filtering the level – getting only the second year element

Page 31: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

31

Xpath – ConditionsXpath – Conditions

/salesdata/year/region[sales > 34]

<region>

<name>east</name>

<sales unit="millions">35</sales>

</region>

<region>

<name>west</name>

<sales unit="millions">42</sales>

</region>

Going down to region, and filtering according to the sales element

/salesdata/year/region[sales > 34]/name

?

Page 32: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

32

Xpath – Traveling Up the TreeXpath – Traveling Up the Tree

/salesdata/year/region[sales > 34]/parent::year/theyear

<theyear>1998</theyear>

Going up the XML tree (and then down again)

Page 33: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

33

Xpath – Traveling Down FastXpath – Traveling Down Fast

/descendant::sales

<sales unit="millions">34</sales>

<sales unit="millions">34</sales>

<sales unit="millions">32</sales>

<sales unit="millions">35</sales>

<sales unit="millions">42</sales>

//sales

Going all the way down, until the sales element

Same same

Page 34: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

34

Xpath – Advanced QueriesXpath – Advanced Queries

• The years (text nodes) for which sales data exists:

//region[name=\"west\" and sales > 32]/sales[@unit='millions']/ancestor::year/theyear

<theyear>1998</theyear>

Logical operators

Accessing attributes

ancestor is same as parent but goes all the way up to year

Page 35: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

35

Xpath – Advanced Queries (cont’d)Xpath – Advanced Queries (cont’d)

• The years (text nodes) in which the west region sales were higher than the east region sales; sales may be expressed in thousands or in millions:

year[region[name="west"]/sales[@unit='millions'*1000 or @unit='thousands'] > region[name="east"]/sales[@unit='millions‘*1000 or @unit='thousands']]/theyear/text()

Page 36: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

36

Xpath in dom4jXpath in dom4j

• Xpath queries can be used in dom4j:

public void xpath(Document document) {XPath xpathSelector =

DocumentHelper.createXPath("/salesdata/year/theyear");List results = xpathSelector.selectNodes(document);for (Iterator iter = results.iterator(); iter.hasNext(); ) {

Element element = (Element) iter.next(); System.out.println(element.asXML());

}}

Xpath expression is fed to the xpathSelector

The nodes are selected from the document, according to the xpath query

Page 37: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

37

AgendaAgenda

• Introduction to XML– What is XML– Structure and Terminology– JAVA APIs for XML: an Overview

• dom4j– Parsing an XML document– Writing to an XML document

• Xpath– Xpath Queries– Xpath in dom4j

• References

Page 38: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

38

References - XMLReferences - XML

• XML tutorial:– http://www.w3schools.com/xml/default.asp

• XML Specification from w3c:– http://www.w3.org/XML/

• The Java/XML Tutorial:– http://java.sun.com/xml/tutorial_intro.html

• DTD Tutorial:– http://www.xmlfiles.com/dtd/

• XML Schema Tutorial:– http://www.w3schools.com/schema/default.asp

• XML Schema Resource Page:– http://www.w3.org/XML/Schema

Page 39: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

39

dom4jdom4j

• Web site:– http://dom4j.org/

• Javadocs:– http://dom4j.org/apidocs/index.html

• Quick Start:– http://dom4j.org/guide.html

• Cookbook (main functionality):– http://dom4j.org/cookbook.html

Page 40: XML and Java: XML, dom4j and Xpath – Eran Toch Methodologies in Information System Development Tutorial: Introduction to XML and Java: XML, dom4j and XPath.

XML and Java: XML, dom4j and Xpath – Eran TochMethodologies in Information System Development

40

XpathXpath

• Xpath specification:– http://www.w3.org/TR/xpath

• Xpath tutorial:– http://www.w3schools.com/xpath/default.asp

• Xpath tutorial (extended):– http://www.zvon.org/xxl/XPathTutorial/General/

examples.html

• Xpath reference:– http://www.vbxml.com/xsl/XPathRef.asp