XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael...

31
XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar Presented By: Tamar Aizikowitz Winter 2006/2007 14th World Wide Web Conference (WWW2005), Chiba, Japan

description

3 / 31 XML Schema XML based alternative to DTDs. Describes structure of XML document. Programmer defines valid structure of data by defining element types. Support for standard and user defined types.

Transcript of XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael...

Page 1: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

XJ: Facilitating XML Processing in Java

Matthew HarrenMukund Raghavachari

Oded ShmueliMichael Burke

Rajesh BordawekarIgor Pechtchanski

Vivek Sarkar

Presented By:Tamar Aizikowitz

Winter 2006/2007

14th World Wide Web Conference (WWW2005), Chiba, Japan

Page 2: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

2 / 31

XML Syntax:

<person><first>John</first><last>Lennon</last>

</person> Semantics:

Applications: The future web? XHTML? RSS? Problem: Supposedly human readable and

writable, but not really…

personfirst

last

John

Lennon

• Markup language• Tags define elements• Elements contain other elements• Elements contain data

Page 3: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

3 / 31

XML Schema XML based alternative to DTDs. Describes structure of XML document. Programmer defines valid structure of data by

defining element types. Support for standard and user defined types.

<xs:element name=“person” type=“personInfo”>

<xs:complexType name=“personInfo”><xs:sequence>

<xs:element name=“first” type=“xs:string”/>

<xs:element name=“last” type=“xs:string”/></xs:sequence>

</xs:complexType>

Page 4: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

4 / 31

XPath Query language for selecting a sequence of

nodes from an XML document.

Filtering of result nodes using predicates. Example://person[last=“Lennon”]/first

XMLTree

XPath QueryProcessor

XPath Query

XML Node Sequence

Page 5: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

5 / 31

XJ Introduction

Developed at the IBM Watson Research Center. More information: http://www.research.ibm.com/xj/.

Java 1.0

Java 1.1

Java 1.4

Java 1.5

XJ

xjc compilerxj runtime environment

Page 6: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

6 / 31

XJ Holy Grail:Smooth Java/XML integration XML Trees

Just like 3, “Hello” and other values. XML Schema

Just like Java classes. XPath Queries

Just like [], ?: and other Java operators. Smart Compiler

Optimization…. Improved efficiency.

Page 7: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

7 / 31

Example: Music Library

musicLibrary

albumalbum album

artisttitle stars artist

stringstring[1-5]string

Page 8: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

8 / 31

Music Library Schema<?xml version="1.0" encoding="UTF-8"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="musicLibrary"> <xs:complexType> <xs:sequence> <xs:element name="album" maxOccurs="unbounded">

</xs:element> </xs:sequence> </xs:complexType> </xs:element></xs:schema>

Page 9: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

9 / 31

Music Library Schema - Album<xs:complexType> <xs:sequence>

<xs:element name="title" type="xs:string"/><xs:element name="stars“/>

<xs:simpleType><xs:restriction base ="xs:integer"/>

<xs:pattern value =“[1-5]"/></xs:restriction>

</xs:simpleType></xs:element><xs:element name="artist" type="xs:string"

maxOccurs="unbounded"> </xs:sequence></xs:complexType>

Page 10: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

10 / 31

Music Library Data

<?xml version="1.0" encoding="UTF-8"?><musicLibrary>

<album><title>Abbey Road</title><stars>4</stars><artist>The Beatles</artist>

</album> <album>

<title>Sounds of Silence</title><stars>4</stars><artist>Paul Simon</artist>

</album>

Page 11: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

11 / 31

The XJ Type Hierarchy

java.lang.Object

com.ibm.xj.Sequence

com.ibm.xj.XMLObject

com.ibm.xj.XMLCursor

com.ibm.xj.XMLElement

com.ibm.xj.XMLAtomic

All Atomic Classes

All Element Classes

com.ibm.xj.io.XMLOutputStream

com.ibm.xj.io.XMLDocumentOutputStream

Page 12: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

12 / 31

The XMLObject Class and Subclasses

XMLObject corresponds to an XML node.

Schema import creates subclasses of XMLElement and XMLAtomic for every element declaration.

XPath expressions evaluated on instances of these classes.

com.ibm.xj.XMLObject

com.ibm.xj.XMLElement

com.ibm.xj.XMLAtomic

All Atomic Classes

All Element Classes

Page 13: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

13 / 31

XMLSequence and XMLCursor Instance of Sequence is

ordered list of XMLObject. XPath expression result is

instance of Sequence. XMLCursor implements java.utils.Iterator. Used to iterate over instances of Sequence.

Support limited genericity (as defined in Java 5.0) for type checking.

java.lang.Object

com.ibm.xj.Sequence

com.ibm.xj.XMLCursor

Page 14: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

14 / 31

Importing Schema Definitions The integration of XML Schema in XJ is built

on the following correspondence: XML Schema ~ Java Package XML Element ~ Logical Class Nested (local) Element ~ Nested Class Atomic types ~ Class + Auto Unboxing

Page 15: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

15 / 31

Schema ~ Package Element declarations are integrated into the

Java type system as “logical classes”. XML documents are well typed XML values

that are instances of these classes. Syntax:import musicLibrary.*;

Page 16: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

16 / 31

XML Element ~ Class Elements represented as subclasses of XMLObject.

May be used wherever a class type is expected. Constructed with the new() operator. Nested elements represented as nested classes. Syntax:

musicLibrary ml = new musicLibrary(...);musicLibrary.album a =

new musicLibrary.album(...);

Page 17: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

17 / 31

Atomic Types Support for XML Schema built-in atomic types such

as xsd:integer and xsd:string. Represented as subclasses of XMLAtomic. Syntax: xsd.integer Subtyping:xsd.short s = ...;xsd.integer i = s;

Automatic unboxing:xsd.string xstr = ...;string s = xstr;

Page 18: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

18 / 31

Creating XML Objects Mechanisms for constructing XML:

External source Literal XML embedded in an XJ program

XMLElement constructors: XMLElement(java.io.InputStream) XMLElement(java.io.File) XMLElement(java.net.URL) XMLElement(literal XML)

Page 19: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

19 / 31

Inline Construction of XML XML data construction using literal XML. Any well formed XML block can be used. Example:

title a = new title(<title>Greatest Hits</title>);

{ and } used to insert runtime values:title buildTitle(string t) {

title newT = new title(<title>{t}</title>);return newT;

}

Page 20: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

20 / 31

XML Type Validation

Example:album a = new album(<album>

<title>Let It Be</title><stars>4</stars><band>The Beatles</band>

</album>);

To construct untyped XML, use the literal XML constructor for XMLElement.

LiteralXML

XMLParser

SchemaValidator

XML?

Valid XML?

CompilationError

Typed XMLObject

YesNo

YesNo

Page 21: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

21 / 31

Executing XPath Queries Syntax: context [|query|] query = valid XPath 1.0 expression. context = XML element. Specifies context for

query evaluation. XPath expressions evaluate to Sequence<T>

Example:string band = “The Beatles”;musicLibrary m = new musicLibrary(...); Sequence<album> b = m[|/album[artist[1]=$band]|];

$ refers to variables

Page 22: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

22 / 31

XPath Static Semantics XPath expressions evaluate to Sequence<T>.

T is the most specific subtype of XMLObject that the compiler can determine.

Worst case: Sequence<XMLObject> is returned. If query result is always empty, a static error

is generated. Identified using Schema definition.

Example: title t = ...;Sequence<album> a = t[|/album|];

title has no album

children

Page 23: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

23 / 31

XPath Runtime Semantics Evaluated with respect to context specifier value. If the context specifier is a Sequence, each

member is used as a context node in turn. Value is union of results.

musicLibrary m = new musicLibrary(...);Sequence<album> albums = m[|/album|];Sequence<artist> artists = albums[|/artist|];

If the result is not a node set, a sequence of appropriate type is returned. For example: Sequence<xsd.boolean>.

Page 24: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

24 / 31

Updating XML Data Reference semantics

Although more difficult to implement… Result: in-place updates, as opposed to copy

based ones. Two types of updates are supported:

Value assignments including complex types Tree structure updates

Page 25: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

25 / 31

Value Assignments XPath expressions used as lvalues for assignment:

album a = new album(...);a[|/title|] = “New Title”;

Bulk assignments:musicLibrary m = new musicLibrary(...);m[|/album[artist[1]=“The Beatles”]/stars|] = 5;

Bulk assignment advantages: Possible optimizations efficient updates Clear concise code.

Page 26: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

26 / 31

Tree Structure Update Methods for structural changes:

insertAfter() insertBefore() insertAsFirst() insertAsLast()

Example:album currArtist = m[|/album[title=“Sounds of

Silence”]/artist[1]|];artist newArtist = new artist(<artist>Art

Garfunkel</artist>);currArtist.insertAfter(newArtist);

Page 27: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

27 / 31

Update Issues – Tree Structure Duplicate parents and acyclicity

After performing tree structure updates, resulting graph must remain a tree.

Example: attaching an element that already has a parent.

Problematic XJ update will result in a runtime exception.

Can be avoided by always detaching before attaching nodes.

Page 28: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

28 / 31

Update Issues – Complex Types Need to validate that new value is still well

typed after update. Problem: Cannot always be done statically. Example:

Schema states that element a can contain between 2 and 5 instances of element b.

What happens after attach() or detach()? Solution:

Runtime check inserted at compile time.

Page 29: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

29 / 31

Update Issues – Covariant Subtyping XML Schema allows declaration of subtypes by

restriction. Causes problems when updating subtype values

through base class interface. Example:

xsd.integer i; stars s = m[|//stars[1]|];i = s;i = 10;

Covariant subtyping already exists in Java arrays. The problem would arise in any language attempting

to support updates on XML Schema types.

illegal value for stars element

Page 30: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

30 / 31

Summary – XJ BenefitsXML objects as typed valuesXML Schema integrationStatic type checkingTyped XPathCompiler optimizations

Page 31: XJ: Facilitating XML Processing in Java Matthew Harren Mukund Raghavachari Oded Shmueli Michael Burke Rajesh Bordawekar Igor Pechtchanski Vivek Sarkar.

31 / 31

XJ - The Future? Full support for Schema types XPath expressions as independent values

Not tied to context specifier Operators on XPath values

Composition, conjunction, disjunction… Typed methods and fields

musicLibrary m = new musicLibrary(…);m.album[2].title = “New Title”;