XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner...

20
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute

Transcript of XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner...

Page 1: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor

MQP PresentationApril 15, 2003

Tammy WorthingtonAdvisor: Elke Rundensteiner

Computer Science DepartmentWorcester Polytechnic Institute

Page 2: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 2

XQuery

IntroductionIntroduction

MASS(A Multi-Axis Storage Structure

for Large XML Documents)

VAMANA(XPath Query Engine)

XQuery Engine(future development)

Execution Tree

Mass Interface Node Set

Node Set

XPath Expression

XMLDoc

XPath

NodeSet

Input XQuery

XMLDoc

XPath Expression

XPath Processor

Page 3: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 3

Processor RequirementsProcessor Requirements

• Must conform with XPATH grammar as specified by W3C

• Implementation in C++– Performance needed since XQuery can have several XPath expressions– MASS and VAMANA in C++

• Parser interface that is independent of query engine– Existing parsers coupled to query engine (Xalan, Pathan, LibXPath, Sablotron)– Facilitates efficient development and testing

• Extensibility – Allow for user transformations of parse tree– Allow for XPath grammar changes

• Error handling– Show useful error message– Parse tree should be destroyed automatically if parse error occurs

Page 4: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 4

Processor OverviewProcessor Overview• Generates independent parse tree

– Parse tree is a structure of in-memory C++ objects– Small footprint due to compact parse tree representation

• Built-in transformations– Predicate transformation needed to facilitate data flow in query engines

• User implemented callback interface– Completely separate from query execution– Generates execution tree from parse tree

• User derives from parser class and provides implementation• Implementation is a simple 1-to-1 transformation

XPATHExpression

XPATHExpression

ParseTree

ParseTree

TransformedParse Tree

TransformedParse Tree

ExecutionTree

ExecutionTree

TransformerTransformer

Generator(Callbacks)Generator(Callbacks)

Parser(Productions)

Parser(Productions)

Page 5: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 5

Parse Tree Node HierarchyParse Tree Node Hierarchy

Parse Node

Step Node

Axis Type Node

Name Test Node

Function Node

Function Arg Node

Filter Node

Negative Node

Predicate Node

Unary Predicate Node

Binary Predicate Node

Set Operator Node

Number Operator NodeLiteral Node

Page 6: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 6

Running ExampleRunning Example

• XPath syntax– AxisName::NodeTest[predicate]– /child::company/descendant-or-self::employee[position() =

1]• Abbreviated notation: /company//employee[1]

<company> <branch> <location>Boston</location> <employee> <name>JohnDoe</name> <id>471</id> <salary>54250</salary> </employee>

<employee> <name>JaneDoe</name>

</employee> </branch></company>

XML Document example

Page 7: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 7

Productions to Parse Tree/child::company/descendant-or-self::employee[position() = 1]

Productions to Parse Tree/child::company/descendant-or-self::employee[position() = 1]

Expr

LocPathStep

Step

Nodetest

Axis

Pred

LocPath

Function Literal

Literal 1

Binary PredicateEquals

FunctionPosition

Stepcompany

Stepemployee

Nodetest

Axis

Productions: Parse Tree:

Page 8: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 8

Abbreviated NotationAbbreviated Notation

• Abbreviated expression– /company//employee[1]

• Normalized expression– /child::company/descendant-or-self::employee[position() = 1]

Literal 1

Stepcompany

Stepemployee

Binary PredicateEquals

FunctionPosition

Parse Tree:Normalized Parse Tree:

Page 9: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 9

Transformed Parse TreeTransformed Parse Tree

• Data flow query engines require data to flow from the step node up through its predicate node which filters the data– Predicate nodes must be placed above their corresponding

step node

• When could this be done?– While parsing

• Too complicated because productions expect certain inputs– While generating execution tree

• Would make user interface too complex– In post-processing of parse tree

• Simple recursive transformation

Page 10: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 10

Transformed Parse Tree/child::company/descendant-or-self::employee[position() = 1]

Transformed Parse Tree/child::company/descendant-or-self::employee[position() = 1]

Stepcompany

Stepemployee

Literal 1

Binary PredicateEquals

FunctionPosition

Literal 1

Binary PredicateEquals

FunctionPosition

Parse Tree:

Transformed Parse Tree:

Page 11: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 11

User Implementation of CallbacksUser Implementation of Callbacks

• Interface defines a callback for each node type– Callbacks supply parse tree parameters

• Value – axis, nodetest, literal, etc• Role – relationship to parent

– User implements callbacks to generate execution tree

• An in-order traversal of the tree is made and appropriate callbacks are called

Page 12: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 12

Callbacks/child::company/descendant-or-self::employee[position() = 1]

Callbacks/child::company/descendant-or-self::employee[position() = 1]

Stepcompany

Stepemployee

Literal 1

Binary Predicateequals

Functionposition

•Start Predicate (equals, root) •Start Step (employee, context)

•Start Step (company, context) •End Step

•End Step•Start Function (position, operand)•End Function•Start Literal (1, operand)•End Literal

•End Predicate

Parse Tree: Corresponding Callbacks Invoked:

Page 13: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 13

Execution Tree GenerationExecution Tree Generation

• User derives from parser class and provides implementation of the callback interfaces– Implementation specific to execution tree stored in derived

class

• Stack required to maintain context in execution tree– Each Start callback creates a node, attaches it to its parent

(top of stack) and pushes it onto the stack– Each End callback pops a node off the stack

• Implemented parser interface for VAMANA query engine– 1-to-1 mapping makes execution tree generation simple

Page 14: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 14

Execution Tree Generation/child::company/descendant-or-self::employee[position() = 1]

Execution Tree Generation/child::company/descendant-or-self::employee[position() = 1]

•Start Predicate (equals, root) •Start Step (employee, context)

•Start Step (company, context) •End Step

•End Step•Start Function (position, operand)•End Function•Start Literal (1, operand)•End Literal

•End Predicate

Predicate (equals, root)

Step (employee, context)

Step (company, context)

Stack:

Mass Nodecompany

Mass Nodeemployee

V Literal 1

V Functionposition

V Binary Predicateequals

•Start Predicate (equals, root) •Start Step (employee, context)

•Start Step (company, context) •End Step

•End Step •Start Function (position, operand)•End Function•Start Literal (1, operand)•End Literal

•End Predicate

Function (position, expr)Literal (1, expr)

Callbacks: Execution Tree:

Page 15: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 15

EvaluationEvaluation

• Numerous XPATH expressions were tested including all of the examples in W3C’s XPath specifications

• Each tree was printed, enabling visual evaluation (example to follow)

• Error messages helpful in locating parse error– Example (using \ instead of /)

• child::company\descendant-or-self::employee[position() =1]– Output

• child::company\descendant-or-self::employee[position() =1] -------------------^

Parse Error!

Page 16: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 16

Printed Trees/child::company/descendant-or-self::employee[position() = 1]Printed Trees/child::company/descendant-or-self::employee[position() = 1]

--------------- |child | |company | |CONTEXT| /~ ------------- ------------------------- |descendant-or-self| |employee || ROOT | ------------------------- | --------------- | |position | | |FUNCTION| | |OPERAND| | /~ -------------- \_ ---------- | = | |BIPRED| | PRED | ----------- \_ --------------- | 1 |

|LITERAL | |OPERAND|

---------------

--------------- |position | |FUNCTION| |OPERAND | /~ -------------- -----------| = ||BIPRED|| ROOT || |\_ --------------| | | 1 || | |LITERAL || | |OPERAND|| | --------------- ----------- | -------------- | | child | | |company | | |CONTEXT| | /~ ------------- \_ ------------------------ |descendant-or-self| | employee | | CONTEXT | -------------------------

----------------- |position | |VFUNCTION| | OPERAND | /~ ---------------- -------------| = ||VBIPRED || ROOT || |\_ --------------| | | 1 || | |VLITERAL || | |OPERAND|| | --------------- -------------- | -------------- | | child | | |company | | |CONTEXT| | /~ ------------- \_ ------------------------ |descendant-or-self| | employee | | CONTEXT | -------------------------

Parse Tree: Transformed Parse Tree: Execution Tree:

Page 17: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 17

ConclusionConclusion

• Successful implementation XPath Processor completely independent of query engine

• Successful integration with VAMANA and MASS

• Successful MQP overall

Page 18: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 18

ThanksThanks

I would like to thank Elke Rundensteiner, Kurt Deschler and Venkatesh Raghavan. This XPATH Processor was a contribution to both the MASS and VAMANA projects and is the result of a collaborated effort.

Page 19: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

XPath Processor 19

ReferencesReferences

1. Tim Bray, Jean Paoli, C. M. Sperberg-Mcqueen and Eve Maler. Extensible Markup Language (XML), Version 1.0, Second Edition, W3C Recommendation, October 6, 2000. http://www.w3.org/TR/REC-xml

2. Jim Clark and Steve DeRose. XML Path Language (XPATH), Version 1.0, W3C Recommendation, November 16, 1999. http://www.w3.org/TR/xpath.html

3. Don Chamberlin, Peter Fankhauser, Massimo Marchiori and Jonathan Robie. XML Query Requirements, W3C Working Draft, February 15, 2001. http://www.w3.org/TR/xmlquery-req

4. Kurt W. Deschler and Elke Rundensteiner. MASS: A Multi-Axis Storage Structure for Large XML Documents, 2002, Technical Report in progress.

5. Venkatesh Raghavan. VAMANA – Efficient Xpath Query Engine Exploiting the MASS Index, October 23, 2002, Master’s Thesis Proposal.

Page 20: XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.

Questions?