May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution...

44
May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions, LLC Session: E05

Transcript of May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution...

Page 1: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

May 8, 2007 9:20 a.m. – 10:20 a.m.

Platform: DB2 for Linux, UNIX and Windows

DB2 9: XML Evolution and Revolution

Philip K. GunningGunning Technology Solutions, LLC

Session: E05

Page 2: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

2

Outline

• XML in DB2 LUW till DB2 9 time• Shredding• CLOBs

• XML only databases• TIMBER, Niagara, Natix

• Followed by bliss for several years…• XML Databases Fundamental differences with

Relational Databases

Page 3: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

3

Outline

• Then IBM shook-up the database world WITH DB2 9 HYBRID DATA SERVER

• Extensible Optimizer and DB2 9• Why Native XML data type?• pureXML™

Page 4: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

4

Outline

• Pure XML Implementation• Pure XML -- Key Enablers• SQL/XML• XPath/XDM• XQuery• Developer Workbench• XQuery Builder

• Explain Facility and Visual Explain

Page 5: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

5

Disclaimer

• DB2 9 is a registered trademark of IBM Corp.• pureXML is a registered trademark of IBM Corp.• DB2 9 Sample queries and programs are copyrights of

IBM Corp.

• DB2 for z/OS is a registered trademark of IBM Corp.

• Developer Workbench and Visual Explain are copyrights of IBM Corp.

Page 6: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

6

Shredding

• Early implementations of XML support in databases used shredding to shred XML to columns in relation tables• Mapping + Parsing = Overhead• Retrieval of whole document or parts • Entire document replaced if update required• Lack of flexibility

Page 7: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

7

CLOBs

• Stored entire XML document as text

• High cost of retrieval• Not buffered• Poor search performance and parsing• Lack of flexibility

Page 8: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

8

Key Factors in IBM Approach

• “XML and Relational data coexist and complement each other in enterprise solutions”

• “A successful XML repository requires much of the same infrastructure that already exists in a RDBMS system”

• “XML query languages have considerable conceptual and functional overlap with SQL”

DB2 goes hybrid: Integrating native XML and XQuery with relational data and SQLIBM Systems Journal, Vol 45 NO 2, 2006, Beyer, et al

Page 9: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

9

Revolutionary ApproachDB2 9 pureXML Framework

• DB2 Optimizer was extensible

• XML Native data type

• Enables XML data to be treated natively

• Native XML data types enables better performance (less overhead versus legacy methods) via optimization and XML indexes

• Industry schemas supported

Page 10: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

10

Fundamental Differences

• DB2 9 native XML data type takes advantage of years of relational database research• 20+ years of optimization advancements

• Extensive query rewrite plus new rewrites

• Uses underlying optimization and storage components

• Same or enhanced APIs

Page 11: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

11

PureXML Framework Implementation

• Key Enablers• Extensible Optimizer• XML and SQL Integration• XQuery, XDM, XPath, SQL/XML, • Development Tooling

• Developer Workbench• XQuery Builder• Explain Support, including Visual Explain

Page 12: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

12

SQL/XML Parser XQuery Parser

Semantics Checking

Optimizer Phase

Rewrite Phase

Code Generation

QueryPlan

QGMX

Hybrid SQL/XQuery Compiler

Page 13: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

13

DB2 Client Application

SQL/XML XQuery

Relational

InterfaceXSR/Catalogs

XML

Interface

DB2 Engine

DB2 STORAGE

XMLRelational

DB2 9 Hybrid Data Server Architecture

Page 14: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

14

Tight Integration

Page 15: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

15

XQuery Defined

• SQL is the query language for relational databases

• XQuery is the query language for XML as defined by the W3C organization

• Built-in support provided in DB2 9 by query compiler and built-in XQuery functions

Page 16: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

16

INPUT FUNCTIONS

Page 17: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

17

DB2 9 XML Input

• SQL INSERT Statement

• Input to the XML column must be a well-formed XML document• Defined in XML specification

• Clients send XML documents in textual representation and DB2 uses a Simple API for XML (SAX) parser• “formness” • Validation

• If XML data type, serialization performed by DB2 implicitly

• XMLPARSE function for non-XML data type

Page 18: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

18

DB2 9 Annotated XML Schema Decomposition

• Data from XML documents decomposed into relational and XML columns using the annotated XML Schema decomposition• Stores data into columns according to

annotations contained in XML schema documents

• XML Schema Registry (XSR) Registration

• Schemas registered with DB2 supplied Stored Procedure or via Command Line Processor

Page 19: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

19

DB2 9 XML Input -- IMPORT

• Import utility enhanced to support import of XML documents

• Validation optional

• Schema must be registered in DB2 XML Schema Repository (XSR) if validation performed

Page 20: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

20

OUTPUT FUNCTIONS

Page 21: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

21

DB2 9 XML Output Functions

• db2-fn:xmlcolumn function• Takes a string literal as input that identifies an

XML column and returns an XML sequence that consists of all document nodes in specified columns

Page 22: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

22

DB2 9 XML Output Functions

• db2-fn:sqlquery function• Used to restrict input to an XQuery by

conditions placed on relational columns in the same or related tables

• Returns a single column• Based on SQL Fullselect

Page 23: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

23

DB2 9 XML Output -- EXPORT

• EXPORT utility supports XML data type

• XML data stored separately from exported relational data

• Details about exported XML represented in main exported file by an XML data specifier (XDS)

Page 24: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

24

XQuery Data Model (XDM)

• XQuery Data Model (XDM) is used to define an instance of an XDM sequence

• An instance of the XDM is a sequence• Sequence is an ordered collection of zero or

more items• An item is either an atomic value or a node

• Sequence – 48, <car/>, (6,7,8), (48,<car/>,(6,7,8))• () (an empty sequence), an XML document, 48

Page 25: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

25

DATABASE DESIGN

Page 26: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

26

Relational – XML

• Relational is highly structured

• Represented by well defined entities and relationships

• XML is hierarchical in form, unstructured and can be very complex• Represented in a tree format defined by XPath

W3C standard

Page 27: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

27

Relational vs. XML Database Design

• Relational• Frequency of updates• Design is fixed• Max performance req• Stays relational• Meaning outside hierarchy• Specific attributes• Large Fact and dimension

tables• RI Required

• XML• Design Changes• Flexibility desired• Not use relationally

downstream• Only hierarchical• Many attributes and

only subset applicable• Only subset applicable• Small dimensions in

STAR schema

Page 28: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

28

XML Indexes

• Value Indexes• Path-specific value indexes on XML columns• Elements and attributes used in predicates and cross-

document joins• Full-text indexes

• Indexes can be defined on any native XML column• Documents can be fully or partially indexed• Enables just certain parts of documents to be subject to full-

text search• Text index maintained asynchronously via “lazy” update

• Regions Indexes• Connects documents that span multiple pages • Created automatically by DB2

Page 29: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

29

XML Storage

• Relational data stored in tables and columns

• XML data stored in hierarchical type-annotated tree format

• XML document stored separately outside of table

• XML Data Specifier (XDS) stored in table describes XML document

Page 30: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

30

XML Storage

• Documents must be able to span disk pages• Single text node may be larger than a page

• Direct Node Access• Not feasible to traverse every node (could be

several gigabyte document)

• Must support existing isolation levels, logging and recovery mechanisms

Page 31: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

31

XML Storage

• DB2 uses a structured, type-annotated tree

• Stored in binary representation to avoid repeated parsing and validating of the document

• Digital signatures preserved

• Each node contains its type information

• Type information on the document level enables schema evolution• Each document in a column can conform to a different

schema or different versions of evolving schema

Page 32: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

32

XML Storage

• Each node contains pointers to parent and children• Supports efficient navigational queries

• Path expressions are evaluated directly for the native format on buffered pages without copying or transforming the data

• Extra information stored with each node• Type annotation if validated• Each element node has set of child slots for

associate attribute and ordered children

Page 33: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

33

XML Storage

• Child slots have hints within them • Give indication of what the child represents• Enables fast navigation across a context node’s set of

children without actually visiting each child node• Child page may be on a different page and require I/O

• A unique identifier gives each node a logical and physical addressability• Can be used in indexing and query evaluation

• Large document trees may not fit on one page• Can be split into regions via region index

Page 34: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

34

BUILDING APPLICATIONS

Page 35: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

35

Key DB2 9 XML Enablers

• Build with Developer Workbench

• Test with Developer Workbench

• Deploy and Maintain with Developer Workbench

• Replaces former Development Center• Migration support for existing documents

• Eclipse Framework based tool

Page 36: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

36

Key DB2 9 XML Enablers• Developer Workbench

• Separate download at http://www-306.ibm.com/software/data/db2/ad/

Page 37: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

37

XML Sample Schema Definition

Page 38: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

38

XML-XQuery SP

Page 39: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

39

Visual Explain Support

Page 40: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

40

Page 41: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

41

XML Schema Definition

Page 42: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

42

XPath Example

Page 43: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

43

Summary

• pureXML™ Framework

• SQL/XML

• XQuery/XPath

• XDM and XSR

• XML Storage and XML Indexes

• Developer Workbench• Build, Test, Deploy and Maintain!

• Additional Features coming in DB2 9 for z/OS

Page 44: May 8, 2007 9:20 a.m. – 10:20 a.m. Platform: DB2 for Linux, UNIX and Windows DB2 9: XML Evolution and Revolution Philip K. Gunning Gunning Technology Solutions,

44

Thanks!Philip K. Gunning

Gunning Technology Solutions, LLC

[email protected]

Session: E5DB2 9: XML Evolution and Revolution