Native XML Databases

80
Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com Native XML Databases Ronald Bourret [email protected] http://www.rpbourret.com

description

Native XML Databases. Ronald Bourret [email protected] http://www.rpbourret.com. Overview. What is a native XML database? Use cases Code examples Implementation and performance Normalization and referential integrity Common database features. What is a Native XML Database?. - PowerPoint PPT Presentation

Transcript of Native XML Databases

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Native XML Databases

Ronald [email protected]://www.rpbourret.com

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Overview

• What is a native XML database?

• Use cases

• Code examples

• Implementation and performance

• Normalization and referential integrity

• Common database features

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

What is aNative XML Database?

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Storing XML in a database

1. Build a model» Model the data in the XML document, or...» Model the XML document itself

2. Map the model to the database

3. Transfer data according to the model

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Sample XML document

<Order> <Number>1234</Number> <Customer>Gallagher Co.</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Modeling data

• Objects in model are specific to XML schema

object Order { number=1234; customer="Gallagher Corp."; date=29.10.00; items={ptrs to Items};}

object Item { number=1; part="A-10"; quantity=12; price=10.95;}

object Item { number=2; part="B-43"; quantity=600; price=3.99;}

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Storing data

• Database schema specific to XML schema

OrdersNumber Customer Date1234 Gallagher Co. 291000... ... ...... ... ...

ItemsSONum Item Part Qty Price1234 1 A-10 12 10.951234 2 B-43 600 3.99... ... ... ... ...

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Modeling documents

• Objects in model independent of XML schema

Element (Order)

Element Element Element Element Element(Number) (Customer) (Date) (Item) (Item)

Text Text Text ... Element Element Element Attr 1234 Gallagher Co. 29.10.00 (Part) (Quantity) (Price) (Number)

Text Text Text Text

A-10 12 10.95 1

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Storing documents

• Database schema independent of XML schema(order columns not shown)

Elements Attributes TextID Name Parent ID Name Parent Parent Value 1 Order -- 13 Number 5 2 12342 Number 1 14 Number 6 3 Gallagher Co.3 Customer 1 4 29.10.004 Date 1 7 A-105 Item 1 8 126 Item 1 9 10.957 Part 5 10 B-438 Quantity 5 11 6009 Price 5 12 3.9910 Part 6 13 111 Quantity 6 14 212 Price 6

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Types of databases (take 1)

• Categorize databases based on what they model

• Model data => XML-enabled database

• Model document => Native XML database

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Types of databases (take 2)

• XML adds a new data model to the world» In addition to relational, hierarchical, OO, ...

• The “XML” data model is» A tree of ordered nodes» Nodes have different types (element, attribute, etc.)» Some nodes are labeled (Date, Quantity, Price, etc.)» Data stored in leaf nodes

• Modeling language is an XML schema language» DTD, XML Schemas, RELAX NG, etc.

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Types of databases (take 2)

• XML-enabled databases» Have their own data model (relational, OO, etc.)» Map their data model to XML data model

• Native XML databases» Use XML data model directly

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Formal definition¹

• A native XML database:» Defines an XML data model» Uses a document as its fundamental unit of (logical)

storage» Can have any physical storage

¹ Definition from XML:DB Initiative

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML data model

• XML 1.0 did not define a model

• Native XML databases define their own model» Model must include elements, attributes, text, and

document order» Examples are XPath, Info Set, DOM, and SAX» XQuery model will be de facto standard in future?

• Data transferred according to the model

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Fundamental unit of storage

• Document is fundamental unit of logical storage» Equivalent structure in RDBMS is a row

• Document usually contains single set of data

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Physical storage

• Can use any physical storage

• Examples» Tables for SAX “objects” in relational database» DOM objects in object-oriented database» Binary file format optimized for XPath data model» Hashtables of XPaths and values» Compressed, indexed XML documents in file system

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use Cases

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case: Tabular data

• Manufacturing data

• Accounting data

• Employee data

• Scientific data

• ...

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB only if:

• You want to lose your job.

• Relational databases excel at storing tabular data

• Systems are mature, fast

• Few reasons to change existing code or DBMS

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case: Documents

• Product documentation

• Static Web pages

• Presentations

• Advertisements

• ....

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB because:

• Structure too irregular for relational model

• Physical info (entities, CDATA, ...) is important

• Need document-centric queries» Get first chapter with a paragraph containing “XML”» Get headings of all chapters that contain figures

• Want to retain document identity

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case: Semi-structured data• Some structure, but not regular

» Structured data: white pages» Semi-structured data: yellow pages

• Examples» Health data» Result of EAI (Enterprise Application Integration)» Geneological data» ...

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB because:• Difficult to store in a relational database

» Use single, sparsely populated table or...» Many tables (one per subject area)

• Documents might contain arbitrary extensions

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case:“Natural” format is XML

• Only format is XML» XSLT stylesheets

• Data stored temporarily as XML» Long-running transactions» Enterprise application integration (EAI)» Documents in a message queue

• Schema-less documents» No schema or schema not known

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB because:

• Want query language, security, transactions, etc.

• No reason to use an XML-enabled database» Natural format is XML» Mapping arbitrary documents at run-time is

inefficient, error-prone

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case: Large documents

• Need entire document in memory» DOM tree» XSLT transformation» etc.

• Document too large for memory

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB because:

• In-memory tree lazily instantiated» Unused nodes kept on disk

• Tree size limited only by:» Disk size» Database address space

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case: Archiving

• Long-term document storage

• Often required by law» Pharmaceutical industry» Contracts» Financial transactions

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB because:

• Want query language, security, transactions, etc.» Better than file system or BLOBs

• Retains complete document» Retaining original document might be a problem

• May support versioning through updates or diffs

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use case: Performance

• We all want better performance, right?

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Use a native XML DB because:

• May be faster than XML-enabled databases

• Speed depends on:» Actual query» Query and storage engines» Output format (DOM, SAX, string)

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Code Examples

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML:DB API

• Database-neutral API from XML:DB Initiative

• Similar to JDBC, OLE DB, etc.» Implemented by database drivers» Methods to connect to database, execute queries, etc.» Uses separate query language(s)

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

JDBC example: INSERT

// Instantiate the Sun JDBC-ODBC bridge driver. The driver registers// itself with the driver manager.

Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

// Get a connection to the Orders database and turn auto-commit off.

Connection conn = DriverManager.getConnection("jdbc:odbc:Orders", "rpbourret", "my_password");conn.setAutoCommit(false);

// Get a Statement.

Statement s = conn.createStatement();

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

JDBC example: INSERT (cont.)

// Execute INSERT statements to insert new rows into the Orders// and Items tables.

s.executeUpdate("INSERT INTO Orders " + "VALUES ('1234', 'Gallagher Co.', #10/29/00#)");s.executeUpdate("INSERT INTO Items " + "VALUES ('1234', 'A-10', 12)");s.executeUpdate("INSERT INTO Items " + "VALUES ('1234', 'B-43', 600)");

// Commit the transaction. Close the statement and the connection.

conn.commit();s.close();conn.close();

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML:DB example: INSERT// Get the database (driver) for dbXML/Xindice. Register it// explicitly with the database manager.

Class c = Class.forName("org.dbxml.client.xmldb.DatabaseImpl");Database db = (Database) c.newInstance();DatabaseManager.registerDatabase(db);

// Get the Orders collection.

Collection coll = DatabaseManager.getCollection("xmldb:dbxml:///db/Orders", "rpbourret", "my_password");

// Get a transaction service. XML:DB is designed to use "services",// such as transaction, collection management, and query services.

TransactionService txn = (TransactionService)coll.getService("TransactionService", "1.0");

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML:DB example: INSERT (cont.)

// Start a new transaction.

txn.begin();

// Create a new (empty) XMLResource in the Orders collection.

XMLResource res = (XMLResource)coll.createResource("Order1234", "XMLResource");

// Read the contents of the Order1234.xml file into a string.

int b;FileReader r = new FileReader("Order1234.xml");StringWriter w = new StringWriter();

while ((b = r.read()) != -1){ w.write(b);}

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML:DB example: INSERT (cont.)

// Set the contents of the new resource to the string.

res.setContent(w.toString());

// Commit the transaction.

txn.commit();

// Close the collection and deregister the database.

coll.close();DatabaseManager.deregisterDatabase(db);

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

JDBC example: SELECT

// Instantiate the Sun JDBC-ODBC bridge driver. The driver registers// itself with the driver manager.

Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

// Get a connection to the Orders database.

Connection conn = DriverManager.getConnection("jdbc:odbc:Orders", "rpbourret", "my_password");

// Execute a SELECT statement to get all orders from 10/29/00.

Statement s = conn.createStatement();ResultSet rs = s.executeQuery("SELECT Number, Date, Customer " + "FROM Orders WHERE Date=#10/29/00#");

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

JDBC example: SELECT (cont.)

// Get each row in the result set and print it.

while (rs.next()){ System.out.println("Order: " + rs.getString(1)); System.out.println("Customer: " + rs.getString(2)); System.out.println("Date: " + rs.getDate(3)); System.out.println();}

// Close the result set, the statement, and the connection.

rs.close();s.close();conn.close();

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML:DB example: SELECT// Get the database (driver) for dbXML/Xindice. Register it// explicitly with the database manager.

Class c = Class.forName("org.dbxml.client.xmldb.DatabaseImpl");Database db = (Database) c.newInstance();DatabaseManager.registerDatabase(db);

// Get the Orders collection.

Collection coll = DatabaseManager.getCollection("xmldb:dbxml:///db/Orders", "rpbourret", "my_password");

// Get an XPath query service over the Orders collection. XML:DB// is designed to use a variety of "services", such as different// query engines.

XPathQueryService service = (XPathQueryService)coll.getService("XPathQueryService", "1.0");

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

XML:DB example: SELECT (cont.)

// Execute an XPath query to get all orders from 29.10.00.

ResourceSet docs = service.query("/Order[Date=\"29.10.00\"]");

// Iterate over the returned documents and print each one. We do// not need to know any metadata (such as data types) here. This// is because metadata is contained in the XML document itself.

ResourceIterator iterator = docs.getIterator();while (iterator.hasMoreResources()){ XMLResource doc = (XMLResource)iterator.nextResource(); System.out.println((String)doc.getContent()); System.out.println();}

// Close the collection and deregister the database.

coll.close();DatabaseManager.deregisterDatabase(db);

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Implementationand Performance

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Text-based storage

• Stores documents as text

• Uses file system, CLOB, etc.» Includes XML-aware text in RDBMS

• May need to parse documents at run time

• Uses indexes to avoid extra parsing, increase search speed

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Text-based storage

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Text-based databases

• Indexed files» TextML

• CLOBs» Oracle 9i release 2, DB2

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Model-based storage

• Stores documents in “object” form

• Documents parsed when inserted

• For example, store DOM objects in OODBMS

• Underlying storage can be relational, object-oriented, hierarchical, proprietary

• Uses indexes to speed searches

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Model-based storage

<Address> <Street>123 Main St.</Street> <City>Chicago</City> <State>IL</State> <PostCode>60609</PostCode> <Country>USA</Country></Address>

Element

Element Element Element Element Element

Text Text Text Text Text

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Model-based databases

• Proprietary» Tamino, Xindice, Neocore, Ipedo, XStream DB,

XYZFind, Infonyte, Virtuoso, Coherity, Luci, TeraText, Sekaiju, Cerisent, DOM-Safe, XDBM, ...

• Relational» Xfinity, eXist, Sybase, DBDOM

• Object-oriented» eXcelon, X-Hive, Ozone/Prowler, 4Suite, Birdstep

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Performance

• No public benchmark data?

• Native vendors say native is faster

• Relational vendors say relational is faster

• Native XML databases rely heavily on indexing» Slows update times

• Some guesses are possible

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Whole documents and fragments(Text-based databases)

• Should be very fast» Data is contiguous on disk» Retrieval requires index lookup and single disk read

1. Index lookup2. Position disk head3. Read to here

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Whole documents and fragments (Model-based databases)

• Databases with proprietary stores should be fast» Can use physical pointers between nodes

• Databases built on other DBs may be fast or slow» Depends on underlying database and implementation

Node Node Node Node NodeNode

1. Index lookup2. Position disk head3. Follow pointers to end

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Unindexed data

• Slow for model-based databases» Must read many elements, not just particular type» Comparisons may be slower due to converting text

• Very slow for text-based databases» Must parse document as well as comparing values

Element

Element Element Element Element

Text Text Text Attr Element ...

Task:Find date 29.10.00

Relational database:1. Search this column

Model-based native XML database:1. Search all elements for Date elements2. Search text for all Date elements

Orders... ... ... 1234 29.10.00 Gallagher Industries ... ... ... ... ... ...

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Unindexed data: Optimization

• /Order[Date="29.10.00"]» Only need to search children of Order» Schema may help locate Date element

• //Order[Date="29.10.00"]» Schema may help locate Order and Date elements» Without schema, must search entire document

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Query return types

• Text-based databases» Very fast returning strings» Slow returning DOM or SAX due to parsing

• Model-based databases» Probably similar to XML-enabled databases

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Queries not following storage hierarchy

• The $64,000 question ...

• Slower than hierarchical queries

• Pointers to parents may help model-based DBs

<Order> <Number>1234</Number> <Customer>Gallagher Co.</Customer> <Date>29.10.00</Date> <Item Number="1"> <Part>A-10</Part> <Quantity>12</Quantity> <Price>10.95</Price> </Item> <Item Number="2"> <Part>B-43</Part> <Quantity>600</Quantity> <Price>3.99</Price> </Item></Order>

Get the dates and numbers ofall sales orders for part “A-10”

1. Index lookup for part “A-10”2. Follow pointers or index to Order3. Search children for Number, Date

<Part Number="A-10" <Order> <Number>1234</Number> <Date>29.10.00</Date> </Order> ...</Part>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Normalization andReferential Integrity

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Normalization

• Defined with respect to the relational model

• Performed on schema

• Determines minimal logical structures

• Eliminates duplicate data

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Example: Normalization

ORDERS:OrderNum, Date, CustNum, Address,ItemNum, Qty, PartNum, Price

ORDERS:OrderNum, Date, CustNum

CUSTOMERS:CustNum, Address

ITEMS:OrderNum, ItemNum, Qty, PartNum

PARTS:PartNum, Price

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

First normal form

Relational schema• One value per column

• Define primary key» Means one set of data per row

XML schema• Does not apply; multiple

children allowed

• Root element not a “wrapper”» Means one set of data per doc

» Reduces concurrency problems

» Reduces parse time in text-based databases

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

First normal form (cont.)

Relational schema XML schema

ORDERS:OrderNum, Date, CustNum, Address,ItemNum, Qty, PartNum, Price

ORDERS:OrderNum, Date, CustNum, Address,ItemNum, Qty, PartNum, Price

<!ELEMENT Orders (Order+)><!ELEMENT Order (Number, Date, CustNum, Address, (ItemNum, Qty, PartNum, Price)+)>

<!ELEMENT Order (Number, Date, CustNum, Address, (ItemNum, Qty, PartNum, Price)+)>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Second normal form

Relational schema• Create separate table for

one-to-many fields» Eliminates duplicate "parent"

data

XML schema• Create separate elements for

repeated groups

• Separate document not needed» Hierarchical structure allows

multiple children per parent

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Second normal form (cont.)

Relational schema XML schemaORDERS:OrderNum, Date, CustNum, Address,ItemNum, Qty, PartNum, Price

<!ELEMENT Order (Number, Date, CustNum, Address, (ItemNum, Qty, PartNum, Price)+)>

ORDERS:OrderNum, Date, CustNum, AddressITEMS:OrderNum, ItemNum, Qty, PartNum, Price

<!ELEMENT Order (Number, Date, CustNum, Address, Item+)><!ELEMENT Item (ItemNum, Qty, PartNum, Price)>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Third normal form

Relational schema• Create separate table for

many-to-one fields» Eliminates duplicate "child"

data

XML schema• Primary information store:

» Move data to separate document

» Too many docs => use RDBMS

• Secondary information store:» Create separate elements

» Duplicate data often not an issue

» Historic data is read-only

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Third normal form (cont.)

Relational schema XML schema (1)ORDERS:OrderNum, Date, CustNum, AddressITEMS:OrderNum, ItemNum, Qty, PartNum, Price

<!ELEMENT Order (Number, Date, CustNum, Address, Item+)><!ELEMENT Item (ItemNum, Qty, PartNum, Price)>

ORDERS:OrderNum, Date, CustNumITEMS:OrderNum, ItemNum, Qty, PartNum CUSTOMERS:CustNum, AddressPARTS:PartNum, Price

<!ELEMENT Order (Number, Date, Customer, Item+)><!ELEMENT Item (ItemNum, Qty, Part)><!ELEMENT Customer EMPTY><!ATTLIST Customer xlink:href CDATA #REQUIRED><!ELEMENT Part EMPTY><!ATTLIST Part xlink:href CDATA #REQUIRED>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Third normal form (cont.)

XML schema (2)<!ELEMENT Order (Number, Date, CustNum, Address, Item+)><!ELEMENT Item (ItemNum, Qty, PartNum, Price)>

<!ELEMENT Order (Number, Date, Customer, Item+)><!ELEMENT Item (ItemNum, Qty, Part)><!ELEMENT Customer (CustNum, Address)><!ELEMENT Part (PartNum, Price)>

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Referential integrity• Means logical pointers point to valid data

Case Referential integrityID/IDREF Enforced during validation

Key/keyref Enforced during validation

XLinks, proprietary links [1] Not enforced (problem!) [2]

External entity refs Not enforced (problem!) [2]

[1] Not widely supported[2] Not enforceable for entities outside database

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Common Database Features

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Round-tripping

• All databases can round-trip documents

• Round-trip level depends on database

• Text-based DBs should do exact round-tripping

• Model-based DBs round-trip at level of model» Minimum level is elements, attributes, PCDATA, and

document order» May be less than canonical XML (comments and

processing instructions may be discarded)

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Document collections

• Contain related documents

• Similar to directory in file system ...» Documents can have any schema» Some databases allow nested collections

• ... or similar to table in relational database» Documents restricted to a single schema» Allows indexes and queries to be optimized

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Indexes

• All databases use indexes» Structural indexes index element and attribute names» Value indexes index text and attribute values» Full text indexes index tokens in values

• Some databases index everything

• Other databases allow user to specify index fields

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Query languages

• Many databases have proprietary languages

• XPath is most common standard language» Includes extensions for multi-document queries

• XQuery will be standard in the future

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Updates

• Many databases only replace existing document

• Common update strategies:» Updates through live DOM» Update language. Use XPath to identify node, then:

• Insert data before/after node

• Modify node

• Delete node

» Extensions to XQuery

• Validity of updates often not enforced

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

APIs

• Similar to ODBC» Query language is separate from API» Connect, execute query, get results, commit/rollback» Results returned as string, DOM tree, or SAX events

• Often available over HTTP

• Most databases have proprietary APIs» XML:DB is database-neutral API» Oracle, IBM have submitted API through Sun JCP

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Transactions, locking,and concurrency

• Most databases support transactions

• Locking usually at document (not fragment) level

• Whether this is an issue depends on» What is stored in a single document» Number of concurrent users

• Fragment locking more common in the future?

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Links

• Few databases support XLink or proprietary links

• Some links are live (i.e. allow updates)

• Referential integrity not supported

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

External data

• Some databases can merge external data» Relational data (ODBC, JDBC, OLE DB)» Application data (SAP, PeopleSoft, Excel, etc.)» ...

• Whether data is live depends on database

• In the future, many databases will probably support live external data

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

External entity storage

• Not clear whether to store entity or URI» Storing entity incorrect if URI points to live data» Storing URI may be incorrect if entity is a snapshot

• Correct answer is probably to let user decide

• Not sure what is currently stored» All databases store URI of external entities?» Some databases also store BLOBs (unparsed entities)

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Resources

• Ronald Bourret’s Papers Page» http://www.rpbourret.com/xml/index.htm

• XML / Database Links» http://www.rpbourret.com/xml/XMLDBLinks.htm

• XML:DB Initiative» http://www.xmldb.org/

• The XML Cover Pages: XML and Databases» http://xml.coverpages.org/xmlAndDatabases.html

Copyright 2002-2003, Ronald Bourret, http://www.rpbourret.com

Questions?

Ronald [email protected]://www.rpbourret.com