© 2006 AG DBIS DASMOD 2006 DASMOD Project A3XDB: XML Databases Christian Mathis...

21
© 2006 AG DBIS DASMOD 2006 DASMOD Project A3XDB: XML Databases Christian Mathis [email protected] Databases and Information Systems Group 1st DASMOD Summer School 1st DASMOD Summer School July 31st – August 13th July 31st – August 13th University of Kaiserslautern University of Kaiserslautern

Transcript of © 2006 AG DBIS DASMOD 2006 DASMOD Project A3XDB: XML Databases Christian Mathis...

© 2006 AG DBIS

DASMOD2006

DASMOD Project A3XDB: XML DatabasesDASMOD Project A3XDB: XML Databases

Christian [email protected]

Databases and Information Systems Group

1st DASMOD Summer School1st DASMOD Summer SchoolJuly 31st – August 13thJuly 31st – August 13thUniversity of KaiserslauternUniversity of Kaiserslautern

© 2006 AG DBIS 2

DASMOD2006 A3XDB Project MembersA3XDB Project Members

Joint project of the Information Systems Group with the Software Technology Group

A3XDB is part of XTC (XML Transaction Coordinator)

Chairs• Theo Härder (Information Systems)• Arnd Poetzsch-Heffter (Software Technology)

Scientific Staff• Michael Haustein (Locking and Recovery; Project

Founder)• Christian Mathis (Query Processing)• Jose de Aguiar Moraes Filho (Cost Model)• Karsten Schmidt (Adaptivity)• Patrick Michel (Adaptivity)

© 2006 AG DBIS 3

DASMOD2006 OutlineOutline

Why XML Database Systems? And how do they look like?

Let's (sky-)dive into XTC• L5: 33,000 ft. (XML Management)

- XML, XQuery, DOM, SAX

• L4: 15,000 ft. (Node Management)- XML Tree

• L3: 10,000 ft. (Record Management)- Mapping onto Records, Pages

• L2: 5,000 ft. (Buffer Management)- DB Buffer

• L1: 0 ft. (I/O Management)- Containers, Blocks

Adaptivity Aspects

© 2006 AG DBIS 4

DASMOD2006 Why XML Database Systems (XDBMS)?Why XML Database Systems (XDBMS)?

Q: When do I need an XML Database System? A: When you have a lot of XML data.

• … and if you also need some of these nice DBMS features- ACID transactions- high-level data handling (declarative query processing)- efficient and parallel processing of large data volumes- high availability and fault tolerance- scalabilty w.r.t transaction workload and data volumes- adaptive tuning

Examples:• Document centric view: document collections

- books, articles, web pages, …→ application: structure-sensitive information retrieval

• Data centric view: semistructured data model - messages, configuration files, semistructured data per se→ application: helthcare information management

© 2006 AG DBIS 5

DASMOD2006

Native XMLStore

XQuery DBMS

XQuery XML

Tables

SQL DBMS

SQL Tuples

How do XDBMS look like?How do XDBMS look like?

XQuery Rewriter

XQuery XML

SQL Rewriter

SQL Tuples

• XOR: "XML over Relational"• "Shredding" XML -> Tables

• ROX: "Relational over XML"• "Native" XML storage• SQL Systems become legacy

© 2006 AG DBIS 6

DASMOD2006 XML Transaction Coordinator (XTC)XML Transaction Coordinator (XTC)XML Transaction Coordinator (XTC)XML Transaction Coordinator (XTC)

OS File SystemTransaction LogContainer FilesContainer LogsTemporary Files

XTC

serv

er

Transaction Services

File Services

Propagation Control

Access Services

Node Services

XML Services

Interface Services

XTCdriver

Http Agent Ftp Agent DOM RMI SAX RMI API RMI

XML Manager XSLT ProcessorXQuery Processor

Node Manager

Record Mgr Index Mgr Catalog Mgr

Buffer Manager

I/O Manager Temp File Mgr

Transaction Manager

Lock Manager

Deadlock Detector

DOM

SAXXTCconnection

Browser FTP Client

L1

L2

L3

L4

L5

© 2006 AG DBIS 7

DASMOD2006 L5 (33,000 ft.): Example XML DocumentL5 (33,000 ft.): Example XML Document

• <bib><book year=“1994“ id=“1“>

<title>TCP/IP Illustrated</title><author>

<first>W.</first><last>Stevens</last>

</author><price>65.95</price>

</book><book year=“2000“ id=“2“>

<title>Data on the Web</title><author>

<last>Abiteboul</last><first>Serge</first>

</author><author>

<last>Buneman</last><first>Peter</first>

</author><author>

<last>Suciu</last><first>Dan</first>

</author><price>39.95</price>

</book><book year=“1999“ id=“3“>

<title>The Economics of . . . </title><editor>

<last>Gerbarg</last><first>Darcy</first><affiliation>CITI</affiliation>

</editor><price>129.95</price>

</book></bib>

© 2006 AG DBIS 8

DASMOD2006 L5 (33,000 ft.): Example API-AccessL5 (33,000 ft.): Example API-Access

XQuery

DOM

SAX

<result>{for $b in //book[@year=2000]where count($b/author) > 2return $b/title

}</result>

Node contextNode = document.getDocumentElement ();

// navigate to first book element contextNode = contextNode.getFirstChild ();

// navigate to next sibling book element contextNode = contextNode.getNextSibling ();

public void startElement(String namespaceURI, String lName, ...) {}

public void endElement(String namespaceURI, String lName, ...) {}

public void characters(char ch[], int start, int length) {}

© 2006 AG DBIS 9

DASMOD2006 L5 (33,000 ft.): XTC Command CenterL5 (33,000 ft.): XTC Command Center

document handling• store/delete documents

document navigation/modification/querying in transactional context• DOM, SAX, XQuery

© 2006 AG DBIS 10

DASMOD2006 L4 (15,000 ft.) taDOM data modelL4 (15,000 ft.) taDOM data model

<?xml version="1.0"?><bib> <book year="2004" id="book1"> <title>The Title</title> <author> <first>FirstName</first> <last>LastName</last> </author> <price>49,99</price> </book></bib>

T

bib

book

title author price

id year Tfirst last

TT

The Title

FirstName

LastName

49,99book1 2004

attribute root node

element node

attribute node

string node

text node

© 2006 AG DBIS 11

DASMOD2006 L4 (15,000 ft.) SPLID node addressing schemeL4 (15,000 ft.) SPLID node addressing scheme

T

bib

book

title author price

id year Tfirst last

TT

The Title

FirstName

LastName

49,99book1 2004

1

1.3

1.3.3 1.3.5 1.3.7

1.3.3.3

1.3.3.3.1

1.3.5.31.3.5.5

1.3.5.3.3

1.3.5.3.3.1

1.3.5.5.3

1.3.5.5.3.1

1.3.7.3

1.3.7.3.1

1.3.11.3.1.3 1.3.1.5

1.3.1.3.11.3.1.5.1

Stable Path Labeling IDentifiers• for document storage• for query processing• for locking support

© 2006 AG DBIS 12

DASMOD2006 L4 (15,000 ft.) Simple Locking ExampleL4 (15,000 ft.) Simple Locking Example

- R X

R + + -

X + - -Object

modify read

• needs exclusive access• requests X lock

• needs shared access• requests R lock

Protocol: Compatability Matrix

T

bib

book

title author price

id year Tfirst last

TT

The Title

FirstName

LastName

49,99book1 2004

1

1.3

1.3.3 1.3.5 1.3.7

1.3.3.3

1.3.3.3.1

1.3.5.31.3.5.5

1.3.5.3.3

1.3.5.3.3.1

1.3.5.5.3

1.3.5.5.3.1

1.3.7.3

1.3.7.3.1

1.3.1

1.3.1.5

1.3.1.3.11.3.1.5.1

On a tree: hierarchical locking!

T1: X

T2: R

T1

T2: R OK!

T2

© 2006 AG DBIS 13

DASMOD2006 L4 (15,000 ft.) taDOM3+ Compatability MatrixL4 (15,000 ft.) taDOM3+ Compatability Matrix

- IR NR LR SR IX

NRIX

LRIX

SRIX

CX NRCX

LRCX

SRCX

NU LRNU

SRNU

NX LRNX

SRNX

SU SX

IR + + + + + + + + + + + + + + + + + + + - -

NR + + + + + + + + + + + + + - - - - - - - -

LR + + + + + + + + + - - - - - - - - - - - -

SR + + + + + - - - - - - - - - - - - - - - -

IX + + + + - + + + - + + + - + + - + + - - -

NRIX + + + + - + + + - + + + - - - - - - - - -

LRIX + + + + - + + + - - - - - - - - - - - - -

SRIX + + + + - - - - - - - - - - - - - - - - -

CX + + + - - + + - - + + - - + - - + - - - -

NRCX + + + - - + + - - + + - - - - - - - - - -

LRCX + + + - - + + - - - - - - - - - - - - - -

SRCX + + + - - - - - - - - - - - - - - - - - -

NU + + + + + + + + + + + + + - - - - - - - -

LRNU + + + + + + + + + - - - - - - - - - - - -

SRNU + + + + + - - - - - - - - - - - - - - - -

NX + + - - - + - - - + - - - - - - - - - - -

LRNX + + - - - + - - - - - - - - - - - - - - -

SRNX + + - - - - - - - - - - - - - - - - - - -

SU + + + + + - - - - - - - - - - - - - - - -

SX + - - - - - - - - - - - - - - - - - - - -

© 2006 AG DBIS 14

DASMOD2006 L3 (10,000 ft.) XTC Document IndexL3 (10,000 ft.) XTC Document Index

• document mapped to records and distributed across fixed sized pages• efficient DOM navigations• prefix compression works

1.3.1.31.3.11.31

1.3.31.3.1.5.11.31.51.3.1.3.1

1.3.5.31.3.51.3.3.3.11.3.3.3

1.3.5.5.31.3.5.51.3.5.3.3.11.3.5.3.3

1.3.7.3.11.3.7.31.3.71.3.5.5.3.1

SPLID node data (byte representation)

1.3.1.3.1

1 1.3.5.3.3

1.3.5.5.3.1

1.3.3.3

do

cum

ent

ind

exd

ocu

men

t co

nta

iner

© 2006 AG DBIS 15

DASMOD2006 Buffer ManagementBuffer Management

Buffer = main memory area with fixed number of frames for pages

Exploits reference locality Typical BufferManager operations

• fetch page, allocate page, clear page, fix page, unfix page Page replacement strategy LRU or LRD-V2 Page addressing by 4-byte page number (external memory

address)

Data Page Data Page Data Page Data Page

Frame Frame Frame Frame Frame

Database BufferPageNumber (4 Bytes)

PageType (1 Byte)

© 2006 AG DBIS 16

DASMOD2006 I/O-Manager (1)I/O-Manager (1)

Container file is sliced into fixed sized blocks (blockSize == pageSize)

I/O-Manager handles container file• read block, write block, allocate block, release block.

Dynamic allocation of new external memory space, if container is full Indexblock an Position 0 verwaltet Block- und Erweiterungsgröße Before-Image-Block at position 1 for Update-In-Place with Write-

Ahead-Log Block addressing with 3-byte block number

Block 0Index

Block 1Before Image

Block 2Data Block

Block 3Data Block

Block nData Block

Block SizeExtent Size

Container

© 2006 AG DBIS 17

DASMOD2006 Approaches to Adaptivity of System Behavior Approaches to Adaptivity of System Behavior

DBS have a large number of tuning parameters

Choose default values for tuning: rules of thumb• OK for workload-independent parameters:

page size, striping unit, minimal buffer size • insufficient for load balancing aspects: MPL limit, etc.

Hardware is cheap: the KIWI principle• OK if applied with care• however, it often implies a waste of resources

Autonomic computing: online feedback control loop• OK, but requires additional ressources (cycles,

memory, ...)

© 2006 AG DBIS 18

DASMOD2006 Automate some Tasks of the DBAAutomate some Tasks of the DBA

Process the loop automatically• monitor – analyze – plan – react• prediction needs quantitative models!• additional information flow within / between layers

© 2006 AG DBIS 19

DASMOD2006 Local Self-Tuning – Index SelectionLocal Self-Tuning – Index Selection

Automatic creation of indexes in L3 Analogy:

Global self-tuning requires distributed knowledge• Workload statistics collected in L5• Use of path processing algorithms in L4• availability of alternative indexes in L3

Countingtrafficlocally

Planning new resources?

Global traffic observation

Better solution!

© 2006 AG DBIS 20

DASMOD2006 ConclusionsConclusions

XTC is a real database system• Try it: www.xtc-project.de

We dived through the 5 XTC layers• XML management• Node Management• Record Management• Buffer Management• I/O Management

Adaptivity• We are only at the beginning• Central concept: online feedback control loop• First step in XTC: Let the components talk to each

other

© 2006 AG DBIS 21

DASMOD2006

Questions?