Management of XML Documents in Object-Relational Databases

28
Management of XML Documents in Object-Relational Databases Thomas Kudrass Matthias Conrad HTWK Leipzig EDBT-Workshop XML-Based Data Management Prague, 24 March 2002

description

Management of XML Documents in Object-Relational Databases. Thomas Kudrass Matthias Conrad HTWK Leipzig. EDBT-Workshop XML-Based Data Management Prague, 24 March 2002. Overview. Motivation Object-Relational Database Concepts Parsing XML Documents XML-to-ORDB Mapping Meta-Data - PowerPoint PPT Presentation

Transcript of Management of XML Documents in Object-Relational Databases

Page 1: Management of XML Documents in Object-Relational Databases

Management of XML Documents in Object-Relational Databases

Thomas Kudrass Matthias Conrad

HTWK Leipzig

EDBT-WorkshopXML-Based Data ManagementPrague, 24 March 2002

Page 2: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Overview

Motivation Object-Relational Database Concepts Parsing XML Documents XML-to-ORDB Mapping Meta-Data Special Issues Conclusions

Page 3: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Motivation

Storing of XML documents in DBMS Use existing database technology Dealing with complex objects:

– XML documents = complex objects– avoid any decomposition– object-relational database technology good choice

to represent complex objects

Page 4: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

User-Defined Types in ORDB

Complex Data Types– Object Type– Collection Type

Object References Object Views

Page 5: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Example: Object Types

CREATE TYPE Type_Professor AS OBJECT (PName VARCHAR(80),Subject VARCHAR(120) );

object-valued object table attribute

CREATE TYPE Type_Course AS OBJECT ( CREATE TABLE TabProfessor OF Name VARCHAR(100), Type_Professor;Professor Type_Professor );

Page 6: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Example: Collection Types

CREATE TYPE Type_Professor AS OBJECT (PName VARCHAR(80),Subject VARCHAR(120) );

Array Nested Table

CREATE TYPE TypeVa_ Professor AS CREATE TYPE Type_TabProfessor AS VARRAY(5) OF Type_Professor; TABLE OF Type_Professor; 

CREATE TABLE TabDept ( DName VARCHAR(80),

Professor Type_TabProfessor ) NESTED TABLE Professor STORE AS TabProfessor_List;

Page 7: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Example: Object References

CREATE TYPE Type_Professor AS OBJECT (PName VARCHAR(80),Dept VARCHAR(120) );

 CREATE TABLE TabProfessor OF Type_Professor;

 CREATE TYPE Type_Course AS OBJECT (

Name VARCHAR(200),Prof_Ref REF Type_Professor );

 CREATE TABLE TabCourse OF Type_Course;

Referenceto objects of object table TabProfessor

Page 8: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Parsing DTD and XML

XML V2 Parser DTD Parser

XML Document DTD

Schema Definition

Well-FormednessValidity Check

XML2 Oracle

XML DOM Tree DTD DOM Tree

--------------------------------------------------------------------------------

-----------------------------------------------------------------------------------

DBMS Oracle

JDBC / ODBC

Syntax Check

Page 9: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig1 <!ELEMENT University (StudyCourse,Student*)>2 <!ELEMENT Student (LName,FName,Course*)>3 <!ATTLIST Student StudNr CDATA #REQUIRED>4 <!ELEMENT Course (Name,Professor*,CreditPts?)>5 <!ELEMENT Professor (PName,Subject+,Dept)>6 <!ENTITY cs “Computer Science“>7 <!ELEMENT LName (#PCDATA)>8 <!ELEMENT FName (#PCDATA)>9 <!ELEMENT Name (#PCDATA)>10 <!ELEMENT CreditPts (#PCDATA)>11 <!ELEMENT PName (#PCDATA)>12 <!ELEMENT Subject (#PCDATA)>13 <!ELEMENT Dept (#PCDATA)>14 <!ELEMENT StudyCourse (#PCDATA)>

Page 10: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Object–Based–Mapping

DTD Classes Tables <!ELEMENT A (B,C)> CLASS A { CREATE TABLE A (<!ELEMENT C (D)> STRING b; a_pk INTEGER NOT NULL,<!ELEMENT D (#PCDATA)> C c; b VARCHAR(30) NOT NULL);<!ELEMENT B (#PCDATA)> CLASS C { CREATE TABLE C (

STRING d;} c_pk INTEGER NOT NULL, a_fk INTEGER NOT NULL,

d VARCHAR(10) NOT NULL);

• Modification of the Mapping Algorithm [Bourret]

No class definitions

Use objects of the DTD tree

Page 11: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

• Each Complex Element Table

• Each Set-Valued Element Table

• Primary Key in each Table

1 <!ELEMENT University (StudyCourse,Student*)>2 <!ELEMENT Student (LName,FName,Course*)>3 <!ATTLIST Student StudNr CDATA #REQUIRED>4 <!ELEMENT Course (Name,Professor*,CreditPts?)>5 <!ELEMENT Professor (PName,Subject+,Dept)>6 <!ENTITY cs “Computer Science“>7 <!ELEMENT LName (#PCDATA)>8 <!ELEMENT FName (#PCDATA)>9 <!ELEMENT Name (#PCDATA)>10 <!ELEMENT CreditPts (#PCDATA)>11 <!ELEMENT PName (#PCDATA)>12 <!ELEMENT Subject (#PCDATA)>13 <!ELEMENT Dept (#PCDATA)>14 <!ELEMENT StudyCourse (#PCDATA)>

1 <!ELEMENT University (StudyCourse,Student*)> CREATE TABLE TabUniversity (IDUniversity

 

2 <!ELEMENT Student (LName,FName,Course*)> CREATE TABLE TabStudent ( IDStudent

 

4 <!ELEMENT Course (Name,Professor*,CreditPts?)> CREATE TABLE TabCourse ( IDCourse

 

5 <!ELEMENT Professor (PName,Subject+,Dept)> CREATE TABLE TabProfessor ( IDProfessor

CREATE TABLE TabSubject (

IDSubject

Step 1

Page 12: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Other Elements & Attributes Table Columns

CREATE TABLE TabUniversity (IDUniversity,attrStudyCourse,

 CREATE TABLE TabStudent (

IDStudent,attrStudNr,attrLName,attrFName,

 CREATE TABLE TblMatrikelNr (

IDMatrikelNr,attrMNummer,

CREATE TABLE TabCourse (IDCourse,attrName,attrCreditPts,

CREATE TABLE TabProfessor (IDProfessor,attrPName,attrDept,

CREATE TABLE TabSubject (IDSubject,attrSubject,

Step 2

1 <!ELEMENT University (StudyCourse,Student*)>2 <!ELEMENT Student (LName,FName,Course*)>3 <!ATTLIST Student StudNr CDATA #REQUIRED>4 <!ELEMENT Course (Name,Professor*,CreditPts?)>5 <!ELEMENT Professor (PName,Subject+,Dept)>6 <!ENTITY cs “Computer Science“>7 <!ELEMENT LName (#PCDATA)>8 <!ELEMENT FName (#PCDATA)>9 <!ELEMENT Name (#PCDATA)>10 <!ELEMENT CreditPts (#PCDATA)>11 <!ELEMENT PName (#PCDATA)>12 <!ELEMENT Subject (#PCDATA)>13 <!ELEMENT Dept (#PCDATA)>14 <!ELEMENT StudyCourse (#PCDATA)>

Page 13: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Relationships between Elements Foreign Keys

CREATE TABLE TabUniversity (IDUniversity INTEGER NOT NULL,attrStudyCourse VARCHAR(4000) NOT NULL,PRIMARY KEY (IDUniversity));

 CREATE TABLE TabStudent (

IDStudent INTEGER NOT NULL,IDUniversity INTEGER NOT NULL,attrStudNr VARCHAR(4000) NOT NULL,attrLName VARCHAR(4000) NOT NULL,attrFName VARCHAR(4000) NOT NULL,PRIMARY KEY (IDStudent),CONSTRAINT conMatrikel FOREIGN KEY (IDUniversity) REFERENCES TabUniversity (IDUniversity));

 ...

Step 3

Page 14: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

ORDBS Oracle and XML

Basic Idea:– Generate an object-relational schema from the DTD – Natural representation of an XML document by

combining user-defined types Different Mapping Rules:

– Simple elements– Complex elements– Set-valued elements– Complex set-valued elements

Page 15: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

XML Attributes & Simple Elements

Elements of #PCDATA type and XML attributes Attributes of the object type

Domain of Simple Elements:– No type information in the DTD:

numeric vs. alphanumeric? length?

– Restrictions of the DBMS (e.g. VARCHAR [Oracle] 4000 characters)

Mapping of an XML attribute of a simple element Definition of an object type for both attribute and element

Page 16: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

CREATE TABLE TabProfessor OF Type_Professor;

CREATE TYPE Type_Professor AS OBJECT ( attr PAddress VARCHAR(4000), attrPName VARCHAR(4000), attrSubject VARCHAR(4000), attrDept Type_Dept);

CREATE TYPE Type_Dept AS OBJECT ( attrDept VARCHAR(4000), attrDAddress VARCHAR(4000)); 

<!ELEMENT Professor (PName,Subject,Dept)><!ATTLIST Professor PAddress CDATA #REQUIRED> <!ELEMENT PName (#PCDATA)><!ELEMENT Subject (#PCDATA)><!ELEMENT Dept (#PCDATA)><!ATTLIST Dept DAddress CDATA #REQUIRED>

XML Attributes & Simple Elements

Page 17: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Complex ElementsNesting of elements by composite DB object types

CREATE TABLE TabUniversity (attrStudyCourse VARCHAR(4000),attrStudent Type_Matrikel );

 CREATE TYPE Type_Student AS OBJECT (

attrStudNr VARCHAR(4000),attrLName VARCHAR(4000),attrFName VARCHAR(4000),attrCourse Type_Vorlesung );

 CREATE TYPE Type_Course AS OBJECT (

attrName VARCHAR(4000),attrProfessor Type_Professor,attrCreditPts

VARCHAR(4000)); CREATE TYPE Type_Professor AS OBJECT (

attrPName VARCHAR(4000),attrSubject VARCHAR(4000),attrDept VARCHAR(4000));

INSERT INTO TabUniversity VALUES ( ‘Computer Science' ,

Type_Student('23374','Conrad','Matthias', Type_Course(‘Databases II‘, Type_Professor(‘Kudrass‘ ,

‘Database Systems‘', ‘Computer Science‘), '4')));

SELECT u.attrStudent.attrLname FROM TabUniversity uWHERE u.attrStudent.attrCourse.attrProfessor.attrPName

= ‘Kudrass';

Page 18: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Set-Valued Elements

Multiple Occurrence (in DTD): marked by + or * DBMS Restrictions

– collection type applicable to set-valued elements with text-valued subelements, e.g. ARRAY OF VARCHAR

– collection type not applicable to set-valued elements with complex subelements

subelements may be set-valued again

Solutions– use newer DBMS releases (e.g. Oracle 9i)– model relationships with object references

Page 19: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Set-Valued Elements

CREATE TYPE Type_Student AS OBJECT ( attrJahrgang VARCHAR(4000) , attrUniversity REF Type_University ); CREATE TABLE TabStudent OF Type_Student; CREATE TYPE Type_University AS OBJECT( attrStudyCourse VARCHAR(4000)); CREATE TABLE TabUniversity OF Type_University;

• Set-valued element Student

• Modeling in object type Type_Student with a reference

to objects of the table TabUniversity

<!ELEMENT University (StudyCourse,Student*)>

Reference toUniversity Objects

Page 20: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Set-Valued ElementsCREATE TYPE TypeVA_Course AS VARRAY(100) OF Type_Course;CREATE TYPE TypeVA_Professor AS VARRAY(100) OF Type_Professor;CREATE TYPE TypeVA_Subject AS VARRAY(100) OF VARCHAR(4000);

CREATE TABLE TabUniversity (attrStudyCourse VARCHAR(4000),attrStudent Type_Matrikel );

 CREATE TYPE Type_Student AS OBJECT (

attrStudNr VARCHAR(4000),attrLName VARCHAR(4000),attrFName VARCHAR(4000),attrCourse Type_Vorlesung );

 CREATE TYPE Type_Course AS OBJECT (

attrName VARCHAR(4000),attrProfessor Type_Professor,attrCreditPts VARCHAR(4000));

 CREATE TYPE Type_Professor AS OBJECT (

attrPName VARCHAR(4000),attrSubject VARCHAR(4000),attrDept VARCHAR(4000));

Page 21: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Set-Valued ElementsExample

INSERT INTO TabUniversity VALUES ( ‘Computer Science' , TypeVA_Student (

Type_Student('23374','Conrad','Matthias', TypeVA_Course ( Type_Course(‘Databases II‘, TypeVA_Professor ( Type_Professor(‘Kudrass‘ ,

TypeVA_Subject (‘Database Systems,‘Operating Systems‘),‘Computer Science‘)),‘4‘),

Type_Course(‘CAD Intro‘, TypeVA_Professor ( Type_Professor(‘Jaeger‘ ,

TypeVA_Subject ( ‘CAD‘,‘CAE‘), ‘Computer Science‘)),‘4‘),

...)),Type_Student(‘00011',‘Meier',‘Ralf', … ) … )...);

Page 22: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Dealing with Null Values

Restrictions with NOT NULL constraints in object-relational DB schema– NOT NULL constraints in table - not in object type! – NOT NULL constraints not applicable to collection

types

Object-valued attributes: – use CHECK constraints for NOT NULL

Loss of DTD semantics DTD in the database

Page 23: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Dealing with CHECK Constraints<!ELEMENT Course (CName, Address?)><!ELEMENT Addresse (Street, City?)>   CREATE TYPE Type_Address AS OBJECT (

attrStreet VARCHAR(4000),attrCity VARCHAR(4000));

 CREATE TYPE Type_Course AS OBJECT (

attrName VARCHAR(4000),attrAddress Type_Address);  

CREATE TABLE TabCourse OF Type_Course (attrName NOT NULL,CHECK (attrAdresse.attrStrasse IS NOT NULL));

 // ORA-02290: Desired error message

1. INSERT INTO TabCourse ( VALUES (‘CAD Intro’,Type_Address (NULL,’Leipzig’);  // ORA-02290: Undesired error message

2. INSERT INTO TabCourse ( VALUES ('RN', NULL)

Page 24: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Meta-Data about XML Documents

Unique DocumentID for each Document Prolog Information Document Location (URL) Name Space Element vs. Attribute

Page 25: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Naming Conventions for DB Objects

Rules:– TabElementname Table Name– Type_Elementname Object Type Name – TypeVa_Elementname Array Name

No Conflicts with Keywords Introduction of a Schema ID Naming Rule:

SchemaID + Naming Convention + Name

CREATE TYPE DTD01_Type_University CREATE TYPE DTD02_Type_University AS OBJECT ( AS OBJECT ( attrStudyCourse VARCHAR(4000) ); attrRegister VARCHAR(4000) );

Page 26: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Conclusions: Advantages

Non-atomic domains possible– Natural representation of XML Documents– Nesting of any complexity possible

Simple queries by using dot notation Using object references to represent

relationships (OIDs)

Page 27: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Conclusions: Drawbacks

Mapping Deficiencies– Possible restrictions of element types in collections– No adequate mapping of NOT NULL constraints

Loss of Information– Prolog, Comments, Processing Instructions, Prolog– Entity References – Attribute vs. Element ?

Schema Evolution– Modification of DTD Modification of DB

Type Information– Target type: VARCHAR - not sufficient!

Page 28: Management of XML Documents in Object-Relational Databases

© T. Kudrass, HTWK Leipzig

Outlook

Graph-based creation of a schema Source: XML Schema Use CLOB datatype Enhance Meta-Schema

– Comments, Processing Instructions and their position in document

– Entity references and their substitution text