Object Relational Databases

Object Relational Databases

Ioan Despi

Motivation & Politics

In the early 80’s, it became clear that relational systems were not robust enough for non-administrative data-intensive applications of the day: CAD/CAM CASE GIS etc.

Two buzz-phrases began to emerge: "Object-Oriented" and "Extensible"

Much vision & politics ensued: Various data models (NF2, ER, Functional, Semantic) Object-Oriented DB System Manifesto (OO-ness) Third-Generation DB System Manifesto (Extensibility) Many query languages proposed

Systems were built, companies started, etc.

Today, the field has settled down into two arenas:

Persistent OO PL systems (e.g. EXODUS, ObjectStore, Objectivity, Versant, etc.)

Query-based systems with OO features (e.g. Starburst, Postgres, Illustra, Informix & ORacle "Universal Servers", DB/2 UDB)

Almost nobody does both well

Few people continue to argue in terms of the paradigms.

Market: $10 billion/year for RDBMS

$200 million/year for OODBMS

To extend an existing relational data base management system:

• keep the basic relational tables & query language

• add object flavours:

user - extensible type system

encapsulation

inheritance

polymorphism

dynamic binding of methods

complex objects (non-first normal form objects)

object identity

Obvious & common idea:

Terminology:

Extended Relational DBMS (ERDBMS) -- original term

Object - Relational DBMS (ORDBMS) -- more descriptive

Universal Server, Universal DBMS (UDBMS)-- recently

Oracle

Informix have all extended their systems to become ORDBMSs

IBM

SQL3 standardize extensions to the relational model and query language

Advantages:

resolves many of the weaknesses of the relational model

extends RDBMS with reuse and sharing

comes from the ability to extend the database server to perform standard functionality centrally, rather than

having it coded in each application

preserves the knowledge and experience from relational model

Disadvantages:

complexity and associated increased costs

simplicity and purity of the relational model are lost

object-orinted purists: the used terminology is wrong

The Third - Generation Database Manifestos

1989 Atkinson et al. : Object -Oriented Database System Manifesto:

1. Complex objects must be supported

2. Object identity must be supported

3. Encapsulation must be supported

4. Types or classes must be supported

5. Types or classes must be able to inherit from ancestors

6. Dynamic binding must be supported

7. The DML must be computationally complete

8. The set of data types must be extensible

9. Data persistence must be provided

10. Support for very large databases must be provided

11. DBMS must support concurrent users

12. DBMS must be capable of recovery

13. DBMS must provide a simple way of querying data

1990 Stonebraker et al. : Committee for Advanced DBMS Function

The Third-Generation Database Systems Manifeso

1. Athird-generation DBMS must have a rich type system

2. Inheritance is a good idea

3. Functions, including database procedures, methods and encapsulation, are a good idea

4. Unique identifiers for records should be assigned by the DBMS only if a user-defined primary key is not available

5. Rules (triggers, constraints) will become a major feature. They should not be associated with a specific function or collection.

6. The programmatic access to a database should be through a non-procedural high-level access language

7. At least two ways to specify collections:

using enumaration of members

using QL to specify membership

8. Updateable views are essential

9. Performance indicators must not appear in the data model

10. DBMS must be accessible from multiple languages

11. On top of the DBMS: a variety of high-level languages

12. For better or worse, SQL is ‘intergalactic dataspeak’

13. Queries and their resulting answers should be the lowest level of communication between a server and a client

1995, 1998: Darwen & Date: attempt to defend the relational data model, as described in their book (1992).

The Third Manifesto:

certain oo features are desirable but they must be orthogonal to the relational model

the realtional model needs no extension, no correction, no subsumtion, no perversion

SQL is the biggest perversion--> it is rejected from the model

instead: language D

a front-end layer is provided to D that allows SQL to be used (a migration path for existing SQL users)

The D language is subject to:

1. Prescriptions that arise from the relational model

(RM Prescriptions)

2. Prescriptions that do not arise from the realtional model, called Other Orthogonal prescriptions

(OO Prescriptions)

3. Proscriptions that arise from the relational model

(RM Proscriptions)

4. Proscriptions that do not arise from the relational model

(OO Proscriptions)

5. Very strong suggestions (RM and OO)

RM Prescriptions

1. Domanins

2. Typed scalars

3. Scalar operators

4. Actual representation

5. Truth values

6. Type constructor TUPLE

7. Type constructor RELATION

8. Equality operator

9. Tuples

10. Relations

11. Scalar variables

12. Tuple variables

13. Realtion variables (relvars)

14. Base vs. derived relvars

15. Database variables (dbvars)

16. Transactions and dbvars

17. Create/destroy operations

18. Relational algebra

19. Relvar names and explicit values

20. Relation functions

21. Relation and tuple assignement

22. Comparisons

23. Integrity constraints

24. Relation and database predicates

25. Catalog

26. Language design

RM Proscriptions

1. No attribute ordering

2. No tuple ordering

no duplicate tuples

4. No nulls

5. No nullogical mistakes

6. No internal-level constructs

7. No tuple-level operations

8. No composite columns

9. No domain check override

10. Not SQL

OO Proscriptions

1. Relvars are not domains

2. No object ids

3. No ‘public instance variables’

4. No ‘prottected instance variables’ or friends

OO Prescriptions1. Compile-time type checking

2. Single inheritance ( conditional)

3. Multiple inheritance (conditional)

4. Computational completeness

5. Explicit transactions boundaries

6. Nested transactions

7. Aggregates and empty sets

RM Very strong

suggestions

1. Candidate keys for derived relvars

2. System-generated keys

3. Referential integrity

4. Candidate key inference

5. Quota queries

6. Transitive closure of a relation

7. Tuple and relation parameters

8. Default values

9. SQL migration

OO Very strong

suggestions

1. Typew inheritance

2. Collection type constructors

3. Conversion to/from relations

4. Single- level store

The primary object in the propopsal is the domain:

a named set of encapsulated values, of arbitrary complexity

equivalent to a data type or object class

Domain values: scalars, can be manipulated only by means of operators defined for the domain.

The language D comes with some build-in domains(ex: truth values)

The equals (=) comparison operator is defined for every domain, returning a boolean value

Relations, tuples and tuple headings have their normal meaning with the introduction of RELATION and TUPLE type constructors for these objects

The following variables are defined:

•Scalar variable of type V--variable whose permitted values are scalars from a specified domain V

•Tuple variable of type H-- variable whose permitted values are tuples with a specified tuple heading H

•Relation variable (relvar) of type H -- variable whose permitted values are relations with a specified relation heading H

•Database variable (dbvar) -- a named set of relvars. Every dbvar is subject to a set of named integrity constraints and has an associated self-describing catalog

Query RDBMS ORDBMS

NoQuery File Sys. OODBMS Simple Complex Data Data

Stonebraker’s matrix: upper right is growing,

Systems History

Three influential research systems:

Starburst (IBM Almaden)

POSTGRES (Berkeley)

EXODUS (Wisconsin)

Others include O2 (Altair), ORION (MCC), Iris (HP), Genesis (Texas)

EXODUS

EXODUS was intended to be both a persistent PL system and a query system with "Toolkit" extensibility

Query processing engine never got built, though the EXODUS optimizer architecture was influential (Graefe & DeWitt)

Ended up focusing on OODB stuff SHORE (follow-on to EXODUS) is delivering on EXODUS

promises, but rather late. Persistent C++ Query Processing (w/ GIS features): Paradise Extensible Optimizer: Opt++ Parallelism We'll see Exodus/SHORE work on

pointer swizzling, client-server caching

POSTGRES

Stonebraker, Rowe, a few staff and many students, 1986-1994. Post-INGRES.

The Postgres Data Model

1.Co-opt the OO terminology class = relation instance = tuple object-id = tuple-id method = attribute or function of attributes 2.Support extensible ADTs extensible procedures using C functions binary operators, which interface to extensible AM

3.Support type constructors trick: use queries columns can be parameterized Postquel functions

(returns setof, or tuple) queries can live in fields of a tuple (returns setof or tuple) another exploitation of the view paradigm! these derived objects can optionally be cached

(never implemented) nested-dots used to traverse complex object structures leverages EXISTING techniques for relational processing added array support directly 4.Added class inheritance (gives method inheritance and

collection hierarchies)

Starburst

Original goal: build a nice playpen for whatever comes next. Extensible "in-house". Not by users! No one survey paper seems to capture the work they did. Best bet: "Starburst Mid-Flight: As The Dust Clears", Haas, et al., TKDE 1990

Plumbing: clean internal query representation (QGM).

Key to Query Rewrite! non-normalized catalogs for efficiency –

normalized view for users WAL instead of shadow pages

B+-tree compression Buffer Pool Manager accepts hints from optimizer

(a la DBMIN)

Extensibility features: • User-defined functions: table expressions: queries or C functions scalar functions no dynamic linking • Rule-based query rewrite engine a little rule system with QGM as "working memory" conditions and actions are C functions that check and

change QGM some nifty rule control mechanisms (rule classes, rule

budgets, multiple conflict res.) • Extensible access methods (as in POSTGRES) "Attachments": routines to be automatically called before/after

dealing with an access method used by Starburst Rule System to generate transition logs used to implement pre-computed joins• Complex objects implemented in a "wrapper" (SQL-XNF),

translated down to Starburst SQL

Hot Applications:

Web servers, full-text collections Time-series data "Asset Management" GIS image DB

Players:

Informix Universal Server (head of Illustra, body of Informix). Shipping now. IBM DB2 UDB (head of Starburst, body of DB2). Extensibility features coming along. UniSQL (Won Kim of ORION fame). Went out of business recently. Oracle Universal Server (marketing-ware). Shipping now. NCR (Teradata) bought Wisconsin's Paradise ORDBMS

(and DeWitt/Naughton/students) Other big R vendors are late (Sybase, Tandem, etc.)

Object -Relational Databases Concepts

I. Nested relations

II. Complex types and object orientation

III. Querying with complex types

IV. Creation of complex values and objects

V. Comparison of OO and OR databases

Object -Relational Data Models

1. Extend the relational data model by including object orientation and constructs to deal with added data types

2. Allow attributes of tuples to have complex types, including non-atomic values such as nested relations

3. Preserve relational foundations (e.g. declarative access to data) while extending modeling power

4. Upward compatibility with existing relational languges

I. Nested Relations

1. Permit non-atomic domains (e.g. set of integers, set of tuples,…)

--> violate 1NF (that all attributes have atomic (indivisible) domains)

2. Allow more intuitive modelling for applications with complex data

a complex object may be represented by a single tuple --->

1:1 correspondence between data items and objects

3. Retain mathematical foundation of relational model

Intuitive definition: Nested relations allow:

1. Relations wherever we allow atomic (scalar) values

2. Relations within relations

Example. Suppose the information to be stored consists of:

1. document title

2. author_list (set of authors)

3. date (day, month, year)

4. key_word_list (list of keywords)

stored in a non-1NF document relation, doc:

title author_list date keyword_list

day month year

salesplan

stat.report{Smith, Jones}

{Jones, Frick}

17 april 93

28 june 98

{profit, strategy}

{personnel, profit}

title author day month year keyword

salesplan Smith 17 april 93 profit

salesplan Jones 17 April 93 profit

salesplan Smith 17 April 93 strategy

salesplan Jones 17 April 93 strategy

stat report Jones 28 June 98 personnel

stat report Frick 28 June 98 Personnel

stat report Jones 28 June 98 Profit

stat report frick 28 June 98 Profit

The doc relation can be represented in 1NF as doc’ but awkward.

If we asssume the following MVDs hold:

title --> author

title --> keyword

title --> day month year

we can decompose the relation in the following 4NF relations:

R1(title, author)

R2(title, keyword)

R3(title, day, month, year)

A relation is in 5NF if no reamining nonloss projections are possible, except the trivial one in which the key appears in each projection.

A relation is in 4NF iff it is in BCNF and there are no nontrivial MVD

A relation is in BCNF iff every determinant is a candidate key.

A relation is in 3NF iff it is in 2NF and no nonkey attribute is transitively dependent on the key

A relation is 2NF iff it is in 1NF and all the non-key attributes are fully FD on the key

A relation is 1NF iff every attribute is single-valued for each tuple.

The non-1NF representation may be an easier-to-understand model (closer to user’s view)

The 4NF design would require users to include joins in their queries, thereby complicating interaction with the system

We could define a view, but we lose the 1:1 correspondence between tuples and documents.

II. Complex Types and Object Orientation

Again: extensions to relational model include:

nested relations

complex types

specialization (IS_A hierarchies)

inheritance

object identity

A. Structured and collection types

Define a relation doc with complex attributes: sets and structured attributes:

create type MyString char varying

create type MyDate

(day integer, month char(10), year char(10))

create type Document

(name MyString, author_list setof(MyString),

date MyDate, keyword_list setof (MyString))

create table doc of type Document

•Unlike table definitions in ordinary relational databases, the doc table definition allows attributes that are sets and structured attributes (see: MyDate)

•Allow composite attributes and multivalued attributes of ER diagrams to be represented directly.

•The types created using the above statements are recorded in the schema stored in the database

•Can create tables directly:

create table doc

(name MyString, author_list setof (MyString), date MyDate, keyword_list setof (MyString))Complex type systems usually support other collection types, as

arrays: author_array MyString[10] //presents an ordered list of authors

multisets: print_runs multiset(integer) //presents the number of copies in each //printing run

B. Inheritance

Inheritance can be at the level of types or at the level of tables

1. Inheritance of types

create type Person

(name MyString,

social_security_no integer)

create type Student

(degree MyString,

department MyString)

under Person

create type Teacher

(salary integer,


under Person

Since name and social_security_no are inherited from a common source, Person, there is no conflict by inheriting them .

However, department is defined separately in Student and Teacher and one can rename them to avoid conflict, by using an as clause:

1’. Multiple inheritance of types

Definiton of the type TeachingAssistant as a subtype of both Teacher andStudent.

create type TeachingAssistant

under Student with (department as student_dept),

under Teacher with (department as teacher_dept)

2. Inheritance of tables

To avoid creation of too many subtypes: one approach is to allow an object to have multiple types without having a most specific type

OR databases can model such a feature by using inheritance at the level of tables, rather than types and allowing an entity(object) to exist in more than one table at once.

create table people

(name MyString,

social_security_no integer)

create table students

(degree MyString,


under people

create table teachers

(salary integer,


under people

2’. Table inheritance: Roles

Table inheritance permits an object to have multiple types, without having a most-specific type (unlike type inheritance)

Example: an object can be in the students and teachers subtables simultaneously, without having to be in a subtable

student_teachers that is under both students and teachers

Object can gain/ lose roles: corresponds to inserting /deleting object from a subtable.2’’. Table inheritance: Consistency Requirements

1. Each tuple of supertable people can correspond to (i.e. having the same values for all inherited attributes as) at most one tuple in each of the tables studentsand teachers (WHY?)

2. Each tuple in students and teachers must have exactly one corresponding tuple in people. (WHY?)

Subtables can be stored in an efficient manner without replication of all inherited fields:

inherited attributes other than the primary key of the supertable need not be stored and can be derived by means of a join with the supertable, based on the primary key

As with types, multiple inheritance is possible with tables:

a TeachingAssistant can simply belong to the table students as well as to the table teachers. However, if ew want, we can create a table for TeachingAssistant entities.

Based on the consistency requirements for subtables, if an entity is present in the teaching_assistants table, it is also present in the teachers and in the students table.

3 . Inheritance: Conclusion

Inheritance:

makes schema definition natural

ensures referential and cardinality constraints

enables the use of functions defined for supertypes on objects belonging to subtypes

allows the orderly extension of a database system to incorporate new types

C. Reference Types

Object - oriented languages provide the ability to create and refer to objects.

1.To refer to objects = an attribute of a type can be a reference to an object of a specified type.

Example: redefine the author_list field of the type Document as:

author_list setof ( ref (Person))

Now author_list is a set of references to Person objects

2. Tuples of a table can also have references to them.

Example: ref(people)

It can be implemented using the primary key or tuple-id.

3. SQL3 uses identity (for tuples) and oid (for objects).

III. Quering with Complex Types

A. Relation-Valued Attributes

Extended SQL allows an expression evaluating to a relation to appear anywhere the relation name may appear.==>

can take advantage of the structure of nested relations

Example: consider the following relation pdoc:

create table pdoc

name MyString

author_list setof (ref people)),

date MyDate,

keyword_list setof (MyString))

1. Find all documents having the word “database” as one of their keywords:

select name

from pdoc

where “database” in keyword_list

2. Find all pairs of the form “doc_name, author_name” for each document and each author of the document:

select B.name, Y.name

from pdoc as B, B.author_list as Y

3. Find the name and the number of authors for each document (aggregate functions can be applied to any relation-valued expression)

select name, count(author_list)

from pdoc

B. Path Expressions

The dot notation for referring to composite attributes can be used with references.

Example: student.advisor.name

References can be used to hide join operations --> they simplify the query considerably.

Example: Consider the previous table people and a table phd_student:

create table phd_students

(advisor (ref(people))

under people

Find the names of the advisers of all PhD students:

select phd_student.advisor.name

from phd_students

In general, attributes used in a path expression can be a collection, such as a set or a multiset.

For example, to get the names of all authors of documents in pdoc relation:

select Y.name

from pdoc.author_list as Y

C. Nesting and Unnesting

Unnesting = the transformation of a nested relation into 1NF

converts a nested relation into a single flat relation with no nested relations or structured types as attributes

create table doc

(name MyString,

author_list setof (MyString),

date MyDate,

keyword_list setof (MyString))

author_list, keyword_list:

nested relations

name, date: are not nested

select name, A as author, date.day, date.month, date.year, K as keyword

from doc as B, B.author_list as A, B.keyword_list as K

B is declared to range over doc, A ranges over the authors in author_list for that document, K ranges over the keywords in the keyword_list of the document

Nesting = the reverse operation of transformation of a 1NF relation into a nested relation

can be carried out by an extension of grouping in SQL

Example: nest the relation flat_doc on the attribute keyword:

select title, author, (day, month, year) as date,

set (keyword) as keyword_list

from flat_doc

groupby title, author, date

title author_listdate keyword_list

day month year

salesplan

salesplan

stat.report

stat_report

Smith

Jones

Jones

Frick

17 april 93

17 april 93

28 june 98

28 june 98

{profit, strategy}

{profit, strategy}

{personnel, profit}

{personnel, profit}

D. Functions

Object - relational systems allow functions to be defined by users

Functions can be defined in a DML, such as extended SQL

Example: define a function that, given a document, return the count of the number of authors:

create function author_count (one_doc Document)

returns integer as

select count (author_list)

from one_doc

the function can be used in a query , to find the names of all documents that have more than one author

select name

from doc

where author_count (doc) > 1

Notes:

1.Although doc refers to a relation in the from clause, it is treated as tuple variable in the where clause, and can therefore be used as an argument to the author_count function.

2. A select statement can return a collection of values.

If the return type of a function is a collection type, the result of the function is the entire collection.

However, if the return type is not a collection type, the collection generated by SQL shoul contain only one tuple.

Otherwise, a system may have two choices: flag an error or select an arbitrary one from the collection.

Functions can also be defined in a programming language as C, C++, Java. It can be more efficient and handle more complex computations than that defined using SQL.

Since the code needs to be loaded and executed within the database system code, it may carry the risk of:

integrity: a bug in the program can corrupt the database internal structure

security: it can by-pass the access control functionality of the database management system

Embedded SQL is different from C++ code functions: in SQL the query is passed by the user program to the database system to run. User-written code never needs to access to the database itself. The operating system thus can protect the database from access by any user process.

IV. Creation of Complex Values and Objects

1. Create and update tuples with complex types (tuple-valued and set-valued):

insert into doc

values

(“salesplan”, set (“Smith”, “Jones”),

(19, “January”, 00), set (“profit”, “strategy”))

2. We can use complex values in queries: anywhere where a set is expected, we can enumerate a set:

select name, date

from doc

where name in set (“salesplan”, “opportunities”, “risks”)

3. Multiset values can be created by replacing set by multiset.

4. To create new objects, one can use constructor functions.

The constructor function for an object type T is T( ).

When it is invoked, it creates a new uninitialized object of type T, fills in its oid field, and returns the object.

The fields of the object must then be initialized.

V. Comparison of OO and OR databases

1. OR databases are OO databases built on top of the relational model.

2. Persistent programming language-based OODBs target applications that have high performance requirements( CAD dbs)

3. Relational systems: simple data types, powerful query languages, high protection

4. Persistent programming language-based OODBs: complex data types, integration with programming languages, high performance

5. Object- relational systems: complex data types, powerful query languages, high protection

Reference: A. Silberschatz, H.F. Korth, S. Sudarshan - Database System Concepts, McGraw-Hill, 3 rd ed., 1997

T. Connolly, C. Begg, A. Strachan - Database Systems: A Practical Approach to Design, Implementation and Management. 2 nd ed., Addison-Wesley, 1999

Object Relational Databases

Documents

Transcript of Object Relational Databases