Advanced Database CS-426 Week 13 – Object Relational Databases.
Object Relational Databases
description
Transcript of Object Relational Databases
Object Relational Databases
Ioan Despi
Motivation & Politics
In the early 80’s, it became clear that relational systems were not robust enough for non-administrative data-intensive applications of the day: CAD/CAM CASE GIS etc.
Two buzz-phrases began to emerge: "Object-Oriented" and "Extensible"
Much vision & politics ensued: Various data models (NF2, ER, Functional, Semantic) Object-Oriented DB System Manifesto (OO-ness) Third-Generation DB System Manifesto (Extensibility) Many query languages proposed
Systems were built, companies started, etc.
Today, the field has settled down into two arenas:
Persistent OO PL systems (e.g. EXODUS, ObjectStore, Objectivity, Versant, etc.)
Query-based systems with OO features (e.g. Starburst, Postgres, Illustra, Informix & ORacle "Universal Servers", DB/2 UDB)
Almost nobody does both well
Few people continue to argue in terms of the paradigms.
Market: $10 billion/year for RDBMS
$200 million/year for OODBMS
To extend an existing relational data base management system:
• keep the basic relational tables & query language
• add object flavours:
user - extensible type system
encapsulation
inheritance
polymorphism
dynamic binding of methods
complex objects (non-first normal form objects)
object identity
Obvious & common idea:
Terminology:
Extended Relational DBMS (ERDBMS) -- original term
Object - Relational DBMS (ORDBMS) -- more descriptive
Universal Server, Universal DBMS (UDBMS)-- recently
Oracle
Informix have all extended their systems to become ORDBMSs
IBM
SQL3 standardize extensions to the relational model and query language
Advantages:
resolves many of the weaknesses of the relational model
extends RDBMS with reuse and sharing
comes from the ability to extend the database server to perform standard functionality centrally, rather than
having it coded in each application
preserves the knowledge and experience from relational model
Disadvantages:
complexity and associated increased costs
simplicity and purity of the relational model are lost
object-orinted purists: the used terminology is wrong
The Third - Generation Database Manifestos
1989 Atkinson et al. : Object -Oriented Database System Manifesto:
1. Complex objects must be supported
2. Object identity must be supported
3. Encapsulation must be supported
4. Types or classes must be supported
5. Types or classes must be able to inherit from ancestors
6. Dynamic binding must be supported
7. The DML must be computationally complete
8. The set of data types must be extensible
9. Data persistence must be provided
10. Support for very large databases must be provided
11. DBMS must support concurrent users
12. DBMS must be capable of recovery
13. DBMS must provide a simple way of querying data
1990 Stonebraker et al. : Committee for Advanced DBMS Function
The Third-Generation Database Systems Manifeso
1. Athird-generation DBMS must have a rich type system
2. Inheritance is a good idea
3. Functions, including database procedures, methods and encapsulation, are a good idea
4. Unique identifiers for records should be assigned by the DBMS only if a user-defined primary key is not available
5. Rules (triggers, constraints) will become a major feature. They should not be associated with a specific function or collection.
6. The programmatic access to a database should be through a non-procedural high-level access language
7. At least two ways to specify collections:
using enumaration of members
using QL to specify membership
8. Updateable views are essential
9. Performance indicators must not appear in the data model
10. DBMS must be accessible from multiple languages
11. On top of the DBMS: a variety of high-level languages
12. For better or worse, SQL is ‘intergalactic dataspeak’
13. Queries and their resulting answers should be the lowest level of communication between a server and a client
1995, 1998: Darwen & Date: attempt to defend the relational data model, as described in their book (1992).
The Third Manifesto:
certain oo features are desirable but they must be orthogonal to the relational model
the realtional model needs no extension, no correction, no subsumtion, no perversion
SQL is the biggest perversion--> it is rejected from the model
instead: language D
a front-end layer is provided to D that allows SQL to be used (a migration path for existing SQL users)
The D language is subject to:
1. Prescriptions that arise from the relational model
(RM Prescriptions)
2. Prescriptions that do not arise from the realtional model, called Other Orthogonal prescriptions
(OO Prescriptions)
3. Proscriptions that arise from the relational model
(RM Proscriptions)
4. Proscriptions that do not arise from the relational model
(OO Proscriptions)
5. Very strong suggestions (RM and OO)
RM Prescriptions
1. Domanins
2. Typed scalars
3. Scalar operators
4. Actual representation
5. Truth values
6. Type constructor TUPLE
7. Type constructor RELATION
8. Equality operator
9. Tuples
10. Relations
11. Scalar variables
12. Tuple variables
13. Realtion variables (relvars)
14. Base vs. derived relvars
15. Database variables (dbvars)
16. Transactions and dbvars
17. Create/destroy operations
18. Relational algebra
19. Relvar names and explicit values
20. Relation functions
21. Relation and tuple assignement
22. Comparisons
23. Integrity constraints
24. Relation and database predicates
25. Catalog
26. Language design
RM Proscriptions
1. No attribute ordering
2. No tuple ordering
no duplicate tuples
4. No nulls
5. No nullogical mistakes
6. No internal-level constructs
7. No tuple-level operations
8. No composite columns
9. No domain check override
10. Not SQL
OO Proscriptions
1. Relvars are not domains
2. No object ids
3. No ‘public instance variables’
4. No ‘prottected instance variables’ or friends
OO Prescriptions1. Compile-time type checking
2. Single inheritance ( conditional)
3. Multiple inheritance (conditional)
4. Computational completeness
5. Explicit transactions boundaries
6. Nested transactions
7. Aggregates and empty sets
RM Very strong
suggestions
1. Candidate keys for derived relvars
2. System-generated keys
3. Referential integrity
4. Candidate key inference
5. Quota queries
6. Transitive closure of a relation
7. Tuple and relation parameters
8. Default values
9. SQL migration
OO Very strong
suggestions
1. Typew inheritance
2. Collection type constructors
3. Conversion to/from relations
4. Single- level store
The primary object in the propopsal is the domain:
a named set of encapsulated values, of arbitrary complexity
equivalent to a data type or object class
Domain values: scalars, can be manipulated only by means of operators defined for the domain.
The language D comes with some build-in domains(ex: truth values)
The equals (=) comparison operator is defined for every domain, returning a boolean value
Relations, tuples and tuple headings have their normal meaning with the introduction of RELATION and TUPLE type constructors for these objects
The following variables are defined:
•Scalar variable of type V--variable whose permitted values are scalars from a specified domain V
•Tuple variable of type H-- variable whose permitted values are tuples with a specified tuple heading H
•Relation variable (relvar) of type H -- variable whose permitted values are relations with a specified relation heading H
•Database variable (dbvar) -- a named set of relvars. Every dbvar is subject to a set of named integrity constraints and has an associated self-describing catalog
Query RDBMS ORDBMS
NoQuery File Sys. OODBMS Simple Complex Data Data
Stonebraker’s matrix: upper right is growing,
Systems History
Three influential research systems:
Starburst (IBM Almaden)
POSTGRES (Berkeley)
EXODUS (Wisconsin)
Others include O2 (Altair), ORION (MCC), Iris (HP), Genesis (Texas)
EXODUS
EXODUS was intended to be both a persistent PL system and a query system with "Toolkit" extensibility
Query processing engine never got built, though the EXODUS optimizer architecture was influential (Graefe & DeWitt)
Ended up focusing on OODB stuff SHORE (follow-on to EXODUS) is delivering on EXODUS
promises, but rather late. Persistent C++ Query Processing (w/ GIS features): Paradise Extensible Optimizer: Opt++ Parallelism We'll see Exodus/SHORE work on
pointer swizzling, client-server caching
POSTGRES
Stonebraker, Rowe, a few staff and many students, 1986-1994. Post-INGRES.
The Postgres Data Model
1.Co-opt the OO terminology class = relation instance = tuple object-id = tuple-id method = attribute or function of attributes 2.Support extensible ADTs extensible procedures using C functions binary operators, which interface to extensible AM
3.Support type constructors trick: use queries columns can be parameterized Postquel functions
(returns setof, or tuple) queries can live in fields of a tuple (returns setof or tuple) another exploitation of the view paradigm! these derived objects can optionally be cached
(never implemented) nested-dots used to traverse complex object structures leverages EXISTING techniques for relational processing added array support directly 4.Added class inheritance (gives method inheritance and
collection hierarchies)
Starburst
Original goal: build a nice playpen for whatever comes next. Extensible "in-house". Not by users! No one survey paper seems to capture the work they did. Best bet: "Starburst Mid-Flight: As The Dust Clears", Haas, et al., TKDE 1990
Plumbing: clean internal query representation (QGM).
Key to Query Rewrite! non-normalized catalogs for efficiency –
normalized view for users WAL instead of shadow pages
B+-tree compression Buffer Pool Manager accepts hints from optimizer
(a la DBMIN)
Extensibility features: • User-defined functions: table expressions: queries or C functions scalar functions no dynamic linking • Rule-based query rewrite engine a little rule system with QGM as "working memory" conditions and actions are C functions that check and
change QGM some nifty rule control mechanisms (rule classes, rule
budgets, multiple conflict res.) • Extensible access methods (as in POSTGRES) "Attachments": routines to be automatically called before/after
dealing with an access method used by Starburst Rule System to generate transition logs used to implement pre-computed joins• Complex objects implemented in a "wrapper" (SQL-XNF),
translated down to Starburst SQL
Hot Applications:
Web servers, full-text collections Time-series data "Asset Management" GIS image DB
Players:
Informix Universal Server (head of Illustra, body of Informix). Shipping now. IBM DB2 UDB (head of Starburst, body of DB2). Extensibility features coming along. UniSQL (Won Kim of ORION fame). Went out of business recently. Oracle Universal Server (marketing-ware). Shipping now. NCR (Teradata) bought Wisconsin's Paradise ORDBMS
(and DeWitt/Naughton/students) Other big R vendors are late (Sybase, Tandem, etc.)
Object -Relational Databases Concepts
I. Nested relations
II. Complex types and object orientation
III. Querying with complex types
IV. Creation of complex values and objects
V. Comparison of OO and OR databases
Object -Relational Data Models
1. Extend the relational data model by including object orientation and constructs to deal with added data types
2. Allow attributes of tuples to have complex types, including non-atomic values such as nested relations
3. Preserve relational foundations (e.g. declarative access to data) while extending modeling power
4. Upward compatibility with existing relational languges
I. Nested Relations
1. Permit non-atomic domains (e.g. set of integers, set of tuples,…)
--> violate 1NF (that all attributes have atomic (indivisible) domains)
2. Allow more intuitive modelling for applications with complex data
a complex object may be represented by a single tuple --->
1:1 correspondence between data items and objects
3. Retain mathematical foundation of relational model
Intuitive definition: Nested relations allow:
1. Relations wherever we allow atomic (scalar) values
2. Relations within relations
Example. Suppose the information to be stored consists of:
1. document title
2. author_list (set of authors)
3. date (day, month, year)
4. key_word_list (list of keywords)
stored in a non-1NF document relation, doc:
title author_list date keyword_list
day month year
salesplan
stat.report{Smith, Jones}
{Jones, Frick}
17 april 93
28 june 98
{profit, strategy}
{personnel, profit}
title author day month year keyword
salesplan Smith 17 april 93 profit
salesplan Jones 17 April 93 profit
salesplan Smith 17 April 93 strategy
salesplan Jones 17 April 93 strategy
stat report Jones 28 June 98 personnel
stat report Frick 28 June 98 Personnel
stat report Jones 28 June 98 Profit
stat report frick 28 June 98 Profit
The doc relation can be represented in 1NF as doc’ but awkward.
If we asssume the following MVDs hold:
title --> author
title --> keyword
title --> day month year
we can decompose the relation in the following 4NF relations:
R1(title, author)
R2(title, keyword)
R3(title, day, month, year)
A relation is in 5NF if no reamining nonloss projections are possible, except the trivial one in which the key appears in each projection.
A relation is in 4NF iff it is in BCNF and there are no nontrivial MVD
A relation is in BCNF iff every determinant is a candidate key.
A relation is in 3NF iff it is in 2NF and no nonkey attribute is transitively dependent on the key
A relation is 2NF iff it is in 1NF and all the non-key attributes are fully FD on the key
A relation is 1NF iff every attribute is single-valued for each tuple.
The non-1NF representation may be an easier-to-understand model (closer to user’s view)
The 4NF design would require users to include joins in their queries, thereby complicating interaction with the system
We could define a view, but we lose the 1:1 correspondence between tuples and documents.
II. Complex Types and Object Orientation
Again: extensions to relational model include:
nested relations
complex types
specialization (IS_A hierarchies)
inheritance
object identity
A. Structured and collection types
Define a relation doc with complex attributes: sets and structured attributes:
create type MyString char varying
create type MyDate
(day integer, month char(10), year char(10))
create type Document
(name MyString, author_list setof(MyString),
date MyDate, keyword_list setof (MyString))
create table doc of type Document
•Unlike table definitions in ordinary relational databases, the doc table definition allows attributes that are sets and structured attributes (see: MyDate)
•Allow composite attributes and multivalued attributes of ER diagrams to be represented directly.
•The types created using the above statements are recorded in the schema stored in the database
•Can create tables directly:
create table doc
(name MyString, author_list setof (MyString), date MyDate, keyword_list setof (MyString))Complex type systems usually support other collection types, as
arrays: author_array MyString[10] //presents an ordered list of authors
multisets: print_runs multiset(integer) //presents the number of copies in each //printing run
B. Inheritance
Inheritance can be at the level of types or at the level of tables
1. Inheritance of types
create type Person
(name MyString,
social_security_no integer)
create type Student
(degree MyString,
department MyString)
under Person
create type Teacher
(salary integer,
department MyString)
under Person
Since name and social_security_no are inherited from a common source, Person, there is no conflict by inheriting them .
However, department is defined separately in Student and Teacher and one can rename them to avoid conflict, by using an as clause:
1’. Multiple inheritance of types
Definiton of the type TeachingAssistant as a subtype of both Teacher andStudent.
create type TeachingAssistant
under Student with (department as student_dept),
under Teacher with (department as teacher_dept)
2. Inheritance of tables
To avoid creation of too many subtypes: one approach is to allow an object to have multiple types without having a most specific type
OR databases can model such a feature by using inheritance at the level of tables, rather than types and allowing an entity(object) to exist in more than one table at once.
create table people
(name MyString,
social_security_no integer)
create table students
(degree MyString,
department MyString)
under people
create table teachers
(salary integer,
department MyString)
under people
2’. Table inheritance: Roles
Table inheritance permits an object to have multiple types, without having a most-specific type (unlike type inheritance)
Example: an object can be in the students and teachers subtables simultaneously, without having to be in a subtable
student_teachers that is under both students and teachers
Object can gain/ lose roles: corresponds to inserting /deleting object from a subtable.2’’. Table inheritance: Consistency Requirements
1. Each tuple of supertable people can correspond to (i.e. having the same values for all inherited attributes as) at most one tuple in each of the tables studentsand teachers (WHY?)
2. Each tuple in students and teachers must have exactly one corresponding tuple in people. (WHY?)
Subtables can be stored in an efficient manner without replication of all inherited fields:
inherited attributes other than the primary key of the supertable need not be stored and can be derived by means of a join with the supertable, based on the primary key
As with types, multiple inheritance is possible with tables:
a TeachingAssistant can simply belong to the table students as well as to the table teachers. However, if ew want, we can create a table for TeachingAssistant entities.
Based on the consistency requirements for subtables, if an entity is present in the teaching_assistants table, it is also present in the teachers and in the students table.
3 . Inheritance: Conclusion
Inheritance:
makes schema definition natural
ensures referential and cardinality constraints
enables the use of functions defined for supertypes on objects belonging to subtypes
allows the orderly extension of a database system to incorporate new types
C. Reference Types
Object - oriented languages provide the ability to create and refer to objects.
1.To refer to objects = an attribute of a type can be a reference to an object of a specified type.
Example: redefine the author_list field of the type Document as:
author_list setof ( ref (Person))
Now author_list is a set of references to Person objects
2. Tuples of a table can also have references to them.
Example: ref(people)
It can be implemented using the primary key or tuple-id.
3. SQL3 uses identity (for tuples) and oid (for objects).
III. Quering with Complex Types
A. Relation-Valued Attributes
Extended SQL allows an expression evaluating to a relation to appear anywhere the relation name may appear.==>
can take advantage of the structure of nested relations
Example: consider the following relation pdoc:
create table pdoc
name MyString
author_list setof (ref people)),
date MyDate,
keyword_list setof (MyString))
1. Find all documents having the word “database” as one of their keywords:
select name
from pdoc
where “database” in keyword_list
2. Find all pairs of the form “doc_name, author_name” for each document and each author of the document:
select B.name, Y.name
from pdoc as B, B.author_list as Y
3. Find the name and the number of authors for each document (aggregate functions can be applied to any relation-valued expression)
select name, count(author_list)
from pdoc
B. Path Expressions
The dot notation for referring to composite attributes can be used with references.
Example: student.advisor.name
References can be used to hide join operations --> they simplify the query considerably.
Example: Consider the previous table people and a table phd_student:
create table phd_students
(advisor (ref(people))
under people
Find the names of the advisers of all PhD students:
select phd_student.advisor.name
from phd_students
In general, attributes used in a path expression can be a collection, such as a set or a multiset.
For example, to get the names of all authors of documents in pdoc relation:
select Y.name
from pdoc.author_list as Y
C. Nesting and Unnesting
Unnesting = the transformation of a nested relation into 1NF
converts a nested relation into a single flat relation with no nested relations or structured types as attributes
create table doc
(name MyString,
author_list setof (MyString),
date MyDate,
keyword_list setof (MyString))
author_list, keyword_list:
nested relations
name, date: are not nested
select name, A as author, date.day, date.month, date.year, K as keyword
from doc as B, B.author_list as A, B.keyword_list as K
B is declared to range over doc, A ranges over the authors in author_list for that document, K ranges over the keywords in the keyword_list of the document
Nesting = the reverse operation of transformation of a 1NF relation into a nested relation
can be carried out by an extension of grouping in SQL
Example: nest the relation flat_doc on the attribute keyword:
select title, author, (day, month, year) as date,
set (keyword) as keyword_list
from flat_doc
groupby title, author, date
title author_listdate keyword_list
day month year
salesplan
salesplan
stat.report
stat_report
Smith
Jones
Jones
Frick
17 april 93
17 april 93
28 june 98
28 june 98
{profit, strategy}
{profit, strategy}
{personnel, profit}
{personnel, profit}
D. Functions
Object - relational systems allow functions to be defined by users
Functions can be defined in a DML, such as extended SQL
Example: define a function that, given a document, return the count of the number of authors:
create function author_count (one_doc Document)
returns integer as
select count (author_list)
from one_doc
the function can be used in a query , to find the names of all documents that have more than one author
select name
from doc
where author_count (doc) > 1
Notes:
1.Although doc refers to a relation in the from clause, it is treated as tuple variable in the where clause, and can therefore be used as an argument to the author_count function.
2. A select statement can return a collection of values.
If the return type of a function is a collection type, the result of the function is the entire collection.
However, if the return type is not a collection type, the collection generated by SQL shoul contain only one tuple.
Otherwise, a system may have two choices: flag an error or select an arbitrary one from the collection.
Functions can also be defined in a programming language as C, C++, Java. It can be more efficient and handle more complex computations than that defined using SQL.
Since the code needs to be loaded and executed within the database system code, it may carry the risk of:
integrity: a bug in the program can corrupt the database internal structure
security: it can by-pass the access control functionality of the database management system
Embedded SQL is different from C++ code functions: in SQL the query is passed by the user program to the database system to run. User-written code never needs to access to the database itself. The operating system thus can protect the database from access by any user process.
IV. Creation of Complex Values and Objects
1. Create and update tuples with complex types (tuple-valued and set-valued):
insert into doc
values
(“salesplan”, set (“Smith”, “Jones”),
(19, “January”, 00), set (“profit”, “strategy”))
2. We can use complex values in queries: anywhere where a set is expected, we can enumerate a set:
select name, date
from doc
where name in set (“salesplan”, “opportunities”, “risks”)
3. Multiset values can be created by replacing set by multiset.
4. To create new objects, one can use constructor functions.
The constructor function for an object type T is T( ).
When it is invoked, it creates a new uninitialized object of type T, fills in its oid field, and returns the object.
The fields of the object must then be initialized.
V. Comparison of OO and OR databases
1. OR databases are OO databases built on top of the relational model.
2. Persistent programming language-based OODBs target applications that have high performance requirements( CAD dbs)
3. Relational systems: simple data types, powerful query languages, high protection
4. Persistent programming language-based OODBs: complex data types, integration with programming languages, high performance
5. Object- relational systems: complex data types, powerful query languages, high protection
Reference: A. Silberschatz, H.F. Korth, S. Sudarshan - Database System Concepts, McGraw-Hill, 3 rd ed., 1997
T. Connolly, C. Begg, A. Strachan - Database Systems: A Practical Approach to Design, Implementation and Management. 2 nd ed., Addison-Wesley, 1999