Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by...

29
Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker R am on Law rence K en B arker U niversity ofM anitoba U niversity ofC algary umlawren@ cs.umanitoba.ca [email protected] Multidatabase Querying by Context Multidatabase Querying by Context
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of Page 1 Multidatabase Querying by Context Ramon Lawrence, Ken Barker Multidatabase Querying by...

Page 1

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Ramon Lawrence Ken BarkerUniversity of Manitoba University of Calgary

[email protected] [email protected]

Multidatabase Querying by ContextMultidatabase Querying by ContextMultidatabase Querying by ContextMultidatabase Querying by Context

Page 2

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Outline

Introduction, Motivation, and Background Integration architecture Integration architecture components

standard dictionary, X-Specs, query processor

Context View as an Universal Relation Query Processor Algorithms

field/table mapping discovery, join selection model extensions to simplify query construction

Future work and conclusions

Page 3

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Introduction and Motivation

Database integration requires conflict resolution during schema integration.

There are many integration environments: operational systems within an organization system integration during company merger data warehouses, Intranets, and the WWW

Our system automatically integrates relational database schema and allows the user to transparently query the system.

This presentation is on the query system.

Page 4

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Background

Current techniques for database interoperability have some of these problems:

require integrator to understand all databases integration process is manual do not hide system complexity from the user

SQL and multidatabase query languages such as MSQL require user to query integrated system by structure

not desirable when accessing numerous databases

Page 5

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Previous Work

Research systems: integrating systems by logical rules (Sheth) defining global dictionaries (Castano) Carnot Project using the Cyc knowledge base wrapper and mediator systems:

Information Manifold, TSIMMIS, Infomaster

Industrial systems and standards: Metadata Interchange Specification (MDIS) XML, BizTalk, E-commerce portals

Query Languages: SQL, MSQL, IDL, DIRECT, SchemaSQL

Page 6

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Database

Integration Architecture

Architecture Components: 1) Integrated Context View

• user’s view of integration 2) X-Spec Editor

• stores schema & metadata 3) Standard Dictionary

• terms to express semantics 4) Integration Algorithm

• combines X-Specs into integrated context view

5) Query Processor• accepts query on view• determines data source mappings and joins• executes queries and formats results

Client

Local Transactions

Subtransactions

X-Spec

Client

Multidatabase Layer

X-Spec Editor

Standard Dictionary

Integration Algorithm

Database

Integrated Context View

Query Processor and ODBC Manager

X-Spec

Page 7

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Integration Architecture

The integration architecture consists of three separate processes:

Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec.

Integration process: combines X-Specs into a structurally-neutral hierarchy of database concepts called an integrated context view.

Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries and the results are integrated and formatted.

Page 8

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Integration Architecture:The Capture Process

RelationalSchema

StandardDictionary

X-SpecSpecification

Editor

AutomaticExtraction

DBA Lookupof terms

Capture process involves: automatically extracting the schema information and

metadata using a specification editor assigning semantic names to each schema element

(tables and fields) to capture their semantics

Page 9

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Architecture Discussion

The architecture automatically integrates relational schemas into a multidatabase

Desirable properties: individual mappings - information sources integrated

one-at-a-time and independently integrated view constructed for query transparency -

user queries system by semantics instead of structure handles schema conflicts - including semantic,

structural, and naming conflicts automated integration - integrated view constructed

efficiently and automatically no wrapper or mediator software is required

Page 10

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Architecture Components: The Standard Dictionary

A standard dictionary (SD) provides standardized terms to capture data semantics.

Hierarchy of terms related by IS-A or Has-A links Contains base set of common database

concepts, but new concepts can be added

A SD term is a single, unambiguous semantic definition.

Several SD entries for a single English word are required if the word has multiple definitions.

Page 11

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Architecture Components:Using the Standard Dictionary

SD terms are used to build semantic names describing semantics of schema elements.

Semantic names have the form: semantic name = “[“CT [[;CT] | [,CT]] “]” CN CT = context term, CN = concept name each CT and CN is a single term from the SD

Semantic names are included in specifications describing a database.

Page 12

Multidatabase Querying by Context Ramon Lawrence, Ken BarkerArchitecture Components:

X-Specs

Database metadata and semantic names are combined into specifications called X-Specs:

stored and transmitted using XML contains information on a relational schema organized into database, table, and field levels stores semantic names to describe and integrate

schema elements

Page 13

Multidatabase Querying by Context Ramon Lawrence, Ken BarkerArchitecture Components:

Integrating X-Specs

Each database to be integrated is described using a X-Spec.

Identical concepts in different databases are identified by similar semantic names.

Concepts with identical (or hierarchially related) semantic names are combined regardless of their physical representation in the individual databases.

Product of the integration algorithm is a structurally-neutral integrated view of all database concepts.

Page 14

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Integration Architecture:The Integration Process

Integration process involves: automatically identifying identical concepts by

matching semantic names constructing a structurally-neutral integrated

view of database concepts resolving structural differences during query

generation and submission (e.g. a concept may be represented as a table in one database and a field (attribute) in another)

Page 15

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Context View as a Universal Relation

Definitions: dictionary term - unambiguous word phrase in dictionary semantic name - combination of dictionary terms to

represent schema element semantics context - a semantic name is a context if it maps to a table concept - a semantic name is a concept if it maps to a field context closure - of semantic name Si denoted Si

* is the set of semantic names produced by taking ordered subsets of the terms of Si = {T1, T2 , … TN} starting with T1. Example: If Si = [A;B;C] D then Si

* ={[A], [A;B], [A;B;C], [A;B;C] D}.

Page 16

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Context View as a Universal Relation (cont.)

Define a context view (CV) as follows: If a semantic name Si is in CV, then for any Sj in Si*, Sj is also

in CV. For each semantic name Si in CV, there exists a set of zero or

more mappings Mi that associate a schema element Ej with Si.

A semantic name Si can only occur once in the CV.

A context view (CV) is a valid Universal Relation. Each field is assigned a semantic name which uniquely

identifies its semantic connotation. Illustrates that context view provides structural transparency

similar to Universal Relation

Page 17

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Architecture Components: The Query Processor

The query processor: allows the user to formulate queries on the view translates from semantic names in the context

view to structural queries (SQL) on databases involves determining correct field and table mappings and discovery of join conditions and join paths

retrieves query results and formats them for display to the user

Page 18

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

The Query Processor:Determining field/table mappings

For each database (D) in the context view For each semantic name (S) in query

If S has only one semantic name mapping in D Then Add field mapping to query and its parent table

Else If S has multiple mappings but all in one table Then Add each field mapping to query and the parent table

Else S has multiple mappings in more than one table Then If any field mapping has a table already in query take that one Else take field mapping with best semantic name match Else take first mapping found

End If

Next

Next

Page 19

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

The Query Processor:Constructing Join Graphs

Given a set of fields (F) and tables (T) to access, joins are applied to connect the tables.

A join graph is an undirected graph where: each node Ni is a table in the database there is a link from node Ni to node Nj if there is a join

between the two tables

A join path is a sequence of joins connecting two nodes in the graph.

A join tree is a set of joins connecting two or more nodes.

A join matrix M stores the shortest join paths between any two nodes (tables).

Page 20

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

The Query Processor:Join Graph for Northwind

O rders

O rderD etails

Em p loyeesProducts

C ustom ers

C ategoriesProducts

Products

Suppliers

Shippers

1 N

N

1

1N

1N1 N

1

N

1

N

Products

Page 21

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

The Query Processor:Join Discovery Results

Join Discovery in a database with a connected, acyclic join graph and a join matrix M:

there exists only one join tree for any set of tables the joins required to connect a table set T is found by

taking any Ti of T and unioning the join paths in M[Ni,N1], M[Ni,N2], ... M[Ni,Nn] where N1,N2,..Nn are the nodes corresponding to the set of tables T.

For a cyclic join graph: there may exist more than one join tree for a set of tables

and each tree may have different semantics can allow the user to uniquely determine join tree by

graphically displaying join conditions to the user as they browse the context view

Page 22

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Northwind & Southstorm Integration Example

Northwind Database SchemaTables Fields

Categories CategoryID, CategoryName

Customers CustomerID, CompanyName

Employees EmployeeID, LastName, FirstName

OrderDetails OrderID, ProductID, UnitPrice, Quantity

Order OrderID, CustomerID, EmployeeID, OrderDate, Shipvia

Products ProductID, ProductName, SupplierID, CategoryID

Shippers ShipperID, CompanyName

Suppliers SupplierID, CompanyName

Southstorm Database SchemaTables Fields

Orders_tb Order_num, Cust_name, Emp_name, Item1_id, Item1_qty,

Item1_price, Item2_id, Item2_qty, Item2_price

Page 23

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Northwind & Southstorm Integration Example (page 2)

Northwind Semantic Name MappingsType Semantic Name System Name Type Semantic Name System Name

Table [Category] Categories Table [Order] Orders

Field [Category] Id CategoryID Field [Order] Id OrderID

Field [Category] Name CategoryName Field [Order;Customer] Id CustomerID

Table [Customer] Customers Field [Order;Employee] Id EmployeeID

Field [Customer] Id CustomerID Field [Order] Date OrderDate

Field [Customer] Name CompanyName Field [Order;Shipper] Id Shipvia

Table [Employee] Employees Table [Product] Products

Field [Employee] Id EmployeeID Field [Product] Id ProductID

Field [Employee] Last Name LastName Field [Product] Name ProductName

Field [Employee] First Name FirstName Field [Product;Supplier] Id SupplierID

Table [Order;Product] OrderDetails Field [Product;Category] Id CategoryID

Field [Order] Id OrderID Table [Shipper] Shippers

Field [Order;Product] Id ProductID Field [Shipper] Id ShipperID

Field [Order;Product] Price UnitPrice Field [Shipper] Name ShipperName

Field [Order;Product] Quantity Quantity Table [Supplier] Suppliers

Field [Supplier] Id SupplierID

Field [Supplier] Name SupplierName

Page 24

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Northwind & Southstorm Integration Example (page 3)

Southstorm Semantic Name Mappings

Type Semantic Name System Name

Table [Order] Orders_tb

Field [Order] Id Order_num

Field [Order;Customer] Name Cust_name

Table [Order;Employee] Name Emp_name

Field [Order;Product] Id Item1_id

Field [Order;Product] Quantity Item1_qty

Table [Order;Product] Price Item1_price

Field [Order;Product] Id Item2_id

Field [Order;Product] Quantity Item2_qty

Field [Order;Product] Price Item2_price

Page 25

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Northwind & Southstorm Integration Example (page 4)

Integrated Context ViewIntegrated View

TermData Source Mappings

(not visible to user)Integrated View

TermData Source Mappings

(not visible to user)

V (view root) N/A V (view root) (cont.) N/A

- [Category] NW.Categories - [Order] NW.Orders, SS.Orders_tb

- Id NW.Categories.CategoryID -Id NW.[Orders,OrderDetails].OrderID, SS.Orders_tb.Order_num

- Name NW.Categories.CategoryName - [Customer]

- [Customer] NW.Customers - Id NW.Orders.CustomerID

- Id NW.Customers.CustomerID - Name SS.Orders_tb.Cust_name

- Name NW.Customers.CompanyName - [Employee]

- [Employee] NW.Employees - Id NW.Orders.EmployeeID

- Id NW.Employees.EmployeeID - Name SS.Orders_tb.Emp_name

- [Name] - [Product] NW.OrderDetails

- First Name NW.Employees.FirstName - Id NW.OrderDetails.ProductID, SS.Orders_tb.Item[1,2]_id

- Last Name NW.Employees.LastName - Price NW.OrderDetails.UnitPrice, SS.Orders_tb.Item[1,2]_price

- [Product] NW.Products - Quantity NW.OrderDetails.Quantity, SS.Orders_tb.Item[1,2]_qty

- Id NW.Products.PrdouctID - [Shipper] NW.Shippers

- Name NW.Products.ProductName - Id NW.Shippers.ShipperID

- [Supplier] - Name NW.Shippers.ShipperName

- Id NW.Products.SupplierID - [Supplier] NW.Suppliers

- [Category] - Id NW.Suppliers.SupplierID

- Id NW.Products.CategoryID - Name NW.Suppliers.SupplierName

Page 26

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Northwind & Southstorm Query Examples

Example 1: Retrieve all order ids ([Order] Id) and customers ([Customer] Name):

SS: SELECT Order_num, Cust_name FROM Orders_tb NW: SELECT OrderID, CompanyName FROM Orders,

Customers WHERE Orders.CustomerID = Customers.CustomerID

Example 2: Retrieve all ordered products ([Order;Product] Id) and their order ids.

SS: SELECT Order_num, Item1_id, Item2_id FROM Orders_tb NW: SELECT OrderID, ProductID FROM OrderDetails Note: In NW, selects from two different order id mappings.

In SS, result normalization is required.

Page 27

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Integration Example:Discussion

Important points: system table and field names are not presented to

the user who queries based on semantic names database structure is not shown to the user field and table mappings are automatically

determined based on X-Spec information join conditions are inserted as needed when

available to join tables different physical representations for the same

concept are combined hierarchically related concepts are combined based

on their IS-A relationship in the standard dictionary

Page 28

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Conclusions

Automatic integration of database schema is possible by using a standard dictionary of terms and constructing semantic names for schema elements.

Integration of data sources has applications to the WWW and construction of data warehouses.

Users are able to transparently query integrated systems by concept instead of structure.

Page 29

Multidatabase Querying by Context Ramon Lawrence, Ken Barker

Future Work

The integration architecture is evolving with standards on XML and captures metadata information in XML documents.

The system is being tested on sample problems

We are refining a prototype of the system called Unity.

The query processor is being extended to resolve more complex queries and conflicts.