Post on 13-Jan-2016
description
1
Multidatabase manipulations Part 2
Multidatabase manipulations Part 2
Witold Litwin
http://ceria.dauphine.fr/witold.html
2
Multidatabase manipulations(Kandinsky: Ligne avec Accompagnement, 1937 )
Multidatabase manipulations(Kandinsky: Ligne avec Accompagnement, 1937 )
3
Multidatabase manipulations
4
MSQL(Litwin, Abdellatif, Nicolas, Zeroual, 1989
L. Suardi, M. Rusinkiewicz, 1992)
An extension to SQL– Contains by definition every SQL-x
Allows for non-procedural multidatabase base manipulations MSQL queries impossible to formulate in SQL An MSQL query may replace several SQL queries
Developed in 1986-89» INRIA, projet B A BA,
initialement sous projet du projet-pilote SIRIUS (J. Le Bihan, puis W. Litwin)
» Dr. Thesis of MM. Abdellatif, Nicolas, Zeroual
Compiler implemented at Houston University– Team od Prof. M. Rusinkiewicz, 1990-1993
5
MSQL(Litwin, Abdellatif, Nicolas, Zeroual, 1989
L. Suardi, M. Rusinkiewicz, 1992)
Research vehicle for functions for the MBD environment – to address relations in different databases– to manipulate semantically heterogeneous data– to create MDB views– to transfer data (and schemas) between DBs – to define MDB dependencies
Present to limited extent in most of commercial DBMSs &DAMSs
6
MSQL(Basic new properties)
SQL Query– Uses 1st order predicate calculus
– Is compiled for optimization into the relational algebra
– Result is a table
MSQL Query– May use higher-order predicate calculus
– Is compiled for optimization into the multirelational algebra
– Result is a multitable» A set of relations (tables)
» May be constituted from one or no tables
7
MSQL(More on functions specific to MDB env.)
Addressing of tables in different DBs– Implicitly or by qualification by (multi)database
names» Introduced around 1985 by relational multibase system
prototype MRDSM – B A BA project at INRIA
» Unknown at that time of any relational language See the overview of relational DBMSs existing in 1987 (M.
Brodie)
8
MSQL(More on functions specific to MDB env.)
Manipulation of semantically heterogeneousdata
– Multiple Queries » With multiples identifiers» With semantic variables
Ranging over data names
– Scale and Precision – Units of measure– Implicit joins
Capabilities still unknown of SQL Capabilities known at present to some dialects
– Limited with respect to MSQL
9
MSQL : example
SIL SIL SIL
SGCIC BNP
View
SIL = Internal Logical Schema
View
10
Conceptual Schemas (the multischema)
DB bnp : br (br#, brname, street, street#, city, zipcode, tel)
account (acc#, cl#, balance, br#)client (cl#, clname, cltel, cltype, street, street#, city, zipcode)spe-acc (acc#, br#, cl#, balance, curr)
DB sg : branch (bra#, braname, street, s#, town, zip, t#, class)
acc (acc#, bra#, c#, balance)client (c#, cname, ct#, ctype, street, s#, town, zip)
DB cic : br (br#, brname, street, street#, city, zipcode, tel)
account (ac#, br#, cl#, balance, open_date)client(cl#, clname, cltel, cltype, street, street#, city, zipcode)
11
Semantic Heterogeneity In Banks
Same names can designate different data Different names can designate same data
– same client, same town..
The value of a primary key is valid only in one DB– how to identify same client in diff. banks ?
12
MSQL Commands
CREATE TABLE CREATE DATABASE CREATE MULTIDATABASE CREATE VIEW ALTER TABLE ALTER VIEW ALTER MULTIDATABASE DROP TABLE DROP DATABASE DROP MULTIDATABASE DROP VIEW
13
MSQL CREATE DATABASE
Query scope
> MSQLCREATE DATABASE boulogne ;
CREATE DB |.com.org.user.boulogne ;CREATE MULTIDATABASE Banks (bnp cic sg );
USE Banks;CREATE DATABASE boulogne FROM bnp ;
14
MSQL CREATE MULTIDATABASE
MSQL CREATE MDB EC-Banks (f-banks-i-banks, s-banks, g-
banks, e-banks );
CREATE MULTIDATABASE can create :
– flat MDBs (only contain DBs)
– nested MDBs (DBs or MDBs)
» can be potentially any network of DBs or MDBs like through the links on the WEB what about cycles ?
15
MSQLCREATE TABLE
use banks ;CREATE TABLE boulogne.loan FROM bnp.loan ;
CREATE TABLE fake_checks (Chq# INT, Montant_Euro CURRENCY [EURO] .... );
One has created four (empty) tables :bnp. fake_checks , cic. fake_checks ... boulogne.fake_checks
CREATE TABLE boulogne.client (c#, cn, ct#) FROM bnp.client (cl#, clname, cltel) PRIMARY KEY (c#) (cn, ct#) OUTER REFERENCES (clname, cltel);
Unit ofmesure
Import
16
MSQLCREATE TABLE with References
USE AuPrintemps /* MDB AuPrintemps
CREATE TABLE MusicDep.Inventory ….FOREIGN KEY (Item#) REFERENCES Central.Stock(I#);
No unauthorized Item# in the inventory of the Music Department
Other options PRIMARY KEY (…) REFERENCES T(…) ; [T1(A)] [LEFT|RIGHT] REFERENCES T2(B) ;
– Generates implicit equijoin, or left or right implicit outerjoins when a query selects attributes A and B.
17
MSQLALTER MULTIDATABASE
use banks ;
alter banksinclude vernesremove cic
Alter MDB can create
– flat MDBs (only contain DBs)
– nested MDBs
18
MSQL Elementary queries
USE bnp cicSELECT bnp.br.brname, cic.br.braname, bnp.br.street
FROM bnp.br, cic.brWHERE bnp.br.street = cic.br.street ;
bnp.br.brname cic.br.braname bnp.br.street
vaugirard 3 bd. montparnasse vaugirard
Prefixing with DB names was unknown to SQL- and is in DB2 SQL since last year only
19
MSQL Default DB
USE bnpSELECT br.brname, cic.br.brname, br.street
FROM br, cic.brWHERE br.street = cic.br.street ;
br.name cic.br.brname br.street
vaugirard 3 bd. montparnasse vaugirard
Tables of the default database are not prefixedTables of the default database are not prefixed
20
MSQL Elementary queries without prefixed names
USE bnp sgSELECT br.brname, branch.braname, br.street
FROM br, branchWHERE br.street = branch.street ;
br.name branch.braname br.street
vaugirard 3 bd. montparnasse vaugirard
Table names are unique within the query scope
21
Updates
USE (bnp b) sg ;
UPDATE account
SET account.balance = account.balance + 500
WHERE account.balance > acc.balance
AND b.client.clname = sg.client.cname AND b.client.street = sg.client.street ;
What does it mean ?
22
Multiples QueriesUSE BanksSELECT *
FROM br%WHERE street = 'champs elysées' ;
23
Multiple QueriesUSE Banks SELECT * FROM br% WHERE street = 'champs elysées' ; USE bnp SELECT * FROM br WHERE street = 'champs elysées' ; USE sg SELECT * FROM branch WHERE street = 'champs elysées' ; USE cic SELECT * FROM br WHERE street = 'champs elysées' ;
24
Results (a multitable)bnp.br
br# brname street street# city zipcode tel123 sembat champs elysées 130 Boulogne 92100 12 34456 sevres champs elysées 120 Sevres 92105 12 56
cic.br
bra# braname street st# town zip t# class123 jaures champs elysées 153 Boulogne 92100 3214 A765 sevres champs elysées 20 Sevres 92105 1243 B
sg.branch
br# brname street s# city zipcode telabc sembat champs elysées 110 Boulogne 92100 12.45a1f gare champs elysées 30 Chaville 92110 34.56
25
Multiple UpdatesBegin
Use BanksUpdate cl*set street = 'Charles de Gaulle"where street = 'Etoile'
If SQLCODE <> 0 then Rollback ;
Commit
Use Banks vital cicUpdate cl*set street = 'Charles de Gaulle"where street = 'Etoile'
MSQL transaction semantics is more general than ACID– may include COMP (compensation) statement, list of accept.
states....
26
Semantic Variables in MSQL
use bnp sglet x be town cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'
27
Semantic Variables in MSQLuse bnp sg
let x be town cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'
use bnpselect * from brwhere town = 'Paris' and street = 'r. de Rivoli'
use sgselect * from branchwhere city = 'Paris' and street = 'r. de Rivoli'
28
Semantic Variables in MSQL
use bnp sglet x be town cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'
Alternatively:
use bnp sglet x be to% cityselect * from b%where x = 'Paris' and street = 'r. de Rivoli'
29
Semantic Variables in MSQL
use banks let X be banks.*select a%, balance, c%namefrom X.a% a, X.c% c
where a. a% = c. c%
The query illustrates the multitable pair-wise join Semantic variable a over relation name account is not
necessary, but simplifies the typing of the query
33
Semantic Variables in MSQL
Semantic variables can be compound and with values selected by queries from some dictionaries
use bankslet (x, y) be :select X.attr Y.attr
from FD X, FD Y where X.mean = tel and Y.mean = city
select * from clientwhere x = '123' and y = 'Paris'
mean attrtel t#tel telcity citycity towncity burgh
FD
FD
banks
34
Semantic Variables in MSQL
Can be applied to MSQL DD statements
use bankscreate database cic2 ;
let x be a% b% c%create table cic2.x from cic.x ;
Copies cic schema except for one table
35
Name homogenizationThe labels
USE Banks ;LET t BE tel t#SELECT %name branch_name, t tel#, s%# street#FROM br% brWHERE street = ‘Champs Elysées’ ;
The result : multitable:
{( bnp.br.branch_name, bnp.br.tel#, bnp.br.street# ), ( sg.br.branch_name, sg.br.tel#, sg.br.street# )( cic.br. branch_name, cic.br.tel#, cic.br.street# )}
36
Multidatabase Views
USE my_bank bnp sg ;CREATE VIEW my_bank.same_street_branches
(bnp_name, bnp_s#, sg_name, sg_s#, street, city)AS SELECT brname, street#, braname, s#, street, city FROM bnp.br b, sg.branch s
WHERE b.street = s.street AND b.city = s.town ;
A partial view of DBs bnp and sg in DB my_bank
my_bankbnp
sg
The views in my_bankcan be considered Import Schemes
37
Multidatabase Union Views
Use BanksCreate View bnp.all-banks as
Use bankslet x be town citylet y be banks.*Select y.br% ( y, br#, br%name branch, street, street#, x city, zip% zip, t% tel)Union *
Union * unions all the tables of the selected multitable It scales to all the tables named br% of Banks, if new banks enters the MDB Banks in the futureCurrent DBMS, e.g., SQL Server, require to alter the union view definition in such a case
38
Key words and Aggregate Functions in MSQL
Key words and Aggregate Functions of SQL– par definition
» DISTINCT, GROUP BY, ORDER BY» COUNT, AVG, SUM…
– operate at each table of a multitable Their extensions to multitables
» MDISTINCT, MCOUNT, MGROUP BY, MORDER BYMAVG, MSUM...
– operate at whole multitable– important for warehousing
39
Example
USE BanksSELECT COUNT (*)FROM br% brWHERE street = 'champs elysées' ;
40
Example
USE BanksSELECT COUNT (*)FROM br% brWHERE street = 'champs elysées' ;
bnp.br2
cic.br2
sg.br2
41
Example
USE BanksSELECT MCOUNT (*)FROM br% brWHERE street = 'champs elysées' ;
42
ExampleUSE Banks
SELECT MCOUNT (*)FROM br% brWHERE street = 'champs elysées' ;
br6
Exercises in warehousing :
-Average balance per client in each bank
Average balance per client in BANKS
-Sum of client assets per bank
-Sum of client assets in BANKS
Exercises in warehousing :
-Average balance per client in each bank
Average balance per client in BANKS
-Sum of client assets per bank
-Sum of client assets in BANKS
43
Aggregate Functions IMPLEMENTATION ISSUES
All-in-one (traditional computation)– Possibly in parallel– The calculus can take long time.
Successive approximations– Some kind of sampling
» result1, from any 1st DB to come» (result1 + result2) / 2» …» sampling within each database
several ACM-Sigmod & VLDB papers dealt with query evaluation using sampling
Precomputing– Incremental evaluation using interdatabase dependencies– Common to warehousing
44
Aggregate Functions MERGE ON
form a single tuple from all the tuples of the same objet in the multitable– Uses outer jointures
Find millionaires in Banks and form the tuple for each millionaireUSE Banks ;
LET x.y BE clname.cltel cname.ct#LET z BE Banks.*SELECT *FROM z.a% WHERE z.a%.c%# = z.client.c%#AND z.a%.balance > 1 000 000MERGE ON x y ;
45
Aggregate Functions MERGE ON
USE Banks ;LET x.y BE clname.cltel cname.ct#LET z BE Banks.*SELECT *FROM z.a% WHERE z.a%.c%# = z.client.c%#AND z.a%.balance > 1 000 000MERGE ON x y ;
nulls
nullsnulls
46
Aggregate Functions NAME
Transform a name (table, attribute..) into attribute value
USE Banks ;LET x.y BE br.city branch.townSELECT %name branch_name, NAME (.x) bankFROM xWHERE y = 'Nice' UNION * ;
Note: Union * unions all the tables of the selected multitable
the result is the table :branch_name bank
Jaures CIC
DeGaulle BNP
47
Aggregate Functions CHOOSE
Chooses at most n tuples among the selected ones– the 1st found as does the function TOP (default) in any or some order,
specified by ORDER BY (default)
– strictly random (RND)
– these that were not chosen by the previous execution of the query in the same transaction (NEW)
– preferably in the DBs listed, and in the listed order
– at most j per DB
– selecting at most m tuples sharing the values of the attr. in the list A, supposed global key of some objet.
CHOOSE (n, (m, <A>), [<B>] | j, [<B>], [RND | NEW]
<A> ::= <list of attr.> <B> ::= <list of DBs>
48
Aggregate Functions CHOOSE
Choose a millionaire randomly
USE Banks ;SELECT c.*FROM c% c, a% aWHERE c.c%# = a.c%# AND a.ba% > 1.000.000CHOOSE (1) RND ;
Function very important in MBD environment – information overload
49
Aggregate Functions TIMEOUT
Fix time limit of a query– the system should possibly deliver all the relevant tuples – however, any query arriving to timeout is considered executed successfully
TIMEOUT (t [unit]) ;<unit> := ms | s | m | h | ds - seconds (default)
USE BanksSELECT *FROM br%WHERE street = 'champs elysées' TIMEOUT (10) ;
50
Aggregate Functions POST
Make a query continuous– One manipulates each tuple found during the life time of the
query
– Even those created after the query start
– TIMEOUT may be used to limit the life time
USE Immo LaCentrale Orpi ;SELECT *FROM logem%WHERE prix < 1,000,000 AND Ville = 'Paris' POST ;
51
Aggregate Functions ESTIMATE
Compute the cost of a query before the execution and can start the execution after an authorizationESTIMATE (type, price, time, count, size, report)
[WITH EXEC_PROMPT]
type of estimate : – exact (can be long to compute)– approximate
price of the query (in $, FF...). completion time number of tuples size of the resultant, in bytes report on the estimate itself
– precision...
52
Privileges in MSQL
USE bnp sg cic ;
GRANT SELECT ON client TO Nicolas Abdellatif ;
client is a multitable :
client = (bnp.client, sg.client, cic.client)
GRANT ALL ON etoile.account TO Nicolas Abdellatif FROM bnp.account ;
GRANT ALL ON etoile.account TO Nicolas FROM Zeroual ON bnp.account ;
53
Interdatabase Queries
Transfer data between DBs Source and target are multitables
INSERT...
54
Interdatabase Queries INSERT
– insert selected tuples» except these with the key already in the target
STORE– insert selected tuples
» replacing these with the key already in the target
REPLACE– insert selected tuples and delete the rest of the target
UPDATE– update the tuples selected in the target with the values in the
source tables COPY
– copies tuples and the source schema
55
MSQL There are more interesting capabilities
– e.g. Multidatabase Dependencies» referential integrity & (outer) join links» multidatabase triggers
local autonomy
» dynamic attributes for retrieval and updates
The language design will never be finished– MSQL 1, 2, 3...
MSQL : A multidatabase Language. Information Science Journal : Special Issue on Database Systems, 48, 2, (July 1989).
Execution of Extended Multidatabase SQL. Intl. IEEE Conf. on Data Eng. Vienna, 1992
56
O*SQL For OO or RO common model, consider in
addition:– MDB inheritance– MDB type/subtype integration
» derived types
– OID heterogeneity & UUIDs– Type / function value semantic heterogeneity
» dynamic type hierarchies» higher-order OO languages
Relations with inherited attributes
57
Elements of MSQL in commercial DBMSs
Main DBMSs evolved to MDBSs– yet primitive but it's better than nothing
Sybase, Oracle, Informix, MsAccess, SQL Server,....
There are also MDBSs which are only access systems to DBMSs EDA-SQL, DEC DB Integrator, DBJoiner (IBM),
Ingres*, UniSQL/M, Uniface, Q+E, OAdaptor (HP), Telebase...
"Data Warehouses"
58
MSQL in commercial DBMSs(Department Store Data Warehouse, using MsAccess, SQL Server...)
MusicDep
JeansDep
Home Appl.Dep
FoodDep
Bd. Haussman
JeansDep
Home Appl.Dep
FoodDep
Parly 2
BooksDep
Payroll
Orders
Au Printemps
CentralWarehouse
MusicDep
Car
59
MBD Manipulations in MsAccess
One can perform limited MBD operations between – MsAccess DBs – An DB of MsAccess and
» any other DB under a DBMS ODBC compatible
» Paradox, Btrieve, Dbase
» Any OLE compatible program Excel...
60
MBD Manipulations in MsAccess
MsAccess
Attach
Paradox Excel Oracle Sybase
B1
ParadoxGateway
ODBC
Sybase ODBCdriver
ODBC
Import
ExportDistr.Connect.
B3
B2
Insert INTO
61
MsAccess & MSQL
Open B <=> USE B ATTACH table
Open B1 attach B2.T' as T create view B1.T as select * from B2.T'
» DROP VIEW corresponds to Delete in MsAccess menu
Clause IN <externalDB>Open B1
Select a, b, c From D IN B2 select a, b c from B2.D
62
Examples MsAccess Source DB: MsAccess
SELECT [Customer ID]FROM Customers IN MYDATA.MDBWHERE [Customer ID] Like "A*";
Source DB: Paradox SELECT [CustomerID]FROM CustomersIN "C:\PARADOX\DATA\SALES" "Paradox 4.x;"WHERE CustomerID Like "A*";
Every data transfer from/to DB non-MsAccess or OLE compatible software has data repr. conversions– Semantic Heterogeneity oblige
63
Elementary Queries in MS-Access
Open a DB and query other DBs
– one has to define aliases in FROM DB open here is called s-p1.mdb
– but this name has no importance here
Joins of tables in other databases
SELECT TOP 10 C.[Contact Name], C.City
FROM [c:\access\nwind2.mdb].Customers AS C, [c:\access\ordentr2.mdb].customers AS O
WHERE (o.Id= C.[customer Id]);
64
Result
Contact Name CityPat Parkes LondonGladys Lindsay SeattleElizabeth Lincoln TsawassenOlivia LaMont San FranciscoTerry Hargreaves LondonElizabeth Brown LondonSylvia Dunn LondonAnn Devon LondonRonald Merrick LondonBill Lee Pocatello
65
Join of a local and external table
SELECT TOP 10 S.SName, C.[Contact Name], C.City
FROM S, [nwind2.mdb].Customers AS C
WHERE ((S.City= C.City))
Order by [contact name];
Elementary Queries in MS-Access
66
SName Contact Name CityClark Ann Devon LondonClark Archibald Langford LondonClark Cornelia Giles LondonClark David Bird LondonClark Elizabeth Brown LondonClark G.K.Chattergee LondonClark Gerald Pipps LondonClark Hari Kumar LondonClark Jane Austen LondonClark Jeffrey Jefferies London
Result
67
MsAccess & MSQL
Clause INTO <externalDB> dans Select INTO ou INSERT INTOOpen B1
Select a, b, c INTO T IN B2 From D Use B1 ;copy into B2.T
select a, b c from D ;» D can be a view or a subquery» One cannot combine clauses IN et INTO
– INSERT de MsAccess has (sub)semantics of INSERT in MSQL
68
MsAccess & MSQL
IMPORT & EXPORT– menu commands – equivalent to MSQL query
Use B1 ;copy into T1 * from B2.T2 ;
69
MsAccess & MSQLComparison
Formulation of MBD elementary queries and views– first one has to define ATTACH's– then one formulates SQL monodatabase query– then, perhaps one needs to delete the ATTACH's
Much more procedurality than under MSQL– in Banks, one would need in practice that each DB attaches
all the tables of any other DB» Good luck DBA !
Multiple queries and other capabilities of MSQL» yet unknown of MsAccess
70
SQL Server, Sybase, Interbase MBD Architecture similar to that of MsAccess, but more powerful
– gateways to Oracle, IMS, DB2– ODBC
Transac-SQL support the following MSQL functions and is the MBD dialect least procedural in the industry– elementary queries
» to Sybase DBs at the same siteUSE B ;select * from T where B1.T1.a = T.a ;
» Only one DB per USE » some restrictions at the level of interdatabase queries
– multidatabase CREATE VIEW, and MDB triggers
71
Oracle, RDB, Informix Have an operation similar to ATTACH called
Create link:
Create public database link bnp connect to bnp_unix
Create public database link cic connect to cic_vms
SELECT br.brname, b.braname, br.streetFROM br @bnp, br@ cic bWHERE br.street = b.street ;
72
Oracle, RDB, Informix
MBD queries are possible only once the links are defined– Hence these DBMSs are + procedural than Sybase
for MBD operations
Starting from V7, Oracle supports however MBDs queries without links
» postfixing par the DB name as in the last ex.
73
EDA-SQL, DB Integrator, DBJoiner, Ingres* & al SQL MDBSs for access to DBMSs
– in theory, without their own DBs
» but there is always one for the MDB catalogs
auxiliary DB One has to create links and logical DBs
» almost virtual DBs only DB Integrator supports elem.
MDB queries – called multischema queries
No other MSQL functions
logical DB
logical DB
BD lMS
BD RDB
BD Ingres
Gateway
OD
BC
74
UniSQL & O-Adaptor
Similar to previous ones except that for the logical DB – UniSQL uses RO model
– O-Adaptor an OO model
No MDB queries (other than link creations)
logical DB
logical DB
BD lMS
BD RDB
BD Ingres
Gateway
OD
BC
75
Telebase (USA) MDBMS for access to inf. retr. DBs
– 1000+ DBs at many sites» with different local languages
STAIRS, INSPEC, DIALOG...
Extended Common Command Set Language (CCS)
No joins only Boolean clauses Supports the MSQL functions
– multidatabases names » Called Categories
– multiple queries » Called Scans DBs
INSPECDBs
STAIRS
DBsDIALOG
Drivers
CCS
76
Messidor : 1st Heterogeneous Multidatabase Information Retrieval Access System
Démonstration par C. Moulinoux (STERIA), INRIA, 1987
77
Meta-search engines
Metacrawler, BigHub.com, AskJeewes.com, Copernic…
Query simultaneously several search engines – Altavista, Yahoo, Excite, Hotbot…
Boolean Manipulation Langages Multiples Queries
– Apply the mdb aggregate functions Mdistinct Name, Mdistinct, Choose, Timeout…
78
Data Warehouses Popular new concept for MDBSs
– data warehouse <=> an MDB or federation in an enterprise
– With elaborated management of interdatabase dependencies
– new ideas:» elaborated DS implementation
elaborated decision support functions
incremental propagation
» an MDB view or a DB redundant with respect to existent ones is created
Data warehouse
DB lMS
DB RDB
DB Ingres
Gateway
OD
BC
Data mart
79
International Journal of Cooperative Information Systems
Special Issue on Design and Management of Data Warehouses
Guest editors: Manfred A. Jeusfeld and Martin Staudt
Data Warehousing embraces technology and industrial practice to systematically collect data from the enterprise and to use that data in ahighly aggregated form for managing the enterprise thru decisions. Littleattention is currently paid to design and manage a data warehouse (DW) insuch a way that it accomplishes its purpose, i.e. to support the managementof the enterprise. Existing solutions are focusing on technical aspectslike efficient source data extraction. Their parameters are howeverincomprehensible to the stakeholders who decide on the introduction of adata warehouse.
Data warehouses are important in managing large enterprises and incommunicating highly aggregated information between the variousdepartments. Interoperable tools and integrated methods to manage datawarehouses in order to fulfill the enterprise goals are desperately needed.Such tools should cover all aspects of data warehousing:
- selection of data sources- data cleaning- conceptual/logical/physical data warehouse design- enterprise modeling- data warehouse quality monitoring- data warehouse refreshment methods- architecture design- data mart customization, etc.
An Instructive Call for Papers
80
Exemple : Architecture de DB2 Data Warehouse
81
Conclusion MDB Manipulations - among most important R & D directions Other key-words:
– Interoperability
– Integration
– Distributed Heterogeneous DBs
– Data Warehouses MSQL is a research vehicle + advanced for relational MBDs The root for further resarch proposals
– MSQL with Integrity Constraints, IDL, SchemaSQL…
» For the latter, see especially ACM-TODS journal, Dec. 2001 Basic MSQL capabilities are in commercials DBMSs, Information Retrieval Ssytems, MDB
Access Systems, Data Warehouses, XML standard proposals…
– Others will follow But there is still a lot to do
– in the industry and in research
82
Exercises and Research Problems All these in the text ; mdb queries especially. Express various elementary mdb queries using Amos, MsAccess (SQL and QBE), SQL Server,
DB2, Oracle, Interbase… Invent your own instructive queries to BANKS Under MsAccess, design an MDB Form for an elementary query and for a multiple query.
Explain how you did i in a short report. Consider 3 attributes B1.T1.a, B2.T2.b, B3.T3.c. The attribute types are INT and unit of
measures are KG, G, mG. Consider that '=' operator has the usual mathematical meaning, with the usual rounding up of values with a different precision. Prove or disprove that the usual associativity of equijoins
(a JOIN b) JOIN c = a JOIN (b JOIN c)does not hold anymore. Comment on the consequences for the current relational query optimizers.
If you had to evaluate a JOIN b using manual unit conversion, would you rather convert a to b or vice versa ?
Propose and justify a reasonable algorithm for the multiple join evaluation. A unit conversion algorithm A may be a long calculus. Would you rather:
– apply A to every value V of a the manipulated table– project or order the table first, then apply A once for every different value
83
Exercises and Research Problems Propose an execution tree expressed in mdb algebra for the query of slide 30 Add unit conversion to your favorite query optimizer (Ph. D Thesis) Try to express the example queries to Banks using SchemaSQL language. For each query,
present also the result. Can it be a multirelation ? Try to express the example queries to Banks using IDL language. For each query, present also
the result. Can it be a multirelation ? Consider that bnp.balance is in US$ and cic.balance is in FF. Consider that the exchange rate is
in some table ExRa in DB called Currency. Is it possible to find accounts with the same balance in both DBs using a single MSQL query ?
Consider that to perform a multidatabase join A JON B one has to bring both tables into a database. What are your options for an elementary MDB query processing, if there are selections, joins, and projections ?
Idem, if you consider distributed join processing ? Consider a multitable R = (R1, R2, R3), a table T and the query Q
select * from R where R.a = T.a and T.b = '123' ;
What are your option for Q's optimization ?
Propose an implementation for your favorite MSQL aggregate function (Ph. D. Thesis or a part of it at least)
END