Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced...
Transcript of Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced...
![Page 1: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/1.jpg)
Advanced Database Techniques
Martin.Kersten @ cwi.nlStefan [email protected]
Sandor Heman @ cwi.nlJennie Zhang @ cwi.nl
Romulo Goncalves @cwi.nl
![Page 2: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/2.jpg)
Administrative details• The website evolves as during the course• Exam material is marked explicitly• Lab work deadlines are strict
• Email is the preferred way to communicate• Tomorrow the assistants will be available in
person between 11:00-12:00, room REC-P.123
![Page 3: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/3.jpg)
Relational systems• A database system should simplify the
organization, validation, sharing, and bookkeeping of information
• Prerequisite knowledge– Relational data model and algebra– Data structures (B-tree, hash)– Operating system concepts– Using a SQL database system
• What is your practical experience?[Ruby on Rails expertise needed]
![Page 4: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/4.jpg)
Applications• Bread-and-butter applications?
– Web-shop– Banking systems– Inventory systems– Production systems– Shopping systems– Government systems– Health systems– Multimedia systems– Science systems …
![Page 5: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/5.jpg)
Advanced Applications• Bread-and-butter applications ???
– Banking systems• What happens if you install a stock trading system
which should handle >100K transactions/minute• How to derive trading advice using compute
intensive applications• How to warn thousands of users about their trading
opportunity
– …. Need for parallel, distributed main-memory database technology…
![Page 6: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/6.jpg)
![Page 7: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/7.jpg)
Advanced application requirements• Bread-and-butter applications
– Inventory applications• How to install a battlefield inventory systems• How to deliver goods just in time?• How to keep track of moving objects/persons ?
• … need for sensor-based database support and RFID tags … need for a new DBMS ?…
![Page 8: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/8.jpg)
![Page 9: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/9.jpg)
Advanced Applications• Production systems
– How to interact with component suppliers– How to manage the production workflow– How to avoid bad production steps– How to maintain a database with 12000 tables
(SAP)
• … need for interoperability between autonomous systems… datamining and knowledge discovery…
![Page 10: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/10.jpg)
![Page 11: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/11.jpg)
Advanced Applications• Health information systems
– How to monitor your health over 30 years– How to enable quick response to a heart attack
• …need for interoperable database systems …
![Page 12: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/12.jpg)
HELP
The Ambient Home
![Page 13: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/13.jpg)
HELP
The Ambient Home
911 called
![Page 14: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/14.jpg)
MonetDB DataCell
![Page 15: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/15.jpg)
MonetDB DataCell
911 called
nucleus
A Shared Tuple Spaceusing an SQL DBMS
![Page 16: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/16.jpg)
MonetDB DataCell
911 called
receptors emittersnucleus
![Page 17: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/17.jpg)
HELP
MonetDB DataCell
Recall
receptors emittersnucleus
![Page 18: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/18.jpg)
MonetDB DataCell
Keep
911 called
receptors emittersnucleus
![Page 19: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/19.jpg)
HELP
MonetDB DataCell
forget
receptors emittersnucleus
![Page 20: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/20.jpg)
MonetDB DataCell
Aggregate
911 called
receptors emittersnucleus
![Page 21: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/21.jpg)
MonetDB DataCell
911 called
receptors emittersnucleus
Recall
Aggregate
Keep
Forget
![Page 22: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/22.jpg)
SQL work load-- SQL-queries
insert into hospital select ‘John’,* from medic where temp>40.0;
insert into epdselect * from medic where temp>=38.0;
delete from medic ;
Recall
Aggregate
Keep
Forget
![Page 23: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/23.jpg)
SQL work load
insert into hospital select ‘John’,* from medic where temp>40.0;
insert into epdselect * from medic where temp>=38.0;
delete from medic ;
Start End
![Page 24: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/24.jpg)
Query optimizationThe queries in a datacell have
- a soft/hard deadline- strong flow dependency
The operands to the queries are small tables:
- empty- single value- a few values
Traditional query optimizers are biased towards large operands.
Recall
Aggregate
Keep
Forget
![Page 25: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/25.jpg)
Query optimizationChallenges:
• How to optimize the individual SQL programs to select the proper QEP ?
•How to weave the collection of SQL programs to create an optimal multi-query version?
Recall
Aggregate
Keep
Forget
![Page 26: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/26.jpg)
Advanced Applications• Multimedia Systems
– Narrow/broad casting, selective dissemination of volumetric information
– Searching in multimedia storage
• … need for P2P infrastructure …search facilities over feature spaces…
![Page 27: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/27.jpg)
Advanced applications• Government systems
– Security• Biometric data management issues, finger/image
matching
– Public safety• Forensics, manipulate complex objects using
proprietary algorithms
• …need for extensible database technology…need to support unstructured data…
![Page 28: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/28.jpg)
Advanced Applications• Science systems
– The new accelerator in CERN • how to handle >1PTByte files
– The Sloan Digital Skyserver schema is 200 pages and the catalogued data 2.5Tb
• How to query this efficiently
– ..need for P2P and … a novel way to organize data…
![Page 29: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/29.jpg)
![Page 30: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/30.jpg)
![Page 31: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/31.jpg)
LOFAR central processor specs• Streaming Data
– Input: 320 Gbit/s– Internally within correlator: 20 Tbit/s– Into storage: 25 Gbit/s = 250 TByte/day– Final products: 1-3 TByte/day
• High Performance Computing– Correlation: 15 Tflops– Pre processing and filtering: 5 Tflops– Off-line processing (calibration, analysis): 5-10 Tflops– Visualisation, control, scheduling etc: 2 Tflops
• Storage– On-line temporal storage: 500 TByte– Archive: PByte range of data stored in Grid
![Page 32: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/32.jpg)
Technological challenges• Data is often not structured as tables
– XML and XQuery
• Data does not always fit on one system– Distributed and parallel databases
• Querying is more like world-wide searching– Continuous and streaming queries
• A database tells more than facts– Datamining and knowledge discovery
![Page 33: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/33.jpg)
Code bases• Database management systems are BIG software
systems– Oracle, SQL-server, DB2 >1 M lines– PostgreSQL 300K lines– MySQL 500 K lines– MonetDB 200-800 K lines – SQLite 40K lines
• Programmer teams for DBMS kernels range from a few to a few hundred
![Page 34: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/34.jpg)
Performance components• Hardware platform• Data structures• Algebraic optimizer• SQL parser• Application code
– What is the total cost of execution ?– How many tasks can be performed/minute ?– How good is the optimizer?– What is the overhead of the datastructures ?
![Page 35: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/35.jpg)
Not all are equal
0.400.611.704.550.93Big delete and small insert
1.483.211.8113.160.36Big insert after delete
0.752.062.261.310.22Delete with index
0.564.000.971.500.32Delete on text index
1.592.781.5361.360.65Insert from select
1.722.406.9848.1310.3225000 updates on text
3.103.528.1318.798.3325000 updates with index
0.630.638.411.730.431000 updates
1.161.121.274.615.225000 range index selects
3.373.364.6413.402.15100 string range selects
2.522.492.763.620.18100 range selects
1.420.942.184.916.7125000 inserts 1 transaction
0.2213.060.154.300.271000 inserts transactions
SQLlitenosync
SQLiteMySQLPostgreSQLMonetDB
![Page 36: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/36.jpg)
Not all are equal
![Page 37: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/37.jpg)
Not all are equal
Why does it take so long to built a 10Mx2 table?How long will it take to do 10Mx32 on SQLserver Beta 2 ?
![Page 38: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/38.jpg)
![Page 39: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/39.jpg)
Gaining insight• Study the code base (inspection + profiling)
– Often not accessible outside development lab
• Study individual techniques (data structures + simulation)– Focus of most PhD research in DBMS
• Detailed knowledge becomes available, but ignores the total cost of execution.
• Study as a functional black box– Analyse a small application framework
![Page 40: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/40.jpg)
The Jack The Ripper Project• Study the snippet of the database technology and
design an XQuery and SQL application
• What is the schema?
• What are the queries?
• What are unorthodox solutions?
![Page 41: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/41.jpg)
Learning points• My poor knowledge on relational database? Read
the chapters on SQL and relational algebra. Knowledge on data structures comes in handy.
• Database systems are much more than administrative bookkeeping systems
![Page 42: Advanced Database Techniqueshomepages.cwi.nl/~mk/onderwijs/adt2007/lectures/lecture...Advanced Database Techniques Martin.Kersten @ cwi.nl Stefan Manegold@cwi.nl Sandor Heman @ cwi.nl](https://reader033.fdocuments.in/reader033/viewer/2022050200/5f5405bc9a64c7534779d9a3/html5/thumbnails/42.jpg)
Learning points
– Advanced application challenge the technology provided by a DBMS
– Many techniques do not easily scale in size, complexity, functionality
– Effectiveness of a DBMS is determined by many tightly interlocked components