Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2....
Transcript of Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2....
2/8/11
1
Introduc.on to Database Systems
CISC437/637, Lecture #1 Ben Cartere@e
1 Copyright © Ben Cartere@e
Database Systems
• The overview in 5 Ws (and one H): – What is a database? What is a database management system (DBMS)?
– Why use databases? Why study them?
– Who works with databases? – How does a DBMS work? – Where and when did databases originate?
Copyright © Ben Cartere@e 2
2/8/11
2
What is a Database?
• A database is a collec.on of data – Usually large quan..es of interrelated data • E.g. student records, faculty records, courses, classrooms, payrolls, …
• A database management system (DBMS) is a soZware system designed to store and manage data
Copyright © Ben Cartere@e 3
Why Use a DBMS?
“So a bunch of text files on disk can be a database. I’ll just process them with Python. Why do I need to learn about design and DBMSs?”
• Data too large to fit in memory; files too big for random access on disk
• Arbitrarily complex queries that must be answered quickly • Many users accessing data concurrently
• Some users need different access permissions
Copyright © Ben Cartere@e 4
2/8/11
3
Why Use a DBMS?
• Data independence • Efficient access
• Integrity and security • Access administra.on
• Concurrent access • Applica.on development .me
Copyright © Ben Cartere@e 5
Why Not Use a DBMS?
• DBMSs are large, complex programs designed for very general data needs and workloads – They are not always suitable for specialized tasks
• Applica.on may need to manipulate data in ways not supported by DBMS
• Security, concurrent access, crash recovery may not be cri.cal
• Example: web search
Copyright © Ben Cartere@e 6
2/8/11
4
Why Study Databases?
• Mul.billion dollar industry, second only to opera.ng systems
• Databases form backbone of many informa.on-‐centric applica.ons – Using computa.on to create and understand informa.on
• Implemen.ng and understanding DBMS incorporates knowledge from every area of CS – Systems, theory, ar.ficial intelligence
Copyright © Ben Cartere@e 7
Applica.ons of Databases
• Electronic commerce and banking – Amazon, eBay, PayPal
– Integra.ng vast catalogs and accounts, high security
• Social networking – Facebook, Twi@er – Analyzing flow of informa.on through large, .ghtly-‐connected networks
Copyright © Ben Cartere@e 8
2/8/11
5
Applica.ons of Databases
• Sensor networks – GPS, RFID, … – OZen supports mission-‐cri.cal applica.ons
– Response to failures and trust are important
• Bioinforma.cs, health informa.cs – Gene Ontology, PubMed, …
– Requires data integra.on, pa@ern matching, approximate matching, ranking, automa.c inference
Copyright © Ben Cartere@e 9
Who Works With Databases?
• DBMS programmers actually implement the DBMS soZware
• Database administrators design storage requirements, handle security, ensure graceful recovery, tune database performance
• Applica;ons programmers write soZware that interacts with a database
• End users use the soZware wri@en by applica.ons programmers
Copyright © Ben Cartere@e 10
2/8/11
6
How Does a DBMS Work?
• This is the focus of the course • Today: a brief overview of the topics that will be covered
1. Data models; database design 2. Database queries 3. Transac.on management 4. DBMS structure; scalability and efficiency
Copyright © Ben Cartere@e 11
Data Models
• A data model is a collec.on of concepts for describing data
• A schema is a descrip.on of a par.cular collec.on of data using a given model
• The rela;onal data model is most common – Rela;ons (tables of records) are the main concept – Every rela.on has a schema that describes the record fields/table columns
Copyright © Ben Cartere@e 12
2/8/11
7
Levels of Abstrac.on • Views or external schema
describe how users see the data
• Conceptual schema define the logical structure of rela.ons
• Physical schema describe the specific files used to store a rela.on on disk
Copyright © Ben Cartere@e 13
Physical Schema
Conceptual Schema
View 1 View 2 View 3
Database Design
• Designing a database: – A user/client has data and requirements for how they need to access and modify it
• Design steps: – requirements views – views conceptual schema – conceptual schema physical schema – Loop un.l it’s right: integrity maintained, consistent, fast, easy to use
Copyright © Ben Cartere@e 14
2/8/11
8
Data Independence
• Using an external schema does not require knowledge of conceptual schema – Logical data independence
• Using a conceptual schema does not require knowledge of physical schema – Physical data independence
• Applica.ons are insulated from how data is structured and stored
• End users are insulated from how data is organized and constrained
Copyright © Ben Cartere@e 15
Database Queries
• Queries are ques.ons asked of the data
• A query language specifies how queries are posed in a specific data model – The language consists of keywords and operators for manipula.ng rela.ons – the data manipula;on language (DML)
• Formula.ng a query does not require knowledge of physical schema
• Query languages allow fast applica.on development – Embed DML in high-‐level language like Java, C, Python
Copyright © Ben Cartere@e 16
2/8/11
9
Concurrency Control
• Many databases are used by mul.ple users concurrently – Each user manipula.ng rela.ons in different ways – Simultaneous uses can result in inconsistencies • E.g. one is looking up vacancies while another is making a reserva.on
• DBMSs ensure that problems don’t happen
Copyright © Ben Cartere@e 17
Transac.on Management
• A transac;on is an atomic sequence of database ac.ons (reads and writes)
• The complete execu.on of each transac.on must leave the database in a consistent state if the database is consistent when it begins – Consistency means no logical conflicts
• User/applica.on formulates integrity constraints for the DBMS to enforce
Copyright © Ben Cartere@e 18
2/8/11
10
Scheduling Transac.ons
• DBMS ensures that execu.on of {T1, …, Tn} is equivalent to serial execu.on T1’, …, Tn’ – Locks: before reading or wri.ng, a transac.on requests a lock on an object, and does nothing un.l DBMS grants lock. Locks are released aZer execu.on.
– Use locks to force ordering of unordered transac.ons. – Deadlock: Ti has lock on object A and needs lock on object B. Tj has lock on object B and needs lock on object A.
Copyright © Ben Cartere@e 19
Atomicity
• “All or nothing”: an atomic transac.on is one that either completely finishes or does not happen at all
• DBMS needs to maintain atomicity even when it crashes in the middle of transac.ons
• Use a log to keep track of ac.ons DBMS takes to execute transac.on – Write-‐ahead log (WAL) enables this
• Transac.on isn’t done un.l all of its ac.ons are done
Copyright © Ben Cartere@e 20
2/8/11
11
Write-‐Ahead Log
• The log consists of the following: – For write ac.ons, the old data and the new data – A flag indica.ng whether the transac.on was commi@ed or aborted
• Transac.ons can be undone when commit not present
• Deadlocks can be resolved by abor.ng one transac.on and allowing the other to con.nue
Copyright © Ben Cartere@e 21
DBMS Structure
• Layered architecture, each layer only aware of layer below it
Copyright © Ben Cartere@e 22
Query op.miza.on & execu.on
Rela.onal operators
Files and access methods
Buffer management
Disk space management
DB
Recovery manager
Transac.on manager
Lock manager
Concurrency control
2/8/11
12
When and Where
• Charles Bachman designed the Integrated Data Store at General Electric in the 1960s
• The network data model, a tree-‐based representa.on designed for explora.on rather than querying
• First Turing Award winner in 1973
Copyright © Ben Cartere@e 23
When and Where
• Edgar Codd proposed rela.onal data model in 1970 at IBM
• Quickly became the basis of commercial systems; strong theore.cal founda.on developed
• Turing Award 1981
Copyright © Ben Cartere@e 24
2/8/11
13
When and Where
• Jim Gray made fundamental contribu.ons to transac.on management in the 80s and 90s
• Allowed DBMSs to scale to huge applica.ons with thousands or millions of users
• Turing Award 1999
Copyright © Ben Cartere@e 25
Summary
• DBMS used to maintain and query large amounts of data
• They allow concurrent access, recovery from failure, fast applica.on development, security
• Levels of abstrac.on mean that one can work on one subproblem without knowing about others
• Huge industry and huge research area in CS
Copyright © Ben Cartere@e 26