CS 542 Introduction
description
Transcript of CS 542 Introduction
CS 542 Database Management Systems
J Singh
January 13, 2011
2© J Singh, 2011 2
About the Instructor
• Please call me J (Just one letter, no period).• Office hours: 5:00 – 6:00 day of the class in this room.
• In my spare time:– President, Early Stage IT – a cloud-based consulting firm– Co-founder and CTO, ConnectScholar – a cloud-based
web service– Co-chair of Software and Services SIG at TiE-Boston
• In the past:– Director of Software Engineering, Fidelity Investments– Software Architect, Computervision Corp– Prof. of EE @ WPI
3© J Singh, 2011 3
A SQL test
• Explain the difference between
SELECT bFROM RWHERE a<10 OR a>=10;
and
SELECT bFROM R;
a b5 2010 3020 40… …
R
4© J Singh, 2011 4
Another SQL test
• Explain the difference betweenSELECT aFROM R, SWHERE R.b = S.b;
• AndSELECT aFROM RWHERE b IN (SELECT b FROM S);
5© J Singh, 2011 5
About CS 542
• CS 542 will– Build on database
concepts you already know
– Provide you tools for separating hype from reality
– Help you develop skills in evaluating the tradeoffs involved in using and/or creating a database
• CS 542 may– Train you to read
technical journals and apply them
• CS 542 will not– Cover the intricacies of
SQL programming– Spend much effort in
• Dynamic SQL• Stored Procedures• Interfaces with
application programming languages
• Connectors, e.g., JDBC, ODBC
6© J Singh, 2011 6
What’s so fun about databases?
• Traditional database courses talked about– Employee records– Bank records
• Now we talk about– Web search– Data mining– The collective intelligence of tweets– Scientific and medical databases
7© J Singh, 2011 7
How much data can a database hold?
• The biggest OLTP databases– 2001: 1.1 – 10.3 TB.– 2003: 9.1 – 29.2 TB.– 2005: 17.7 – 100.4 TB.– 2010: ~2.5 PB.
• The trend will continue
• Very large databases bring new unique challenges
• CS 542 is about the challenges of big databases
8© J Singh, 2011 8
DBMS Architecture
• Applications can be in any programming language
• DBMS presents a programmatic interface to the applications
– Typically SQL– SQL is not a Turing-complete
programming language• Every SQL statement is
guaranteed to complete
9© J Singh, 2011 9
Databases are a strategic asset
• The value of a company is defined by its data, for example:
– LinkedIn, Facebook, eBay, Amazon, Google– They who have the data have the power
• Some examples?
• The power of the data comes from– Its quality – Its consistency– Its ease of retrieval– What else?
• CS 542 is about creating & enhancing the power of data
10© J Singh, 2011 10
Course Plan
• Course Plan
• Course Policies and Grading Rubric
• Other materials:– Prof. Shivnath Babu, Duke Univ.– Prof. Ullman, Stanford Univ.– Prof. Ramakrishnan, U. of
Wisconsin– Published papers, cited in the
notes.
11© J Singh, 2011 11
Computing Resource Options
• On your laptop– Download MySQL from the web
• Eclipse IDE if desired– Microsoft Access or SQL Server
• WPI Computer Science Resources– MySQL or Oracle
• Amazon AWS– $100 credit per student, send me an email to get authorization
code• For use with RDS or MapReduce
• Google App Engine– BigTable is available under the name DataStore, free up to a
limit
CS 542 Database Management SystemsRelational Model of Data
13© J Singh, 2011 13
Overview of Data Models
Name Structure Operations
Constraints
Remarks
Relational Set of Tuples Queries Type,Uniqueness,Value
Semi-structured
XML or JSON Navigation Type,Uniqueness,Value
Self-describing data
• A Data Model pertains to the structure, operations and any constraints on those structures
14© J Singh, 2011 14
Basics of the Relational Model
Browser Engine Platform Engine VersionInternet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8
• A Table is referred to as a Relation
• Each Row is a tuple; each Column is an attribute– Each attribute is constrained to be a specific type
• May also have value constraints• May also have uniqueness constraints
15© J Singh, 2011 15
More on Relations
• A Relation is a set, not a list– Order of tuples is irrelevant
• It’s common to add/modify/delete rows
• Not so common to add/delete columns
• When you modify a relation, the old version is replaced by the new version
– At any time, the relation only has “the current instance”– Almost impossible to get the state back to prior versions
• Why is this so hard?
16© J Singh, 2011 16
Keys of Relations
• A key is uniquely able to identify a tuple– Single-column keys– Multi-column keys– Can have more than one key
• (more than one way to identify a tuple)
17© J Singh, 2011 17
Defining a Relational Schema in SQL
• Data Definition Language– Equivalent of declaration statements in C or Java– Look these up for the database of your choice:
CREATE TABLEDROP TABLEALTER TABLE
• Data Manipulation Language– Equivalent of programming constructs– Will be covered next week
18© J Singh, 2011 18
CREATE TABLE
CREATE TABLE Engine (name CHAR(30),maker CHAR(45),remarks VARCHAR(255)
);
19© J Singh, 2011 19
CREATE TABLE
CREATE TABLE Engine (name CHAR(30),maker CHAR(45),remarks VARCHAR(255) DEFAULT ''
);
20© J Singh, 2011 20
CREATE TABLE
CREATE TABLE Engine (name CHAR(30) PRIMARY KEY,maker CHAR(45),remarks VARCHAR(255) DEFAULT ''
);
21© J Singh, 2011 21
CREATE TABLE
CREATE TABLE Engine (name CHAR(30),maker CHAR(45),remarks VARCHAR(255) DEFAULT '',PRIMARY KEY (name)
);
22© J Singh, 2011 22
The Algebra of Data Manipulation (p1)
• Set Operations on Relations– Union, Intersection, Difference– The tuples must have the same schema
• Subsetting: selection, projection– C(R) Selection of R for condition C
• yields a subset of rows
– A1,A2,… ,An (R) Projection of R for attributes A1, A2,… , An
• yields a subset of columns
• Quasi-multiplication operators – also known as JOIN
• Renaming of tables or their attributes– a/b(R) Rename column b in R to a
23© J Singh, 2011 23
The Algebra of Data Manipulation (p2)
• See treatment in Wikipedia. Focus on – natural-joins, – -joins, – semi-joins and – outer-joins
24© J Singh, 2011 24
The Algebra of Data Manipulation (p3)
• Relational Algebra allows you to combine the primitivesSELECT bFROM RWHERE a<10 OR a >= 10;
– b ((a < 10 OR a 10)(R))
SELECT aFROM RWHERE b IN (SELECT b FROM S);
– a ((b IN b(S)) (R))
25© J Singh, 2011 25
Next meeting
• January 24
• Chapter 6• Sections 5.3 and 5.4
• Due on 1/24: a proposal for your presentation topic– No more than 1 page, no less than 300 words.– Include an initial bibliography– Will not be graded independently, feedback will be
provided– Will feed into your presentation grade