CS 542 Introduction

25
CS 542 Database Management Systems J Singh January 13, 2011

description

 

Transcript of CS 542 Introduction

Page 1: CS 542 Introduction

CS 542 Database Management Systems

J Singh

January 13, 2011

Page 2: CS 542 Introduction

2© J Singh, 2011 2

About the Instructor

• Please call me J (Just one letter, no period).• Office hours: 5:00 – 6:00 day of the class in this room.

• In my spare time:– President, Early Stage IT – a cloud-based consulting firm– Co-founder and CTO, ConnectScholar – a cloud-based

web service– Co-chair of Software and Services SIG at TiE-Boston

• In the past:– Director of Software Engineering, Fidelity Investments– Software Architect, Computervision Corp– Prof. of EE @ WPI

Page 3: CS 542 Introduction

3© J Singh, 2011 3

A SQL test

• Explain the difference between

SELECT bFROM RWHERE a<10 OR a>=10;

and

SELECT bFROM R;

a b5 2010 3020 40… …

R

Page 4: CS 542 Introduction

4© J Singh, 2011 4

Another SQL test

• Explain the difference betweenSELECT aFROM R, SWHERE R.b = S.b;

• AndSELECT aFROM RWHERE b IN (SELECT b FROM S);

Page 5: CS 542 Introduction

5© J Singh, 2011 5

About CS 542

• CS 542 will– Build on database

concepts you already know

– Provide you tools for separating hype from reality

– Help you develop skills in evaluating the tradeoffs involved in using and/or creating a database

• CS 542 may– Train you to read

technical journals and apply them

• CS 542 will not– Cover the intricacies of

SQL programming– Spend much effort in

• Dynamic SQL• Stored Procedures• Interfaces with

application programming languages

• Connectors, e.g., JDBC, ODBC

Page 6: CS 542 Introduction

6© J Singh, 2011 6

What’s so fun about databases?

• Traditional database courses talked about– Employee records– Bank records

• Now we talk about– Web search– Data mining– The collective intelligence of tweets– Scientific and medical databases

Page 7: CS 542 Introduction

7© J Singh, 2011 7

How much data can a database hold?

• The biggest OLTP databases– 2001: 1.1 – 10.3 TB.– 2003: 9.1 – 29.2 TB.– 2005: 17.7 – 100.4 TB.– 2010: ~2.5 PB.

• The trend will continue

• Very large databases bring new unique challenges

• CS 542 is about the challenges of big databases

Page 8: CS 542 Introduction

8© J Singh, 2011 8

DBMS Architecture

• Applications can be in any programming language

• DBMS presents a programmatic interface to the applications

– Typically SQL– SQL is not a Turing-complete

programming language• Every SQL statement is

guaranteed to complete

Page 9: CS 542 Introduction

9© J Singh, 2011 9

Databases are a strategic asset

• The value of a company is defined by its data, for example:

– LinkedIn, Facebook, eBay, Amazon, Google– They who have the data have the power

• Some examples?

• The power of the data comes from– Its quality – Its consistency– Its ease of retrieval– What else?

• CS 542 is about creating & enhancing the power of data

Page 10: CS 542 Introduction

10© J Singh, 2011 10

Course Plan

• Course Plan

• Course Policies and Grading Rubric

• Other materials:– Prof. Shivnath Babu, Duke Univ.– Prof. Ullman, Stanford Univ.– Prof. Ramakrishnan, U. of

Wisconsin– Published papers, cited in the

notes.

Page 11: CS 542 Introduction

11© J Singh, 2011 11

Computing Resource Options

• On your laptop– Download MySQL from the web

• Eclipse IDE if desired– Microsoft Access or SQL Server

• WPI Computer Science Resources– MySQL or Oracle

• Amazon AWS– $100 credit per student, send me an email to get authorization

code• For use with RDS or MapReduce

• Google App Engine– BigTable is available under the name DataStore, free up to a

limit

Page 12: CS 542 Introduction

CS 542 Database Management SystemsRelational Model of Data

Page 13: CS 542 Introduction

13© J Singh, 2011 13

Overview of Data Models

Name Structure Operations

Constraints

Remarks

Relational Set of Tuples Queries Type,Uniqueness,Value

Semi-structured

XML or JSON Navigation Type,Uniqueness,Value

Self-describing data

• A Data Model pertains to the structure, operations and any constraints on those structures

Page 14: CS 542 Introduction

14© J Singh, 2011 14

Basics of the Relational Model

Browser Engine Platform Engine VersionInternet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8

• A Table is referred to as a Relation

• Each Row is a tuple; each Column is an attribute– Each attribute is constrained to be a specific type

• May also have value constraints• May also have uniqueness constraints

Page 15: CS 542 Introduction

15© J Singh, 2011 15

More on Relations

• A Relation is a set, not a list– Order of tuples is irrelevant

• It’s common to add/modify/delete rows

• Not so common to add/delete columns

• When you modify a relation, the old version is replaced by the new version

– At any time, the relation only has “the current instance”– Almost impossible to get the state back to prior versions

• Why is this so hard?

Page 16: CS 542 Introduction

16© J Singh, 2011 16

Keys of Relations

• A key is uniquely able to identify a tuple– Single-column keys– Multi-column keys– Can have more than one key

• (more than one way to identify a tuple)

Page 17: CS 542 Introduction

17© J Singh, 2011 17

Defining a Relational Schema in SQL

• Data Definition Language– Equivalent of declaration statements in C or Java– Look these up for the database of your choice:

CREATE TABLEDROP TABLEALTER TABLE

• Data Manipulation Language– Equivalent of programming constructs– Will be covered next week

Page 18: CS 542 Introduction

18© J Singh, 2011 18

CREATE TABLE

CREATE TABLE Engine (name CHAR(30),maker CHAR(45),remarks VARCHAR(255)

);

Page 19: CS 542 Introduction

19© J Singh, 2011 19

CREATE TABLE

CREATE TABLE Engine (name CHAR(30),maker CHAR(45),remarks VARCHAR(255) DEFAULT ''

);

Page 20: CS 542 Introduction

20© J Singh, 2011 20

CREATE TABLE

CREATE TABLE Engine (name CHAR(30) PRIMARY KEY,maker CHAR(45),remarks VARCHAR(255) DEFAULT ''

);

Page 21: CS 542 Introduction

21© J Singh, 2011 21

CREATE TABLE

CREATE TABLE Engine (name CHAR(30),maker CHAR(45),remarks VARCHAR(255) DEFAULT '',PRIMARY KEY (name)

);

Page 22: CS 542 Introduction

22© J Singh, 2011 22

The Algebra of Data Manipulation (p1)

• Set Operations on Relations– Union, Intersection, Difference– The tuples must have the same schema

• Subsetting: selection, projection– C(R) Selection of R for condition C

• yields a subset of rows

– A1,A2,… ,An (R) Projection of R for attributes A1, A2,… , An

• yields a subset of columns

• Quasi-multiplication operators – also known as JOIN

• Renaming of tables or their attributes– a/b(R) Rename column b in R to a

Page 23: CS 542 Introduction

23© J Singh, 2011 23

The Algebra of Data Manipulation (p2)

• See treatment in Wikipedia. Focus on – natural-joins, – -joins, – semi-joins and – outer-joins

Page 24: CS 542 Introduction

24© J Singh, 2011 24

The Algebra of Data Manipulation (p3)

• Relational Algebra allows you to combine the primitivesSELECT bFROM RWHERE a<10 OR a >= 10;

– b ((a < 10 OR a 10)(R))

SELECT aFROM RWHERE b IN (SELECT b FROM S);

– a ((b IN b(S)) (R))

Page 25: CS 542 Introduction

25© J Singh, 2011 25

Next meeting

• January 24

• Chapter 6• Sections 5.3 and 5.4

• Due on 1/24: a proposal for your presentation topic– No more than 1 page, no less than 300 words.– Include an initial bibliography– Will not be graded independently, feedback will be

provided– Will feed into your presentation grade