745: Advanced Database Systems - UMass...

33
745: Advanced Database Systems Yanlei Diao University of Massachusetts Amherst

Transcript of 745: Advanced Database Systems - UMass...

Page 1: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

745: Advanced Database Systems

Yanlei Diao University of Massachusetts Amherst

Page 2: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Outline

•  Overview of course topics

•  Course requirements

Page 3: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Database Management Systems

1.  Online Analytical Processing (OLAP) vs. Online Transaction Processing (OLTP) –  Different data characteristics and query workloads –  Different architectural design and query processing techniques

2.  New Data Models and Related Systems –  Temporal DB; Sequence DB; Continuous Queries; Stream Systems

3.  Big Data Systems and Cluster Computing –  Traditional parallel databases –  Big data systems:

•  Cluster computing, •  New storage systems •  Low latency analytics •  Cloud computing

Page 4: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Databases and DBMS’s

•  A database is a large, integrated collection of data •  A database management system (DBMS) is a

software system designed to store and manage a large amount of data –  Declarative interface to define data stored, add data,

update data, and query data –  Efficient querying –  Concurrent users –  Reliable storage and crash recovery –  Access control…

Page 5: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Early DBMS’s

•  Early DBMS’s (1960’s) evolved from file systems •  Typical workloads

–  Many small data items, many queries and updates –  Banking –  Airline reservations…

•  1960s Navigational DBMS –  Tree-based or graph-based data model –  Manual navigation to find what you want –  No support for “search” (“search” ≠ “program”)

Page 6: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Relational DBMS

•  Relational model (E.F. Codd, 1970) –  Data independence: hides details of physical storage

from users –  Declarative query language: say what you want, not

how to compute it –  Mathematical foundation: what queries mean, possible

implementations •  Query optimization (1970’s till now)

–  Earliest: System R at IBM, INGRES at UC Berkeley –  Queries can be efficiently executed despite data

independence and declarative queries! •  Online Transaction Processing (OLTP)

Page 7: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Commercial DBMS’s

System R

INGRES

Material in this slide based on wikipedia

Sybase

Informix Postgres

MS SQL Server

IBM DB2

Oracle

MySQL

Page 8: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Online Analytical Processing (OLAP)

•  Data warehouses –  Large amounts of data over years, complex queries,

designed for analysis and reporting –  Sales data analysis, e.g., Walmart, Target, … –  Fraud analysis, e.g., credit card use, insurance –  Call record analysis, e.g., AT&T –  Changes: Schema design, data cleaning and loading,

indexing, aggregation, materialized views, data mining, new storage layout (column-based)…

Page 9: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

More Recent Application

•  Social networking –  E.g., facebook.com, myspace.com, with 100’s millions

of users at a popular site –  Need to store user profiles, friend info, photos

uploaded, messages exchanged, page views/clicks –  100 terabytes of new data/day, 100 petabytes in total

–  Question: OLTP or OLAP databases?

Page 10: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

New for New Models & Systems

•  Need to support loose and rich structures –  Extensible Markup Language (XML)

•  Need to support time related queries –  Temporal databases

•  Need to support sequence related queries –  Sequence databases

•  Need to support long running queries on continuous data streams –  Continuous query (CQ) systems –  Stream systems

Yanlei Diao, University of Massachusetts Amherst

Page 11: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Data Integration & Sharing

Internet"

Sid Name Contact

107 J. Black 413-555-1223

109 F. James 513-123-0102

111 A. Wang 617-011-3789

… … …

Sid FirstName LastName Contact

12 Joe Smith [email protected]

34 Anna Lee [email protected]

171 Mike Levine [email protected]

… … … …

Amherst College Student Database"

UMass Student Database"

Page 12: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

WWW"

Structured data - Databases"

Unstructured Text - Documents"

Semistructured Data"

Integration of Text & Structured Data

Page 13: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Need for A Rich, Flexible Data Model

•  Need to support loose and rich structures –  Evolving, unknown, irregular structures

–  Integration of structured, but heterogeneous data sources

–  Textual data with tags and links

•  XML was originally proposed for online publishing, is becoming the wire format for data exchange.

–  http://www.w3.org/TR/REC-xml/

Page 14: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Data Stream Management

Two driving forces:

"  A collection of applications where data streams naturally exist but DBMS doesn’t help much

"  Advances of sensor technologies

Page 15: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Financial Applications

•  Financial services –  Data feeds: stock tickers, foreign exchange

transactions… –  Data rate: 10’s or 100’s thousands of messages per

second –  Applications:

•  routing trade requests, •  automating trade strategies, •  market trend analysis…

–  Stream systems: e.g., •  http://www.streambase.com •  http://www.aleri.com

Page 16: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Network and System Monitoring •  Network monitoring

–  Packet traces, network performance measurements… –  Data rate: gigabits per second –  Applications:

•  traffic analysis, performance monitoring, router configuration, intrusion detection…

–  Stream systems: e.g., •  Gigascope at AT&T

•  System/Application monitoring… –  Data: system log, measurements –  Stream systems: e.g.,

•  Ganglia http://ganglia.info/

Page 17: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Wireless Sensor Networks

•  Wireless sensor networks –  Sensor devices: temperature, light,

pressure, acceleration, humidity, magnetic field, …

–  A set of sensor devices auto-configure themselves into a communication network

–  Applications: •  environment monitoring •  habitat monitoring •  structural monitoring •  vehicle tracking…

Page 18: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Data Warehouse of A Social Network

Yanlei Diao, University of Massachusetts Amherst

Web Server

Web Server

Web Server

Data Processing Backend

Click Streams: 1 billion rows/day 5-10 TB/day

Data Loading: High Volume + Transformation

Analysis Queries: Ad targeting, fraud detection, resource provisioning…

User profiles: 100 Million users Each with profile, pics, postings,…

Quick lookups and updates: Update your own profile, read friends’ profiles, write msgs,…

Page 19: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Fun Num. about Facebook (a bit old)

http://www.datacenterknowledge.com/archives/category/facebook/

Stores >20 billion photos, and serves 1 million img/sec.

Facebook software: PHP + MySQL cluster + Memcached

One of the largest MySQL cluster

500 million active users

9.5% Internet traffic

>30,000 servers >4.5 billion msg/day >15 TB click log/day

Page 20: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Outline

•  Overview of course topics

•  Course requirements

Page 21: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Prerequisites

•  A graduate-level database, an equivalent of 645

•  Or consent of the instructor –  An undergraduate database course –  Prior research in the database area

Page 22: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Course Web Site

http://avid.cs.umass.edu/courses/745/f2013/

Or

Yanlei’s web page → Teaching → 745 Fall 2013

Page 23: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Textbook

4th edition, edited by Joseph Hellerstein and Michael Stonebraker

Yanlei Diao, University of Massachusetts Amherst

Selected papers and Lecture notes will be posted on the course website.

Page 24: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

A Textbook on DB Basics

Database Management Systems 3rd Edition Ramakrishnan and Gehrke

Good for background knowledge on database systems

Page 25: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

Grading

•  Paper reviews: 25% •  In class presentation: 15% •  Midterm: 20% •  Course Project: 40%

Page 26: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

How to Write Reviews

Yanlei Diao, University of Massachusetts Amherst

Page 27: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

1. Paper reviews: 25%

•  25 selected papers •  Posted on the readings page •  Review submission: by email to instructor

–  Due at 10 am on the day of class –  Email title “745 PAPER REVIEW” –  Please include the text, no attachments

Page 28: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Paper Review (1)

•  Problem Statement –  Is the problem important?

•  Motivation often comes from applications

–  Is the problem technically challenging? •  What is the state of the art? •  Is the work solvable by most people who think for a week?

Yanlei Diao, University of Massachusetts Amherst

Page 29: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Contributions •  Can be a solution to a new problem, or a new

solution to a known problem •  Please outline the main approach and techniques •  Technical contributions can include:

–  New concepts –  New algorithms –  Thorough analysis (with a model, sometimes) –  Implementation & optimization –  Applying techniques from a different area to the problem –  Strong evaluation: data sets, benchmarks, other systems –  Strong, interesting results…

Page 30: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

Limitations •  Does it solve the right problem

–  E.g., a big data problem does not consider scalability •  Is the assumption made realistic •  Is the solution correct or complete? •  What is the novelty compared to prior work? •  Is evaluation strong:

–  Workloads: Clear? Representative? –  Methodology: Scientific? Data collection? Query

selection? Measurements chosen? –  Results: meaningful? Significant? –  Sufficient explanation…

Page 31: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13

2. In Class Presentation: 15%

•  A group of students lead a lecture –  Present 1-2 papers on a given topic –  Papers will be provided on the schedule and readings

pages –  Lead the class discuss –  Answer open-ended questions from the instructor

Yanlei Diao, University of Massachusetts Amherst

Page 32: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

3. Midterm Exam: 20%

•  Midterm exam –  Take home exam –  Includes both course related material and open-

handed questions –  No discussion with others –  In the middle of November –  No final exam!

Page 33: 745: Advanced Database Systems - UMass Amherstavid.cs.umass.edu/courses/745/f2013/notes/Lec1-Overview.pdfYanlei Diao, University of Massachusetts Amherst 9/3/13 Databases and DBMS’s

9/3/13 Yanlei Diao, University of Massachusetts Amherst

4. Project: 40%

•  Groups of 2 or work individually •  Research-oriented problem

–  A new problem, or a new approach to an existing problem

–  Scientific value

•  Milestones & deliverables: see the projects page •  Submission: via email

–  Proposal, status report: before class on due date –  Final report: 5 pm on the due date

•  In-class presentation