745: Advanced Database Systems - UMass...
Transcript of 745: Advanced Database Systems - UMass...
745: Advanced Database Systems
Yanlei Diao University of Massachusetts Amherst
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Outline
• Overview of course topics
• Course requirements
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Database Management Systems
1. Online Analytical Processing (OLAP) vs. Online Transaction Processing (OLTP) – Different data characteristics and query workloads – Different architectural design and query processing techniques
2. New Data Models and Related Systems – Temporal DB; Sequence DB; Continuous Queries; Stream Systems
3. Big Data Systems and Cluster Computing – Traditional parallel databases – Big data systems:
• Cluster computing, • New storage systems • Low latency analytics • Cloud computing
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Databases and DBMS’s
• A database is a large, integrated collection of data • A database management system (DBMS) is a
software system designed to store and manage a large amount of data – Declarative interface to define data stored, add data,
update data, and query data – Efficient querying – Concurrent users – Reliable storage and crash recovery – Access control…
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Early DBMS’s
• Early DBMS’s (1960’s) evolved from file systems • Typical workloads
– Many small data items, many queries and updates – Banking – Airline reservations…
• 1960s Navigational DBMS – Tree-based or graph-based data model – Manual navigation to find what you want – No support for “search” (“search” ≠ “program”)
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Relational DBMS
• Relational model (E.F. Codd, 1970) – Data independence: hides details of physical storage
from users – Declarative query language: say what you want, not
how to compute it – Mathematical foundation: what queries mean, possible
implementations • Query optimization (1970’s till now)
– Earliest: System R at IBM, INGRES at UC Berkeley – Queries can be efficiently executed despite data
independence and declarative queries! • Online Transaction Processing (OLTP)
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Commercial DBMS’s
System R
INGRES
Material in this slide based on wikipedia
Sybase
Informix Postgres
MS SQL Server
IBM DB2
Oracle
MySQL
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Online Analytical Processing (OLAP)
• Data warehouses – Large amounts of data over years, complex queries,
designed for analysis and reporting – Sales data analysis, e.g., Walmart, Target, … – Fraud analysis, e.g., credit card use, insurance – Call record analysis, e.g., AT&T – Changes: Schema design, data cleaning and loading,
indexing, aggregation, materialized views, data mining, new storage layout (column-based)…
9/3/13 Yanlei Diao, University of Massachusetts Amherst
More Recent Application
• Social networking – E.g., facebook.com, myspace.com, with 100’s millions
of users at a popular site – Need to store user profiles, friend info, photos
uploaded, messages exchanged, page views/clicks – 100 terabytes of new data/day, 100 petabytes in total
– Question: OLTP or OLAP databases?
9/3/13
New for New Models & Systems
• Need to support loose and rich structures – Extensible Markup Language (XML)
• Need to support time related queries – Temporal databases
• Need to support sequence related queries – Sequence databases
• Need to support long running queries on continuous data streams – Continuous query (CQ) systems – Stream systems
Yanlei Diao, University of Massachusetts Amherst
9/3/13
Data Integration & Sharing
Internet"
Sid Name Contact
107 J. Black 413-555-1223
109 F. James 513-123-0102
111 A. Wang 617-011-3789
… … …
Sid FirstName LastName Contact
12 Joe Smith [email protected]
34 Anna Lee [email protected]
171 Mike Levine [email protected]
… … … …
Amherst College Student Database"
UMass Student Database"
9/3/13
WWW"
Structured data - Databases"
Unstructured Text - Documents"
Semistructured Data"
Integration of Text & Structured Data
9/3/13
Need for A Rich, Flexible Data Model
• Need to support loose and rich structures – Evolving, unknown, irregular structures
– Integration of structured, but heterogeneous data sources
– Textual data with tags and links
• XML was originally proposed for online publishing, is becoming the wire format for data exchange.
– http://www.w3.org/TR/REC-xml/
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Data Stream Management
Two driving forces:
" A collection of applications where data streams naturally exist but DBMS doesn’t help much
" Advances of sensor technologies
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Financial Applications
• Financial services – Data feeds: stock tickers, foreign exchange
transactions… – Data rate: 10’s or 100’s thousands of messages per
second – Applications:
• routing trade requests, • automating trade strategies, • market trend analysis…
– Stream systems: e.g., • http://www.streambase.com • http://www.aleri.com
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Network and System Monitoring • Network monitoring
– Packet traces, network performance measurements… – Data rate: gigabits per second – Applications:
• traffic analysis, performance monitoring, router configuration, intrusion detection…
– Stream systems: e.g., • Gigascope at AT&T
• System/Application monitoring… – Data: system log, measurements – Stream systems: e.g.,
• Ganglia http://ganglia.info/
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Wireless Sensor Networks
• Wireless sensor networks – Sensor devices: temperature, light,
pressure, acceleration, humidity, magnetic field, …
– A set of sensor devices auto-configure themselves into a communication network
– Applications: • environment monitoring • habitat monitoring • structural monitoring • vehicle tracking…
9/3/13
Data Warehouse of A Social Network
Yanlei Diao, University of Massachusetts Amherst
Web Server
Web Server
Web Server
Data Processing Backend
Click Streams: 1 billion rows/day 5-10 TB/day
Data Loading: High Volume + Transformation
Analysis Queries: Ad targeting, fraud detection, resource provisioning…
User profiles: 100 Million users Each with profile, pics, postings,…
Quick lookups and updates: Update your own profile, read friends’ profiles, write msgs,…
9/3/13
Fun Num. about Facebook (a bit old)
http://www.datacenterknowledge.com/archives/category/facebook/
Stores >20 billion photos, and serves 1 million img/sec.
Facebook software: PHP + MySQL cluster + Memcached
One of the largest MySQL cluster
500 million active users
9.5% Internet traffic
>30,000 servers >4.5 billion msg/day >15 TB click log/day
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Outline
• Overview of course topics
• Course requirements
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Prerequisites
• A graduate-level database, an equivalent of 645
• Or consent of the instructor – An undergraduate database course – Prior research in the database area
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Course Web Site
http://avid.cs.umass.edu/courses/745/f2013/
Or
Yanlei’s web page → Teaching → 745 Fall 2013
9/3/13
Textbook
4th edition, edited by Joseph Hellerstein and Michael Stonebraker
Yanlei Diao, University of Massachusetts Amherst
Selected papers and Lecture notes will be posted on the course website.
9/3/13 Yanlei Diao, University of Massachusetts Amherst
A Textbook on DB Basics
Database Management Systems 3rd Edition Ramakrishnan and Gehrke
Good for background knowledge on database systems
9/3/13 Yanlei Diao, University of Massachusetts Amherst
Grading
• Paper reviews: 25% • In class presentation: 15% • Midterm: 20% • Course Project: 40%
9/3/13
How to Write Reviews
Yanlei Diao, University of Massachusetts Amherst
9/3/13 Yanlei Diao, University of Massachusetts Amherst
1. Paper reviews: 25%
• 25 selected papers • Posted on the readings page • Review submission: by email to instructor
– Due at 10 am on the day of class – Email title “745 PAPER REVIEW” – Please include the text, no attachments
9/3/13
Paper Review (1)
• Problem Statement – Is the problem important?
• Motivation often comes from applications
– Is the problem technically challenging? • What is the state of the art? • Is the work solvable by most people who think for a week?
Yanlei Diao, University of Massachusetts Amherst
9/3/13
Contributions • Can be a solution to a new problem, or a new
solution to a known problem • Please outline the main approach and techniques • Technical contributions can include:
– New concepts – New algorithms – Thorough analysis (with a model, sometimes) – Implementation & optimization – Applying techniques from a different area to the problem – Strong evaluation: data sets, benchmarks, other systems – Strong, interesting results…
9/3/13
Limitations • Does it solve the right problem
– E.g., a big data problem does not consider scalability • Is the assumption made realistic • Is the solution correct or complete? • What is the novelty compared to prior work? • Is evaluation strong:
– Workloads: Clear? Representative? – Methodology: Scientific? Data collection? Query
selection? Measurements chosen? – Results: meaningful? Significant? – Sufficient explanation…
9/3/13
2. In Class Presentation: 15%
• A group of students lead a lecture – Present 1-2 papers on a given topic – Papers will be provided on the schedule and readings
pages – Lead the class discuss – Answer open-ended questions from the instructor
Yanlei Diao, University of Massachusetts Amherst
9/3/13 Yanlei Diao, University of Massachusetts Amherst
3. Midterm Exam: 20%
• Midterm exam – Take home exam – Includes both course related material and open-
handed questions – No discussion with others – In the middle of November – No final exam!
9/3/13 Yanlei Diao, University of Massachusetts Amherst
4. Project: 40%
• Groups of 2 or work individually • Research-oriented problem
– A new problem, or a new approach to an existing problem
– Scientific value
• Milestones & deliverables: see the projects page • Submission: via email
– Proposal, status report: before class on due date – Final report: 5 pm on the due date
• In-class presentation