Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions •...
Transcript of Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions •...
![Page 1: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/1.jpg)
Introduction to Databases
Second La Serena School for Data Science: Applied Tools for Astronomy
August 2014
Mauro San Martín [email protected]
Universidad de La Serena
![Page 2: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/2.jpg)
Contents
• Introduction Databases and scientific data management
• Part I. Relational Databases Defining, storing, updating and querying data
• Part II. No Relational Databases Short introduction to alternatives to relational databases
![Page 3: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/3.jpg)
LSSDS2013: Intro. to Astronomical RDBs Introduction
Definitions
• Database - An organized and self-describing collection of
data, with a intended meaning, and maintained with a purpose.
• Database Management Systems (DBMS) - Software system designed and implemented to
define, maintain and share a database, and - to separate app. logic from low level data I/O.
�3Definitions
![Page 4: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/4.jpg)
LSSDS2013: Intro. to Astronomical RDBs Introduction
Scientific Data Management
�4Scientific Data Management
RelationalDatabases
(SQL)
NoSQLDatabases
Scientific Data Management(collect, organize, store, transform, and query)
![Page 5: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/5.jpg)
Part I. Relational Databases
Theory
![Page 6: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/6.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Theory RDBs at a Glance
RDBs at a glance
• E. F. Codd 1970 "A Relational Model of Data for Large Shared Data Banks"
• Main characteristics - One simple data structure: relation - Solid mathematical foundations - Several comprehensive implementations available:
PostgreSQL, MySQL, Oracle, SQL Server, etc.
• Industry standard since the 80’s
�6
![Page 7: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/7.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Theory Modeling Data
Modeling data
Capturing the world (or the universe)
• The relational data model - data structure
relations/tables: collections of tuples
- operations (query + update) relational algebra and calculus/SQL
- integrity constraints Data type, not null, referential integrity
�7
Update Q
uery
![Page 8: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/8.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Theory Modeling Data
Schemas
• Schema Definition of relations (columns, types, and keys) and integrity constraints.
• Good Schema avoids Data duplication, null values, and update anomalies.
• Normalization: algorithm to build good schemas.
Identify keys and divide relations (separate columns) with problems.
�8
![Page 9: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/9.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Theory Querying the Database
Querying the DBMap data from DB to the information needed
�9
Query Evaluation
Database Intermediate Result
Data Collection
Sort Group
Aggregate
Result
Cost:reading
data
Cost:storing
temp data
Cost:sortingdata
![Page 10: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/10.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Theory Updates
Updates
• Update: add and modify data. - Updates may render the database inconsistent
• Transactions and ACID - Atomicity - Consistency - Isolation - Durability
�10
Operation 1
Operation 2
Operation 3
...
Operation n
Transaction
![Page 11: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/11.jpg)
Part I. Relational Databases
Practice
![Page 12: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/12.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice What is an RDBMS?
What is an RDBMS?A DataBase Management System (software) for the Relational model
�12
Stored Database(Instance)
Schema(Metadata) Runtime Query
Evaluator Concurrency controlDB Backup
DB Recovery
QueryOptimizer
QueryCompiler
SQL Query
![Page 13: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/13.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice RDBMS Objects
RDBMS Objects
• Tables Represent data: collection of records Record: set of attributes (columns)
• Views: named queries
• Indices: improve search and access time
• Functions: extend query language
�13
ObjectIDID1 3.4 aID2 4.0 bID2 2.1 c
A B
![Page 14: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/14.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice SQL
SQL
• Structured Query Language
• Actually it includes - Data Definition Language (Schema)
create table myTable(number int, letter char)
- Data Manipulation Language (Update) insert into myTable values(1, ‘a’)
�14
![Page 15: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/15.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Querying the DB
Querying the DB
• Basic Query Structure SELECT: definition of the output table (set of columns) FROM: identification of source tables WHERE: optional condition (filter or join)
• Aditional blocks GROUP BY: group defining criteria HAVING: optional condition un aggregate values ORDER BY: sorting criteria for the result
�15
![Page 16: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/16.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Querying the DB
Query Evaluation
Note that query results are relations (query composition)
�16
Query
Database Intermediate Result
Data Collection
Sort Group
Aggregate
Result
SelectFrom Where
![Page 17: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/17.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Executing Queries
Executing Queries
• Parametric
• SQL - System console - Applications and web interfaces
• From code - Parametric from programmer’s perspective - Languages + libraries
�17
![Page 18: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/18.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Examples
Query Examples
• Example Database - Source: SLOAN DR10
• Schema - Tables
photoObj(oid, ra, dec, g, r) specObj(oid, class, subclass)
�18
![Page 19: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/19.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Examples
Basic Query
SELECT * FROM photoObj;
SELECT oid, class
FROM specObj
WHERE class = ‘GALAXY’;
�19
![Page 20: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/20.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Examples
Complex Conditions
�20
SELECT oid, ra, dec
FROM photoObj
WHERE
g < 12
and r < 12
and g - r < 0;
![Page 21: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/21.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Examples
Joins
�21
SELECT p.oid, p.ra, p.dec, s.subclass
FROM photoObj as p, specObj as s
WHERE
p.oid = s.oid
and p.g < 12 and p.r < 12
and p.g - p.r < 0
and s.class = ‘GALAXY’;
![Page 22: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/22.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Examples
Groups and Aggregates
�22
SELECT s.subclass, count(*)
FROM photoObj as p, specObj as s
WHERE
p.oid = s.oid and p.g < 12
and p.r < 12 and p.g - p.r < 0
and s.class = ‘GALAXY’
GROUP BY s.subclass;
![Page 23: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/23.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Examples
Sub-Queries
�23
SELECT oid, ra, dec
FROM photoObj
WHERE g < 12 and r < 12
and g - r < 0
and oid in(SELECT oid
FROM specObj
WHEREs.class = ‘GALAXY’);
![Page 24: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/24.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Complexity
Query Complexity• Data Volume
- I/O based cost model number of reads from and writes to persistent storage
• Query Complexity - table size: n, number of tables: k - search: O(1) to O(log n) to O(n) - joins: O(n) to O(nk) - sort, group, and aggregates: O(n log n)
(size of intermediate result)
- subqueries: hard for the optimizer
�24
![Page 25: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/25.jpg)
Part II. NoSQL
Not only SQL
![Page 26: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/26.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part II: NoSQL RDBMS Comfort Zone
RDBMS Comfort Zone
�26
RelationalDatabases(SQL)
An RDBMS performs better when …
• Data is complete, homogeneous and well defined.
• All data is together (in the same computer).
• Answers must be complete and fully consistent.
• Vertical scaling is possible.
![Page 27: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/27.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part II: NoSQL
NoSQL
NoSQL Comfort Zone
NoSQL Comfort Zone
�27
• Data is massive, heterogeneous, and distributed.
• Partial and eventually consistent answers are acceptable.
• Data must be always available.
• Horizontal scaling is preferred (or vertical scaling is not practical).
![Page 28: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/28.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice NoSQL Databases
NoSQL Databases• Aggregate
Key: identify each aggregate Data: heterogeneous collections of attributes as name/value pairs.
• Main Types • Key-Value Stores
fast to retrieve data with unknown structure
• Document Databases (mostly) tree structured data
• Column-Family Stores complex structured data
�28
![Page 29: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/29.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice CAP Theorem
CAP Theorem
• Consistency
• Availability
• Partition Tolerance
Choose two!
�29
![Page 30: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/30.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Evaluation
Query Evaluation
�30
• Map-Reduce Parallel (cluster) data-processing pattern.
• Two steps • Map
Input is an aggregate, output is a bunch of key-value pairs.
Each map is independent (across aggregates in all the cluster).
• Reduce Map results are collected, sorted and combined.
![Page 31: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/31.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Evaluation
Map-Reduce: Map
�31
Fuente: Fowler and Sadalage, NOSQL Distilled.
![Page 32: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/32.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice Query Evaluation
Map-Reduce: Reduce
�32
Fuente: Fowler and Sadalage, NOSQL Distilled.
![Page 33: Introduction to Databases - AURA Astronomy Intro. to Astronomical RDBs Introduction Definitions • Database-An organized and self-describing collection of data, with a intended meaning,](https://reader034.fdocuments.in/reader034/viewer/2022042306/5ed2418c919a5172c0108072/html5/thumbnails/33.jpg)
LSSDS2013: Applied Tools for AstronomyLSSDS2013: Intro. to Astronomical RDBs Part I: Relational Databases - Practice
Summary
• RDBMS - Tables: collections of records with keys - SQL
Queries: basic, join, groups and aggregates, subqueries.
• NoSQL - Aggregates: collections of key-value pairs with
one identifier. - Map-Reduce
�33