Car projects @ MBDS overview W3XM Project ___________ ETSI, 22/10/2004
Copyright Big Data Pr Serge Miranda, MBDS, Univ de...
Transcript of Copyright Big Data Pr Serge Miranda, MBDS, Univ de...
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
1
www.mbds-fr.org
MBDS course :
From data bases to big data
(7 lectures)
Professor Serge Miranda
Dept of Computer Science
University of Nice Sophia Antipolis (member of UCA)
Director of MBDS Master degree
(www.mbds-fr.org)
2
www.mbds-fr.org
DATA paradigms and Codd’s relational data model
(lecture 2)
Professor Serge Miranda
Dept of Computer Science
University of Nice Sophia Antipolis
Director of MBDS Master degree
(www.mbds-fr.org)
3
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Contents
DATA PARADIGMS
➢Data Schema and data models
➢Data paradigms
➢TOP DOWN approach with fixed predefinedschema
➢VALUE paradigm and TIPS (ACID) properties
➢POINTER-VALUE paradigm and RICE properties (Date’s manifesto)
➢PREDICATE-VALUE Paradigm (RDF) withSparQL
➢Bottom up approach
➢KEY-VALUE paradigm and WHAT propertieswith N.O.SQL and NewSQL
Introduction to Codd’s relational -data model (VALUE paradigm)
➢ Underlying mathematical concepts : SETS and PREDICATES
➢ Value paradigm
➢ CODD ‘s relational data model
➢ Data Structures
➢ Integrity rules
➢ Relational algebra
➢ Codd’s theorem
➢ Codd’s model lessons
➢ Normalization theory (3NF) in a nutshell
➢ Short Seminars :
➢ Data base storage and access
➢ Codd & Date relational data-schema design
4
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
(BIG) DATA !
5
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
BIG DATA ? A scientific couple
1. DATA MANAGEMENT
SQL3, OQL, BigquerySQL, N.O.SQL, CQL, HQL, SPARQL, N1QL, Big Query SQL, UnQL, CoQL,..NEWSQL
N.O.SQL with major Open Source reference :
HADOOP/MAP REDUCE & SPARK
2. DATA ANALYTICS
AI and mathematicswith OPEN SOURCE reference :
R language (> 4000 packages), PYTHON, TENSORFLOW, CAFFE, etc.
6
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Some visions of the future of big-data management
➢CLOUD COMPUTING
➢ INFRASTRUCTURE as a SERVICE (IaaS)
➢ PLATFORM as a SERVICE (Paas)
➢ DATA as a Service (DaaS)from Oracle ;ANALYTICS as a SERVICE (AaaS)from Google, IBM, etc.
➢« CAMS » (IBM 2014)
➢CLOUD for servers
➢DaaS/AaaS : « (DATA) ANALYTICS as a service »
➢Mobility (smartphones applications)
➢Social Networks (for data integration)
➢ « SMAC » stacks (CITY GROUP, Vikram Pandit )« No business model in the future could succeed without the DATA »
SocialMobileApplicationsCloud
7
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
8
Cloud
Big Data
IoT
Social Mobile
ORACLE vision : CI-MBDS☺
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Parallelism and data management
TERABYTES (10**12) a second ?
➢Hard disk (HD) : 100 Mega Bytes/sec➢1 Peta Octet (10**15) a sec ?➔ 10 000 HDs?
➢3 options :➢DATA COMPRESSION➢SCALE UP : (SMP, CLUSTER, MPP)➢SCALE OUT
9
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Programming paradigms
Imperative Declarative
Procedural No-proceduralFUNCTIONAL
Object(OQL)
SQL/NO SQL /NEWSQL
HOW ? WHAT ?
Programming paradigms[Manning2013]
10
(structured) DATA MANAGEMENT concepts
11
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Real world, SCHEMA and data model
Structured approach of real-world abstractions named SCHEMAs
by applying a DATA MODEL
12
DATA MODEL SchemaReal World
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DATA MODEL?
13
DATA MODEL
DATA STRUCTURES
Data-structures operators(algebra)
Integrity rules
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DATA BASES and DBMS (Data base management system)
➢DBMS?
➢DEFINITION
➢MANIPULATION
➢CONTROL
of data bases
14
SCHEMA
DATA BASE DBMS
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Exercice(Internet search)
1. Look at ANSI SPARC standard for CONCEPTUAL SCHEMA for data bases and clarify the concepts of :
➢conceptual schema,
➢logical schema,
➢physical schema,
➢sub-schema
2. Define what is a META MODEL ?
15
Underlying mathematical concepts for big data management
16
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SET
➢A SET * is a well-defined collection of distinct elements with two basic
properties for them:➢They are UNIQUE
➢There is NO ORDERING
➢SET DEFINITION :➢Intensional definition (giving properties of the elements)
➢extensional definition (listing elements)
➢SUBSETS and POWER SETS (sets of all subsets)
➢SET operators➢INTERSECTION, UNION, DIFFERENCE,
➢CARTESIAN PRODUCT
*G. CANTOR :
« A set is a gathering together into a whole of definite, distinct objects of our perception [Anschauung] or of our thought — which are
called elements of the set ».
17
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
PROPOSITION and PREDICATE
PROPOSITION ?
Any sentence with boolean value : TRUE or FALSE
EX : « Socrates is deadly ! » « John loves Mary » etc
PREDICATE ?
« Any sentence containing VARIABLES which is transformed into a PROPOSITION when we replace variables by VALUES »
EX : LOVE (x,y) < or X LOVES Y> ; predicate with two variables
1st-Order predicate logic with well-formed formulas
18
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
GRAPHS
➢A GRAPH is a SET of nodes (vertices) and SET of edges which couldbe directed (digraph) or undirected, labelled or not
➢Example : Category (in Maths) : a labelled directed graph
➢A MULTI–GRAPH with multiple edges
➢A CATEGORY is a directed multigraph
➢HYPER GRAPH : an edge can join a set of vertices
➢OPERATORS
➢Unary operations : dual graph, edge contraction, …
➢Binary operations : disjoint union, cartesian product, etc.
19
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SCALARS and VECTORS
➢SCALAR ➢In MATHS : element of a VECTOR➢In Computing : a VARIABLE (with an address to store a value)
➢SCALAR ➔ VECTOR (rank 1)➔ MATRIX (rank2)➔ TENSOR (rank 3++)➢TENSOR : Multidimensional array➢MATRIX : Group of vectors➢VECTORIZATION : converting DATA into vectors
➢GRADIENT : generalization of the derivative of a function to a function f in several dimensions (vector of n partial derivatives of f)
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
MATRICES and linear algebra (matrix algebra)
➢M by N Matrix : rectangular array of numbers or symbols arranged in M lines and N columns➢Row vector (single-row matrix) or column vector if M or N is = 1➢N by N SQUARE MATRIX (vector transformation, …)➢VECTOR : particular case of a Matrix with N=1
➢Major matrix OPERATIONS :➢Matrix addition and Matrix multiplication ➢Matrix transposition (lines to rows and vice versa)➢LINEAR TRANSFORMATION/mapping (linear algebra)➢Other OPERATORS :
➢ Tensor product➢ Hadamart product : element-wise product of 2 vectors➢ Others : dot product, etc.
➢Linear algebra is the bedrock of Machine Learning (and Deep Learning) :
➢Ax = b in basic machine learning with the matrix A and the parameter vector x to get output column
vector b
21
« Value » vs « Pointer (variable/scalar) » in computer science ?
22
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
VALUE vs VARIABLES vs POINTERS
➢VALUE ?
data which cannot be modified
➢VARIABLE (SCALAR) ?
Every variable is a triptych : a NAME, a VALUE (updatable) and a (memory) ADDRESS
VARIABLE := ( NAME , VALUE, ADDRESS)
➢POINTER ?
Type of variable which contains the ADDRESS of another variable as VALUE
23
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Variables and attached operators
Variables have Addresses (not values)
« ADDRESS type » with two basic operators :
* Referencing : v➔ ADDR
in C : ptr = &v; (with char v; and char *ptr;)
in PL/1 DECLARE N INTEGER
DECLARE P POINTER
P= ADDR (N)
* Dereferencing : ADDR➔ v
in C : *A ; in PL/1 : A--> V
24
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
The three data structures of CODD’srelational data model < VALUE paradigm>
25
VALUE (DATA)
DOMAINS
RELATIONS(« tables » in SQL)
SETconstruct
TUPLE (cartesian product)construct
A relation in Codd’s model is➢ either a predicate (with N variables) or➢ a SET (subset of the cartesian product of N domains)
Domain = SET of values
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Relation in Codd’s data model at a given time : Table of VALUES
26
{Nice, Paris, NEW YORK, DUBLIN}
Domain : CITY
Pilot PIL# PILNAME ADDR
100 Serge Nice
101 John New York
102 Joel DUBLIN
Line= « TUPLE »
COLUMN = « ATTRIBUTE »
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Codd’s Relational algebra (SQL foundation)
➢« RELATION » : « Set » or « predicate »
➢2 dimensional arrays with 4 specific algebraic operators : Select, Project, Join, Division
➢COLUMN implementation for decision support
➢LINE (tuple) implementation for transaction support
➢Closure + Completeness + Orthogonality of the relational ALGEBRA
➢➔ QUERY INTERFACE without any programming to retrieve DATA
➢➔ NON-PROCEDURAL PROGRAMMING !
27
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Plethora of BIG-DATA management Systems (Aslett, 2013)
➢https://blogs.the451group.com/information_management/2011/04/
28
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Plethora of Big Data MANAGEMENTSYSTEMS : data paradigms
29
TIPS RICE
WHAT
Big DataMngt
SYSTEMS
SQL2, SQL3/ODMGNEW SQLBigSQL
Codd’s relational data model (SET theory)VALUE paradigm
OBJECT data model (GRAPH THEORY)
POINTER-VALUE paradigm(SQL3)
OBJECT-VALUE paradigm(ODMG)
N.O. SQL/ NEW SQL
SPARQL(OWL)
PREDICATE-VALUE (RDF) paradigm(Semantic web)
(GRAPH THEORY)
KEY VALUE paradigm(Map Reduce)(MATRICES and linear algebra)
Predefined Schema
Meta data
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
TOP DOWN approach for structuredand semi-structured DATA BASES
➢TOP DOWN approach with predefined SCHEMA and metadata
➢STRUCTURED DATA standards
➢SQL2 and VALUE paradigm
➢SQL3 and POINTER-VALUE paradigm
➢ODMG and OBJECT-VALUE paradigm
➢SEMI-STRUCTURED DATA standards
➢SPARQL and PREDICATE-VALUE (RDF) paradigm
30
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
A relational-schema example with three predicates
PILOT (PIL#, PILNAME, ADDR)
PIL#: Pilot ID then NAME and ADDRESS (City)
PLANE (P#, PNAME, CAP, LOC)
CAP : Capacity, LOC : localization city
FLIGHT (FL#, PIL#, P#, DC,AC, DT, AT)
DC : Departure City, AC : arrival city,
DT : Departure Time, AT : Arrival Time
Note : Primary keys are underlined
31
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Top Down approach with SQL/ODMG
32
Real World
SCHEMA
TOP DOWN approachfor DATA
STRUCTURATION
➔ pre-definition of a fixed schema
DATA MODEL
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DB contibution to Computer Science : TIPS properties
T
I
P
S
Transactions (with ACID properties)
No-procedural Interface (SQL)
Persistency (virtual paged memory)
Structuration (Schema)
33
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SQL (with transactional focus)
➢TIPS properties with « T » referring to « TRANSACTIONS »
➢« ACID » properties for Transactions :
➢Atomicity
➢Consistency
➢Isolation
➢Durability
➢JIM GRAY ‘s theorem (on well-formed transactions with two phases)
➢OLTP (On line Transaction Processing)
➢Data Warehouse/data Mining (& OLCP : On Line Complex Processing)
34
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Data Models, DB Manifesto and standards
➢Relational data model by Ted CODD (8/19/1968)
➔ theoretical support for SQL2 standard
➢Three Manifestos for « Future DB » by F. Bancilhon’s (1st) , M. Stonebraker(2nd), and Chris DATE (3rd)
➢ 3rd Manifesto by C. DATE + 2nd Manifesto by M. Stonebraker
➔ SQL3 Standard (« OR Data Model » - Object - Relational DM -)
➢1st Manifesto by F.Bancilhon
➔ ODMG standard (« OO Data Model » – Object-Oriented )
35
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Object contributions to the DB World : RICE properties
36
R
I
C
E
Reusability (Inheritance or polymorphism)
Identification (OID : Object Identifier)
Complex Object construct
Encapsulation (Methods)
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Three approaches for OBJECT Data models
37
•RICE properties
•1st Manifesto(F. Bancilhon)
VALUES
•RICE properties
•3rd Manifesto(C.Date)
DOMAINS
•RICE properties
•2nd Manifesto(M. Stonebraker)
RELATIONS
SQL3
ODMG
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DATA BASE market & standards?
38
(Stonebracker 96 & Gartner)
DATA
PROCESSING
SQL
No SQL
Simple Complex (graphs,..)
R-DBMS
SQL3Mobiquitous &
Big Data systems
ODMG CAD
OR-DBMS
OO-DBMS
File System
(1) (2)
(3)
SQL2Production Decision
2010 2020
(1) 10 G dollars
20% of cont. Growing rate
(2) 2x (3) 2x (1)!
(3) 1/100 x (1)
1/100 x (1)
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Exercice
➢ACID transactions enable to solve 2 major problems in data management : explain ☺
39
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Solution
AC : Atomicity and Consistency
➢ALL or NOTHING
➢DB consistency in front of any failure
ID (Isolation and Durability)
➢Every well formed transaction (LOCK-Share before reading and LOCK Exclusive before Writing ) is serializable with 2 phase-locking(Jim Gray’s theorem)
➢Isolation
➢No interference among concurrent transactions
40
ACID and 2 issues : CONCURRENCY and CRASH RECOVERY
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Example : SQL2 (Relational)
Who are the pilots (PIL#, PILNAME) from Nice driving a plane from Nice ?
In SQL2:
SELECT PIL#, PILNAME
FROM PILOT, FLIGHT
WHERE PILOT.PIL#= FLIGHT.PIL# and PILOT.ADDR= ’Nice’ and FLIGHT.DC= ‘Nice’;
In Codd’s relational algebra :
V1 = Join PILOT (PIL#= PIL#) FLIGHT
V2 = Select V1 (ADDR= ‘Nice’ and DC=‘Nice’)
RESULT = Project V2 (PIL#, PILNAME)
41
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Example : SQL3 (object relational)
Who are the pilots from Nice driving a plane from Nice ?
In SQL3 :
SELECT REFPIL ➔ PIL#,PILNAME
FROM FLIGHT
WHERE DC= ‘Nice’ and REFPIL ➔ ADDR =‘Nice’;
Note : with
➢REFPIL : REF type attribute containing ROWID (OID) from PILOT and
➢« ➔ » : Dereferencing operator
42
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Example : OQL (ODMG; object)
Who are the pilots from Nice driving a plane from Nice ?
In OQL
SELECT p.PIL#, p.PILNAME
FROM
p in PILOT
v in p.insureFLIGHT
WHERE
p.ADDR= ‘Nice ’ and v.DC=‘Nice’;
Note : with « insureFLIGHT », bidirectional persistent REF pointer from PILOT class towards FLIGHT class
43
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
TOP DOWN approach for semi-structured DATA stores
➢OPEN DATA
➢WEB DATA (Semantic web)
➢RDF (Resource Description Framework) paradigm
44
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
OPEN DATA formats
PDF for documents
For DATA :
➢CSV (Excel)
➢Web standards for publication and sharing
➢HTML (HTML5), XML, RDF
➢Web standards for syndication
➢RSS, Atom, JSON
45
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
OPEN DATA : CSV, JSON, XML
46
XMLJSONCSV
CSV (Comma Separated Value ) for flat files (1)JSON (Java Script Object Notation) for hierarchical documents (2) XML (eXtensible Markup Language) for (1), (2), namespaces,…
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
CSV, JSON and XML (Examples)
#CSV exampleFirst, Name, Course title, date« Serge », « Miranda », « From data bases to Big Data », « 2020 »-----------------------------------------------------------------// JSON example{« First »: « Serge », « Name »: « Miranda »,« course »: {« title »: « From data bases to BIG DATA », « date » : « 2020 »}}------------------------------------------------------------------<!- XML example --><xml> <professor>serge Miranda</professor><list><course>From data bases to BIG DATA</course><date>2020</date></list></xml>
47
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DATA WEB(semantic web)
« I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web — (the content, links, and transactions between people
and computers). ..A « Semantic Web », which should make this possible,has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy
and our daily lives will be handled by machines talking to machines. The « intelligent agents » people have touted for ages will finally materialize »
TIM Berners Lee (2001, Weaving the web)
➢ WEB evolution :➢Network of PAGES ➔➢Network of structured documents (XML) ➔➢DATA WEB/Network of DATA (RDF)➔➢Semantic web (Linked RDF) < W3C>
48
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
« 5 star » LINKED OPEN DATA
➢In 2010, Tim Bernes Lee gave a quality scaling for open data.
49
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SEMANTIC WEB stack (Manning2013)
A typical semantic web stack with common low-level standards like URI, XML, and RDF at the bottom of the stack. The middle layer includes standards for querying (SPARQL) and standards for rules (RIF/SWRL). At the top of the stack are user interface and application layers above abstract layers of logic, proof, and trust building.
50
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SEMANTIC WEB?
A DATA MODEL :
➢DATA STRUCTURES
➢URI (Universal Resource Identifier)
➢Unique Format : RDF (Resource Description Framework)
➢A schema : RDFS
➢ONTOLOGY = SCHEMA (RDFS) + INSTANCES
➢TWO manipulation languages :
➢SPARQL
➢OWL (Ontology Web Language)
51
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
RDF (Resource Description Framework)
➢Defined by W3C (January 15th, 2008)
➢Derived from XML
➢URI for resource identification
➢Web page (identified by URL)
➢Web Service
➢XML document fragment
➢Any object (even physical) having collected DATA
52
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DATA in RDF
RDF triples to describe WEB resources
(:serge: insureFLIGHT:AF100)
(:Peter:insureFLIGHT:AF110)
(:AIRBUSA320:is-used-inFLIGHT: AF100)
(:Paul:is-passenger-inFLIGHT:AF100) …
Note :
A RDF triple<S.P.O>
- is a fact in 1st order predicate logic
- P(S,O) with P Predicate, S Subject and O object
Example : INSUREFLIGHT (Serge, AF100)
53
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
RDF graph (Example)
54
:Serge
:AF100
AIRBUSA320 Paul
:ispassengerinflight:isusedinflight
:insureflight
:drivesplane
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SPARQL (queries on RDF graphs)
PREFIX dc:http://purl.org/dc/elements/1.1/
<URI > “?” < free variable > : <Data Source>
SELECT ?X
WHERE { <http://../../ > dc:Y ?X } < triple list>
FROM graph name RDF
55
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SPARQL (Example)
Who are the pilots from Nice driving a plane from Nice ?
Prefix rdf :<http:// www….>
SELECT ? PILOT
WHERE { GRAPH ?g
{ ?PILOT rdf :ADDR rdf: Nice
?FLIGHT rdf:DC rdf: Nice }}
56
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SPARQL engine
57
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Research : bridge between SPARQL and SQL
➢* « An effective SPARQL support over relational DB »(IBM China), VLDB Vienna, Austria, Sept 2007, LNCS 5005, Spinger Verlag
➢See also SPASQL (SQL extension to handle SPARQL subqueries)
58
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Bottom up Approach of Big data Management with N.O.SQL and NEW SQL : see LECTURE 7
59
OR- DBMS
DATA PROCESSING
SQL
NoSQL
OO-DBMS
SQL3
ODMG
ComplexStructured data
Top Down(schema)
ComplexUnstructured data
Bottom Up(no schema; no metadata)
DATA STRUCTURE
N.O. SQL
NEW SQL
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SQL evolution (a.c.) < See Lectures 4, 6 and 7>
60
www.mbds-fr.org
Introduction to Codd’s relationaldata model
(value paradigm)
Professor Serge Miranda
61
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Dr Edgar « Ted » CODD (1923- 2003)
➢TURING AWARD in 1981
➢Codd’s relational data model (1968)
August 19 1968, IBM Research Report RJ599., « Derivability,
Redundancy, and Consistency of Relations Stored in Large Data
Banks »,
➢1970 : « A Relational Model of Data for Large shared Data Banks »,
CACM 13, No. 6, June 1970 pp 377-387
➢Codd’s theorem on relational programming language
(1971)
➢E. F. Codd, « Relational completeness of data base sublanguages »,
in R. Rustin, (ed.) Data Base Systems, Proceedings of 6th Courant
Computer Science Symposium (May 24-25, 1971 : New York, N.Y.),
pp. 65-98, Prentice-Hall, 1972
➢Note « B.C. », in Data management means « Before Codd », ☺
Ted CODD in Sophia Antipolis, 1986
(with Gilles Taladoire and Serge Miranda) for a demo of a
relational DBMS on PC (CAMPUS)62
VALUE paradigm in Codd’s relationalmodel and 2 phases of data STRUCTURATION
63
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
100 500 NICENEW YORK
B747 A320 JOHN PETER JOEL SERGE
VALUE DOMAIN
Phase 1 of STRUCTURATION* in CODD’s model with the SET construct
64
SET construct.
* NOTE : corresponding to the « S » of SQL
(Structured Query Language) DOMAIN = SET of VALUES
JOHN PETER SERGE
B747 A300 A320
NICE NEW YORK
PILNAME
CITY PNAME
….
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Phase 2 of Structuration with the TUPLE construct (cartesian product)
PLANE P# PNAME CAP LOC
100 A300 200 Paris
101 320 250New York
RELATION is a SET of tuples(which could be VISUALIZED by
TABLES of values)
Relations
TUPLE
DO
MA
IN
S
A relation is the SUBSET of the cartesian product of N domains (sets)
65
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Codd’s relational data models and SQL standards
66
V1 (1970) « b.c. »* 1970
RM-T (1980) and
V2/V3 (1990) « a.c. »* 1970
SQL1 (1989)
SQL2 <relational> (1992)
SQL3 < object relational> (1999)
SQL4, SQL5, SQL6, SQL7 2020..
Codd’s relational data models
SQL standards
SEQUEL (1975 et 1982)
* b.c. (a.c) : before (after)
Codd
Data structures of Codd’srelational data model
67
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
(Codd’s) Relational data model : RELATION, RELATIONAL ALGEBRA and FUNCTIONS
➢RELATION : kernel mathematical concept with double formal basis :
SETS or PREDICATES
➔ Double underlying formal theory for RELATIONS :
➢SET THEORY
➢1st Order predicate logic
➔ Double family of RELATIONAL languages :
SET-oriented (relational algebra) or PREDICATE–oriented
(Relational calculus like Codd’s ALPHA)
➢No need for programming iteration
➢No need for procedures (NON-PROCEDURAL interfaces)
➢NO Programming in relational data management
68
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
1) RELATION : SET
RELATION : Subset of the cartesian product of N domains.
At a given time a relation consists of a set of tuples which COULD BE represented by tables in a very SIMPLE way
Examples :
PILOT ⊆ PILNOxPILNAMEx CITY
PLANE ⊆ PNO x PNAME x CAP x LOC
PLANE tuples : {(10, Airbus, 320, Nice), (11, B747, 300, NEW YORK), …}
69
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DATA representation in RELATIONS
➢SIMPLE data representation : TABLES of VALUES with lines (TUPLES) and COLUMNS (attributes)
➢All DATA are represented in TABLES <VALUE PARADIGM>
70
PILOT PIL# PILNAME ADDR
100 Serge Nice
101 Peter New York
102 Joel Dublin
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
2) RELATION = PREDICATE
➢RELATION = PREDICATE with N variables
➢EX : « The PLANE numbered P# has a name PNAME with capacity CAP and islocalized in LOC ».
written : PLANE (P#, PNAME, CAP, LOC)
➢ PROPOSITION = TUPLE (with TRUE value)
➢EX : « The PLANE numbered 10 is an AIRBUS A320 with capacity 320 seatsand is localized in Paris »
TUPLE : TRUE-valued proposition written : PLANE (10, AIRBUS, 320, Paris)
71
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
RELATIONAL ALGEBRA (Codd’s)
➢RELATIONAL ALGEBRA
➢SET OPERATORS : Union, Intersection, Difference
➢RELATIONAL OPERATORS
➢UNARY operators : SELECTION, PROJECTION
➢BINARY OPERATORS : JOIN (existence quantifier) and DIVISION (Universal quantifier)
➢JOIN and DIVISION are particular cases of the cartesian product operator(which semantically means nothing)
➢JOIN operator was a real disruptive Codd’s contribution to data management !
➢RELATIONAL ALGEBRA : Set of operators with three fundamental properties of a (good programming) language :CLOSURE, COMPLETENESS and ORTHOGONALITY along with CODD’s theorem
➢We can infer any information from a relational data base with operators of the relational algebra
➢« Query languages that are equivalent in expressive power to relational algebra are called relationally complete » (CODD)
72
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Codd ‘s theorem (1971) and itsconsequences !
➢Codd's theorem states that relational algebra and relational calculus are equivalent in expressive power. That is, a database query can be formulated in one language if and only if it can be expressed in the other.
➢A query on a relational data base can be expressed in relational calculus if and only if itcould be expressed in relational algebra
➢Relational calculus ( Codd’s Apha language) with variables and quantifiers : declarativequery
➢EXAMPLE : What are the pilots who insure a flight from Nice ?
ALPHA language : {p ∈ PILOT/ ∃ v ∈ FLIGHT/ p.PIL# = v.PIL# and v.DC = ‘Nice’}
➢Relational algebra : which is variable free : imperative query
➢ same EXAMPLE :
V1 := JOIN PILOT (PIL#= PIL#) FLIGHT
V2 := SELECT V1 (DC= ‘NICE’)
RES := PROJECT V2 (PIL#, PILNAME) 73
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Codd’s theorem and its consequences : SQL success!
➢The algebra operators could be EASILITY implemented (with formalproperties to build an optimization strategy)
➔ Underlying implementation of SQL standard !
➔ SUCCESS of SQL (Structured Query Language) standard*
Same Example with SQL :
SELECT PIL#, PILNAME
From PILOT, FLIGHT
Where PILOT.PIL#=FLIGHT.PIL# and DC= ‘Nice’;
*« SQL will be the languaqe of the future of data management(object and big data included) »
Mike Stonebraker (TURING AWARD 2014) 74
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
FUNCTIONS and disruptive data normalization (Codd)
➢FUNCTIONS :
Mathematical FUNCTIONS correspond to functional dependencies (Codd) whichis the central concept to define a good (normalized) relational schema
➢A Function between two groups of attributes A and B is denoted « ➔ » :
A ➔ B <with ➔ : N:1 function, A : determinant and B : determined>
➢Three formal properties for functional dependencies : IDENTITY, ASSOCIATIVITY and DETERMINANT AUGMENTATION
75
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
FUNCTIONS and NORMAL FORMS (NF) for relations(see also associated seminar)
➢NF2 (Non First Normal Form) towards 3 NF (Third Normal Form)
➢In a NF2 relation (unnormalized), an attribute could be either multivalued (SET construct) or a relation (Tuple construct) or SETxTUPLE valued
➢From NF2 to 1NF (1st Normal Form) by creating new relations (or attributes) to getsingle-valued attributes (TUPLES)
➢From 1NF to 3NF with FUNCTIONS (N:1 links)
➢From 3NF to 5 NF with N:M (multivalued) links < end of the decomposition process>
➢Storage anomalies (Codd) with relations which are not in 3 NF for INSERT, UPDATE, DELETE operations (the 3 storage operators)
76
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
3NF Normalization (Codd)(see also associated seminar)
➢SHARMAN definition of 3 NF relations (simplest definition)
➢A relation is in 3NF if every determinant of a N:1 link (function) is a primarykey
➢We can decompose a relation in a 3NF relation with the following theorem from Casey & Delobel
➢Decomposition theorem (from Casey & Delobel) :
Let R ( A, B, C) with B➔ C
then R could be decomposed without loss into R1 (A, B) and R2 (B, C)
77
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
EXERCICE : Relational schema in 3NF (2NF, BCNF)
➢Example FLIGHT (F#, Day, PIL#, P#, Flight-type)➢ Initialize the corresponding table with some tuples and
integrate the functional dependencies below➢ Show logical redundancy, storage anomalies and connection
trap on this example for each function below➢ Normalize (3NF) with CASEY/DELOBEL ‘s theorem
1) F# → Flight-type<2NF Normalization (partial dependency)>
2) PIL# → P#<3NF Normalization (transitive dependency)>
3) PIL# → Day <BCNF (Boyce Codd Normal Form)>
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Example : FLIGHT table
FLIGHT F# DAY PIL# P# Flight-type100 MONDAY 1 10 BLUE
105 MONDAY 1 10 RED
101 WED 2 11 BLUE
102 THURS 3 10 WHITE
102 FRIDAY 4 11 WHITE
79
For the following function F#➔ FLIGHT-TYPE *
- Logical Redundancy : the pair (102, WHITE) is redundant (as many times as there is a flight 102)- Insertion anomaly : we could not enter the pair (106, RED) until a primary key (106, DAY?) exists- Update anomaly : if we change the flight-type of flight (102,THURS) we should do it for (102, FRIDAY)- Delete anomaly : if we delete the tuple (101, WED) we could loose a unique pair (101, BLUE)- Example of a Connection trap : the decomposition of FLIGHT table into the two following tables is not
reversible : F1( F#, DAY, PIL#) and F2(PIL#, P#, Flight-type)
* NOTE : valid table also for PIL# → P# and PIL# → DAY
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Solution : Decomposition in 3 NF of the followingrelation FLIGHT (F#, DAY, PIL#, P#, FLIGHT-TYPE)
1) F# → FLIGHT-TYPE <2NF Normalization (partial dependency)>
Decomposition in F1 (F#, DAY, PIL#, P#) < 2NF>F2 ( F#, FLIGHT-TYPE)
2) PIL# → P# <3NF Normalization (transitive dependency)>
Decomposition of F1 in F11 (PIL#, P#) and F12 in (F#, DAY, PIL#)
3) PIL# → DAY <BCNF (Boyce Codd Normal Form)>
Decomposition of F12 in F121 (PIL#, DAY) and F122 (F#, PIL#)
80
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Domains and Attributes ?
➢ISSUE : The same domain could be used several times in the definition of a given relation
➢ATTRIBUTE : Role played by a domain within a relation
➢Example: Departure City /DC for the first CITY value.
FLIGHT (FLIGHT# : FLIGHTNO) x (DC : CITY) x (AC : CITY) x (DT :TIME) x (AT :TIME)
81
Example :FLIGHT FLIGHTNO x CITY x CITY x TIME x TIME
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
V1 relational data model by Ted Codd (8/19/1968) and TABLE representation
82
DOMAIN
CITY: {Nice, Paris, DUBLIN, New York}
PILOT PILNO PILNAME ADDRESS
100 Serge Nice
101 JOEL DUBLIN
102 PETERNEW YORK
Line = TUPLECOLUMN = ATTRIBUTE
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DOMAIN : a « semantical data type » (Codd)
DOMAINS
FLIGHT FLIGHTNO x CITY x CITY x TIME x TIME
FLIGHT ( FLIGHT#, DC, AC, DT, AT )
ATTRIBUTES
83
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DOMAIN : « semantical data type »(Codd)
84
Domain PILNAME forPILOTs
Domain PNAME forPLANEs
X(12)
Same Syntactical data type
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
ATTRIBUTE (Codd)
➢A attribute corresponds to the role played by a domain in a relation
➢ Attributes are unique in a given relation
➢Examples :
➢attributes DC and AC for the domains CITY in the relation FLIGHT
➢Attribute LOC for the domain CITY in the Plane relation
➢Attribute ADDR for the domain CITY in the Pilot relation
➢For every attribute, we need to indicate its corresponding domain
➢Example : PLANE Relation
(P# : PNO, PN : PNAME, ADDRESS : CITY)
with pairs : < attribute : domain: >
85
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
PRIMARY KEY
➢A relation is a SET of tuples (uniqueness of elements is a SET property)
➔Every TUPLE should be DISTINCT
➔The part of the tuple which enables the uniqueness corresponds to the PRIMARY KEY
➢The Primary Key is a subset of the attributes of the relation which enables to distinguish the existing tuples
➔THE VALUE of the primary key is UNIQUE and NOT NULL*
➢*NULL value (SQL) corresponds to an unknown value
86
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
FOREIGN KEY
➢FOREIGN KEY : Ordinary attribute in one relation which corresponds to a primary key in another relation.
➢Example :
FLIGHT (FLIGHT# : FLIGHTNO, PIL# : PILNO, P# : PNO, …)
PIL# and P# are foreign keys (primary keys in the relations PILOT and PLANE)
87
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Codd’s relational SCHEMA in two phasesPhase 1 : DOMAIN creation
PHASE 1) DOMAIN creation with CREATE DOMAIN
CREATE DOMAIN FNO NUMERIC (6) primary*
CREATE DOMAIN PNO NUMERIC (6) primary*
CREATE DOMAIN PILNO NUMERIC (6) Primary*
CREATE DOMAIN PILNAME CHARACTER (20)
CREATE DOMAIN PNAME CHARACTER (6)
CREATE DOMAIN CAPACITY NUMERIC (3)
CREATE DOMAIN CITY CHARACTER (10)
CREATE DOMAIN TIME NUMERIC (4)
* PRIMARY DOMAIN : DOMAIN on which a primary key is defined (Chris DATE); alternative to foreign key.
88
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Codd’s relation SCHEMA in two phasesPhase 2 : RELATION creation
➢PHASE 2 : RELATION creation : CREATE RELATION/TABLE with attributes defined upon previousdomains
➢CREATE RELATION : PLANE P# PRIMARY KEY : <defined upon> PNO
PNAME : PNAME
CAP : CAPACITY
LOC : CITY
➢CREATE RELATION : PILOT PIL# PRIMARY KEY : PILNO
PILNAME : PILNAME
ADDR :CITY
➢CREATE RELATION : FLIGHT FLIGHT# PRIMARY KEY : FNO
PIL# : PILNO*
P# : PNO*
DC : CITY
AC : CITY
DT : TIME
AT : TIME
* Those ordinary attributes in FLIGHT are definedover PRIMARY DOMAINS➔ they are foreign keys
89
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
INTEGRITY CONSTRAINTS in relational data structures
INTEGRITY CONSTRAINTS concern data base storage operators : INSERT, UPDATE, DELETE
90
DOMAIN➔ Type integrity
• unique NOT-NULL values
PRIMARY KEY➔ Entity integrity
• Existence constraint
• Reference constraint (SQL) : NULL, DEFAULT, CASCADE (propagation) & RESTRICT (forbidden)
FOREIGN KEY ➔ Referentialintegrity
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
EXERCICE : Relational schema (CODD)
➢With the verbs CREATE DOMAIN and CREATE RELATION, let us buildCodd’s schema for the following example:
STUDENT (S#, SNAME, ADDRESS)
PROFESSOR (P#, PNAME, PADDR)
COURSE (C#, CTITLE, Degree)
SCHEDULE (S#, P#, C#, QUARTER, YEAR, GRADE)
➢Illustrate the three integrity constraints on the SCHEDULE relation
➢Give two possibilities to declare a foreign key
91
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
RELATIONAL ALGEBRA (CODD)
➢Three key properties of a good (relational) language like the RELATIONAL ALGEBRA :
➢CLOSURE
➢COMPLETENESS
➢ORTHOGONALITY
➢NOTE : SQL standard is neither closed, nor orthogonal nor complete !
92
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Relational algebra (Codd)
93
Algebraic relational operators
UnionIntersection Difference
(cartesian product)
Restriction (unary)
Extension (binairy)
SET operators
PROJECTION(vertical splitting)
SELECTION(horizontal splitting)
JOIN DIVISION
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Generic SET operators : UNION (OR), INTERSECTION (AND), DIFFERENCE (NOT)
Let us consider two relations R et S
➢UNION :
➢INTERSECTION :
➢DIFFERENCE :
94
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
CARTESIAN PRODUCT
➢Cartesian product does not have any semantical meaning
We associate everything with everything ! :
➢2 specific particular cases of the cartesian product proposed by CODD (corresponding to the existence and universal quantifiers in predicate calculus) :
➢JOIN
➢DIVISION
95
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
PROJECTION (guess ☺)
PLANE P# PNAME CAP LOC
1 A300 300 NICE
2 B727 250 NICE
3 B747 350 Paris
4 B747 350 DUBLIN
5 A380 380NEW YORK
A1 PNAME
A300
B727
B747
A380A1 = PROJECT PLANE (PNAME)
96
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
PROJECTION (Definition)
DEFINITION NOTATION
➢Selection of attributes referencedin the target list (A) with selectionof corresponding values without
duplicates
Project R (ATT1,…ATTi)
97
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
PROJECTION (Example)
➢What are the departure cities of the company flights ?
➢RES = PROJECT FLIGHT (DC)
98
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SELECTION (guess ☺)
PLANE P# PNAME CAP CITY
1 A300 300 NICE
2 B727 250 NICE
3 B747 350 Paris
4 B747 350 DUBLIN
5 A380 380 NEW YORK
A2 = Select PLANE (PNAME= ‘B747’)
99
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SELECTION (Definition)
➢Notation : « SELECT R (boolean condition) »
➢Definition : Set of tuples of R satisfying the F condition
100
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
SELECTION (Example)
➢What are the names of planes located in Nice ?
A1 = SELECT PLANE (LOC = ‘ NICE’)
RES = Project A1 (PNAME)
Note : the query can be written : RES = Project (SELECT PLANE(LOC = ‘NICE’) (PNAME))
101
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
JOIN (Guess ☺)
102
Join R1 (A=A) R2
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
JOIN (Definition)
➢DEFINITION: Join on two relations R and S concerning twoattributes defined on the same domain ATT1 = ATT2*: ➢Cartesian product followed by a SELECTION (ATT1= ATT2)
➢ attribute equality: EQUI JOIN
➢ TETA JOIN possible with <, > , <=, > =
➢NOTATION : JOIN R [ ATT1 = ATT2] S with➢ *ATTi (attribute)
➢ATT2 defined upon the same domain as ATT1
103
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
JOIN (Example)
➢What are the names of the pilots who insure a flight from Nice ?
➢PV = JOIN PILOT (PIL#= PIL#) FLIGHT
➢PV1 = SELECT PV (DC = 'Nice')
➢RES = PROJECT PV1 ( PILNAME)
104
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Division (Guess)
Division: R1/R2
Dividend Divisor
105
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DIVISION (Définition)
DEFINITION
➢Simple division to get it started : Binary dividend and UNARY(« all ») divisor withcommon attribute (defined on the same domain)
➢R (att1, att2)/ S (Att3) with
➢ Att2 and att3 defined on the same domain
➢ Att1 : the target attribute
➢ Att3 : representing « ALL…. »
➢Particular case of cartesian product : « Complementary of the divisor in the dividend whose cartesian product with the divisor is included in the dividend ! ☺ »
106
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
DIVISION (Example) : What are the PILOT names who drive EVERY Airbus A300 ?
1. start with the Divisor (typically an unary relation ..) ➔ « Every.. »/ « ALL… »
<here : « every plane (P# primary key) corresponding to an A300 »> :
A1 = SELECT PLANE (PNAME = ‘A300’)
DS = Project A1 (P#)
2. Then build the Dividend (typically a binary relation with the target attribute and the « same » attribute of the divisor) ➔ <here (PILNAME, P#) >:
PF1 = Join PILOT (PIL# = PIL#) FLIGHT
DD = Project PF1 (PILNAME, P#)
RESULT = DD/ DS
107
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Advances in data management withCodd’s relational data model
➢SIMPLICITY of DATA representation : TABLES
➢Non procedural query language : « SAT » (Set At a Time)
➢FUNCTIONS integration to have a good schema
The relational algebra is a PRE REQUESITE for SQL2 both for usage and implementation (optimization)
108
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Three lessons from Codd’s contribution to data management and ..Computer Science !
« A 10-pages theoretical IBM research report in 1968 became a 10 billion US dollars Market 50 years later with a growing market share of 20% a year »
➢Theoretical model (SET THEORY) reduced to the maximum : Primary key, SAT operators with JOIN and FUNCTIONS (to support an international standard SQL)
➢Ease of implementation with well mastered access methods
➢An IBM research prototype in San Jose (SYSTEM-R) under the guidance of Jim Gray wassettled to determine the performance feasability of the relational data model of Ted Codd (same with INGRES at UC Berkeley with M.Stonebraker)
➢TEACHABILITY (for large dissemination of SQL developpers)
109
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Exercice :
GOAL : Give a natural language interpretation of the JOIN operator :
The JOIN corresponds to a VERB…
➢Let us consider the two following queries in the relational algebra
➢What are the names of the pilots who are DRIVING an Airbus ‘A320’ ?
➢What are the names of the pilots who are LIVING in the location city of an Airbus A320 ?
Q1) Write them in the relational algebra
Q2) Imagine how the JOIN interpretation in natural language would enable to have a semantic control on the queries ?
110
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Exercices :
Process the following queries in Codd ‘s relational algebra :
Q3 : What are the names of the planes departing from Nice ?
Q4 : What are the names of the planes driven by pilots living in Nice ?
Q5: What are the names of the planes located in the departure city of a flight proceeding to Nice ?
Q6: What are the names of the planes driven by ALL pilots living in Nice ?
Q7: What are the names of the planes driven by ALL pilots living in Nice except those living in the departure city of a flight proceeding to Nice ? (Use Q5 and Q6 with set operators)
111
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Some books of reference on Data bases
In English :
➢ Chris Date « An Introduction to data base systems » (8th Edition), Addison Wesley<the reference book on data bases>
➢ E.F Codd (1990). « The Relational Model for Database Management » (Version 2). Addison Wesley Publishing Company. ISBN 0-201-14192-2. <Codd’s book>
➢ M.Stonebraker et al « Readings in data base systems » <The « red book »> 5th Edition 1998, Morgan Koffmann
➢ S.Abitboul et al « Foundation of data bases » Addison Wesley <data base theory>
In French :
➢ JL Hainaut « Bases de données (Concepts, applications et développement) », DUNOD, 4ième Edition, 2018
➢ G. Gardarin « Bases de Données » Eyrolles, Version gratuite sur georges.gardarin.free.fr
➢ S. Miranda « L’Art des Bases de données » (3 Tomes), EYROLLES
➢ S. Miranda « Bases de données : Architectures, modèles relationnels et objets, SQL3 et ODMG », DUNOD, 2002
112
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Some books of reference on Big Data
In English➢Rajendra Akerkar (Ed) “Big Data Computing” CRC Press, 2014➢Jules Berman “Principles of Big Data” Morgan Kaufman, 2013➢Joe Celko “”A Complete guide to NO SQL » Elsevier 2014➢W.CHU Editor « Data mining and knowledge Discovery for big data » Springer 2014➢Dan Mc Creary, Ann Kelly « Making sense of NO SQL » Manning 2014➢F.Provost, T Fawcell « DATA SCIENCE for Business » O’Reilly 2013➢Jordan Tigani, Siddartha Naidi « Google Bigquery Analytics » WILEY, 2014 (510 pages)➢Mike Stonebraker, “New SQL: An Alternative to NoSQL and Old SQL for New OLTP Apps »
ACM, June 2011
In French➢R.Bruchez « Les bases de données NO SQL et le Big Data », Eyrolles 2015➢I.lemberger et al « « Big data et machine learning”, Dunod 2016➢C.Azencott “Introduction au Machine learning » Dunod 2018➢G.Grolemund « R pour les data science », Eyrolles 2017
113
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)
Short seminar on DATA BASE STORAGE and ACCESS
« Every DATA (of computing variables) has an address
and an access method»
114