Copyright Big Data Pr Serge Miranda, MBDS, Univ de...

114
Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA) 1

Transcript of Copyright Big Data Pr Serge Miranda, MBDS, Univ de...

Page 1: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

1

Page 2: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

www.mbds-fr.org

MBDS course :

From data bases to big data

(7 lectures)

Professor Serge Miranda

Dept of Computer Science

University of Nice Sophia Antipolis (member of UCA)

Director of MBDS Master degree

(www.mbds-fr.org)

2

Page 3: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

www.mbds-fr.org

DATA paradigms and Codd’s relational data model

(lecture 2)

Professor Serge Miranda

Dept of Computer Science

University of Nice Sophia Antipolis

Director of MBDS Master degree

(www.mbds-fr.org)

3

Page 4: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Contents

DATA PARADIGMS

➢Data Schema and data models

➢Data paradigms

➢TOP DOWN approach with fixed predefinedschema

➢VALUE paradigm and TIPS (ACID) properties

➢POINTER-VALUE paradigm and RICE properties (Date’s manifesto)

➢PREDICATE-VALUE Paradigm (RDF) withSparQL

➢Bottom up approach

➢KEY-VALUE paradigm and WHAT propertieswith N.O.SQL and NewSQL

Introduction to Codd’s relational -data model (VALUE paradigm)

➢ Underlying mathematical concepts : SETS and PREDICATES

➢ Value paradigm

➢ CODD ‘s relational data model

➢ Data Structures

➢ Integrity rules

➢ Relational algebra

➢ Codd’s theorem

➢ Codd’s model lessons

➢ Normalization theory (3NF) in a nutshell

➢ Short Seminars :

➢ Data base storage and access

➢ Codd & Date relational data-schema design

4

Page 5: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

(BIG) DATA !

5

Page 6: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

BIG DATA ? A scientific couple

1. DATA MANAGEMENT

SQL3, OQL, BigquerySQL, N.O.SQL, CQL, HQL, SPARQL, N1QL, Big Query SQL, UnQL, CoQL,..NEWSQL

N.O.SQL with major Open Source reference :

HADOOP/MAP REDUCE & SPARK

2. DATA ANALYTICS

AI and mathematicswith OPEN SOURCE reference :

R language (> 4000 packages), PYTHON, TENSORFLOW, CAFFE, etc.

6

Page 7: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Some visions of the future of big-data management

➢CLOUD COMPUTING

➢ INFRASTRUCTURE as a SERVICE (IaaS)

➢ PLATFORM as a SERVICE (Paas)

➢ DATA as a Service (DaaS)from Oracle ;ANALYTICS as a SERVICE (AaaS)from Google, IBM, etc.

➢« CAMS » (IBM 2014)

➢CLOUD for servers

➢DaaS/AaaS : « (DATA) ANALYTICS as a service »

➢Mobility (smartphones applications)

➢Social Networks (for data integration)

➢ « SMAC » stacks (CITY GROUP, Vikram Pandit )« No business model in the future could succeed without the DATA »

SocialMobileApplicationsCloud

7

Page 8: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

8

Cloud

Big Data

IoT

Social Mobile

ORACLE vision : CI-MBDS☺

Page 9: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Parallelism and data management

TERABYTES (10**12) a second ?

➢Hard disk (HD) : 100 Mega Bytes/sec➢1 Peta Octet (10**15) a sec ?➔ 10 000 HDs?

➢3 options :➢DATA COMPRESSION➢SCALE UP : (SMP, CLUSTER, MPP)➢SCALE OUT

9

Page 10: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Programming paradigms

Imperative Declarative

Procedural No-proceduralFUNCTIONAL

Object(OQL)

SQL/NO SQL /NEWSQL

HOW ? WHAT ?

Programming paradigms[Manning2013]

10

Page 11: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

(structured) DATA MANAGEMENT concepts

11

Page 12: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Real world, SCHEMA and data model

Structured approach of real-world abstractions named SCHEMAs

by applying a DATA MODEL

12

DATA MODEL SchemaReal World

Page 13: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DATA MODEL?

13

DATA MODEL

DATA STRUCTURES

Data-structures operators(algebra)

Integrity rules

Page 14: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DATA BASES and DBMS (Data base management system)

➢DBMS?

➢DEFINITION

➢MANIPULATION

➢CONTROL

of data bases

14

SCHEMA

DATA BASE DBMS

Page 15: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Exercice(Internet search)

1. Look at ANSI SPARC standard for CONCEPTUAL SCHEMA for data bases and clarify the concepts of :

➢conceptual schema,

➢logical schema,

➢physical schema,

➢sub-schema

2. Define what is a META MODEL ?

15

Page 16: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Underlying mathematical concepts for big data management

16

Page 17: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SET

➢A SET * is a well-defined collection of distinct elements with two basic

properties for them:➢They are UNIQUE

➢There is NO ORDERING

➢SET DEFINITION :➢Intensional definition (giving properties of the elements)

➢extensional definition (listing elements)

➢SUBSETS and POWER SETS (sets of all subsets)

➢SET operators➢INTERSECTION, UNION, DIFFERENCE,

➢CARTESIAN PRODUCT

*G. CANTOR :

« A set is a gathering together into a whole of definite, distinct objects of our perception [Anschauung] or of our thought — which are

called elements of the set ».

17

Page 18: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

PROPOSITION and PREDICATE

PROPOSITION ?

Any sentence with boolean value : TRUE or FALSE

EX : « Socrates is deadly ! » « John loves Mary » etc

PREDICATE ?

« Any sentence containing VARIABLES which is transformed into a PROPOSITION when we replace variables by VALUES »

EX : LOVE (x,y) < or X LOVES Y> ; predicate with two variables

1st-Order predicate logic with well-formed formulas

18

Page 19: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

GRAPHS

➢A GRAPH is a SET of nodes (vertices) and SET of edges which couldbe directed (digraph) or undirected, labelled or not

➢Example : Category (in Maths) : a labelled directed graph

➢A MULTI–GRAPH with multiple edges

➢A CATEGORY is a directed multigraph

➢HYPER GRAPH : an edge can join a set of vertices

➢OPERATORS

➢Unary operations : dual graph, edge contraction, …

➢Binary operations : disjoint union, cartesian product, etc.

19

Page 20: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SCALARS and VECTORS

➢SCALAR ➢In MATHS : element of a VECTOR➢In Computing : a VARIABLE (with an address to store a value)

➢SCALAR ➔ VECTOR (rank 1)➔ MATRIX (rank2)➔ TENSOR (rank 3++)➢TENSOR : Multidimensional array➢MATRIX : Group of vectors➢VECTORIZATION : converting DATA into vectors

➢GRADIENT : generalization of the derivative of a function to a function f in several dimensions (vector of n partial derivatives of f)

Page 21: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

MATRICES and linear algebra (matrix algebra)

➢M by N Matrix : rectangular array of numbers or symbols arranged in M lines and N columns➢Row vector (single-row matrix) or column vector if M or N is = 1➢N by N SQUARE MATRIX (vector transformation, …)➢VECTOR : particular case of a Matrix with N=1

➢Major matrix OPERATIONS :➢Matrix addition and Matrix multiplication ➢Matrix transposition (lines to rows and vice versa)➢LINEAR TRANSFORMATION/mapping (linear algebra)➢Other OPERATORS :

➢ Tensor product➢ Hadamart product : element-wise product of 2 vectors➢ Others : dot product, etc.

➢Linear algebra is the bedrock of Machine Learning (and Deep Learning) :

➢Ax = b in basic machine learning with the matrix A and the parameter vector x to get output column

vector b

21

Page 22: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

« Value » vs « Pointer (variable/scalar) » in computer science ?

22

Page 23: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

VALUE vs VARIABLES vs POINTERS

➢VALUE ?

data which cannot be modified

➢VARIABLE (SCALAR) ?

Every variable is a triptych : a NAME, a VALUE (updatable) and a (memory) ADDRESS

VARIABLE := ( NAME , VALUE, ADDRESS)

➢POINTER ?

Type of variable which contains the ADDRESS of another variable as VALUE

23

Page 24: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Variables and attached operators

Variables have Addresses (not values)

« ADDRESS type » with two basic operators :

* Referencing : v➔ ADDR

in C : ptr = &v; (with char v; and char *ptr;)

in PL/1 DECLARE N INTEGER

DECLARE P POINTER

P= ADDR (N)

* Dereferencing : ADDR➔ v

in C : *A ; in PL/1 : A--> V

24

Page 25: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

The three data structures of CODD’srelational data model < VALUE paradigm>

25

VALUE (DATA)

DOMAINS

RELATIONS(« tables » in SQL)

SETconstruct

TUPLE (cartesian product)construct

A relation in Codd’s model is➢ either a predicate (with N variables) or➢ a SET (subset of the cartesian product of N domains)

Domain = SET of values

Page 26: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Relation in Codd’s data model at a given time : Table of VALUES

26

{Nice, Paris, NEW YORK, DUBLIN}

Domain : CITY

Pilot PIL# PILNAME ADDR

100 Serge Nice

101 John New York

102 Joel DUBLIN

Line= « TUPLE »

COLUMN = « ATTRIBUTE »

Page 27: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Codd’s Relational algebra (SQL foundation)

➢« RELATION » : « Set » or « predicate »

➢2 dimensional arrays with 4 specific algebraic operators : Select, Project, Join, Division

➢COLUMN implementation for decision support

➢LINE (tuple) implementation for transaction support

➢Closure + Completeness + Orthogonality of the relational ALGEBRA

➢➔ QUERY INTERFACE without any programming to retrieve DATA

➢➔ NON-PROCEDURAL PROGRAMMING !

27

Page 28: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Plethora of BIG-DATA management Systems (Aslett, 2013)

➢https://blogs.the451group.com/information_management/2011/04/

28

Page 29: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Plethora of Big Data MANAGEMENTSYSTEMS : data paradigms

29

TIPS RICE

WHAT

Big DataMngt

SYSTEMS

SQL2, SQL3/ODMGNEW SQLBigSQL

Codd’s relational data model (SET theory)VALUE paradigm

OBJECT data model (GRAPH THEORY)

POINTER-VALUE paradigm(SQL3)

OBJECT-VALUE paradigm(ODMG)

N.O. SQL/ NEW SQL

SPARQL(OWL)

PREDICATE-VALUE (RDF) paradigm(Semantic web)

(GRAPH THEORY)

KEY VALUE paradigm(Map Reduce)(MATRICES and linear algebra)

Predefined Schema

Meta data

Page 30: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

TOP DOWN approach for structuredand semi-structured DATA BASES

➢TOP DOWN approach with predefined SCHEMA and metadata

➢STRUCTURED DATA standards

➢SQL2 and VALUE paradigm

➢SQL3 and POINTER-VALUE paradigm

➢ODMG and OBJECT-VALUE paradigm

➢SEMI-STRUCTURED DATA standards

➢SPARQL and PREDICATE-VALUE (RDF) paradigm

30

Page 31: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

A relational-schema example with three predicates

PILOT (PIL#, PILNAME, ADDR)

PIL#: Pilot ID then NAME and ADDRESS (City)

PLANE (P#, PNAME, CAP, LOC)

CAP : Capacity, LOC : localization city

FLIGHT (FL#, PIL#, P#, DC,AC, DT, AT)

DC : Departure City, AC : arrival city,

DT : Departure Time, AT : Arrival Time

Note : Primary keys are underlined

31

Page 32: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Top Down approach with SQL/ODMG

32

Real World

SCHEMA

TOP DOWN approachfor DATA

STRUCTURATION

➔ pre-definition of a fixed schema

DATA MODEL

Page 33: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DB contibution to Computer Science : TIPS properties

T

I

P

S

Transactions (with ACID properties)

No-procedural Interface (SQL)

Persistency (virtual paged memory)

Structuration (Schema)

33

Page 34: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SQL (with transactional focus)

➢TIPS properties with « T » referring to « TRANSACTIONS »

➢« ACID » properties for Transactions :

➢Atomicity

➢Consistency

➢Isolation

➢Durability

➢JIM GRAY ‘s theorem (on well-formed transactions with two phases)

➢OLTP (On line Transaction Processing)

➢Data Warehouse/data Mining (& OLCP : On Line Complex Processing)

34

Page 35: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Data Models, DB Manifesto and standards

➢Relational data model by Ted CODD (8/19/1968)

➔ theoretical support for SQL2 standard

➢Three Manifestos for « Future DB » by F. Bancilhon’s (1st) , M. Stonebraker(2nd), and Chris DATE (3rd)

➢ 3rd Manifesto by C. DATE + 2nd Manifesto by M. Stonebraker

➔ SQL3 Standard (« OR Data Model » - Object - Relational DM -)

➢1st Manifesto by F.Bancilhon

➔ ODMG standard (« OO Data Model » – Object-Oriented )

35

Page 36: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Object contributions to the DB World : RICE properties

36

R

I

C

E

Reusability (Inheritance or polymorphism)

Identification (OID : Object Identifier)

Complex Object construct

Encapsulation (Methods)

Page 37: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Three approaches for OBJECT Data models

37

•RICE properties

•1st Manifesto(F. Bancilhon)

VALUES

•RICE properties

•3rd Manifesto(C.Date)

DOMAINS

•RICE properties

•2nd Manifesto(M. Stonebraker)

RELATIONS

SQL3

ODMG

Page 38: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DATA BASE market & standards?

38

(Stonebracker 96 & Gartner)

DATA

PROCESSING

SQL

No SQL

Simple Complex (graphs,..)

R-DBMS

SQL3Mobiquitous &

Big Data systems

ODMG CAD

OR-DBMS

OO-DBMS

File System

(1) (2)

(3)

SQL2Production Decision

2010 2020

(1) 10 G dollars

20% of cont. Growing rate

(2) 2x (3) 2x (1)!

(3) 1/100 x (1)

1/100 x (1)

Page 39: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Exercice

➢ACID transactions enable to solve 2 major problems in data management : explain ☺

39

Page 40: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Solution

AC : Atomicity and Consistency

➢ALL or NOTHING

➢DB consistency in front of any failure

ID (Isolation and Durability)

➢Every well formed transaction (LOCK-Share before reading and LOCK Exclusive before Writing ) is serializable with 2 phase-locking(Jim Gray’s theorem)

➢Isolation

➢No interference among concurrent transactions

40

ACID and 2 issues : CONCURRENCY and CRASH RECOVERY

Page 41: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Example : SQL2 (Relational)

Who are the pilots (PIL#, PILNAME) from Nice driving a plane from Nice ?

In SQL2:

SELECT PIL#, PILNAME

FROM PILOT, FLIGHT

WHERE PILOT.PIL#= FLIGHT.PIL# and PILOT.ADDR= ’Nice’ and FLIGHT.DC= ‘Nice’;

In Codd’s relational algebra :

V1 = Join PILOT (PIL#= PIL#) FLIGHT

V2 = Select V1 (ADDR= ‘Nice’ and DC=‘Nice’)

RESULT = Project V2 (PIL#, PILNAME)

41

Page 42: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Example : SQL3 (object relational)

Who are the pilots from Nice driving a plane from Nice ?

In SQL3 :

SELECT REFPIL ➔ PIL#,PILNAME

FROM FLIGHT

WHERE DC= ‘Nice’ and REFPIL ➔ ADDR =‘Nice’;

Note : with

➢REFPIL : REF type attribute containing ROWID (OID) from PILOT and

➢« ➔ » : Dereferencing operator

42

Page 43: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Example : OQL (ODMG; object)

Who are the pilots from Nice driving a plane from Nice ?

In OQL

SELECT p.PIL#, p.PILNAME

FROM

p in PILOT

v in p.insureFLIGHT

WHERE

p.ADDR= ‘Nice ’ and v.DC=‘Nice’;

Note : with « insureFLIGHT », bidirectional persistent REF pointer from PILOT class towards FLIGHT class

43

Page 44: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

TOP DOWN approach for semi-structured DATA stores

➢OPEN DATA

➢WEB DATA (Semantic web)

➢RDF (Resource Description Framework) paradigm

44

Page 45: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

OPEN DATA formats

PDF for documents

For DATA :

➢CSV (Excel)

➢Web standards for publication and sharing

➢HTML (HTML5), XML, RDF

➢Web standards for syndication

➢RSS, Atom, JSON

45

Page 46: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

OPEN DATA : CSV, JSON, XML

46

XMLJSONCSV

CSV (Comma Separated Value ) for flat files (1)JSON (Java Script Object Notation) for hierarchical documents (2) XML (eXtensible Markup Language) for (1), (2), namespaces,…

Page 47: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

CSV, JSON and XML (Examples)

#CSV exampleFirst, Name, Course title, date« Serge », « Miranda », « From data bases to Big Data », « 2020 »-----------------------------------------------------------------// JSON example{« First »: « Serge », « Name »: « Miranda »,« course »: {« title »: « From data bases to BIG DATA », « date » : « 2020 »}}------------------------------------------------------------------<!- XML example --><xml> <professor>serge Miranda</professor><list><course>From data bases to BIG DATA</course><date>2020</date></list></xml>

47

Page 48: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DATA WEB(semantic web)

« I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web — (the content, links, and transactions between people

and computers). ..A « Semantic Web », which should make this possible,has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy

and our daily lives will be handled by machines talking to machines. The « intelligent agents » people have touted for ages will finally materialize »

TIM Berners Lee (2001, Weaving the web)

➢ WEB evolution :➢Network of PAGES ➔➢Network of structured documents (XML) ➔➢DATA WEB/Network of DATA (RDF)➔➢Semantic web (Linked RDF) < W3C>

48

Page 49: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

« 5 star » LINKED OPEN DATA

➢In 2010, Tim Bernes Lee gave a quality scaling for open data.

49

Page 50: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SEMANTIC WEB stack (Manning2013)

A typical semantic web stack with common low-level standards like URI, XML, and RDF at the bottom of the stack. The middle layer includes standards for querying (SPARQL) and standards for rules (RIF/SWRL). At the top of the stack are user interface and application layers above abstract layers of logic, proof, and trust building.

50

Page 51: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SEMANTIC WEB?

A DATA MODEL :

➢DATA STRUCTURES

➢URI (Universal Resource Identifier)

➢Unique Format : RDF (Resource Description Framework)

➢A schema : RDFS

➢ONTOLOGY = SCHEMA (RDFS) + INSTANCES

➢TWO manipulation languages :

➢SPARQL

➢OWL (Ontology Web Language)

51

Page 52: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

RDF (Resource Description Framework)

➢Defined by W3C (January 15th, 2008)

➢Derived from XML

➢URI for resource identification

➢Web page (identified by URL)

➢Web Service

➢XML document fragment

➢Any object (even physical) having collected DATA

52

Page 53: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DATA in RDF

RDF triples to describe WEB resources

(:serge: insureFLIGHT:AF100)

(:Peter:insureFLIGHT:AF110)

(:AIRBUSA320:is-used-inFLIGHT: AF100)

(:Paul:is-passenger-inFLIGHT:AF100) …

Note :

A RDF triple<S.P.O>

- is a fact in 1st order predicate logic

- P(S,O) with P Predicate, S Subject and O object

Example : INSUREFLIGHT (Serge, AF100)

53

Page 54: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

RDF graph (Example)

54

:Serge

:AF100

AIRBUSA320 Paul

:ispassengerinflight:isusedinflight

:insureflight

:drivesplane

Page 55: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SPARQL (queries on RDF graphs)

PREFIX dc:http://purl.org/dc/elements/1.1/

<URI > “?” < free variable > : <Data Source>

SELECT ?X

WHERE { <http://../../ > dc:Y ?X } < triple list>

FROM graph name RDF

55

Page 56: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SPARQL (Example)

Who are the pilots from Nice driving a plane from Nice ?

Prefix rdf :<http:// www….>

SELECT ? PILOT

WHERE { GRAPH ?g

{ ?PILOT rdf :ADDR rdf: Nice

?FLIGHT rdf:DC rdf: Nice }}

56

Page 57: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SPARQL engine

57

Page 58: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Research : bridge between SPARQL and SQL

➢* « An effective SPARQL support over relational DB »(IBM China), VLDB Vienna, Austria, Sept 2007, LNCS 5005, Spinger Verlag

➢See also SPASQL (SQL extension to handle SPARQL subqueries)

58

Page 59: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Bottom up Approach of Big data Management with N.O.SQL and NEW SQL : see LECTURE 7

59

OR- DBMS

DATA PROCESSING

SQL

NoSQL

OO-DBMS

SQL3

ODMG

ComplexStructured data

Top Down(schema)

ComplexUnstructured data

Bottom Up(no schema; no metadata)

DATA STRUCTURE

N.O. SQL

NEW SQL

Page 60: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SQL evolution (a.c.) < See Lectures 4, 6 and 7>

60

Page 61: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

www.mbds-fr.org

Introduction to Codd’s relationaldata model

(value paradigm)

Professor Serge Miranda

61

Page 62: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Dr Edgar « Ted » CODD (1923- 2003)

➢TURING AWARD in 1981

➢Codd’s relational data model (1968)

August 19 1968, IBM Research Report RJ599., « Derivability,

Redundancy, and Consistency of Relations Stored in Large Data

Banks »,

➢1970 : « A Relational Model of Data for Large shared Data Banks »,

CACM 13, No. 6, June 1970 pp 377-387

➢Codd’s theorem on relational programming language

(1971)

➢E. F. Codd, « Relational completeness of data base sublanguages »,

in R. Rustin, (ed.) Data Base Systems, Proceedings of 6th Courant

Computer Science Symposium (May 24-25, 1971 : New York, N.Y.),

pp. 65-98, Prentice-Hall, 1972

➢Note « B.C. », in Data management means « Before Codd », ☺

Ted CODD in Sophia Antipolis, 1986

(with Gilles Taladoire and Serge Miranda) for a demo of a

relational DBMS on PC (CAMPUS)62

Page 63: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

VALUE paradigm in Codd’s relationalmodel and 2 phases of data STRUCTURATION

63

Page 64: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

100 500 NICENEW YORK

B747 A320 JOHN PETER JOEL SERGE

VALUE DOMAIN

Phase 1 of STRUCTURATION* in CODD’s model with the SET construct

64

SET construct.

* NOTE : corresponding to the « S » of SQL

(Structured Query Language) DOMAIN = SET of VALUES

JOHN PETER SERGE

B747 A300 A320

NICE NEW YORK

PILNAME

CITY PNAME

….

Page 65: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Phase 2 of Structuration with the TUPLE construct (cartesian product)

PLANE P# PNAME CAP LOC

100 A300 200 Paris

101 320 250New York

RELATION is a SET of tuples(which could be VISUALIZED by

TABLES of values)

Relations

TUPLE

DO

MA

IN

S

A relation is the SUBSET of the cartesian product of N domains (sets)

65

Page 66: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Codd’s relational data models and SQL standards

66

V1 (1970) « b.c. »* 1970

RM-T (1980) and

V2/V3 (1990) « a.c. »* 1970

SQL1 (1989)

SQL2 <relational> (1992)

SQL3 < object relational> (1999)

SQL4, SQL5, SQL6, SQL7 2020..

Codd’s relational data models

SQL standards

SEQUEL (1975 et 1982)

* b.c. (a.c) : before (after)

Codd

Page 67: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Data structures of Codd’srelational data model

67

Page 68: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

(Codd’s) Relational data model : RELATION, RELATIONAL ALGEBRA and FUNCTIONS

➢RELATION : kernel mathematical concept with double formal basis :

SETS or PREDICATES

➔ Double underlying formal theory for RELATIONS :

➢SET THEORY

➢1st Order predicate logic

➔ Double family of RELATIONAL languages :

SET-oriented (relational algebra) or PREDICATE–oriented

(Relational calculus like Codd’s ALPHA)

➢No need for programming iteration

➢No need for procedures (NON-PROCEDURAL interfaces)

➢NO Programming in relational data management

68

Page 69: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

1) RELATION : SET

RELATION : Subset of the cartesian product of N domains.

At a given time a relation consists of a set of tuples which COULD BE represented by tables in a very SIMPLE way

Examples :

PILOT ⊆ PILNOxPILNAMEx CITY

PLANE ⊆ PNO x PNAME x CAP x LOC

PLANE tuples : {(10, Airbus, 320, Nice), (11, B747, 300, NEW YORK), …}

69

Page 70: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DATA representation in RELATIONS

➢SIMPLE data representation : TABLES of VALUES with lines (TUPLES) and COLUMNS (attributes)

➢All DATA are represented in TABLES <VALUE PARADIGM>

70

PILOT PIL# PILNAME ADDR

100 Serge Nice

101 Peter New York

102 Joel Dublin

Page 71: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

2) RELATION = PREDICATE

➢RELATION = PREDICATE with N variables

➢EX : « The PLANE numbered P# has a name PNAME with capacity CAP and islocalized in LOC ».

written : PLANE (P#, PNAME, CAP, LOC)

➢ PROPOSITION = TUPLE (with TRUE value)

➢EX : « The PLANE numbered 10 is an AIRBUS A320 with capacity 320 seatsand is localized in Paris »

TUPLE : TRUE-valued proposition written : PLANE (10, AIRBUS, 320, Paris)

71

Page 72: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

RELATIONAL ALGEBRA (Codd’s)

➢RELATIONAL ALGEBRA

➢SET OPERATORS : Union, Intersection, Difference

➢RELATIONAL OPERATORS

➢UNARY operators : SELECTION, PROJECTION

➢BINARY OPERATORS : JOIN (existence quantifier) and DIVISION (Universal quantifier)

➢JOIN and DIVISION are particular cases of the cartesian product operator(which semantically means nothing)

➢JOIN operator was a real disruptive Codd’s contribution to data management !

➢RELATIONAL ALGEBRA : Set of operators with three fundamental properties of a (good programming) language :CLOSURE, COMPLETENESS and ORTHOGONALITY along with CODD’s theorem

➢We can infer any information from a relational data base with operators of the relational algebra

➢« Query languages that are equivalent in expressive power to relational algebra are called relationally complete » (CODD)

72

Page 73: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Codd ‘s theorem (1971) and itsconsequences !

➢Codd's theorem states that relational algebra and relational calculus are equivalent in expressive power. That is, a database query can be formulated in one language if and only if it can be expressed in the other.

➢A query on a relational data base can be expressed in relational calculus if and only if itcould be expressed in relational algebra

➢Relational calculus ( Codd’s Apha language) with variables and quantifiers : declarativequery

➢EXAMPLE : What are the pilots who insure a flight from Nice ?

ALPHA language : {p ∈ PILOT/ ∃ v ∈ FLIGHT/ p.PIL# = v.PIL# and v.DC = ‘Nice’}

➢Relational algebra : which is variable free : imperative query

➢ same EXAMPLE :

V1 := JOIN PILOT (PIL#= PIL#) FLIGHT

V2 := SELECT V1 (DC= ‘NICE’)

RES := PROJECT V2 (PIL#, PILNAME) 73

Page 74: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Codd’s theorem and its consequences : SQL success!

➢The algebra operators could be EASILITY implemented (with formalproperties to build an optimization strategy)

➔ Underlying implementation of SQL standard !

➔ SUCCESS of SQL (Structured Query Language) standard*

Same Example with SQL :

SELECT PIL#, PILNAME

From PILOT, FLIGHT

Where PILOT.PIL#=FLIGHT.PIL# and DC= ‘Nice’;

*« SQL will be the languaqe of the future of data management(object and big data included) »

Mike Stonebraker (TURING AWARD 2014) 74

Page 75: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

FUNCTIONS and disruptive data normalization (Codd)

➢FUNCTIONS :

Mathematical FUNCTIONS correspond to functional dependencies (Codd) whichis the central concept to define a good (normalized) relational schema

➢A Function between two groups of attributes A and B is denoted « ➔ » :

A ➔ B <with ➔ : N:1 function, A : determinant and B : determined>

➢Three formal properties for functional dependencies : IDENTITY, ASSOCIATIVITY and DETERMINANT AUGMENTATION

75

Page 76: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

FUNCTIONS and NORMAL FORMS (NF) for relations(see also associated seminar)

➢NF2 (Non First Normal Form) towards 3 NF (Third Normal Form)

➢In a NF2 relation (unnormalized), an attribute could be either multivalued (SET construct) or a relation (Tuple construct) or SETxTUPLE valued

➢From NF2 to 1NF (1st Normal Form) by creating new relations (or attributes) to getsingle-valued attributes (TUPLES)

➢From 1NF to 3NF with FUNCTIONS (N:1 links)

➢From 3NF to 5 NF with N:M (multivalued) links < end of the decomposition process>

➢Storage anomalies (Codd) with relations which are not in 3 NF for INSERT, UPDATE, DELETE operations (the 3 storage operators)

76

Page 77: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

3NF Normalization (Codd)(see also associated seminar)

➢SHARMAN definition of 3 NF relations (simplest definition)

➢A relation is in 3NF if every determinant of a N:1 link (function) is a primarykey

➢We can decompose a relation in a 3NF relation with the following theorem from Casey & Delobel

➢Decomposition theorem (from Casey & Delobel) :

Let R ( A, B, C) with B➔ C

then R could be decomposed without loss into R1 (A, B) and R2 (B, C)

77

Page 78: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

EXERCICE : Relational schema in 3NF (2NF, BCNF)

➢Example FLIGHT (F#, Day, PIL#, P#, Flight-type)➢ Initialize the corresponding table with some tuples and

integrate the functional dependencies below➢ Show logical redundancy, storage anomalies and connection

trap on this example for each function below➢ Normalize (3NF) with CASEY/DELOBEL ‘s theorem

1) F# → Flight-type<2NF Normalization (partial dependency)>

2) PIL# → P#<3NF Normalization (transitive dependency)>

3) PIL# → Day <BCNF (Boyce Codd Normal Form)>

Page 79: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Example : FLIGHT table

FLIGHT F# DAY PIL# P# Flight-type100 MONDAY 1 10 BLUE

105 MONDAY 1 10 RED

101 WED 2 11 BLUE

102 THURS 3 10 WHITE

102 FRIDAY 4 11 WHITE

79

For the following function F#➔ FLIGHT-TYPE *

- Logical Redundancy : the pair (102, WHITE) is redundant (as many times as there is a flight 102)- Insertion anomaly : we could not enter the pair (106, RED) until a primary key (106, DAY?) exists- Update anomaly : if we change the flight-type of flight (102,THURS) we should do it for (102, FRIDAY)- Delete anomaly : if we delete the tuple (101, WED) we could loose a unique pair (101, BLUE)- Example of a Connection trap : the decomposition of FLIGHT table into the two following tables is not

reversible : F1( F#, DAY, PIL#) and F2(PIL#, P#, Flight-type)

* NOTE : valid table also for PIL# → P# and PIL# → DAY

Page 80: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Solution : Decomposition in 3 NF of the followingrelation FLIGHT (F#, DAY, PIL#, P#, FLIGHT-TYPE)

1) F# → FLIGHT-TYPE <2NF Normalization (partial dependency)>

Decomposition in F1 (F#, DAY, PIL#, P#) < 2NF>F2 ( F#, FLIGHT-TYPE)

2) PIL# → P# <3NF Normalization (transitive dependency)>

Decomposition of F1 in F11 (PIL#, P#) and F12 in (F#, DAY, PIL#)

3) PIL# → DAY <BCNF (Boyce Codd Normal Form)>

Decomposition of F12 in F121 (PIL#, DAY) and F122 (F#, PIL#)

80

Page 81: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Domains and Attributes ?

➢ISSUE : The same domain could be used several times in the definition of a given relation

➢ATTRIBUTE : Role played by a domain within a relation

➢Example: Departure City /DC for the first CITY value.

FLIGHT (FLIGHT# : FLIGHTNO) x (DC : CITY) x (AC : CITY) x (DT :TIME) x (AT :TIME)

81

Example :FLIGHT FLIGHTNO x CITY x CITY x TIME x TIME

Page 82: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

V1 relational data model by Ted Codd (8/19/1968) and TABLE representation

82

DOMAIN

CITY: {Nice, Paris, DUBLIN, New York}

PILOT PILNO PILNAME ADDRESS

100 Serge Nice

101 JOEL DUBLIN

102 PETERNEW YORK

Line = TUPLECOLUMN = ATTRIBUTE

Page 83: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DOMAIN : a « semantical data type » (Codd)

DOMAINS

FLIGHT FLIGHTNO x CITY x CITY x TIME x TIME

FLIGHT ( FLIGHT#, DC, AC, DT, AT )

ATTRIBUTES

83

Page 84: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DOMAIN : « semantical data type »(Codd)

84

Domain PILNAME forPILOTs

Domain PNAME forPLANEs

X(12)

Same Syntactical data type

Page 85: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

ATTRIBUTE (Codd)

➢A attribute corresponds to the role played by a domain in a relation

➢ Attributes are unique in a given relation

➢Examples :

➢attributes DC and AC for the domains CITY in the relation FLIGHT

➢Attribute LOC for the domain CITY in the Plane relation

➢Attribute ADDR for the domain CITY in the Pilot relation

➢For every attribute, we need to indicate its corresponding domain

➢Example : PLANE Relation

(P# : PNO, PN : PNAME, ADDRESS : CITY)

with pairs : < attribute : domain: >

85

Page 86: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

PRIMARY KEY

➢A relation is a SET of tuples (uniqueness of elements is a SET property)

➔Every TUPLE should be DISTINCT

➔The part of the tuple which enables the uniqueness corresponds to the PRIMARY KEY

➢The Primary Key is a subset of the attributes of the relation which enables to distinguish the existing tuples

➔THE VALUE of the primary key is UNIQUE and NOT NULL*

➢*NULL value (SQL) corresponds to an unknown value

86

Page 87: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

FOREIGN KEY

➢FOREIGN KEY : Ordinary attribute in one relation which corresponds to a primary key in another relation.

➢Example :

FLIGHT (FLIGHT# : FLIGHTNO, PIL# : PILNO, P# : PNO, …)

PIL# and P# are foreign keys (primary keys in the relations PILOT and PLANE)

87

Page 88: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Codd’s relational SCHEMA in two phasesPhase 1 : DOMAIN creation

PHASE 1) DOMAIN creation with CREATE DOMAIN

CREATE DOMAIN FNO NUMERIC (6) primary*

CREATE DOMAIN PNO NUMERIC (6) primary*

CREATE DOMAIN PILNO NUMERIC (6) Primary*

CREATE DOMAIN PILNAME CHARACTER (20)

CREATE DOMAIN PNAME CHARACTER (6)

CREATE DOMAIN CAPACITY NUMERIC (3)

CREATE DOMAIN CITY CHARACTER (10)

CREATE DOMAIN TIME NUMERIC (4)

* PRIMARY DOMAIN : DOMAIN on which a primary key is defined (Chris DATE); alternative to foreign key.

88

Page 89: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Codd’s relation SCHEMA in two phasesPhase 2 : RELATION creation

➢PHASE 2 : RELATION creation : CREATE RELATION/TABLE with attributes defined upon previousdomains

➢CREATE RELATION : PLANE P# PRIMARY KEY : <defined upon> PNO

PNAME : PNAME

CAP : CAPACITY

LOC : CITY

➢CREATE RELATION : PILOT PIL# PRIMARY KEY : PILNO

PILNAME : PILNAME

ADDR :CITY

➢CREATE RELATION : FLIGHT FLIGHT# PRIMARY KEY : FNO

PIL# : PILNO*

P# : PNO*

DC : CITY

AC : CITY

DT : TIME

AT : TIME

* Those ordinary attributes in FLIGHT are definedover PRIMARY DOMAINS➔ they are foreign keys

89

Page 90: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

INTEGRITY CONSTRAINTS in relational data structures

INTEGRITY CONSTRAINTS concern data base storage operators : INSERT, UPDATE, DELETE

90

DOMAIN➔ Type integrity

• unique NOT-NULL values

PRIMARY KEY➔ Entity integrity

• Existence constraint

• Reference constraint (SQL) : NULL, DEFAULT, CASCADE (propagation) & RESTRICT (forbidden)

FOREIGN KEY ➔ Referentialintegrity

Page 91: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

EXERCICE : Relational schema (CODD)

➢With the verbs CREATE DOMAIN and CREATE RELATION, let us buildCodd’s schema for the following example:

STUDENT (S#, SNAME, ADDRESS)

PROFESSOR (P#, PNAME, PADDR)

COURSE (C#, CTITLE, Degree)

SCHEDULE (S#, P#, C#, QUARTER, YEAR, GRADE)

➢Illustrate the three integrity constraints on the SCHEDULE relation

➢Give two possibilities to declare a foreign key

91

Page 92: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

RELATIONAL ALGEBRA (CODD)

➢Three key properties of a good (relational) language like the RELATIONAL ALGEBRA :

➢CLOSURE

➢COMPLETENESS

➢ORTHOGONALITY

➢NOTE : SQL standard is neither closed, nor orthogonal nor complete !

92

Page 93: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Relational algebra (Codd)

93

Algebraic relational operators

UnionIntersection Difference

(cartesian product)

Restriction (unary)

Extension (binairy)

SET operators

PROJECTION(vertical splitting)

SELECTION(horizontal splitting)

JOIN DIVISION

Page 94: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Generic SET operators : UNION (OR), INTERSECTION (AND), DIFFERENCE (NOT)

Let us consider two relations R et S

➢UNION :

➢INTERSECTION :

➢DIFFERENCE :

94

Page 95: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

CARTESIAN PRODUCT

➢Cartesian product does not have any semantical meaning

We associate everything with everything ! :

➢2 specific particular cases of the cartesian product proposed by CODD (corresponding to the existence and universal quantifiers in predicate calculus) :

➢JOIN

➢DIVISION

95

Page 96: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

PROJECTION (guess ☺)

PLANE P# PNAME CAP LOC

1 A300 300 NICE

2 B727 250 NICE

3 B747 350 Paris

4 B747 350 DUBLIN

5 A380 380NEW YORK

A1 PNAME

A300

B727

B747

A380A1 = PROJECT PLANE (PNAME)

96

Page 97: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

PROJECTION (Definition)

DEFINITION NOTATION

➢Selection of attributes referencedin the target list (A) with selectionof corresponding values without

duplicates

Project R (ATT1,…ATTi)

97

Page 98: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

PROJECTION (Example)

➢What are the departure cities of the company flights ?

➢RES = PROJECT FLIGHT (DC)

98

Page 99: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SELECTION (guess ☺)

PLANE P# PNAME CAP CITY

1 A300 300 NICE

2 B727 250 NICE

3 B747 350 Paris

4 B747 350 DUBLIN

5 A380 380 NEW YORK

A2 = Select PLANE (PNAME= ‘B747’)

99

Page 100: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SELECTION (Definition)

➢Notation : « SELECT R (boolean condition) »

➢Definition : Set of tuples of R satisfying the F condition

100

Page 101: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

SELECTION (Example)

➢What are the names of planes located in Nice ?

A1 = SELECT PLANE (LOC = ‘ NICE’)

RES = Project A1 (PNAME)

Note : the query can be written : RES = Project (SELECT PLANE(LOC = ‘NICE’) (PNAME))

101

Page 102: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

JOIN (Guess ☺)

102

Join R1 (A=A) R2

Page 103: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

JOIN (Definition)

➢DEFINITION: Join on two relations R and S concerning twoattributes defined on the same domain ATT1 = ATT2*: ➢Cartesian product followed by a SELECTION (ATT1= ATT2)

➢ attribute equality: EQUI JOIN

➢ TETA JOIN possible with <, > , <=, > =

➢NOTATION : JOIN R [ ATT1 = ATT2] S with➢ *ATTi (attribute)

➢ATT2 defined upon the same domain as ATT1

103

Page 104: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

JOIN (Example)

➢What are the names of the pilots who insure a flight from Nice ?

➢PV = JOIN PILOT (PIL#= PIL#) FLIGHT

➢PV1 = SELECT PV (DC = 'Nice')

➢RES = PROJECT PV1 ( PILNAME)

104

Page 105: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Division (Guess)

Division: R1/R2

Dividend Divisor

105

Page 106: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DIVISION (Définition)

DEFINITION

➢Simple division to get it started : Binary dividend and UNARY(« all ») divisor withcommon attribute (defined on the same domain)

➢R (att1, att2)/ S (Att3) with

➢ Att2 and att3 defined on the same domain

➢ Att1 : the target attribute

➢ Att3 : representing « ALL…. »

➢Particular case of cartesian product : « Complementary of the divisor in the dividend whose cartesian product with the divisor is included in the dividend ! ☺ »

106

Page 107: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

DIVISION (Example) : What are the PILOT names who drive EVERY Airbus A300 ?

1. start with the Divisor (typically an unary relation ..) ➔ « Every.. »/ « ALL… »

<here : « every plane (P# primary key) corresponding to an A300 »> :

A1 = SELECT PLANE (PNAME = ‘A300’)

DS = Project A1 (P#)

2. Then build the Dividend (typically a binary relation with the target attribute and the « same » attribute of the divisor) ➔ <here (PILNAME, P#) >:

PF1 = Join PILOT (PIL# = PIL#) FLIGHT

DD = Project PF1 (PILNAME, P#)

RESULT = DD/ DS

107

Page 108: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Advances in data management withCodd’s relational data model

➢SIMPLICITY of DATA representation : TABLES

➢Non procedural query language : « SAT » (Set At a Time)

➢FUNCTIONS integration to have a good schema

The relational algebra is a PRE REQUESITE for SQL2 both for usage and implementation (optimization)

108

Page 109: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Three lessons from Codd’s contribution to data management and ..Computer Science !

« A 10-pages theoretical IBM research report in 1968 became a 10 billion US dollars Market 50 years later with a growing market share of 20% a year »

➢Theoretical model (SET THEORY) reduced to the maximum : Primary key, SAT operators with JOIN and FUNCTIONS (to support an international standard SQL)

➢Ease of implementation with well mastered access methods

➢An IBM research prototype in San Jose (SYSTEM-R) under the guidance of Jim Gray wassettled to determine the performance feasability of the relational data model of Ted Codd (same with INGRES at UC Berkeley with M.Stonebraker)

➢TEACHABILITY (for large dissemination of SQL developpers)

109

Page 110: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Exercice :

GOAL : Give a natural language interpretation of the JOIN operator :

The JOIN corresponds to a VERB…

➢Let us consider the two following queries in the relational algebra

➢What are the names of the pilots who are DRIVING an Airbus ‘A320’ ?

➢What are the names of the pilots who are LIVING in the location city of an Airbus A320 ?

Q1) Write them in the relational algebra

Q2) Imagine how the JOIN interpretation in natural language would enable to have a semantic control on the queries ?

110

Page 111: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Exercices :

Process the following queries in Codd ‘s relational algebra :

Q3 : What are the names of the planes departing from Nice ?

Q4 : What are the names of the planes driven by pilots living in Nice ?

Q5: What are the names of the planes located in the departure city of a flight proceeding to Nice ?

Q6: What are the names of the planes driven by ALL pilots living in Nice ?

Q7: What are the names of the planes driven by ALL pilots living in Nice except those living in the departure city of a flight proceeding to Nice ? (Use Q5 and Q6 with set operators)

111

Page 112: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Some books of reference on Data bases

In English :

➢ Chris Date « An Introduction to data base systems » (8th Edition), Addison Wesley<the reference book on data bases>

➢ E.F Codd (1990). « The Relational Model for Database Management » (Version 2). Addison Wesley Publishing Company. ISBN 0-201-14192-2. <Codd’s book>

➢ M.Stonebraker et al « Readings in data base systems » <The « red book »> 5th Edition 1998, Morgan Koffmann

➢ S.Abitboul et al « Foundation of data bases » Addison Wesley <data base theory>

In French :

➢ JL Hainaut « Bases de données (Concepts, applications et développement) », DUNOD, 4ième Edition, 2018

➢ G. Gardarin « Bases de Données » Eyrolles, Version gratuite sur georges.gardarin.free.fr

➢ S. Miranda « L’Art des Bases de données » (3 Tomes), EYROLLES

➢ S. Miranda « Bases de données : Architectures, modèles relationnels et objets, SQL3 et ODMG », DUNOD, 2002

112

Page 113: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Some books of reference on Big Data

In English➢Rajendra Akerkar (Ed) “Big Data Computing” CRC Press, 2014➢Jules Berman “Principles of Big Data” Morgan Kaufman, 2013➢Joe Celko “”A Complete guide to NO SQL » Elsevier 2014➢W.CHU Editor « Data mining and knowledge Discovery for big data » Springer 2014➢Dan Mc Creary, Ann Kelly « Making sense of NO SQL » Manning 2014➢F.Provost, T Fawcell « DATA SCIENCE for Business » O’Reilly 2013➢Jordan Tigani, Siddartha Naidi « Google Bigquery Analytics » WILEY, 2014 (510 pages)➢Mike Stonebraker, “New SQL: An Alternative to NoSQL and Old SQL for New OLTP Apps »

ACM, June 2011

In French➢R.Bruchez « Les bases de données NO SQL et le Big Data », Eyrolles 2015➢I.lemberger et al « « Big data et machine learning”, Dunod 2016➢C.Azencott “Introduction au Machine learning » Dunod 2018➢G.Grolemund « R pour les data science », Eyrolles 2017

113

Page 114: Copyright Big Data Pr Serge Miranda, MBDS, Univ de …mbds-fr.org/wp-content/uploads/2019/03/lectures/l2.pdf MBDS course : From data bases to big data (7 lectures) Professor Serge

Copyright Big Data Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis (UCA)

Short seminar on DATA BASE STORAGE and ACCESS

« Every DATA (of computing variables) has an address

and an access method»

114