NoSql - Università degli Studi di Milano-Bicocca

30
NoSql

Transcript of NoSql - Università degli Studi di Milano-Bicocca

Page 1: NoSql - Università degli Studi di Milano-Bicocca

NoSql

Page 2: NoSql - Università degli Studi di Milano-Bicocca

• Definizione e ragioni del NoSql

• Modelli document based

• Modelli a grafo

• Modelli key/value

• Modelli wide column

• confronti

Indice

Page 3: NoSql - Università degli Studi di Milano-Bicocca

In principio fu…

http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

Page 4: NoSql - Università degli Studi di Milano-Bicocca

Storage!

Page 5: NoSql - Università degli Studi di Milano-Bicocca

• Relational model is very stricted– Closed world assumption– Minimization value

• RDBMS (the software able to manage the relationalmodel)– More than 35 years of R&D (security, optimization,

standardization)– ACID properties

• Very well know

• A large amount of data are still stored in DBMS– Porting data is a nightmare!

• For a large number of tasks is still the best option

Positive aspects of relational model

Page 6: NoSql - Università degli Studi di Milano-Bicocca

• ACID properties

• Atomic

• Consistence

• Isolation

• Durability

Positive aspect of RDBMS

Page 7: NoSql - Università degli Studi di Milano-Bicocca

• Relational model is very stricted

– Closed world assumption

– Minimization value

– One attribute →one value

– Not compatibile with modern programming language

– Not able to support loop in data (see later)

• RDBMS (the software able to manage the relational model)

– Hard to modify tables

– Not scalable

Limitation of relational model

Page 8: NoSql - Università degli Studi di Milano-Bicocca

Sviluppo di applicaizoni

Relational Database

Object Relational Mapping

Application

Code XML Config DB Schema

©Massimo Brignoli, Mongodb

Page 9: NoSql - Università degli Studi di Milano-Bicocca

And Even Harder To Iterate

New Table

New Table

New Column

Name Pet Phone Email

New Column

3 months later…

©Massimo Brignoli, Mongodb

Page 10: NoSql - Università degli Studi di Milano-Bicocca

• RDBMS are able to scale up easly, less scale out

– Scale up

– Scale out

Performance

Page 11: NoSql - Università degli Studi di Milano-Bicocca

• Fino a un limite…

Performance

Page 12: NoSql - Università degli Studi di Milano-Bicocca

• MultiValue databases at TRW in 1965.

• DBM is released by AT&T in 1979.

• Lotus Domino released in 1989.

• Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface.

• Graph database Neo4j is started in 2000.

• Google BigTable is started in 2004. Paper published in 2006.

• CouchDB is started in 2005.

• The research paper on Amazon Dynamo is released in 2007.

• The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009.

• Facebooks open sources the Cassandra project in 2008.

• Project Voldemort started in 2008.

• The term NoSQL was reintroduced in early 2009.

Storia

Page 13: NoSql - Università degli Studi di Milano-Bicocca

• Not only SQL

• Insieme di modelli di rappresentazione dei dati e relativi software di gestione

• Schema free (o schemaless)

• CAP theorem

• Base

– Basic Available, Soft state, Eventually consistency

NoSQL

Page 14: NoSql - Università degli Studi di Milano-Bicocca

• In the relational model usually

– First define the model (the set of attrbitues describe data and its relation)

– Then populated data

• If there is the need to add a new attribute (or change an existin one

– First modify the model, then (if possibile) change data

• In most NoSQL model there is no strict model; it isbased on the data you insert (see later)

• All NoSQL models assume the open word assumption

Schema free

Page 15: NoSql - Università degli Studi di Milano-Bicocca

CAP theorem

Page 16: NoSql - Università degli Studi di Milano-Bicocca

CAP theorem

Page 17: NoSql - Università degli Studi di Milano-Bicocca

• RDBMS are basically CA

– In some cases it is possibile to have a CAP basedRDBMS

• I NOSQL systems are mainly CP or AP

– CP-> data are coherent but the dbms cannot works 24/7

– AP -> sometime data cannot be consistent

CAP theorem

Page 18: NoSql - Università degli Studi di Milano-Bicocca

CAP theorem

https://www.mysoftkey.com/architecture/understanding-of-cap-theorem/

Page 19: NoSql - Università degli Studi di Milano-Bicocca

• Basic Availability: fulfill request, even in partial consistency.

• Soft State: abandon the consistency requirements of the ACID model

pretty much completely

• Eventual Consistency: at some point in the future, data will

converge to a consistent state; delayed consistency, as opposed to immediate consistency of the ACID properties.

– purely a liveness guarantee (reads eventually return the requested value); but

– does not make safety guarantees, i.e.,

– an eventually consistent system can return any value before it converges

BASE principle

Page 20: NoSql - Università degli Studi di Milano-Bicocca

• Key-Value Stores

• Column Family Stores

• Document Databases

• Graph Databases

• RDF databases as well as Tuple stores

Modelli NoSQL

Page 21: NoSql - Università degli Studi di Milano-Bicocca

• Dynamo, Voldemort, redis, riak...

– DeCandia et al. "Dynamo: Amazon’s Highly Available Key-value Store", 2007

• Key-Value sono tabelle di hash dove la chiavepunta a un particolare valore

• Il mapping chiave-valore è supportato damecchanismi di hash per massimizzare le performance

Key value

Page 22: NoSql - Università degli Studi di Milano-Bicocca

• BigTable, Cassandra, HBase,...

– Chang et al. "Bigtable: A Distributed Storage System for Structured Data", 2006

• La chiave punta colonne multiple

Wide column

Page 23: NoSql - Università degli Studi di Milano-Bicocca

• CouchDB, MongoDB,...

• I documenti sono indirizzati nel db tramite una chiaveunica

• Ricerca nei documenti

Document Based

Page 24: NoSql - Università degli Studi di Milano-Bicocca

• Neo4J, FlockDB, GraphBase, InfoGrip, ...

• Graph Databases sono costruiti da nodi e relazioni fra nodi (archi).

• I nodi hanno proprietà

– Nodes rappresentano entità (e.g. "Bob" or "Alice").

– Proprietà sono informazioni pertinenti ai nodi (e. g. età:18).

• I graph DBs non scalano bene

Graph store

Page 25: NoSql - Università degli Studi di Milano-Bicocca

Comparison

Page 26: NoSql - Università degli Studi di Milano-Bicocca

Comparazione

Page 27: NoSql - Università degli Studi di Milano-Bicocca

Volume and complexity

Page 28: NoSql - Università degli Studi di Milano-Bicocca

• In all data models connect data is a key issue with a great impact wrt perfomance/analysis

Data is singular or plural?

Page 29: NoSql - Università degli Studi di Milano-Bicocca

Data Model

Relational Document based

{

first_name: ‘Paul’,

surname: ‘Miller’,

city: ‘London’,

location:

[45.123,47.232],

cars: [

{ model: ‘Bentley’,

year: 1973,

value: 100000, … },

{ model: ‘Rolls Royce’,

year: 1965,

value : 330000, … }

]

}

Relations are included in data

Page 30: NoSql - Università degli Studi di Milano-Bicocca

ID Name Surname DateofBirth

1 Tom Hanks …

Model Comparison

Id Title Director

1 The Da Vinci Code 2

2 The Green Mile 3

3 That thing you do 1

..

Movie Actor

1 1

2 1

3 1

{ “Name":“Tom", “Surname":”Hanks”, “Works_on”: [

{“Title”:”The Da Vinci Code”, “role”:”Actor”},

{“Title”: “That thing you do”, “role”:[”Actor”,”Director”]}

]}