n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta...

31
Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 Application version n + … Application version n Schema Management Schema Management Schema Evolution Data Migration

Transcript of n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta...

Page 1: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Continuous Database Evolution

Prof. Dr. Uta Störl

Darmstadt University of Applied Sciences

February 2019

Application version n + …Application version n

Schema Management Schema ManagementSchema Evolution

Data Migration

Page 2: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Motivation

• Agile software development with frequent schema changes (weekly up to daily!) Schema-flexible NoSQL databases

• However, how to migrate variational data in the productive database?

– State of the art: Within the application code

Optional schema management for NoSQL database systems necessary!

Uta Störl, Darmstadt University of Applied Sciences

Application version n + …Application version n

2

Page 3: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Remark: NoSQL Database Are Schema-Free – Aren’t They?

Uta Störl, Darmstadt University of Applied Sciences 3

NoSQL DBMS without native

schema support

Couchbase, CouchDB, Neo4J, …

NoSQL DBMS with optional

schema support

MongoDB, OrientDB, ArangoDB, …

NoSQL DBMS with mandatory

schema

Cassandra, …

Page 4: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Schema Management for NoSQL Databases

• Forward Engineering

– Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 4

ApplicationVersion n + 1

Application Version n

Schema Version n

Schema Version n+1

Forward Engineering

• Reverse Engineering

– Schema Overview

– Data Exploration

Page 5: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Forward Engineering: Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 5

Create• JSON Schema• Proprietary schema formats (e.g. Mongoose for

MongoDB, Ottoman for Couchbase, …)

Page 6: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Schema Management for NoSQL Databases

• Forward Engineering

– Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 6

ApplicationVersion n + 1

Application Version n

Schema Version n

Schema Version n+1

Forward Engineering

• Reverse Engineering

– Schema Overview

– Data Exploration

Page 7: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Reverse Engineering: Schema Overview

Uta Störl, Darmstadt University of Applied Sciences 7

Page 8: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Reverse Engineering: Data Exploration

Uta Störl, Darmstadt University of Applied Sciences 8

Page 9: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Tools for NoSQL Schema Management (Selection)

Multi Data Store Tools

• Hackolade– Support for MongoDB, Couchbase, Elasticsearch, HBase, Cassandra & Datastax ,

DynamoDB, Cosmos DB, Avro, Hive, and Hbase

– Forward- and Reverse-Engineering (available in Professional edition only)

– https://hackolade.com/

• erwin DM NoSQL– Support for MongoDB and Couchbase

– Forward- and Reverse-Engineering (available in Professional edition only)

– https://erwin.com/products/erwin-dm-nosql/

Uta Störl, Darmstadt University of Applied Sciences 9

Page 10: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Tools for NoSQL Schema Management (Selection)

Single Data Store Tools (MongoDB)

• MongoDB Compass

– (Reverse-Engineering) available free of charge

– https://www.mongodb.com/products/compass

• Studio 3T

– (Reverse-Engineering) (available in Pro edition only)

– https://studio3t.com/

Uta Störl, Darmstadt University of Applied Sciences 10

Page 11: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Tools for NoSQL Schema Management (Selection)

Research Prototypes (Multi Data Store Tools)• NoSQL Data Engineering Project NoSQL DEP

– Support for MongoDB and CouchDB– Reverse Engineering– University of Murcia, Spain:

https://www.researchgate.net/project/NoSQL-Data-Engineering– Source: https://github.com/catedrasaes-umu/NoSQLDataEngineering/

• Darwin: Schema Management for NoSQL Database Systems– Support for MongoDB and Couchbase– Reverse Engineering– Darmstadt University of Applied Sciences, University of Rostock, OTH Regensburg,

Germany: https://fbi.h-da.de/personen/uta-stoerl/dfg-projekt-nosql-schema-evolution/– https://www.researchgate.net/project/Darwin-Schema-Management-in-NoSQL-

Databases

Uta Störl, Darmstadt University of Applied Sciences 11

Page 12: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

NoSQL Schema Management: So Far, So Good

• We are able to

– Define schemas and validate data (Forward Engineering)

– Extract a schema overview and explore data (Reverse Engineering)

• However, what about the heterogeneous data (due to different application releases, for example) in the NoSQL database?

Uta Störl, Darmstadt University of Applied Sciences 12

???

Page 13: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Continuous Database Evolution

• (Optional) schema management for NoSQL databases

Two main tasks– Schema evolution management – Data migration

Uta Störl, Darmstadt University of Applied Sciences

Application version n + …Application version n

Schema Management Schema ManagementSchema Evolution

Data Migration

13

Page 14: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Approaches to Realize Data Migration

• Custom-coded Migration Scripts

– Error-prone and expensive

• Using Object-NoSQL-Mapper Annotations

– Easy to realize for simple evolution operations like add, delete, and rename (e.g. @AlsoLoad)

– More expensive for complex operations like split, merge, copy, and move (coding @PostLoad methods)

Uta Störl, Darmstadt University of Applied Sciences 14

Page 15: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Approaches to Realize Data Migration

• Forward Engineering

– Schema Creation

Uta Störl, Darmstadt University of Applied Sciences 15

ApplicationVersion n + 1

Application Version n

Schema Version n

Schema Version n+1

Forward Engineering

Evolution Operations

• Reverse Engineering

– Schema Overview

– Data Exploration

Advanced Reverse Engineering

– Schema Version Extraction

Page 16: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Example from Marine Biology

• JSON datasets for Species classification of the Baltic Sea and observation Protocols

Uta Störl, Darmstadt University of Applied Sciences

{"id": 124, "name": "Abra prismatica","ts": 3, "category": 141436}

{"id": 901, "time": "2017-07-21", "location": {"x":19.863285, "y":58.487952, "z":-1400},

"spec_id": 123, "ts": 4}

{"id": 123, "name": "Mya arenaria", "ts": 1}

{"id": 900, "time": "2017-07-21","location": {"x":19.863281, "y":58.487952, "z":-1400},

"spec_id": 123, "ts": 2},

entity type Species

entity type Protocols

{"id": 125, "name": "Abra alba", "ts": 5, "WoRMS": 141433}

{"id": 126, "name": "Abra aequalis", "ts": 7, "WoRMS": 293683}

{"id": 902, "time": "2017-07-23", "location": {"x":19.863281, "y":58.487961, "z":-1350},

"spec_id": 125, "ts": 6}

{"id": 903, "time": "2017-07-24", "location": {"x":19.863285, "y":58.487952, "z":-1400},

"spec_id": 126, "ts": 8, "WoRMS": 293683}

WoRMS: World Register of Marine Species

16

Page 17: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Short Excursion: Schema Version Extraction Step 1 - Building the Schema Version Graphs

Uta Störl, Darmstadt University of Applied Sciences

id[2,4,6,8]type: number

y[2,4,6,8]type: number

x[2,4,6,8]type: number

WoRMS[8]

type: number

location[2,4,6,8]type: object

spec_id[2,4,6,8]type: number

z[2,4,6,8]type: number

ts[2,4,6,8]type: number

Protocols[2,4,6,8]

time[2,4,6,8]type: number

name[1,3,5,7]type: string

ts[1,3,5,7]type: number

category[3]

type: number

Species[1,3,5,7]

id[1,3,5,7]type: number

WoRMS[5,7]

type: number

17

Page 18: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Short Excursion: Schema Version ExtractionStep 2 - Deriving Schema Evolution Operations

Uta Störl, Darmstadt University of Applied Sciences

id[2,4,6,8]type: number

y[2,4,6,8]type: number

x[2,4,6,8]type: number

WoRMS[8]

type: number

location[2,4,6,8]type: object

spec_id[2,4,6,8]type: number

z[2,4,6,8]type: number

ts[2,4,6,8]type: number

Protocols[2,4,6,8]

time[2,4,6,8]type: number

name[1,3,5,7]type: string

ts[1,3,5,7]type: number

category[3]

type: number

Species[1,3,5,7]

id[1,3,5,7]type: number

WoRMS[5,7]

type: number

2 3

4

3 4

add integer Species.category

rename Species.category to WoRMS

or

delete Species.category

add Species.WoRMS

add integer Protocols.WoRMS

or

copy Species.WoRMS to Protocols.WoRMS

where Species.id = Protocols.spec_id

2

3

4

18

Page 19: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Short Excursion: Schema Version ExtractionStep 3 - Resolving Ambiguities

• Interactively resolving ambiguous schema evolution operations:– alternative schema evolution operations– specifying join conditions for move or copy operations

• Open questions– Automated choice in case of ambiguities – Suggestion of meaningful join conditions

• Approach to a solution– Algorithm for deriving inclusion dependencies from NoSQL datasets proposed in [Klettke et al. 2017]

Uta Störl, Darmstadt University of Applied Sciences 19

Page 20: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Advanced Reverse Engineering: Schema Version Extraction

Uta Störl, Darmstadt University of Applied Sciences 20

Page 21: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Advanced Reverse Engineering: Schema Version Extraction

Uta Störl, Darmstadt University of Applied Sciences 21

Page 22: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Continuous Database Evolution

• (Optional) schema management for NoSQL databases

Two main tasks– Schema evolution management – Data migration as safe process based on schema evolution management

Uta Störl, Darmstadt University of Applied Sciences

Application version n + …Application version n

Schema Management Schema ManagementSchema Evolution

Data Migration

22

Page 23: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Basic Strategies of Data Migration

Uta Störl, Darmstadt University of Applied Sciences

Eager Migration

• after introduction of a new schema version, all

entities are migrated

Advantages:

+ all entities are in the current version

+ low latency (when entities are accessed)

Disadvantages:

even entities that are not in use are migrated

high number (and costs) of migration operations

Schema Evolution

Data Migration Data Migration

Lazy Migration

• evolution operations are stored,

• data migration is done on request

Advantages:

+ only entities that are in use are migrated

+ no unnecessary data migration operations

+ composition of operations is possible

Disadvantages:

entities in the NoSQL database in different versions

increased latency

schema Sv schema Sv+1

Schema Evolution

schema Sv schema Sv+1

Data Migration

23

Page 24: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

How to Reduce Costs of Data Migration?

• In case of large amount of datasets

– update operations even for cold data (that is not in use)

• In case of database as a service

– payable operations, monetary costs for all data migrations

• How to reduce costs of data migration?

Optimize Data Migration

Hybrid / Proactive Migration Approaches

• Predictive Migration

• Incremental Migration

Uta Störl, Darmstadt University of Applied Sciences 24

Page 25: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Proactive Migration Strategies

Uta Störl, Darmstadt University of Applied Sciences

Data Migration

Incremental Migration

• in some version, an eager migration is applied

Advantage:

+ composition of operations is possible

Disadvantage:

even entities that are not in use are migrated

Schema Evolution

schema Sv schema Sv+1

Data Migration

25

Predictive Migration

• Forecast function, which entities are accessed in near future (based on heuristics)

• Predictive migration of these entities

Advantages:

+ decreased average latency

+ reduced number of migration operations

Disadvantage:

additional migration operations in case of wrong predictions

Schema Evolution

Data Migration

schema Sv schema Sv+1

Page 26: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Tradeoffs in Choosing a Data Migration Strategy

Uta Störl, Darmstadt University of Applied Sciences 26

Page 27: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

MigCast: Choosing a Suitable Data Migration Strategy

Uta Störl, Darmstadt University of Applied Sciences 27

This work is supported by DFG 385808805

Page 28: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Continuous Database Evolution

• Forward Engineering– Schema Creation

• Reverse Engineering– Schema Overview– Data Exploration

• Advanced Reverse Engineering– Schema Version Extraction

• Data Migration– Suitable Data Migration Strategies

Uta Störl, Darmstadt University of Applied Sciences 28

Page 29: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Continuous Database Evolution – Tools (Selection)

• Forward Engineering– Schema Creation

• Reverse Engineering– Schema Overview– Data Exploration

• Advanced Reverse Eng.– Schema Version Extraction

• Data Migration– Suitable Data Migration Strategies

Uta Störl, Darmstadt University of Applied Sciences 29

Multi Data Store Tools Single Data Store Tools Research Prototypes

NoSQL DEP

NoSQL DEP

Page 30: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Further Reading

• M. Fowler: Schemaless Data Structures, 2013., https://martinfowler.com/articles/schemaless/

• P. Sadalage, M. Fowler: Evolutionary Database Design, 2016, https://martinfowler.com/articles/evodb.html

• U. Störl, D. Müller, M. Klettke, S. Scherzinger: Enabling Efficient Agile Software Development ofNoSQL-backed Applications, BTW, 2017, https://dl.gi.de/bitstream/handle/20.500.12116/667/paper49.pdf

• M. Klettke, H. Awolin, U. Störl, D. Müller, and S. Scherzinger. Uncovering the Evolution History of Data Lakes. SCDM 2017, https://ieeexplore.ieee.org/document/8258204

• A. H. Chillón, S. F. Morales, D. Sevilla, J. G. Molina: Exploring the Visualization of Schemas for Aggregate-Oriented NoSQL Databases. ER 2017, http://ceur-ws.org/Vol-1979/paper-11.pdf

• D. Sevilla: Discovery and Visualization of NoSQL Database Schemas, 2018, https://modeling-languages.com/discovery-and-visualization-of-nosql-database-schemas/

Uta Störl, Darmstadt University of Applied Sciences 30

Page 31: n t Continuous Database Evolution - Entwicklertag · Continuous Database Evolution Prof. Dr. Uta Störl Darmstadt University of Applied Sciences February 2019 n Y t olution t Migration.

Feedback

Uta Störl, Darmstadt University of Applied Sciences 31

[email protected]