Webinar: Strongly Typed Languages and Flexible Schemas

Strongly Typed Languages and Flexible Schemas

3

Agenda

Strongly Typed Languages

Flexible Schema Databases

Change Management

Strategies

Tradeoffs

Strongly Typed Languages

"A programming language that requires a variable to be defined as well as the variable it is"

7

Traditional RDMS

create table users (id int, firstname text, lastname text);

Table definition

Column structure

8

Traditional RDMS

Table with checks

create table cat_pictures(

id int not null,

size int not null,

picture blob not null,

user_id int,

primary key (id),

foreign key (user_id) references users(id));

Null checks

Foreign and Primary key checks

9

Traditional RDMS

users cat_pictures

1 N

10

Is this Flexible?

• What happens when we need to change the schema?– Add new fields– Add new relations– Change data types

• What happens when we need to scale out our data structure?

11

Flexible Schema Database

Document Graph Key Value

12

Flexible Schema

• No mandatory schema definition• No structure restrictions• No schema validation process

13

We start from code

public class CatPicture {

int size;byte[] blob;

}

public class User {

int id;String firstname;String lastname;

CatPicture[] cat_pictures;

}

14

Document Structure

{ _id: 1234, firstname: 'Juan', lastname: 'Olivo', cat_pictures: [ { size: 10, picture: BinData("0x133334299399299432"), } ]}

Rich Data Types

Embedded Documents

15


• Challenges–Different Versions of Documents–Different Structures of Documents–Different Value Types for Fields in

Documents

16

Different Versions of Documents

Same document across time suffers changes on how it represents data

{ "_id" : 174, "firstname": "Juan" }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }

First Version

Second Version

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData("0x133334299399299432")}]}

Third Version

17

Different Versions of Documents

Same document across time suffers changes on how it represents data

{ "_id" : 174, "firstname": "Juan" }

{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }

Different Structure

18

Different Structures of Documents

Different documents coexisting on the same collection

{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }

Within same collection

19

Different Data Types for Fields

Different documents coexisting on the same collection

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}

{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}

{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}

Same field, different data type

Change Management

21

Change Management

Versioning Class Loading

How to set correct data format versioning?

What mechanisms are out there to make this work ?

Strategies

23

Strategies

• Decoupling Architectures• ODM'S• Versioning• Data Migrations

Decoupled Architectures

25

Strongly Coupled

26

Becomes a mess in your hair…

Coupled Architectures

DatabaseApplication A

Application C

Application B Let me perform some schema

changes!

Decoupled Architecture

DatabaseApplication A API

Application C

Application B

29

Decoupled Architectures

• Allows the business logic to evolve independently of the data layer

• Decouples the underlying storage / persistency option from the business service

• Changes are "requested" and not imposed across all applications

• Better versioning control of each request and it's mapping

31

ODM

• Reduce impedance between code and Databases• Data management facilitator • Hides complexity of operators• Tries to decouple business complexity with "magic"

recipes

32

Spring Data

• POJO centric model• MongoTemplate || CrudRepository

extensions to make the connection to the repositories

• Uses annotations to override default field names and even data types (data type mapping)

public interface UserRepository extends MongoRepository<User, Integer>{

}

public class User {

@Idint id;

@Field("first_name")String firstname;String lastname;

33

Spring Data Document Structure

{ "_id": 1, "first_name": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}

34

Spring Data Considerations

• Data formats, versions and types still need to be managed

• Does not solve issues like type validation out-of-box• Can make things more complicated but more

"controllable"@Field("first_name")String firstname;

35

Morphia

• Data source centric• Will do all the discovery of POJO's for

given package• Also uses annotations to perform

overrides and deal with object mapping

@Entity("users")public class User {

@Idint id;String firstname;String lastname;

morphia.mapPackage("examples.odms.morphia.pojos");

Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example");datastore.save(user);

36

Morphia Document Structure

{ "_id": 1, "className": "examples.odms.morphia.pojos.User", "firstname": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}

Class Definition

37

Morphia Considerations

• Enables better control at Class loading• Also facilitates, like Spring Data, the field overriding (tags

to define field keys)• Better support for Object Polymorphism

Versioning

39

Versioning

Versioning of data structures (specially documents) can be very helpful

Recreate documents over time

Flow Control

Data / Field Multiversion Requirements

Archiving and History Purposes

40

Versioning – Option 0

Change existing document each time there is a write with monotonically increasing version number inside

{ "_id" : 174, "v" : 1, "firstname": "Juan" }

{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )

Increment field value

41


Store full document each time there is a write with monotonically increasing version number inside

{ "docId" : 174, "v" : 1, "firstname": "Juan" }

{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

> db.users.insert( {"docId":174 …})

> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);

Find always latest version

42


Store all document versions inside a single document.

> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... }, {"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )

Current value

{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" }, "prev" : [ { "v" : 1, "attr1": 165 }, { "v" : 2, "attr1": 165, "attr2": "A-1" } ]}

Previous values

43


Keep collection for "current" version and past versions

> db.users.find( {"_id": 174 })

> db.users_past.find( {"pid": 174 })

{ "pid" : 174, "v" : 1, "firstname": "Juan" }

{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }

{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

Previous versions collection

Current collection

44

Versioning

Schema Fetch 1 Fetch Many Update Recover if Fail

0) Increment Version

Easy, Fast Fast Easy Medium N/A

1) New Document

Easy, Fast Not Easy, Slow

Medium Hard

2) Embedded in Single Doc

Easy, Fastest

Easy, Fastest Medium N/A

3) Separate Collection

Easy, Fastest

Easy, Fastest Medium Medium, Hard

Migrations

46

Migrations

Several types of "Migrations":

Add/Remove Fields

Change Field Names

Change Field Data Type

Extract Embedded Document into Collection

47

Add / Remove Fields

For Flexible Schema Database this is our Bread & Butter

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }

> db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })

48

Change Field Names

Again, programmatically you can do it

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}

{ "_id" : 174, "first": "Juan", "last": "Olivo" }

> db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })

49

Change Field Data Type

Align to a new code change and move from Int to String

{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}

1) Batch Process

2) Aggregation Framework

3) Change based on usage

50

Change Field Data Type1) Batch Process – bulk api

public void migrateBulk(){DateFormat df = new SimpleDateFormat("yyyy-MM-DD");...List<UpdateOneModel<Document>> toUpdate =

new ArrayList<UpdateOneModel<Document>>();for (Document doc : coll.find()){

String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));Document filter = new Document("_id", doc.getInteger("_id"));Document value = new Document("bdate", dateAsString);Document update = new Document("$set", value);

toUpdate.add(new UpdateOneModel<Document>(filter, update));}coll.bulkWrite(toUpdate);

51


public void migrateBulk(){...for (Document doc : coll.find()){

...}coll.bulkWrite(toUpdate);

Is there any problem with this?

52


public void migrateBulk(){...//bson type 16 represents int32 data typeDocument query = new Document("bdate", new Document("$type", "16"));for (Document doc : coll.find(query)){

...}

coll.bulkWrite(toUpdate);More efficient filtering!

53

Extract Document into CollectionNormalize your schema

{"size": 10, picture: BinData("0x133334299399299432")}{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}

> db.users.aggregate( [ {$unwind: "$cat_pictures"}, {$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}}, {$out:"cats"}])

{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}]}

{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}

Tradeoffs

55

Tradeoffs

Positives Penalties

Decoupled Architecture - Should be your default approach

- Clean Solution - Scalable

N/A

Data Structures Variability - Reflects Nowadays data structures

- You can push decisions for later

- More complex code base

Data Structures Strictness - Simple to maintain- Always aligned with your

code base

- Will eventually need Migrations

- Restricts your code iterations

57

Recap

• Flexible and Dynamic Schemas are a great tool– Use them wisely – Make sure you understand the tradeoffs– Make sure you understand the different strategies and

options

• Works well with Strongly Typed Languages

58

Free Educationhttps://university.mongodb.com/courses/M101J/about

Obrigado!• Norberto Leite• Technical Evangelist• http://www.mongodb.com/norberto• [email protected]• @nleite

Webinar: Strongly Typed Languages and Flexible Schemas

Technology

Transcript of Webinar: Strongly Typed Languages and Flexible Schemas