Webinar: Strongly Typed Languages and Flexible Schemas
-
Upload
mongodb -
Category
Technology
-
view
1.309 -
download
1
Transcript of Webinar: Strongly Typed Languages and Flexible Schemas
Strongly Typed Languages and Flexible Schemas
3
Agenda
Strongly Typed Languages
Flexible Schema Databases
Change Management
Strategies
Tradeoffs
Strongly Typed Languages
"A programming language that requires a variable to be defined as well as the variable it is"
Flexible Schema Databases
7
Traditional RDMS
create table users (id int, firstname text, lastname text);
Table definition
Column structure
8
Traditional RDMS
Table with checks
create table cat_pictures(
id int not null,
size int not null,
picture blob not null,
user_id int,
primary key (id),
foreign key (user_id) references users(id));
Null checks
Foreign and Primary key checks
9
Traditional RDMS
users cat_pictures
1 N
10
Is this Flexible?
• What happens when we need to change the schema?– Add new fields– Add new relations– Change data types
• What happens when we need to scale out our data structure?
11
Flexible Schema Database
Document Graph Key Value
12
Flexible Schema
• No mandatory schema definition• No structure restrictions• No schema validation process
13
We start from code
public class CatPicture {
int size;byte[] blob;
}
public class User {
int id;String firstname;String lastname;
CatPicture[] cat_pictures;
}
14
Document Structure
{ _id: 1234, firstname: 'Juan', lastname: 'Olivo', cat_pictures: [ { size: 10, picture: BinData("0x133334299399299432"), } ]}
Rich Data Types
Embedded Documents
15
Flexible Schema Databases
• Challenges–Different Versions of Documents–Different Structures of Documents–Different Value Types for Fields in
Documents
16
Different Versions of Documents
Same document across time suffers changes on how it represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
First Version
Second Version
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData("0x133334299399299432")}]}
Third Version
17
Different Versions of Documents
Same document across time suffers changes on how it represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }
Different Structure
18
Different Structures of Documents
Different documents coexisting on the same collection
{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
Within same collection
19
Different Data Types for Fields
Different documents coexisting on the same collection
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}
{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}
{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}
Same field, different data type
Change Management
21
Change Management
Versioning Class Loading
How to set correct data format versioning?
What mechanisms are out there to make this work ?
Strategies
23
Strategies
• Decoupling Architectures• ODM'S• Versioning• Data Migrations
Decoupled Architectures
25
Strongly Coupled
26
Becomes a mess in your hair…
Coupled Architectures
DatabaseApplication A
Application C
Application B Let me perform some schema
changes!
Decoupled Architecture
DatabaseApplication A API
Application C
Application B
29
Decoupled Architectures
• Allows the business logic to evolve independently of the data layer
• Decouples the underlying storage / persistency option from the business service
• Changes are "requested" and not imposed across all applications
• Better versioning control of each request and it's mapping
ODM's
31
ODM
• Reduce impedance between code and Databases• Data management facilitator • Hides complexity of operators• Tries to decouple business complexity with "magic"
recipes
32
Spring Data
• POJO centric model• MongoTemplate || CrudRepository
extensions to make the connection to the repositories
• Uses annotations to override default field names and even data types (data type mapping)
public interface UserRepository extends MongoRepository<User, Integer>{
}
public class User {
@Idint id;
@Field("first_name")String firstname;String lastname;
33
Spring Data Document Structure
{ "_id": 1, "first_name": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}
34
Spring Data Considerations
• Data formats, versions and types still need to be managed
• Does not solve issues like type validation out-of-box• Can make things more complicated but more
"controllable"@Field("first_name")String firstname;
35
Morphia
• Data source centric• Will do all the discovery of POJO's for
given package• Also uses annotations to perform
overrides and deal with object mapping
@Entity("users")public class User {
@Idint id;String firstname;String lastname;
morphia.mapPackage("examples.odms.morphia.pojos");
Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example");datastore.save(user);
36
Morphia Document Structure
{ "_id": 1, "className": "examples.odms.morphia.pojos.User", "firstname": "first", "lastname": "last", "catpictures": [ { "size": 10, "blob": BinData(0, "Kr3AqmvV1R9TJQ==") }, ]}
Class Definition
37
Morphia Considerations
• Enables better control at Class loading• Also facilitates, like Spring Data, the field overriding (tags
to define field keys)• Better support for Object Polymorphism
Versioning
39
Versioning
Versioning of data structures (specially documents) can be very helpful
Recreate documents over time
Flow Control
Data / Field Multiversion Requirements
Archiving and History Purposes
40
Versioning – Option 0
Change existing document each time there is a write with monotonically increasing version number inside
{ "_id" : 174, "v" : 1, "firstname": "Juan" }
{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )
Increment field value
41
Versioning – Option 1
Store full document each time there is a write with monotonically increasing version number inside
{ "docId" : 174, "v" : 1, "firstname": "Juan" }
{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.insert( {"docId":174 …})
> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);
Find always latest version
42
Versioning – Option 2
Store all document versions inside a single document.
> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... }, {"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )
Current value
{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" }, "prev" : [ { "v" : 1, "attr1": 165 }, { "v" : 2, "attr1": 165, "attr2": "A-1" } ]}
Previous values
43
Versioning – Option 3
Keep collection for "current" version and past versions
> db.users.find( {"_id": 174 })
> db.users_past.find( {"pid": 174 })
{ "pid" : 174, "v" : 1, "firstname": "Juan" }
{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
Previous versions collection
Current collection
44
Versioning
Schema Fetch 1 Fetch Many Update Recover if Fail
0) Increment Version
Easy, Fast Fast Easy Medium N/A
1) New Document
Easy, Fast Not Easy, Slow
Medium Hard
2) Embedded in Single Doc
Easy, Fastest
Easy, Fastest Medium N/A
3) Separate Collection
Easy, Fastest
Easy, Fastest Medium Medium, Hard
Migrations
46
Migrations
Several types of "Migrations":
Add/Remove Fields
Change Field Names
Change Field Data Type
Extract Embedded Document into Collection
47
Add / Remove Fields
For Flexible Schema Database this is our Bread & Butter
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }
> db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })
48
Change Field Names
Again, programmatically you can do it
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
{ "_id" : 174, "first": "Juan", "last": "Olivo" }
> db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })
49
Change Field Data Type
Align to a new code change and move from Int to String
{..."bdate": 1435394461522} {..."bdate": "2015-06-27"}
1) Batch Process
2) Aggregation Framework
3) Change based on usage
50
Change Field Data Type1) Batch Process – bulk api
public void migrateBulk(){DateFormat df = new SimpleDateFormat("yyyy-MM-DD");...List<UpdateOneModel<Document>> toUpdate =
new ArrayList<UpdateOneModel<Document>>();for (Document doc : coll.find()){
String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));Document filter = new Document("_id", doc.getInteger("_id"));Document value = new Document("bdate", dateAsString);Document update = new Document("$set", value);
toUpdate.add(new UpdateOneModel<Document>(filter, update));}coll.bulkWrite(toUpdate);
51
Change Field Data Type1) Batch Process – bulk api
public void migrateBulk(){...for (Document doc : coll.find()){
...}coll.bulkWrite(toUpdate);
Is there any problem with this?
52
Change Field Data Type1) Batch Process – bulk api
public void migrateBulk(){...//bson type 16 represents int32 data typeDocument query = new Document("bdate", new Document("$type", "16"));for (Document doc : coll.find(query)){
...}
coll.bulkWrite(toUpdate);More efficient filtering!
53
Extract Document into CollectionNormalize your schema
{"size": 10, picture: BinData("0x133334299399299432")}{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
> db.users.aggregate( [ {$unwind: "$cat_pictures"}, {$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}}, {$out:"cats"}])
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures": [{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}]}
{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}
Tradeoffs
55
Tradeoffs
Positives Penalties
Decoupled Architecture - Should be your default approach
- Clean Solution - Scalable
N/A
Data Structures Variability - Reflects Nowadays data structures
- You can push decisions for later
- More complex code base
Data Structures Strictness - Simple to maintain- Always aligned with your
code base
- Will eventually need Migrations
- Restricts your code iterations
Recap
57
Recap
• Flexible and Dynamic Schemas are a great tool– Use them wisely – Make sure you understand the tradeoffs– Make sure you understand the different strategies and
options
• Works well with Strongly Typed Languages
58
Free Educationhttps://university.mongodb.com/courses/M101J/about
Obrigado!• Norberto Leite• Technical Evangelist• http://www.mongodb.com/norberto• [email protected]• @nleite