Webinar: Schema Design

47
Schema Design Solutions Architect, MongoDB Jay Runkel #MongoDB

description

One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.

Transcript of Webinar: Schema Design

Page 1: Webinar: Schema Design

Schema Design

Solutions Architect, MongoDB

Jay Runkel

#MongoDB

Page 2: Webinar: Schema Design

First a story:

Once upon a time there was a medical records company…

Page 3: Webinar: Schema Design
Page 4: Webinar: Schema Design
Page 5: Webinar: Schema Design
Page 6: Webinar: Schema Design
Page 7: Webinar: Schema Design
Page 8: Webinar: Schema Design

• Schema Design Challenge

• Modeling Relationships in MongoDB

• An Example

• General Recommendations

Agenda

Page 9: Webinar: Schema Design

Schema Design Challenges

Page 10: Webinar: Schema Design

• Flexibility– Easily adapt to new requirements

• Agility– Rapid application development

• Scalability– Support large data and query volumes

Schema Design Challenge

Page 11: Webinar: Schema Design

• How do we model data and relationships to ensure:

–Flexibility

–Agility

–Scalability

Schema Design Challenge

Page 12: Webinar: Schema Design

Schema Design:

MongoDB vs. Relational

Page 13: Webinar: Schema Design

MongoDB Relational

Collections Tables

Documents Rows

Data Use Data Storage

What questions do I have?

What answers do I have?

MongoDB versus Relational

Page 14: Webinar: Schema Design

Attribute MongoDB Relational

Storage N-dimensional Two-dimensional

Field Values0, 1, many, or embed

Single value

QueryAny field or level

Any field

Schema Flexible Very structured

Updates In line In place

Page 15: Webinar: Schema Design

With relational, this is hard

Long development times

Inflexible

Doesn’t scale

Page 16: Webinar: Schema Design

Document model is much easier

Shorter development times

Flexible

Scalable

{ "patient_id": "1177099", "first_name": "John", "last_name": "Doe", "middle_initial": "A", "dob": "2000-01-25", "gender": "Male", "blood_type": "B+", "address": "123 Elm St., Chicago, IL 59923", "height": "66", "weight": "110", "allergies": ["Nuts", "Penicillin", "Pet Dander"], "current_medications": [{"name": "Zoloft", "dosage": "2mg", "frequency": "daily", "route": "orally"}], "complaint" : [{"entered": "2000-11-03", "onset": "2000-11-03", "prob_desc": "", "icd" : 250.00, "status" : "Active"}, {"entered": "2000-02-04", "onset": "2000-02-04", "prob_desc": "in spite of regular exercise, ...", "icd" : 401.9, "status" : "Active"}], "diagnosis" : [{"visit" : "2005-07-22" , "narrative" : "Fractured femur", "icd" : "9999", "priority" : "Primary"}, {"visit" : "2005-07-22" , "narrative" : "Type II Diabetes", "icd" : "250.00", "priority" : "Secondary"}]}

Page 17: Webinar: Schema Design

Modeling Entities and Relationships

Page 18: Webinar: Schema Design

Let’s model something together

How about a business card?

Page 19: Webinar: Schema Design

Business Card

Page 20: Webinar: Schema Design

Address Book Entity-Relationship

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

N

N

N

1

1

1

11

Twitters• name• location• web• bio

1

1

Page 21: Webinar: Schema Design

Modeling One-to-One Relationships

Page 22: Webinar: Schema Design

Referencing

Contact

• name• compan

y• title• phone

Address

• street• city• state• zip_cod

e

Use two collections with a reference

Similar to relational

Page 23: Webinar: Schema Design

Contact

• name• company• adress

• Street• City• State• Zip

• title• phone

• address• street• city• State• zip_cod

e

Embedding

Document Schema

Page 24: Webinar: Schema Design

Referencing

Contacts

{

“_id”: 2,

“name”: “Steven Jobs”,

“title”: “VP, New Product Development”,

“company”: “Apple Computer”,

“phone”: “408-996-1010”,

“address_id”: 1

}

Addresses

{“_id”: 1,“street”: “10260 Bandley Dr”,“city”: “Cupertino”,“state”: “CA”,“zip_code”: ”95014”,“country”: “USA”

}

Page 25: Webinar: Schema Design

EmbeddingContacts

{

“_id”: 2,

“name”: “Steven Jobs”,

“title”: “VP, New Product Development”,

“company”: “Apple Computer”,

“address”: {“street”: “10260 Bandley Dr”,

“city”: “Cupertino”,

“state”: “CA”,

“zip_code”: ”95014”,

“country”: “USA”},

“phone”: “408-996-1010”

}

Page 26: Webinar: Schema Design

How are they different? Why?

Contact

• name• compan

y• title• phone

Address

• street• city• state• zip_cod

e

Contact

• name• company• adress

• Street• City• State• Zip

• title• phone

• address• street• city• state• zip_cod

e

Page 27: Webinar: Schema Design

Schema Flexibility{

“name”: “Steven Jobs”,“title”: “VP, New Product

Development”,“company”: “Apple

Computer”,“address”: {

“street”: 10260 Bandley Dr”,

“city”: “Cupertino”,“state”: “CA”,“zip_code”:

“95014”},“phone”: “408-996-1010”

}

{“name”: “Larry Page,“url”: “http://google.com”,“title”: “CEO”,“company”: “Google!”,“address”: {

“street”: 555 Bryant, #106”,

“city”: “Palo Alto”,“state”: “CA”,“zip_code”:

“94301”},“phone”: “650-330-0100”“fax”: ”650-330-1499”

}

Page 28: Webinar: Schema Design

One to OneSchema Design Choices

contacttwitter_id

twitter1 1

contact twittercontact_id1 1

Redundant to track relationship

on both sides

May save a fetch?

Contacttwitter

twitter1

Page 29: Webinar: Schema Design

One to One: General Recommendations

• Embed– Full contact info all at once– Parent-child relationship “contains”– No additional data duplication– Can query or index on embedded field• e.g., “twitter.name”

• Exceptional cases…• Embedding results in large

documents

Contacttwitter

twitter 1

Page 30: Webinar: Schema Design

Modeling One-to-Many Relationships

Page 31: Webinar: Schema Design

One to ManySchema Design Choices

contactphone_ids: [ ]

phone1 N

contact phonecontact_id1 N

Redundant to track relationship

on both sides

Not possible in relational DBs

Contactphones

phoneN

Page 32: Webinar: Schema Design

One-to-many embedding vs. referencing

{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “[email protected]”, “address”: [{ “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301” }] “phones”: [{“type”: “Office”, “number”: “650-618-1499”}, {“type”: “fax”, “number”: “650-330-0100”}]}

{ “name”: “Larry Page”, “url”: “http://google.com/”, “title”: “CEO”, “company”: “Google!”, “email”: “[email protected]”, “address”: [“addr99”], “phones”: [“ph23”, “ph49”]}

{ “_id”: “addr99”, “street”: “555 Bryant, #106”, “city”: “Palo Alto”, “state”: “CA”, “zip_code”: “94301”}

{ “_id”: “ph23”, “type”: “Office”, “number”: “650-618-1499”},{ “_id”: “ph49”,

“type”: “fax”, “number”: “650-330-0100”}

Page 33: Webinar: Schema Design

One to ManyGeneral Recommendation

• Embed when possible– Full contact info all at once– Parent-children relationship “contains”– No additional data duplication– Can query or index on any field• e.g., { “phones.type”: “mobile” }

• Exceptional cases…• Scaling: maximum document size is 16MB

Contactphones

phoneN

Page 34: Webinar: Schema Design

Modeling Many-to-Many Relationships

Page 35: Webinar: Schema Design

Many to ManyTraditional Relational Association

Join table

Contactsnamecompanytitlephone

Groupsname

GroupContacts

group_idcontact_idX

Use arrays instead

Page 36: Webinar: Schema Design

Many to ManySchema Design Choices

groupcontact_ids: [ ]

contactN N

group contactgroup_ids: [ ]N N

Redundant to track relationship on both sides • Both references must be

updated for consistency

Redundant to track relationship on both sides • Duplicated data must be

updated for consistency

groupcontacts

contactN

contactgroups

group N

Page 37: Webinar: Schema Design

Many to ManyGeneral Recommendation

• Use case determines whether to reference or embed:

1. Simple address book• Contact references groups

2. Corporate email groups• Group embeds contacts for

performance

• Exceptional cases– Scaling: maximum document size is 16MB– Scaling may affect performance and

working set

group contactgroup_ids: [ ]N N

Page 38: Webinar: Schema Design

Address Book Entity-Relationship

Contacts• name• company• title

Addresses

• type• street• city• state• zip_code

Phones• type• number

Emails• type• address

Thumbnails

• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

N

N

N

1

1

1

11

Twitters• name• location• web• bio

1

1

Page 39: Webinar: Schema Design

Contacts• name• company• title

addresses• type• street• city• state• zip_code

phones• type• number

emails• type• address

thumbnail• mime_type• data

Portraits• mime_type• data

Groups• name

N

1

N

1

twitter• name• location• web• bio

N

N

N

1

1

Document model - holistic and efficient representation

Page 40: Webinar: Schema Design

Contact document example{

“name” : “Gary J. Murakami, Ph.D.”,

“company” : “MongoDB, Inc”,

“title” : “Lead Engineer and Ruby Evangelist”,

“twitter” : {

“name” : “GaryMurakami”, “location” : “New Providence, NJ”,

“web” : “http://www.nobell.org”

},

“portrait_id” : 1,

“addresses” : [

{ “type” : “work”, “street” : ”229 W 43rd St.”, “city” : “New York”, “zip_code” :

“10036” }

],

“phones” : [

{ “type” : “work”, “number” : “1-866-237-8815 x8015” }

],

“emails” : [

{ “type” : “work”, “address” : “[email protected]” },

{ “type” : “home”, “address” : “[email protected]” }

]

}

Page 41: Webinar: Schema Design

General Recommendations

Page 42: Webinar: Schema Design

Legacy Migration

1. Copy existing schema & some data to MongoDB

2. Iterative schema design development– Measure performance, find bottlenecks, and embed

1. one to one associations first2. one to many associations next3. many to many associations– eliminate join table using array of references or

embedded documents– Measure and analyze, review concerns, scaling

Page 43: Webinar: Schema Design

• Embed by default

New Software Application

Page 44: Webinar: Schema Design

Embedding over Referencing

• Embedding is a bit like pre-joined data– BSON (Binary JSON) document ops are easy for

the server

• Embed (90/10 following rule of thumb)– When the “one” or “many” objects are viewed in

the context of their parent– For performance– For atomicity

• Reference– When you need more scaling– For easy consistency with “many to many”

associations without duplicated data

Page 45: Webinar: Schema Design

It’s All About Your Application

• Programs+Databases = (Big) Data Applications

• Your schema is the impedance matcher– Design choices: normalize/denormalize,

reference/embed– Melds programming with MongoDB for best of

both– Flexible for development and change

• Programs×MongoDB = Great Big Data Applications

Page 46: Webinar: Schema Design

Questions?

Page 47: Webinar: Schema Design

Thank You

Solutions Architect, MongoDB

Jay [email protected]@jayrunkel

#MongoDB