Entity Relationships in a Document Database at ZendCon 2012

Post on 05-Dec-2014

1.075 views 1 download

description

Unlike relational databases, document databases like CouchDB and MongoDB do not directly support entity relationships. This talk will explore patterns of modeling one-to-many and many-to-many entity relationships in a document database. These patterns include using an embedded JSON array, relating documents using identifiers, using a list of keys, and using relationship documents. This talk will explore how these entity relationship patterns equate to how entities are joined in a relational database. We'll take a look at the relevant differences between document databases and relational databases. For example, document databases do not have tables, each document can have its own schema, there is no built-in concept of relationships between documents, views/indexes are queried directly instead of being used to optimize more generalized queries, a column within a result set can contain a mix of logical data types, and there is typically no support for transactions across document boundaries.

Transcript of Entity Relationships in a Document Database at ZendCon 2012

Entity Relationships in a Document Database

MapReduce Views for SQL Users

You’re using a relational database, but have been relying heavily on denormalization to optimize read performance

You would like to give up consistency in exchange for a high level of concurrency

Your data model is a “!t” for documents (e.g. a CMS)

When to Choose aDocument Database

When Not to Choose aDocument DatabaseYour data !ts better in a relational model—SQL is a powerful and mature language for working with relational data sets

Consistency is critical to your application

You haven’t bothered exploring scalability options for your current database

Incremental Map/Reduce

"How fucked is my NoSQL database?" howfuckedismydatabase.com. 2009. http://howfuckedismydatabase.com/nosql/ (24 October 2012).

Entity RelationshipModel

Join vs. Collation

SELECT `publisher`.`id`, `publisher`.`name`, `book`.`title`FROM `publisher`FULL OUTER JOIN `book` ON `publisher`.`id` = `book`.`publisher_id`ORDER BY `publisher`.`id`, `book`.`title`;

SQL Query Joining Publishers and Books

Joined Result Set

publisher.id publisher.name book.title

oreilly O'Reilly MediaBuilding iPhone Apps with HTML, CSS, and JavaScript

oreilly O'Reilly MediaCouchDB: The Definitive

Guide

oreilly O'Reilly MediaDocBook: The Definitive

Guide

oreilly O'Reilly Media RESTful Web Services

Publisher (“left”) Book “right”

Collated Result Set

key id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"

["oreilly",1] "oreilly" "DocBook: The Definitive Guide"

["oreilly",1] "oreilly" "RESTful Web Services"

Publisher

Books

View Result SetsMade up of columns and rows

Every row has the same three columns:• key• id• value

Columns can contain a mixture of logical data types

One to Many Relationships

Embedded Entities: Nest related entities within a document

A single document represents the “one” entity

Nested entities (JSON Array) represents the “many” entities

Simplest way to create a one to many relationship

Embedded Entities

Example: Publisher with Nested Books{ "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media", "books":[ { "title":"CouchDB: The Definitive Guide" }, { "title":"RESTful Web Services" }, { "title":"DocBook: The Definitive Guide" }, { "title":"Building iPhone Apps with HTML, CSS, and JavaScript" } ]}

function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], doc.books[i].title); } }}

Map Function

Result Setkey id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "oreilly" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "oreilly" "CouchDB: The Definitive Guide"

["oreilly",1] "oreilly" "DocBook: The Definitive Guide"

["oreilly",1] "oreilly" "RESTful Web Services"

Only works if there aren’t a large number of related entities:• Too many nested entities can result in very large documents• Slow to transfer between client and server• Unwieldy to modify• Time-consuming to index

Limitations

Related Documents: Reference an entity by its identi!er

A document representing the “one” entity

Separate documents for each “many” entity

Each “many” entity references its related “one” entity by the “one” entity’s document identi!er

Makes for smaller documents

Reduces the probability of document update con"icts

Related Documents

Example: Publisher

{ "_id":"oreilly", "collection":"publisher", "name":"O'Reilly Media"}

Example: Related Book

{ "_id":"9780596155896", "collection":"book", "title":"CouchDB: The Definitive Guide", "publisher":"oreilly"}

Map Function

function(doc) { if ("publisher" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { emit([doc.publisher, 1], doc.title); }}

Result Set

key id value

["oreilly",0] "oreilly" "O'Reilly Media"

["oreilly",1] "9780596155896" "CouchDB: The Definitive Guide"

["oreilly",1] "9780596529260" "RESTful Web Services"

["oreilly",1] "9780596805791" "Building iPhone Apps with HTML, CSS, and JavaScript"

["oreilly",1] "9781565925809" "DocBook: The Definitive Guide"

When retrieving the entity on the “right” side of the relationship, one cannot include any data from the entity on the “left” side of the relationship without the use of an additional query

Only works for one to many relationships

Limitations

Many to Many Relationships

List of Keys: Reference entities by their identi!ers

A document representing each “many” entity on the “left” side of the relationship

Separate documents for each “many” entity on the “right” side of the relationship

Each “many” entity on the “right” side of the relationship maintains a list of document identi!ers for its related “many” entities on the “left” side of the relationship

List of Keys

Books and Related Authors

Example: Book

{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}

Example: Book

{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}

Example: Book

{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}

Example: Author

{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner", "books":[ "9781565925809" ]}

Example: Author

{ "_id":"walsh", "collection":"author", "name":"Norman Walsh", "books":[ "9780596805029", "9781565925809", "9781565920514" ]}

Map Function

function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("author" == doc.collection) { for (var i in doc.books) { emit([doc.books[i], 1], doc.name); } }}

Result Setkey id value

["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"

["9780596805029",1] "walsh" "Norman Walsh"

["9781565920514",0] "9781565920514" "Making TeX Work"

["9781565920514",1] "walsh" "Norman Walsh"

["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"

["9781565925809",1] "muellner" "Leonard Muellner"

["9781565925809",1] "walsh" "Norman Walsh"

Authors and Related Books

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); for (var i in doc.books) { emit([doc._id, 1], {"_id":doc.books[i]}); } }}

Map Function

Result Setkey id value

["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "muellner" {"_id":"9781565925809"}

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "walsh" {"_id":"9780596805029"}

["walsh",1] "walsh" {"_id":"9781565920514"}

["walsh",1] "walsh" {"_id":"9781565925809"}

Including Docs include_docs=true

key id value doc (truncated)

["muellner",0] "muellner" … {"name":"Leonard Muellner"}

["muellner",1] "muellner" … {"title":"DocBook: The Definitive Guide"}

["walsh",0] "walsh" … {"name":"Norman Walsh"}

["walsh",1] "walsh" … {"title":"DocBook 5: The Definitive Guide"}

["walsh",1] "walsh" … {"title":"Making TeX Work"}

["walsh",1] "walsh" … {"title":"DocBook: The Definitive Guide"}

Or, we can reverse the references…

Example: Author

{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}

Example: Author

{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}

Example: Book

{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide", "authors":[ "walsh" ]}

Example: Book

{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work", "authors":[ "walsh" ]}

Example: Book

{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide", "authors":[ "muellner", "walsh" ]}

Map Function

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book" == doc.collection) { for (var i in doc.authors) { emit([doc.authors[i], 1], doc.title); } }}

Result Setkey id value

["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "9781565925809" "DocBook: The Definitive Guide"

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "9780596805029" "DocBook 5: The Definitive Guide"

["walsh",1] "9781565920514" "Making TeX Work"

["walsh",1] "9781565925809" "DocBook: The Definitive Guide"

Queries from the “right” side of the relationship cannot include any data from entities on the “left” side of the relationship (without the use of include_docs)

A document representing an entity with lots of relationships could become quite large

Limitations

Relationship Documents: Create a document to represent each individual relationship

A document representing each “many” entity on the “left” side of the relationship

Separate documents for each “many” entity on the “right” side of the relationship

Neither the “left” nor “right” side of the relationship contain any direct references to each other

For each distinct relationship, a separate document includes the document identi!ers for both the “left” and “right” sides of the relationship

Relationship Documents

Example: Book

{ "_id":"9780596805029", "collection":"book", "title":"DocBook 5: The Definitive Guide"}

Example: Book

{ "_id":"9781565920514", "collection":"book", "title":"Making TeX Work"}

Example: Book

{ "_id":"9781565925809", "collection":"book", "title":"DocBook: The Definitive Guide"}

Example: Author

{ "_id":"muellner", "collection":"author", "name":"Leonard Muellner"}

Example: Author

{ "_id":"walsh", "collection":"author", "name":"Norman Walsh"}

Example: Relationship Document{ "_id":"44005f2c", "collection":"book-author", "book":"9780596805029", "author":"walsh"}

Example: Relationship Document{ "_id":"44005f72", "collection":"book-author", "book":"9781565920514", "author":"walsh"}

Example: Relationship Document{ "_id":"44006720", "collection":"book-author", "book":"9781565925809", "author":"muellner"}

Example: Relationship Document{ "_id":"44006b0d", "collection":"book-author", "book":"9781565925809", "author":"walsh"}

Books and Related Authors

function(doc) { if ("book" == doc.collection) { emit([doc._id, 0], doc.title); } if ("book-author" == doc.collection) { emit([doc.book, 1], {"_id":doc.author}); }}

Map Function

Result Setkey id value

["9780596805029",0] "9780596805029" "DocBook 5: The Definitive Guide"

["9780596805029",1] "44005f2c" {"_id":"walsh"}

["9781565920514",0] "9781565920514" "Making TeX Work"

["9781565920514",1] "44005f72" {"_id":"walsh"}

["9781565925809",0] "9781565925809" "DocBook: The Definitive Guide"

["9781565925809",1] "44006720" {"_id":"muellner"}

["9781565925809",1] "44006b0d" {"_id":"walsh"}

Including Docs include_docs=true

key id value doc (truncated)

["9780596805029",0] … … {"title":"DocBook 5: The Definitive Guide"}

["9780596805029",1] … … {"name":"Norman Walsh"}

["9781565920514",0] … … {"title":"Making TeX Work"}

["9781565920514",1] … … {"author","name":"Norman Walsh"}

["9781565925809",0] … … {"title":"DocBook: The Definitive Guide"}

["9781565925809",1] … … {"name":"Leonard Muellner"}

["9781565925809",1] … … {"name":"Norman Walsh"}

Authors and Related Books

function(doc) { if ("author" == doc.collection) { emit([doc._id, 0], doc.name); } if ("book-author" == doc.collection) { emit([doc.author, 1], {"_id":doc.book}); }}

Map Function

Result Setkey id value

["muellner",0] "muellner" "Leonard Muellner"

["muellner",1] "44006720" {"_id":"9781565925809"}

["walsh",0] "walsh" "Norman Walsh"

["walsh",1] "44005f2c" {"_id":"9780596805029"}

["walsh",1] "44005f72" {"_id":"9781565920514"}

["walsh",1] "44006b0d" {"_id":"9781565925809"}

Including Docs include_docs=true

key id value doc (truncated)

["muellner",0] … … {"name":"Leonard Muellner"}

["muellner",1] … … {"title":"DocBook: The Definitive Guide"}

["walsh",0] … … {"name":"Norman Walsh"}

["walsh",1] … … {"title":"DocBook 5: The Definitive Guide"}

["walsh",1] … … {"title":"Making TeX Work"}

["walsh",1] … … {"title":"DocBook: The Definitive Guide"}

Queries can only contain data from the “left” or “right” side of the relationship (without the use of include_docs)

Maintaining relationship documents may require more work

Limitations

Doctrine’s Object-Document Mapper (ODM)

Doctrine CouchDB[1]

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/

Includes a CouchDB client library and ODM

Maps documents using Doctrine’s persistence semantics

Maps CouchDB views to PHP objects

Document con"ict resolution support

Includes a write-behind feature for increased performance

Features

De!ning an Entity[1]

/** @Document */class BlogPost{ /** @Id */ private $id; /** @Field(type="string") */ private $headline; /** @Field(type="string") */ private $text; /** @Field(type="datetime") */ private $publishDate; // getter/setter here}

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture

Persisting an Entity[1]

$blogPost = new BlogPost();$blogPost->setHeadline("Hello World!");$blogPost->setText("This is a blog post going to be saved into CouchDB");$blogPost->setPublishDate(new \DateTime("now"));$dm->persist($blogPost);$dm->flush();

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#architecture

Querying an Entity[1]

// $dm is an instance of Doctrine\ODM\CouchDB\DocumentManager$blogPost = $dm->find("MyApp\Document\BlogPost", $theUUID);

1. http://docs.doctrine-project.org/projects/doctrine-couchdb/en/latest/reference/introduction.html#querying

Doctrine MongoDB ODM [1]

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/

FeaturesMaps documents using Doctrine’s persistence semantics

Map embedded documents

Map referenced documents

Uses batch inserts

Performs atomic updates

De!ning Entities[1]

/** @MappedSuperclass */abstract class BaseEmployee{ /** @Id */ private $id;

/** @EmbedOne(targetDocument="Address") */ private $address;

// ...}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

De!ning Entities[1]

/** @Document */class Employee extends BaseEmployee{ /** @ReferenceOne(targetDocument="Documents\Manager") */ private $manager;

// ...}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

De!ning Entities[1]

/** @Document */class Manager extends BaseEmployee{ /** @ReferenceMany(targetDocument="Documents\Project") */ private $projects = array();

// ...}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

De!ning Entities[1]

/** @EmbeddedDocument */class Address{ /** @String */ private $address; /** @String */ private $city; /** @String */ private $state; /** @String */ private $zipcode; // ...}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

De!ning Entities[1]

/** @Document */class Project{ /** @Id */ private $id; /** @String */ private $name; public function __construct($name) { $this->name = $name; } // ...}

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Persisting Entities[1]

$employee = new Employee();$employee->setName('Employee');$employee->setSalary(50000.00);$employee->setStarted(new \DateTime());

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Persisting Entities[1]

$address = new Address();$address->setAddress('555 Doctrine Rd.');$address->setCity('Nashville');$address->setState('TN');$address->setZipcode('37209');$employee->setAddress($address);

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Persisting Entities[1]

$project = new Project('New Project');$manager = new Manager();$manager->setName('Manager');$manager->setSalary(100000.00);$manager->setStarted(new \DateTime());$manager->addProject($project);

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Persisting Entities[1]

// $dm is an instance of Doctrine\ODM\MongoDB\DocumentManager$dm->persist($employee);$dm->persist($address);$dm->persist($project);$dm->persist($manager);$dm->flush();

1. http://docs.doctrine-project.org/projects/doctrine-mongodb-odm/en/latest/reference/introduction.html#features-overview

Querying an Entity

// $dm is an instance of Doctrine\ODM\MongoDB\DocumentManager$manager = $dm->find("Documents\Manager", $theID);

Final Thoughts

Document databases have no tables (and therefore no columns)

Indexes (views) are queried directly, instead of being used to optimize more generalized queries

Result set columns can contain a mix of logical data types

No built-in concept of relationships between documents

Related entities can be embedded in a document, referenced from a document, or both

Document Databases Compared to Relational Databases

CaveatsNo referential integrity

No atomic transactions across document boundaries

Some patterns may involve denormalized (i.e. redundant) data

Data inconsistencies are inevitable (i.e. eventual consistency)

Consider the implications of replication—what may seem consistent with one database may not be consistent across nodes (e.g. referencing entities that don’t yet exist on the node)

Additional TechniquesUse the startkey and endkey parameters to retrieve one entity and its related entities:startkey=["9781565925809"]&endkey=["9781565925809",{}]

De!ne a reduce function and use grouping levels

Use UUIDs rather than natural keys for better performance

Use the bulk document API when writing Relationship Documents

When using the List of Keys or Relationship Documents patterns, denormalize data so that you can have data from the “right” and “left” side of the relationship within your query results

Cheat Sheet

One to Many

Many to Many

<= N* Relations

> N* Relations

Embedded Entities

Related Documents

List of Keys Relationship Documents

✓ ✓

✓ ✓

✓ ✓

✓ ✓

* where N is a large number for your system

Thank You@BradleyHolt

http://bradley-holt.comhttps://joind.in/7040

Copyright © 2011-2012 Bradley Holt. All rights reserved.