Schema Agnostic Indexing with Azure...

15
Schema Agnostic Indexing with Azure DocumentDB DEEKSHA SINGH: 2641679 YASH THAKKAR: 2642764

Transcript of Schema Agnostic Indexing with Azure...

Page 1: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

Schema Agnostic Indexing with Azure DocumentDBDEEKSHA SINGH: 2641679

YASH THAKKAR: 2642764

Page 2: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

ABSTRACT

Azure DocumentDB is Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale.

Automatic indexing of documents without requiring a schema or secondary indices.

Operates within extremely frugal resource budget .

Page 3: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

OUTLINE

DocumentDB

DocumentDB Capabilities

Resource Model

System Topology

Design Goals

Schema Agnostic Indexing

Logical Index Organization

When to not to use and when to use DocumentDB

Page 4: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

INTRODUCTION

DocumentDB is based on the JSON data model and JavaScript language directly within its database engine.

The indexing subsystem needs to support:

Automatic indexing of documents

DocumentDB’s query language

Real time, consistent queries

Multi-tenancy under extremely frugal resource budgets

Predictable Performance guarantees

Page 5: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

DOCUMENTDB CAPABILITIES

DocumentDB query language supports rich relational & hierarchical queries.

By default, the database engine automatically indexes all documents without requiring schema or secondary indexes from developers.

Transactional execution of application logic.

DocumentDB offers well defined consistency levels for developers.

All machine and resource management is abstracted from users.

Page 6: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

RESOURCE MODEL

A tenant of DocumentDB starts by provisioning a database account.

A DocumentDB database manages a set of entities: users, permissions and collections-referred to as resources.

Collection is a schema–agnostic container of arbitrary user generated documents.

Developers can interact with resources.

Tenants can elastically scale a resource by simply creating new resources which get placed across resource partition.

Page 7: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

SYSTEM TOPOLOGY

Deployed worldwide across multiple Azure regions.

Managed and deployed on clusters of machines, each with dedicated local SSDs(to provide durability and high availability).

DocumentDB database engine consist of following components:

RSM for coordination

JavaScript language runtime

Query processor

Storage and indexing subsystems

Page 8: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

DESIGN GOALS FOR INDEXING

Automatic Indexing

Configurable storage/performance tradeoffs

Efficient, rich hierarchical and relational queries

Consistent queries in face of sustained volume of document

Multi-tenancy

Page 9: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

SCHEMA AGNOSTIC INDEXING

No Schema, No Problem!

Documents as Trees

Index as a Document

DocumentDB Queries

No assumptions about the documents and allows documents to vary in schema.

To blur the boundary between the schema of JSON documents and their instance values

• Every path in document tree is indexed.

• Each update of a document leads to update of the structure of index.

• Developers can query DocumentDB collections using queries written in SQL and JavaScript.

• DocumentDB Query IL

Page 10: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

QUERY IL

Designed to exploit JSON and JavaScript integration

Rooted in JavaScript type system

Follows JavaScript language semantics for expression evaluation & function invocation

Designed to be target o translation from multiple query language frontends

Page 11: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

LOGICAL INDEX ORGANIZATION

The index is the union of all documents and is also represented as a tree.

Each node of the index tree contains a list of document ids corresponding to the documents containing the given label.

Page 12: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

WHEN NOT TO USE DOCUMENTDB

Page 13: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

Consider Azure DocumentDB

When you need:

To build a new web and mobile cloud-based applications

Rapid development and high-scalability requirements

Query and processing of user and device generated data

To run a document store in virtual machines

A managed service model

Page 14: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

REFERENCES

AzureDocumentDB Documentation: http://azure.Microsoft.com

Javascript Object Notation: http://ietf.org

Google Cloud Datastore: http://cloud.google.com/datastore/

Page 15: Schema Agnostic Indexing with Azure DocumentDBcis.csuohio.edu/~sschung/CIS601/CIS601_Presentation...YASH THAKKAR: 2642764 ABSTRACT Azure DocumentDB is Microsoft’s multi-tenant distributed

QUESTIONS?