.NET Database Technologies: Using NoSQL databases

30
.NET Database Technologies: Using NoSQL databases

description

.NET Database Technologies: Using NoSQL databases. NoSQL – “Not only SQL”. Alternatives to the ubiquitous relational database which may be superior in specific application scenarios Object-oriented databases (ODBMS) They came, they saw, they.... - PowerPoint PPT Presentation

Transcript of .NET Database Technologies: Using NoSQL databases

Page 1: .NET Database Technologies:  Using  NoSQL  databases

.NET Database Technologies: Using NoSQL databases

Page 2: .NET Database Technologies:  Using  NoSQL  databases

NoSQL – “Not only SQL”

• Alternatives to the ubiquitous relational database which may be superior in specific application scenarios

• Object-oriented databases (ODBMS)

They came, they saw, they....

...didn’t conquer, but they are still around

• NoSQL databases

The new kids on the block

General term applied to a range of different non-relational database systems

Largely emerging to meet the needs of large-scale Web 2.0 applications

Page 3: .NET Database Technologies:  Using  NoSQL  databases

Object-oriented databases

• ODBMSs use the same data model as object-oriented programming languages

no object-relational impedance mismatch due to a uniform model

• An object database combines the features of an object-oriented language and a DBMS (language binding)

treat data as objects

• object identity

• attributes and methods

• relationships between objects

extensible type hierarchy

• inheritance, overloading and overriding as well as customised types

Page 4: .NET Database Technologies:  Using  NoSQL  databases

ODBMS history

• Object Database Manifesto

Paper published in 1989 (Atkinson et. al)

• Some ODBMS products

Early 1990s: Gemstone, Objectivity

Late 1990s: Versant, ObjectStore, Poet , Matisse

2000s: db4o, Cache

• ODMG (Object Data Management Group)

1993: ODMG 1.0 standard

1997: ODMG 2.0

1999: ODMG 3.0, then ODMG disbanded

2005: ODMG reformed, working towards new standard

Page 5: .NET Database Technologies:  Using  NoSQL  databases

ODMG

• Object Database ManagementGroup (ODMG) founded in 1991

standardisation body including all majorODBMS vendors

• Define a standard to increase the portability across different ODBMS products

• Mirroring the SQL standard for RDBMS

Object Model

Object Definition Language (ODL)

Object Query Language (OQL)

language bindings

• C++, Smalltalk and Java bindings

Page 6: .NET Database Technologies:  Using  NoSQL  databases

Characteristics of ODBMS

• Support complex data models with no mapping issues

• Tight integration with an object-oriented programming language (persistent programming language)

• High performance in suitableapplication scenarios

• Different products scale fromsmall-footprint embedded db (db4o) to large-scale highly-concurrent systems (e.g. Versant V/OD)

Page 7: .NET Database Technologies:  Using  NoSQL  databases

Persistence patterns and ODBMS

• Some of Fowler’s patterns are specific to the use of a relational database, e.g.

Data Mapper

Foreign Key Mapping

Metadata Mapping

Single-table Inheritance, etc.

• Some are not specific to the data storage model and are relevant when using an ODBMS, e.g.

Identity Map

Unit of Work

Repository

Lazy-Loading

Page 8: .NET Database Technologies:  Using  NoSQL  databases

db4o

• Open-source object-database engine

Now owned by Versant

Complements their own V/OD product

• Can be used in embedded or client-server modes

Embed in application simply by including DLLs

• Native object database

Stores .NET (or Java) objects directly with no special requirements on classes

Other ODBMSs (e.g. V/OD) require classes to be marked as persistent through bytecode manipulation and also store class definitions

Tight integration with application, but trade-off in limited ad-hoc querying and reporting

Can replicate data to relational database if required

Page 9: .NET Database Technologies:  Using  NoSQL  databases

IObjectContainer

• IObjectContainer interface is implemented by objects which provide access to database

IObjectContainer is roughly equivalent to EF ObjectContext

Unit of Work pattern if transparent persistence is enabled (see later)

• Can access DB in embedded mode (direct file access) or client-server mode (local or remote)

IObjectServer instance required in client-server mode

• IObjectContainer instances created by factory classes, e.g. Db40Embedded

• Queries on IObjectContainer return IObjectSet (except LINQ queries)

Page 10: .NET Database Technologies:  Using  NoSQL  databases

Viewing data and ad-hoc querying

• ObjectManager Enterprise

Visual Studio plug-in

Browsing and drag-and-drop queries

• LINQPad

Need to include db4o DLLs and namespaces for stored classes

Executes LINQ queries and visualises results

Page 11: .NET Database Technologies:  Using  NoSQL  databases

db4o query APIs

• Query-by-example (QBE)

Very limited - no comparisons, ranges, etc.

• Simple Object Data Access (SODA)

Build query by navigating graph and adding constraints to nodes

• Native Queries

Expressed completely in programming language

Type-safe

Optimised to SODA query at runtime if possible

• LINQ

.NET version, not in Java (obviously)

Page 12: .NET Database Technologies:  Using  NoSQL  databases

Activation

• Objects are stored in DB as an object graph

• If db4o configured to cascade-on-activate (eager loading) then retrieving one object could potentially load a large number of related objects

• Fixed activation depth limits depth of traversal of graph when retrieving objects

Default value is 5

• Can then explicitly activate related objects when needed

• Lazy loading can be configured with transparent activation

• Classes need to be “instrumented” at load time by running Db4oTool.exe

Code injected into assembly so that classes implement IActivatable interface

Page 13: .NET Database Technologies:  Using  NoSQL  databases

Update depth

• Similar considerations apply to updates

• Storing an updated object could cause unnecessary updates to related objects

• Fixed update depth limits depth of traversal of graph when retrieving objects

Default value is 1

• Can configure transparent persistence which allows changes to be tracked

Only changed objects are updated in database

Behaves like change tracking in, for example, Entity Framework

Unit of Work

Page 14: .NET Database Technologies:  Using  NoSQL  databases

PI?

• Stores POCOs without any need for mapping, so yes

• Transparent Activation requires that classes implement a specific interface

• But this is done at build time so domain classes don’t need any specific code

• Has parallels with dynamic proxies in ORMs:

Classes are instances of domain classes, which have been modified ‘under the hood’ at build-time

Compare with dynamic proxy class which derive from domain classes and are created ‘under the hood’ at run-time

Page 15: .NET Database Technologies:  Using  NoSQL  databases

Further reading

• www.odbms.org

Resource portal

• Db4o Tutorial

included in product download

• The Definitive Guide to db4o (Apress)

Page 16: .NET Database Technologies:  Using  NoSQL  databases

NoSQL databases

• New breed of databases that are appearing largely in response to the limitations of existing relational databases

• Typically:

Support massive data storage (petabyte+)

Distribute storage and processing across multiple servers

• Contrast in architecture and priorities compared to relational databases

• Hence term NoSQL

• “Not only SQL” – absence of SQL is not a requirement

Page 17: .NET Database Technologies:  Using  NoSQL  databases

NoSQL features

• Wide variety of implementations, but some features are common to many of them:

• Schema-less

• Shared-nothing architecture

• Elasticity

• Sharding and asynchronous replication

• BASE, not ACID

Basically Available

Soft state

Eventually consistent

Page 18: .NET Database Technologies:  Using  NoSQL  databases

MapReduce

• Algorithm for dividing a work load into units suitable for parallel processing

• Useful for queries against large sets of data: the query can be distributed to 100’s or 1000’s of nodes, each of which works on a subset of the target data

• The results are then merged together, ultimately yielding a single “answer” to the original query

• Example: get total word count of a large number of documents

Map: calculate word count of each document

• Each node works on a subset of the overall data set

• Results emitted to intermediate storage

Reduce: calculate total of intermediate results

Page 19: .NET Database Technologies:  Using  NoSQL  databases

Brewer’s CAP theorem

• Can optimize for only two of three priorities in a distributed database:

• Consistency

All clients have same view of the data

Requires atomicity, transaction isolation

• Availability

Every request received by a non-failing node must result in a response

• Partition Tolerance

Partitions happen if certain nodes can’t communicate

No set of failures less than total network failure is allowed to cause the system to respond incorrectly

Page 20: .NET Database Technologies:  Using  NoSQL  databases

Implications of CAP theorem

• Any two properties can be achieved

• CP

If messages between nodes are lost then system waits

Possible that no response returned at all

No inconsistent data returned to client

• CA

No partitions, system will always respond and data is consistent

• AP

Response always returned even if some messages between nodes

Different nodes may have different views of the data

Page 21: .NET Database Technologies:  Using  NoSQL  databases

Implications of CAP theorem

• Choose a database whose priorities match the application

http://blog.nahurst.com/visual-guide-to-nosql-systems

Page 22: .NET Database Technologies:  Using  NoSQL  databases

Using a NoSQL database in a .NET application

• Application typically makes connection to remote cluster

• Some (but not many) NoSQL databases are supported by native .NET clients

Handle “mapping” from .NET objects to data model

• Many NoSQL databases are accessed through a REST interface

Application must construct request and handle response format, e.g. JSON

Application can be written in any suitable language

• Azure Table Storage is Microsoft’s NoSQL storage for cloud-based applications

• However the data is accessed, you need to understand the data model, which will be significantly different from a typical relational database or object model

Page 23: .NET Database Technologies:  Using  NoSQL  databases

NoSQL database types and examples

• Key/value Databases

These manage a simple value or row, indexed by a key

e.g. Voldemort, Vertica

• Big table Databases

“a sparse, distributed, persistent multidimensional sorted map”

e.g. Google BigTable, Azure Table Storage, Amazon SimpleDB

• Document Databases

Multi-field documents (or objects) with JSON access

e.g. MongoDB, RavenDB (.NET specific), CouchDB

• Graph Databases

Manage nodes, edges, and properties

e.g. Neo4j, sones

Page 24: .NET Database Technologies:  Using  NoSQL  databases

MongoDB

• Scalable, high-performance, open source, document-oriented database

• Stores JSON-style (actually BSON) documents with dynamic schema

• Replication, high-availability and auto-sharding

• Supports document-based queries and map/reduce

• Command line tools :

mongod – starts server as a service or daemon

mongo – client shell

• Store documents defined as JSON

• Retrieved documents form query displayed as JSON

Page 25: .NET Database Technologies:  Using  NoSQL  databases

MongoDB and HTTP

• Admin console at http://<server name>:28017

•REST interface on http://<server name>:28018

Enabled by starting server with mongod --rest

Server responds to RESTful HTTP requests, e.g.

• http://127.0.0.1:28017/company/Employee/?filter_Name=Fernando

Response is in JSON format

Could be consumed by client-side code in Ajax application

Page 26: .NET Database Technologies:  Using  NoSQL  databases

MongoDB .NET driver

• Can access documents as instances of Document class

• Represents document as key-value pairs

• Or, can serialize POCOs to database format (JSON)

• Deserialize database documents to POCOs

• Supports LINQ queries

• MapReduce queries can be expressed as LINQ queries

Page 27: .NET Database Technologies:  Using  NoSQL  databases

MongoDB schema design

•Collections are essentially named groupings of documents

Roughly equivalent to relational database tables

• Less "normalization" than a relational schema because there are no server-side joins

• Generally, you will want one database collection for each of your top level objects

Don’t want a collection for every "class" - instead, embed objects

relationalrelational documentdocument

Page 28: .NET Database Technologies:  Using  NoSQL  databases

Document example

• Save:

• Query:

http://www.10gen.com/video/mongosv2010/schemadesign

Page 29: .NET Database Technologies:  Using  NoSQL  databases

MongoDB in C# applications - PI?

• Up to a point

• Collection class needs Id property of a specific type (MongoDB.Oid)

• Object model needs to be designed with document schema in mind

Page 30: .NET Database Technologies:  Using  NoSQL  databases

Further reading

• http://nosql-database.org/

• http://www.nosqlpedia.com/

• http://www.mongodb.org/

• http://www.codeproject.com/KB/database/MongoDBCS.aspx

Nice code example for C# and MongoDB