MongoDB Concepts

download MongoDB Concepts

If you can't read please download the document

Transcript of MongoDB Concepts

Concepts of

Juan Antonio Roy CoutoTwitter: @juanroycoutoWebsite: www.juanroy.es

September 2014

Concepts

Contents

Why?

Characteristics

Who?

DB Ranking

Drivers

Shell

Utilities

Community

Terms

Schema design

Replication

Failover

Replica Set

Indexes

Sharding

Pre-splitting

Questions?

Apps

Internet of Things

Wearables

Smartcities

Cloud computing

Non structured data

Reduce costs and time to market

Horizontal scalability

Real time analytics

Better strategic decisions

Concepts

Why?

MongoDBFaster development

NoSQL surge debido a la globalizacin, se necesita una muy alta tasa de lectura y escritura, soportar gran cantidad de datos, mxima disponibilidad, peticiones,... Rendimiento Fiabilidad Escalabilidad Replica Set Sharding Clusters Auto balanceado de carga Disminucin de las labores tpicas de administracin de una base de datos (enumerar cules y por qu) Aumento en la velocidad de la puesta en produccin de un proceso al disminuir el tiempo del desarrollo de un producto

NoSQL significa No solo SQL En el momento en que el modelo relacional no es capaz de asumir las necesidades actuales de almacenamiento y procesado de la ingente cantidad de datos que hoy se genera (IoT, redes sociales,...) Hoy los datos que se generan son multidisciplinares, no siguen un esquema fijo

Concepts

Who provides MongoDB in the cloud?http://www.mongodb.com/partners/list

Who is using MongoDB?http://www.mongodb.com/who-uses-mongodb

Who?

MongoDB no pretende que nadie cambie su base de datos si esta le ofrece un rendimiento y fiabilidad con la que est satisfecho. Sin embargo, s basa su esfuerzo en las pequeas empresas o startups que abordan nuevos proyectos. Tambin en aquellas empresas, de cualquier tamao, que quieren o necesitan mejorar el rendimiento de una aplicacin en marcha.

BBVA, Telefnica, Santander, ...

Concepts

DB Rankinghttp://db-engines.com/en/ranking

Por que es la base de datos no relacional lder del mercado

Concepts

Community

8 Million +Downloads

200k+Education Registrations

30k+MongoDB User Group Members

Concepts

Drivershttp://docs.mongodb.org/ecosystem/drivers/

MongoDB

Driver

C

C++

C#

Java

Node.js

Perl

PHP

Python

Ruby

Scala

App

Concepts

Characteristicshttp://www.mongodbspain.com/en/2014/08/17/mongodb-characteristics-future/

General purpose NoSQL database

Native replication

Document oriented (stores data as documents in BSON Binary JSON)

Auto sharding & load balancing

Schemaless (dynamic schema)

Security

Open source

Automatic failover

High availability (replica sets)

JSON objects

Horizontal scalability (commodity servers)

MMS (continuous monitoring in the cloud)

Aggregation framework

Geospatial queries

Map Reduce

In-memory performance

Hadoop connector (for processing large volumes of data in batch)

ACID compliant at the document level

Open-source db used by companies of all sizes, across all industries and for a wide variety of applications. It is an agile database that allows schemas to change quickly as applications evolve, while still providing the functionality developers expect from traditional databases, such as secondary indexes, a full query language and strict consistency.

MongoDB is built for scalability, performance and high availability, scaling from single server deployments to large, complex multi-site architectures. By leveraging in-memory computing, MongoDB provides high performance for both reads and writes. MongoDBs native replication and automated failover enable enterprise-grade reliability and operational flexibility.

Horizontal Scalability. As the data volume and throughput grow, developers can take advantage of commodity hardware and cloud infrastructure to increase the capacity of the MongoDB system.High Availability. Multiple copies of data are maintained with native replication. Automatic failover to secondary nodes, racks and data centers makes it possible to achieve enterprise- grade uptime without custom code and complicated tuning

In-Memory Performance. Data is read and written to RAM while also persisted to disk for durability, providing fast performance and eliminating the need for a separate caching layer.Aggregation - Batch processing of data and aggregate calculationsJavaScript execution - Ability to store JavaScript functions on the server

Concepts

Advanced characteristics

Chunk 1

Chunk 2

Chunk 3

GridFS

TTL (special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time)

Capped collections

Index intersection

...

Es una base de datos generalista, no se enfoca en hacer bien una cosa, como podra ser el caso de las clave:valor que son las que ofrecen la velocidad de respuesta ms elevada del mercado. Su objetivo es abarcar lo ms posible y, por tanto, ofrece todas, o casi todas, las caractersticas de las bases de datos relacionales y las ventajas de las no relacionales, como pueden ser: schemaless, rendimiento,...

All mapReduce functions are native for both MongoDB are JavaScript and run on the database nodes.

Concepts

Shell

MongoDB

Administrative tasks

Full featured

Javascript interpreter

Standalone MongoDB client

Allows interaction with a MongoDB instance from the command line

Concepts

Utilities

MongoDB tools for backup:

MongoDB tools for tracking instances:

mongoexportUtility that generates a JSON or CSV file of data from a MongoDB instance

mongoimportImports content from a JSON, CSV or TSV export

mongodumpUtility for creating a binary export

mongorestoreWrites data to a MongoDB instance from a binary file

mongoexportUtility that generates a JSON or CSV file of data from a MongoDB instance

mongoimportImports content from a JSON, CSV or TSV export

mongodumpUtility for creating a binary export

mongorestoreWrites data to a MongoDB instance from a binary file

mongostatProvides a quick overview of the status of a running mongod or mongos instance

mongotopProvides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level. By default, mongotop returns values every second

Adems de estas herramientas existen otras tcnicas para hacer backup, como puede ser a travs de una simple copia de los ficheros

Concepts

Basic terms to know

MongoDBSQL

databasedatabase

collectiontable

documentrow

fieldcolumn

embeddingjoin

MongoDB ha sido diseada para que sea rpida (no joins but embedded documents)

Geospatial indexes

MongoDB has two types of indexes for supporting geographical queries.2d indexes: for calculations on a flat surface

2dsphere indexes: for calculations on a earth-like sphere

Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.

For supporting geospatial queries (2d and 2dsphere)

Concepts

SQL Schema Design

Customer keyFirst nameLast namePhone numberAddress keyCustomer keyStreetNumberLocationPostal CodePet keyCustomer keyTypeBreedNameAgeCustomersAddressesTablesPets

Concepts

MongoDB Schema Design

> db.customers.findOne(){"_id" : ObjectId("54131863041cd2e6181156ba"),"first_name" : "Peter","last_name" : "Keil","phone_number" : 619123456,"address" : {"street" : "C/Alcal","number" : 123,"location" : "Madrid","postal_code" : 12345},"pets" : [{"type" : "Dog","breed" : "Airedale Terrier","name" : "Linda","age" : 2},{"type" : "Dog","breed" : "Akita","name" : "Bruto","age" : 10}]}>

First nameLast namePhone numberStreetNumberLocationPostal CodeTypeBreedNameAgeTypeBreedNameAgeCustomer infoAddressesPetsCustomers collection

Concepts

Replication

PrimarySecondary 1Secondary 2Replica Set

High availability

Data safety

Read preference

Asynchronus

Single primary

Statement based

Master-slave

Automatic failover

Automatic node recovery

Failover:- Proceso desde que se cae el primario hasta que otro nodo asume su papel

Node recovery:- Rollback a todas las escrituras del primario que no llegaron a replicarse (si las haba).- Recepcin de todas las operaciones que se han hecho mientras ha estado cado.- Comienza a funcionar como secundario

Slave Delay:Tiempo de retraso hasta que un secundario se actualiza.Se utiliza en situaciones en las que se ha cometido un error (fat fingers) y se necesita volver atrs rpidamente sin tener que esperar a hacer un restore desde algn backup.

Concepts

Failover scenario

PrimarySecondary 1Secondary 2Replica Set

Secondary 2PrimarySecondary 1Replica Set

Primary goes down

New election (majority of the set)

Primary comes back (now as secondary)

The new primary assumes replication tasks

Concepts

Failover scenario with rollback

PrimarySecondary 1Secondary 2Replica Set

Secondary 2PrimarySecondary 1Replica Set

Rollback

Hard Disk

mongorestore

Concepts

Replica Set principles

Write is truly committed upon application at the majority of the set

Concepts

Replica Set: read preference

Reasons

Geography dispersed nodes

Separate a work load

Availability

Types

Primary

Primary preferred

Secondary

Secondary preferred

Nearest

Tags

Tags:Sirve para escoger los servidores con los que queremos hablar

Concepts

Sharding

SecondarySecondaryPrimaryShard 0

SecondarySecondaryPrimaryShard 1

SecondarySecondaryPrimaryShard 2

SecondarySecondaryPrimaryShard N-1

Config serverConfig serverConfig serverQuery routerQuery router...

ClientClientClient

CLUSTER

Los routers (mongos) enrutan las peticiones de los clientes al shard/s implicado

El cliente no sabe si la coleccin est particionada o no, ni en qu shard residen los datos que necesita. Por lo tanto, no hay que cambiar el cdigo de nuestra aplicacin

MongoDB leverages horizontal scalability effortlessly by using commodity computers

Sharding: concepts

Sharding concepts

Data are uniformely distributed across the shards using the shard key

Each shard allocates those documents that belongs to its own range

Sharding improves efficiency and, therefore, the performance because queries are routed only to the shards in where our data resides

Replica:High availability

Data safety

Disaster recovery

Sharding:Scale out

Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.

Sharding: metadata

Shard key:lastnameLowHighShard

Range 0MartnPrez0

Range 1PrezRodriguez1

The config servers allocates the config database which contains the cluster metadata

Metadata describes what is in the cluster, what is contained in the shards

It is a map of the data itself

Range-based partitioning

Sharding: chunks, split and migrate

ChunkSplitMigrate

Range data subsetRuns in backgroundRuns in background

Aproximately 1 chunk per 60MBWhen a chunk grows beyond 60MB it will be splitted in two equal chunksIt will move the chunks across the shards in order to achieve the balance

The MongoDB goal is to achieve a uniform data distribution across all the shards

MongoDB balances the number of chunks pers shard (nor documents nor bytes)

By default all collections belong to shard 0

An empty collection has only one chunk (shard 0)

1 chunk is about 60MB of data

Chunks > 60 MB split

Uniform data distribution across shards (chunks / shard)

Balancer decides when to migrate chunks and to which shard

Sharding: chunks, split and migrate (2)

Drivermongos

Shard 0

Shard 1chunk 0chunk 1chunk 0

App

Pre-splitting

Utilized in batch/bulk loads

Split and migration do not work

Metadata are not altered

Data are stored automatically in its shard

Shard 0

Shard 1

Shard 2

datamongosdatadata

Driver

App

Summary

Designed to be:Fast (no joins, in-memory performance),

Flexible (schemaless),

Scalable (horizontal vs vertical),

Easy to learn

Designed to:Reduce administrative tasks (replica set, sharding, disaster recovery)

With powerful:Analysis tools (aggregation framework, map reduce, hadoop connector),

Characteristics such as geospatial indexes, GridFS, etc.

PerformanceHorizontal scalability with commodity hardware Replica Set Sharding Clusters Auto load balancinghigh availabilityIn-memory performanceSchema lessFailoverData safetyDisaster recovery

Questions?

Any questions?

MongoDB ha sido diseada para que sea rpida (no joins but embedded documents), flexible (schema less), escalable (horizontal no vertical), para reducir al mnimo las labores de administracin (replica set, failover, sharding) y para que a los programadores les resulte divertida y rpida de aprender a utilizar y dotada de potentes herramientas de anlisis de datos (aggregation framework), geospatial indexes, GridFS, and so on.

MongoDB does not support multi-document transactions.

However, MongoDB does provide atomic operations on a single document. Often these document-level atomic operations are sufficient to solve problems that would require ACID transactions in a relational database. Relational databases might represent the same kind of data with multiple tables and rows, which would require transaction support to update the data atomically.

Concepts

Juan Antonio Roy CoutoEmail: [email protected]

September 2014

Thank you for your attention!