Post on 16-Apr-2017
Concepts of
Juan Antonio Roy CoutoTwitter: @juanroycoutoWebsite: www.juanroy.es
September 2014
Concepts
Contents
Why?
Characteristics
Who?
DB Ranking
Drivers
Shell
Utilities
Community
Terms
Schema design
Replication
Failover
Replica Set
Indexes
Sharding
Pre-splitting
Questions?
Apps
Internet of Things
Wearables
Smartcities
Cloud computing
Non structured data
Reduce costs and time to market
Horizontal scalability
Real time analytics
Better strategic decisions
Concepts
Why?
MongoDBFaster development
NoSQL surge debido a la globalizacin, se necesita una muy alta tasa de lectura y escritura, soportar gran cantidad de datos, mxima disponibilidad, peticiones,... Rendimiento Fiabilidad Escalabilidad Replica Set Sharding Clusters Auto balanceado de carga Disminucin de las labores tpicas de administracin de una base de datos (enumerar cules y por qu) Aumento en la velocidad de la puesta en produccin de un proceso al disminuir el tiempo del desarrollo de un producto
NoSQL significa No solo SQL En el momento en que el modelo relacional no es capaz de asumir las necesidades actuales de almacenamiento y procesado de la ingente cantidad de datos que hoy se genera (IoT, redes sociales,...) Hoy los datos que se generan son multidisciplinares, no siguen un esquema fijo
Concepts
Who provides MongoDB in the cloud?http://www.mongodb.com/partners/list
Who is using MongoDB?http://www.mongodb.com/who-uses-mongodb
Who?
MongoDB no pretende que nadie cambie su base de datos si esta le ofrece un rendimiento y fiabilidad con la que est satisfecho. Sin embargo, s basa su esfuerzo en las pequeas empresas o startups que abordan nuevos proyectos. Tambin en aquellas empresas, de cualquier tamao, que quieren o necesitan mejorar el rendimiento de una aplicacin en marcha.
BBVA, Telefnica, Santander, ...
Concepts
DB Rankinghttp://db-engines.com/en/ranking
Por que es la base de datos no relacional lder del mercado
Concepts
Community
8 Million +Downloads
200k+Education Registrations
30k+MongoDB User Group Members
Concepts
Drivershttp://docs.mongodb.org/ecosystem/drivers/
MongoDB
Driver
C
C++
C#
Java
Node.js
Perl
PHP
Python
Ruby
Scala
App
Concepts
Characteristicshttp://www.mongodbspain.com/en/2014/08/17/mongodb-characteristics-future/
General purpose NoSQL database
Native replication
Document oriented (stores data as documents in BSON Binary JSON)
Auto sharding & load balancing
Schemaless (dynamic schema)
Security
Open source
Automatic failover
High availability (replica sets)
JSON objects
Horizontal scalability (commodity servers)
MMS (continuous monitoring in the cloud)
Aggregation framework
Geospatial queries
Map Reduce
In-memory performance
Hadoop connector (for processing large volumes of data in batch)
ACID compliant at the document level
Open-source db used by companies of all sizes, across all industries and for a wide variety of applications. It is an agile database that allows schemas to change quickly as applications evolve, while still providing the functionality developers expect from traditional databases, such as secondary indexes, a full query language and strict consistency.
MongoDB is built for scalability, performance and high availability, scaling from single server deployments to large, complex multi-site architectures. By leveraging in-memory computing, MongoDB provides high performance for both reads and writes. MongoDBs native replication and automated failover enable enterprise-grade reliability and operational flexibility.
Horizontal Scalability. As the data volume and throughput grow, developers can take advantage of commodity hardware and cloud infrastructure to increase the capacity of the MongoDB system.High Availability. Multiple copies of data are maintained with native replication. Automatic failover to secondary nodes, racks and data centers makes it possible to achieve enterprise- grade uptime without custom code and complicated tuning
In-Memory Performance. Data is read and written to RAM while also persisted to disk for durability, providing fast performance and eliminating the need for a separate caching layer.Aggregation - Batch processing of data and aggregate calculationsJavaScript execution - Ability to store JavaScript functions on the server
Concepts
Advanced characteristics
Chunk 1
Chunk 2
Chunk 3
GridFS
TTL (special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time)
Capped collections
Index intersection
...
Es una base de datos generalista, no se enfoca en hacer bien una cosa, como podra ser el caso de las clave:valor que son las que ofrecen la velocidad de respuesta ms elevada del mercado. Su objetivo es abarcar lo ms posible y, por tanto, ofrece todas, o casi todas, las caractersticas de las bases de datos relacionales y las ventajas de las no relacionales, como pueden ser: schemaless, rendimiento,...
All mapReduce functions are native for both MongoDB are JavaScript and run on the database nodes.
Concepts
Shell
MongoDB
Administrative tasks
Full featured
Javascript interpreter
Standalone MongoDB client
Allows interaction with a MongoDB instance from the command line
Concepts
Utilities
MongoDB tools for backup:
MongoDB tools for tracking instances:
mongoexportUtility that generates a JSON or CSV file of data from a MongoDB instance
mongoimportImports content from a JSON, CSV or TSV export
mongodumpUtility for creating a binary export
mongorestoreWrites data to a MongoDB instance from a binary file
mongoexportUtility that generates a JSON or CSV file of data from a MongoDB instance
mongoimportImports content from a JSON, CSV or TSV export
mongodumpUtility for creating a binary export
mongorestoreWrites data to a MongoDB instance from a binary file
mongostatProvides a quick overview of the status of a running mongod or mongos instance
mongotopProvides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level. By default, mongotop returns values every second
Adems de estas herramientas existen otras tcnicas para hacer backup, como puede ser a travs de una simple copia de los ficheros
Concepts
Basic terms to know
MongoDBSQL
databasedatabase
collectiontable
documentrow
fieldcolumn
embeddingjoin
MongoDB ha sido diseada para que sea rpida (no joins but embedded documents)
Geospatial indexes
MongoDB has two types of indexes for supporting geographical queries.2d indexes: for calculations on a flat surface
2dsphere indexes: for calculations on a earth-like sphere
Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.
For supporting geospatial queries (2d and 2dsphere)
Concepts
SQL Schema Design
Customer keyFirst nameLast namePhone numberAddress keyCustomer keyStreetNumberLocationPostal CodePet keyCustomer keyTypeBreedNameAgeCustomersAddressesTablesPets
Concepts
MongoDB Schema Design
> db.customers.findOne(){"_id" : ObjectId("54131863041cd2e6181156ba"),"first_name" : "Peter","last_name" : "Keil","phone_number" : 619123456,"address" : {"street" : "C/Alcal","number" : 123,"location" : "Madrid","postal_code" : 12345},"pets" : [{"type" : "Dog","breed" : "Airedale Terrier","name" : "Linda","age" : 2},{"type" : "Dog","breed" : "Akita","name" : "Bruto","age" : 10}]}>
First nameLast namePhone numberStreetNumberLocationPostal CodeTypeBreedNameAgeTypeBreedNameAgeCustomer infoAddressesPetsCustomers collection
Concepts
Replication
PrimarySecondary 1Secondary 2Replica Set
High availability
Data safety
Read preference
Asynchronus
Single primary
Statement based
Master-slave
Automatic failover
Automatic node recovery
Failover:- Proceso desde que se cae el primario hasta que otro nodo asume su papel
Node recovery:- Rollback a todas las escrituras del primario que no llegaron a replicarse (si las haba).- Recepcin de todas las operaciones que se han hecho mientras ha estado cado.- Comienza a funcionar como secundario
Slave Delay:Tiempo de retraso hasta que un secundario se actualiza.Se utiliza en situaciones en las que se ha cometido un error (fat fingers) y se necesita volver atrs rpidamente sin tener que esperar a hacer un restore desde algn backup.
Concepts
Failover scenario
PrimarySecondary 1Secondary 2Replica Set
Secondary 2PrimarySecondary 1Replica Set
Primary goes down
New election (majority of the set)
Primary comes back (now as secondary)
The new primary assumes replication tasks
Concepts
Failover scenario with rollback
PrimarySecondary 1Secondary 2Replica Set
Secondary 2PrimarySecondary 1Replica Set
Rollback
Hard Disk
mongorestore
Concepts
Replica Set principles
Write is truly committed upon application at the majority of the set
Concepts
Replica Set: read preference
Reasons
Geography dispersed nodes
Separate a work load
Availability
Types
Primary
Primary preferred
Secondary
Secondary preferred
Nearest
Tags
Tags:Sirve para escoger los servidores con los que queremos hablar
Concepts
Sharding
SecondarySecondaryPrimaryShard 0
SecondarySecondaryPrimaryShard 1
SecondarySecondaryPrimaryShard 2
SecondarySecondaryPrimaryShard N-1
Config serverConfig serverConfig serverQuery routerQuery router...
ClientClientClient
CLUSTER
Los routers (mongos) enrutan las peticiones de los clientes al shard/s implicado
El cliente no sabe si la coleccin est particionada o no, ni en qu shard residen los datos que necesita. Por lo tanto, no hay que cambiar el cdigo de nuestra aplicacin
MongoDB leverages horizontal scalability effortlessly by using commodity computers
Sharding: concepts
Sharding concepts
Data are uniformely distributed across the shards using the shard key
Each shard allocates those documents that belongs to its own range
Sharding improves efficiency and, therefore, the performance because queries are routed only to the shards in where our data resides
Replica:High availability
Data safety
Disaster recovery
Sharding:Scale out
Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.
Sharding: metadata
Shard key:lastnameLowHighShard
Range 0MartnPrez0
Range 1PrezRodriguez1
The config servers allocates the config database which contains the cluster metadata
Metadata describes what is in the cluster, what is contained in the shards
It is a map of the data itself
Range-based partitioning
Sharding: chunks, split and migrate
ChunkSplitMigrate
Range data subsetRuns in backgroundRuns in background
Aproximately 1 chunk per 60MBWhen a chunk grows beyond 60MB it will be splitted in two equal chunksIt will move the chunks across the shards in order to achieve the balance
The MongoDB goal is to achieve a uniform data distribution across all the shards
MongoDB balances the number of chunks pers shard (nor documents nor bytes)
By default all collections belong to shard 0
An empty collection has only one chunk (shard 0)
1 chunk is about 60MB of data
Chunks > 60 MB split
Uniform data distribution across shards (chunks / shard)
Balancer decides when to migrate chunks and to which shard
Sharding: chunks, split and migrate (2)
Drivermongos
Shard 0
Shard 1chunk 0chunk 1chunk 0
App
Pre-splitting
Utilized in batch/bulk loads
Split and migration do not work
Metadata are not altered
Data are stored automatically in its shard
Shard 0
Shard 1
Shard 2
datamongosdatadata
Driver
App
Summary
Designed to be:Fast (no joins, in-memory performance),
Flexible (schemaless),
Scalable (horizontal vs vertical),
Easy to learn
Designed to:Reduce administrative tasks (replica set, sharding, disaster recovery)
With powerful:Analysis tools (aggregation framework, map reduce, hadoop connector),
Characteristics such as geospatial indexes, GridFS, etc.
PerformanceHorizontal scalability with commodity hardware Replica Set Sharding Clusters Auto load balancinghigh availabilityIn-memory performanceSchema lessFailoverData safetyDisaster recovery
Questions?
Any questions?
MongoDB ha sido diseada para que sea rpida (no joins but embedded documents), flexible (schema less), escalable (horizontal no vertical), para reducir al mnimo las labores de administracin (replica set, failover, sharding) y para que a los programadores les resulte divertida y rpida de aprender a utilizar y dotada de potentes herramientas de anlisis de datos (aggregation framework), geospatial indexes, GridFS, and so on.
MongoDB does not support multi-document transactions.
However, MongoDB does provide atomic operations on a single document. Often these document-level atomic operations are sufficient to solve problems that would require ACID transactions in a relational database. Relational databases might represent the same kind of data with multiple tables and rows, which would require transaction support to update the data atomically.
Concepts
Juan Antonio Roy CoutoEmail: juanroycouto@gmail.com
September 2014
Thank you for your attention!