Mongo db intro.pptx

26
1 Big Data: MongoDB Workshop

Transcript of Mongo db intro.pptx

Page 1: Mongo db intro.pptx

1

Big Data: MongoDB Workshop

Page 2: Mongo db intro.pptx

2

Content

▪ MongoDB Intro▪ MongoDB @VRT by Chris▪ MongoDB 2.6: What is new▪ Break▪ Certification by Tim▪ Online Courses MongoDB / Hadoop and What next

Page 3: Mongo db intro.pptx

3

MongoDB Intro

▪ A new world▪ NoSQL▪ What is MongoDB▪ MongoDB Architecture

Page 4: Mongo db intro.pptx

4

A New World

▪ New AppsThe applications serving, generating and interfacing with data have changed. Big Data, SaaS, social and mobile apps are the new norm.

▪ New Data TypesNew applications, users and inputs demand new types of data, like unstructured, semi-structured and polymorphic data.

▪ New Data VolumesData volumes were once smaller, constrained and predictable. Today organizations must be prepared to support millions of users, thousands of queries per second and hundreds of terabytes of data.

▪ New Development MethodsThe methods we use to build applications have changed. With increasingly competitive markets and the need to adapt constantly, iterative development has become the standard.

▪ New ArchitecturesThe infrastructure on which we store data has changed. Companies are leveraging cloud computing, commodity hardware and virtualization.

MongoDB was designed for how we build and run applications today.

Page 5: Mongo db intro.pptx

55

NoSQL

Page 6: Mongo db intro.pptx

6

NoSQL

▪ Wide variety of different database techonologies▪ Designed to deal with new issues arising with data

- Rise in volume of data stored- Frequency in which this data is accessed- Performance and Processing Needs

What is NoSQL

Page 7: Mongo db intro.pptx

7

NoSQL

▪ Document databases- Pair each key with a complex data structure known as a

document. Documents can contain: key-value pairs, key-array pairs and even nested documents.

▪ Graph stores - Store information about networks, such as social connections.

▪ Key-value stores - simplest NoSQL databases. - every single item in the database is stored as an attribute name

(or "key"), together with its value.▪ Wide-column stores 

- Optimized for queries over large datasets- Store columns of data together, instead of rows.

NoSQL datatypes

Page 8: Mongo db intro.pptx

8

The benefits of NoSQL

▪ More Scaleable▪ More performance▪ Their data model addresses several issues a relational model

is not designed to address- Large volumes of structured, semi-structured, and unstructured

data- Agile sprints, quick iteration, and frequent code pushes- Object-oriented programming that is easy to use and flexible- Efficient, scale-out architecture instead of expensive, monolithic

architecture

Page 9: Mongo db intro.pptx

9

The benefits of NoSQL

▪ Dynamic Schemas- Agile development approach

▪ Auto Sharding- Application does not need to be aware of server composition- Cloud

▪ Replication- Most support automatic replication- High availability and better disaster recovery

▪ Integrated Caching- On relational DB does not improve writes

Page 10: Mongo db intro.pptx

1010

MongoDB

Page 11: Mongo db intro.pptx

11

What is MongoDB

▪ Open Source Database▪ Used by companies of all sizes▪ Agile database▪ With functionality of traditional databases:

- Full query language- Consistency- Secondary Indexes

▪ Built for:- Scalability- Performance- High Availability

▪ NoSQL

Page 12: Mongo db intro.pptx

1212

MongoDBArchitecture

Page 13: Mongo db intro.pptx

13

MongoDB Architecture

▪ Document Data Model ▪ Rich Query Model  ▪ Idiomatic Drivers ▪ Horizontal Scalability  ▪ High Availability  ▪ In-Memory Performance  ▪ Flexibility 

Feature Overview

Page 14: Mongo db intro.pptx

14

MongoDB Architecture

▪ Data as Documents- Stored in BSON (Binary Json)- Collections (similar like a table tables)- Tends to have all data for a given record in a single document

▪ Dynamic Schema- Documents can vary in structure- Fields can vary from document to document- Documents are self-describing- New fields, collections can be created without affecting all other

documents in the system, without updating a central system catalog and without taking the system offline

MongoDB Data Model

Page 15: Mongo db intro.pptx

15

MongoDB Architecture

▪ Idiomatic Drivers- MongoDB provides native drivers for all popular programming

languages and frameworks to make development natural. - Supported drivers include Java, .NET, Ruby, PHP, JavaScript,

node.js, Python, Perl, PHP, Scala and others. ▪ Query Types

- MongoDB supports many types of queries. A query may return a document or a subset of specific fields within the document.

▪ Key-Value queries: on alue of a specific field▪ Range queries: on inequalities (greater then, smaller then …)▪ Geospatial queries: on proximity criteria▪ Text Search queries: on relevance order based on text arguments▪ Aggregation Framework queries: like group by statements▪ MapReduce Queries: complex data processing expressed in JavaScript

MongoDB Query Model

Page 16: Mongo db intro.pptx

16

MongoDB Architecture

▪ Indexing- Many types of indexes on any field in the document- Improve performance of some operations by orders of magnitued- Has associated costs in the form of:

▪ Slower writes▪ Disk usage▪ Memory Usage

MongoDB Query Model

Page 17: Mongo db intro.pptx

17

MongoDB Architecture

▪ Auto-Sharding- Distributes data across multiple physical partitions called shards. - Allows MongoDB deployments to address the hardware limitations

of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.

▪ Sharding is transparent to applications:- Applications issue requests to query routers which sends that

query to the appropriate shards

MongoDB Data Management

Page 18: Mongo db intro.pptx

18

MongoDB Architecture

▪ Transaction Model- ACID compliant at the document level. Ensures complete isolation as a

document is updated; any errors cause the operation to roll back and clients receive a consistent view of the document

- This is the same model used by many traditional relational databases to provide durability guarantees.

- As a distributed system: additional flexibility in enabling users to achieve their desired durability goals by controlling how write operations are persisted across replicas.

MongoDB Consistency & Durability

Page 19: Mongo db intro.pptx

19

MongoDB Architecture

▪ Replica Sets- MongoDB maintains multiple copies of data called replica sets - A fully self-healing shard that helps prevent database downtime. - Replica failover is fully automated, - The number of replicas in a MongoDB replica set is configurable,- A larger number of replicas provides increased data durability and

protection against database downtime (e.g., in case of multiple machine failures, rack failures, data center failures, or network partitions).

- Optionally, operations can be configured to write to multiple replicas before returning to the application, thereby providing functionality that is similar to synchronous replication.

- Replica sets also provide operational flexibility by providing a way to upgrade hardware and software without requiring the database to go offline.

MongoDB Consistency & Durability

Page 20: Mongo db intro.pptx

20

MongoDB Architecture

▪ In-Memory Performance with On-Disk Capacity- A fully self-healing shard that helps prevent database downtime. - Extensive use of RAM to speed up database operations.

▪ Reading data from memory is measured in nanoseconds, whereas reading data from spinning disk is measured in milliseconds; reading from memory is approximately 100,000 times faster than reading data from disk.

- All data is read and manipulated through memory-mapped files. Data that is not accessed is not loaded into RAM.

- While it is not required that all data fit in RAM, it should be the goal of the deployment team that indexes and all data that is frequently accessed should fit in RAM.

- If the volume of data that is frequently accessed exceeds the capacity of a single machine => Automatic Sharding.

- No need for a separate caching layer.

MongoDB Consistency & Durability

Page 21: Mongo db intro.pptx

2121

MongoDBScale & Use Cases

Page 22: Mongo db intro.pptx

22

MongoDB Scale

▪ 30 of the world’s 100 largest organizations use MongoDB. ▪ Scale

- Over 100 organizations run clusters with more than 100 nodes. Some clusters exceed 1,000 nodes.

- Deployments like: Yandex

▪ Velocity- Many clusters deliver hundreds of thousands of operations per second

(combined read and write). - Deployments like: Foursquare

▪ Volume- Clusters with hundreds of terabytes, some store multiple petabytes of data. - Over 150 clusters exceed 1 billion documents in size. Many with more than

100 billion documents. - Deployments like: Craigslist

Page 23: Mongo db intro.pptx

23

MongoDB Use Case

▪ Why move?- Original architecture relied on relational DB- Too much traffic for 1 machine

▪ Why MongoDB- Auto-sharding to scale high-traffic and fast-growing application- Geo-indexing for easy querying of location-based data- Dramatically simplified data model

Foursquary

Page 24: Mongo db intro.pptx

24

MongoDB Use Case

▪ In today’s regulatory environment, there are two constants about compliance: - requirements are changing - the volume of data to manage is immense.

▪ For Craigslist, the popular classifieds and job posting community that serves 570 cities in 50 countries, this means:- Having to archive years of accumulated data. (1.5 million new classified ads

posted every day)- Must be able to query and report on these archives at runtime.

▪ Historically:- MySQL Cluster- Simple schema change on their vast archive took months to complete,

preventing them from pushing new features.

Craigslist 1

Page 25: Mongo db intro.pptx

25

MongoDB Use Case

▪ Flexibility- Each post and its metadata in a single document- Schema changes with little cost

▪ Scalability and Availability- Can to scale horizontaly across commodity hardware without having to

write and maintain complex, custom sharding code- Craigslist’s initial MongoDB deployment was designed to hold over 5 billion

documents and 10TB of data. - MongoDB’s support for automated failover of nodes via replica sets was

another big win. In the previous system it was a manual effort.

▪ Ease of Use▪ Proven, Supported Technology

- Compared to the other NoSQL options, MongoDB is broadly-used technology with many major deployments.

Craigslist 2

Page 26: Mongo db intro.pptx

2626

Questions ?