Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod)...

48
Unpredictable & interactive analysis of terabytes of data Amadeus Revenue Accounting Metadata Search Big Data Paris, 11 March 2015 Laurent Dollé [email protected] 265ced1609a17cf1a5979880a2ad364653895ae8

Transcript of Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod)...

Page 1: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Insert Co-branding logo 1. Click on placeholder 2. Click ’Insert’ 3. Click ‘Picture’ 4. Locate the co-branding logo, click Insert 5. Align with bottom line of amadeus-logo

Unpredictable & interactive analysis of terabytes of data

Amadeus Revenue Accounting Metadata Search

Big Data Paris, 11 March 2015

Laurent Dollé [email protected]

265ced1609a17cf1a5979880a2ad364653895ae8

Page 2: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Amadeus today

1

265ced1609a17cf1a5979880a2ad364653895ae8

Page 3: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Amadeus In a few words

Amadeus is a technology company dedicated to the

global travel industry.

We are present in 195 countries with a worldwide team of more than 11,000 people.

Our solutions help improve the

business performance of travel agencies, corporations, airlines,

airports, hotels, railways and more.

Page 4: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Connecting The travel industry

Cruiselines

Hotels

Car rental

Ground handlers

Ferry operators

Ground transportation

Airports

Travel agencies

Insurance companies

Airlines

Page 5: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Supporting The traveler life cycle

Post-trip

On trip

Pre-trip Buy/Purchase

Search

Inspire

Page 6: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Robust Global operations

We designed & own our Data Processing Centres _ Central DC @ Erding, Germany

_ Remote DCs all over the globe

_ Recovery DC on standby in case of natural disasters

1.6+ billion transactions

processed per day

502+ million travel agency bookings processed in 2013

615+ million Passengers Boarded in 2013

95% of the world’s scheduled network

airline seats

Page 7: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Close To our customers

Page 8: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Our commitment To innovation

_ Amadeus has invested €2.9bn in

Research & Development since 2004.

_ Nominated within “top 3” software companies in 2013 European Union Industrial R&D Investment Scorecard.

Page 9: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Amadeus growth is powered by a

sustainable

transaction-based business model

Global air travel Is a growth industry

Source: IATA. Airline Industry forecast 2013-2017

2.98 billion air passengers

2012 2017

3.91 billion air passengers

31 % growth

Page 10: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Amadeus Revenue Accounting

2

265ced1609a17cf1a5979880a2ad364653895ae8

Page 11: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Revenue of a flight ticket

is shared

_ Travel agent

_ Governments

_ Airlines: many can be involved

(marketing & operating)

What for?

Passenger Revenue Accounting

Amadeus Revenue Accounting handles cash flows

on behalf of airlines

_ Tracking

_ Error handling & optimisation

_ Reporting: analysis & audit

Page 12: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Distribution IT

• Data centres

• Platforms and applications

• Sales & marketing infrastructure

• Customers

In common

Increasing accuracy By leveraging our GDS position

Real-time tracking of airline’s

passenger sales revenue

_ at usage time: effective revenue

_ at sale time, weeks before:

expected revenue

Page 13: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

_Facilitate

strategic decisions

_Optimise revenue accounting

processes

Amadeus Revenue Accounting Key benefits & features

Web apps, APIs & feeds hosted in the Amadeus cloud (SaaS)

Page 14: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Metadata Search business needs

3

265ced1609a17cf1a5979880a2ad364653895ae8

Page 15: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

One of our launch partners is a

large European airline

_ transporting 35m+ passengers a year

_ key player in the

revenue accounting industry

Business needs Gathered from a launch partner

They requested a user-friendly way to query any data in our main operational database

_ Unpredictable ad-hoc search

_ Many advanced reporting requirements

Migrating

_ from their

in-house data warehouse

_ to our

cloud-based solution

Page 16: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

_Graphical user interface edit, import, save & share queries

_Data warehouse fed in real time 4 years history (140m+ documents, versioned)

_ Interactive response times

_ Search further using

chained queries (patent pending)

Metadata Search The main promises

Page 17: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

November 2013 User acceptance testing

December 2014 Migration & parallel running validation on production

Summer 2015 Production cut-over

Post cut-over SLA & optimisation based on usage statistics

Project milestones And possible impacts

Any delay or functional gap may

impact the whole project as application is used to validate

migration and parallel running phases.

Page 18: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

User-friendly

SQL graphical user interface

4

265ced1609a17cf1a5979880a2ad364653895ae8

Page 19: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

SQL paradigm Split into 2 functional areas

2 functional areas can be defined

_ Search criteria predicates filtering the results

_ Displayed data projections and related functions

SELECT A, SUM(B) WHERE A > C AND B > D GROUP BY A ORDER BY A

Page 20: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Graphical user interface Query editor

Page 21: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Graphical user interface Query editor

Page 22: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Technical constraints

5

265ced1609a17cf1a5979880a2ad364653895ae8

Page 23: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Expecting fast answer to unpredictable queries

No index, no hint (almost)

_ Fields to be scanned unknown

_ Main-memory full scans to decrease response time

Need to scale out for sustainable performances

Support mainstream SQL DML statements

_ Aggregation

_ Cross-column comparison, Boolean logic

_ Sort

Page 24: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Document timeline implemented to

retrieve efficiently the particular version of a document

based on arbitrary date, event name, flags

Efficient upserts & transactions needed to

replace or update multiple versions at each write

Resilient & user-friendly versioning Featuring a document timeline

1.0 Issuance

1.1 Issuance confirmation

2.0 Exchange

Timeline 3.0 Usage

3.1 Usage (replay)

3.2 Usage (replay)

Events out of timeline 2.1 Exchange (replay)

4.0 Exchange

conflict: 3.2 bumped out of timeline conflict

last issuance confirmation last 2.x last usage last issuance

last 1.x last 3.x last exchange

final event last 4.x

Flags

Page 25: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Our main operational database is an Oracle document store containing

Protocol Buffers documents

(4000+ fields)

A schema-less document store would ease

_ the ETL transformation process

(400+ metadata fields to load)

_ the data model maintenance & synchronization between both databases

Schema-less document store For agile integration

Page 26: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Consistency favoured over availability (CAP)

_ Expecting accuracy since data used by auditors

_ However: no operational impact application is not MCA

No contractual SLA

_ To be agreed after benchmarking on production

_ Interactive response times expected

with very few parallel users

_ Full outages out of business hours accepted

Consistency & availability And their impacts

Page 27: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Runs on standard x86 architecture

C++, Python & Java drivers

Enterprise-grade security

_ SSL encryption

_ Kerberos authentication

_ Data-at-rest encryption

Integration In the Amadeus standards

Page 28: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

_ Oracle Mounting all data in memory is irrelevant for cost & hardware reasons: 90TB for our biggest prospect.

_ MySQL cluster Technical & functional limitations,

complex to implement & maintain.

_ Impala Still young, with a steep learning curve. Distributed data analysis not exactly matching our use-case.

Considered alternatives To MongoDB

_ Couchbase Slightly behind MongoDB for document

search (index mandatory).

N1QL not finalized.

Key-value store not exactly matching our use-case.

_ Crescando Amadeus in-house R&D database engine

(index-less, main-memory only,

partitioning data at CPU core level).

Project terminated.

Page 29: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Technical architecture

6

265ced1609a17cf1a5979880a2ad364653895ae8

Page 30: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Microsharding solves this issue.

Database is highly sharded – as many shards as cores –

so that each shard spawns its own thread,

thus sharing efficiently the workload on the whole CPU power.

Enforcing parallel processing To speed up aggregation queries

A MongoDB daemon (mongod) processes

any incoming query on a single thread.

Modern hardware architectures features

many sockets (2-4) and many cores (8-16),

meaning wasted computing power

if we do not enforce parallel processing.

Our online analytical processing use-case implies

intense workload (full scans)

with limited concurrency as queries are queued and

run sequentially.

Page 31: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

_Microsharding validated, from 6 to 48 shards on 6 physical servers

Performances increase almost linearly in respect to the number of shards

_On-the-fly rebalancing validated Cleaning step is mandatory (12 shards and +)

Benchmarking CPU usage Through in-memory microsharding

0

50

100

150

200

250

300

350

400

0 10 20 30 40 50 60

tim

e

shards

Full scan

0

200

400

600

800

1000

1200

1400

1600

1800

0 10 20 30 40 50 60

tim

e

shards

Full scan with aggregation

Page 32: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

_ Performances increase linearly in respect to the amount of scanned data

_ Positive impact of caching (light blue dots) validated on full scans only

Benchmarking scalability Through data ramp-up

0

2

4

6

8

10

12

0 200 400 600 800 1000 1200

tim

e

data size

Full scan

0

100

200

300

400

500

0 200 400 600 800 1000 1200

tim

e

data size

Full scan with aggregation

Behaviour reproduced for 2 shard distributions 24 & 48 shards on 6 physical servers, 100% in-memory

Page 33: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Benchmarking scalability Through generated search criteria

0

2

4

6

8

10

0 10000 20000 30000 40000 50000

tim

e

search criteria pairs (A and B)

Full scan: OR & AND

0

0,5

1

1,5

2

0 10000 20000 30000 40000 50000

tim

e

search criteria

Full scan: IN

_ Performances increase linearly in respect to the amount of search criteria

Page 34: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

6 physical data servers

_ Server HP ProLiant DL580 Gen8

4 sockets, x86, rack

_ 4x CPU Intel Xeon E7-4850 v2

2.30 GHz, 12 physical cores

_ RAM 512GB 40GB/s scanning speed

_ 2x flash cards Fusion-io ioScale 3.2TB 1.5GB/s read

3 virtual config servers

_ RAM 8GB

Production cluster setup Facts & figures

Overall cluster

_ 288 cores, 288 sharded replica sets (2x+1)

_ 3TB RAM, 38.4TB flash card storage

Currently 1 year of production data (4 expected)

_ 250m+ docs (1bn)

_ Data size 2.8TB (11TB) docs with padding

_ Average object size 11.9KB

_ File size 3.97TB (16TB) data & index extents

Page 35: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

265ced1609a17cf1a5979880a2ad364653895ae8

Input queue

Error queue

RA

wo

rkfl

ow

Revenue Accounting operational database

Write

Read

REV

Sharded replica sets

Config servers

1st 2nd x

Mongo daemons & arbiter

Shell & drivers (C++, Python, Java)

mongoimport initial/massive feed

live feed

REV OBE BATCH CLUSTER - SLES

MONGODB CLUSTER - RHEL

on-call, debugging & ad-hoc investigation

AQG lib C++ driver

Shard router

service

live trigger

MSG live

gateway

Shard router

applicative

Shard router

applicative

REV OBE OLTP CLUSTER - SLES

SI

https

Browser

corrective feed

MSF front-end

edifact

JSON files

MSG batch

gateway AQG lib

C++ driver ORACLE C

LU

STER

Technical architecture

Page 36: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Microsharding is a powerful way to increase response times, what else can bring value?

Database customisation And its results

NUMA

Kernel tuning

Striped replica set

Cgroups

Cgroups Prevent shards from competing for memory when data does not fit into RAM – especially with microsharding. Low-memory Cgroups may be compressed with zRAM/WiredTiger.

Kernel tuning Optimize Linux in case of CPU-bound effort (vs. IO-bound): small readahead, THP off, increase task scheduler.

NUMA Restrict access to CPU & memory for secondary daemons.

Striped replica set Span shards on all the available hardware, with secondary daemons replicated on different nodes for smooth failover.

Page 37: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

High availability & fault tolerance

265ced1609a17cf1a5979880a2ad364653895ae8

Mongo daemon

Mongo daemon

Mongo daemon

1st 2nd

Mongo daemons

1st 2nd x

Mongo daemons & arbiter

1st 2nd x

Mongo daemons & arbiter

1st 2nd x

Mongo daemons & arbiter

1st 2nd x

Mongo daemons & arbiter

1st 2nd x

Mongo daemons & arbiter

1st 2nd x

Mongo daemons & arbiter

2nd

1st 2nd

Mongo daemons

2nd

1st 2nd

Mongo daemons

2nd

UNSHARDED DATABASE SHARDS SHARDED REPLICA SETS SHARDED REPLICA SETS STRIPED & SHARDED REPLICA SETS

_ Many options & combinations possible

_ Updates performed on-the-fly

Horizontal scaling through sharding

High availability through replication (primary & secondary shards)

Cheaper, relaxed high-availability through arbiters (empty shards)

Hardware fault-tolerance through physical servers

C B A

Shard, replicate & stripe

Page 38: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Production benchmarks

7

265ced1609a17cf1a5979880a2ad364653895ae8

Page 39: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Full scan aggregation is CPU-bound,

with a fixed entry cost for unwinds.

_ no unwind 3s

_ unwinds on 1, 2 or 3 levels 70s

Interactive response times promise is complied with

on basic use-cases

In the absence of concurrency,

response times are consistent across all tests.

Production response times And their lessons learnt

Indexes have a linear impact on response times.

Complex query with 4 match criteria

_ full scan 100s

_ index, 40% selectivity 40s

Complex query with 4 match criteria,

including field-on-field comparison

_ full scan 190s

_ index, 40% selectivity 70s

_ index, 75% selectivity 145s

Position of the match operator in the

aggregation pipeline can impact index usage.

Page 40: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Integrated monitoring

8

265ced1609a17cf1a5979880a2ad364653895ae8

Page 41: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Ops Manager Flavours

MongoDB Ops Manager can be run

_ in the cloud

_ on premise

On-prem version features

_ an admin GUI

_ a monitoring API

Page 42: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Ops Manager API Integrated in topology explorer

Page 43: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Ops Manager API Integrated in ping watchdog

Page 44: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Ops Manager API Integrated in real-time monitoring

Page 45: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Ops Manager API Integrated in Ops workbench

Page 46: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Feedback on 1 year of Open Source

9

265ced1609a17cf1a5979880a2ad364653895ae8

Page 47: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

Need some basic help? Some expert advice? Or the source code?

Google can definitely help, but MongoDB too.

_ Turn Pre-sales Engineers & Solutions Architects into Trainers & Evangelists

_ Everybody can open tickets in MongoDB’s JIRA, but Commercial Support can

process them even faster for you (premium)

_ A dedicated Technical Account Manager can follow your project, provide ad-hoc support and chase tickets internally

Turn your employees into smart creatives _ Empower small teams, embrace agility, set broad objectives & watch the magic

_ Even internal use-cases might be addressed by accident

Services & empowerment Can help you go the extra mile

Page 48: Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod) processes any incoming query on a single thread. Modern hardware architectures features

Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’

You can follow us on:

AmadeusITGroup amadeus.com/blog amadeus.com

Thank you

265ced1609a17cf1a5979880a2ad364653895ae8