Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod)...
Transcript of Unpredictable interactive terabytes of data Laurent DOLLE.pdf · A MongoDB daemon (mongod)...
Insert Co-branding logo 1. Click on placeholder 2. Click ’Insert’ 3. Click ‘Picture’ 4. Locate the co-branding logo, click Insert 5. Align with bottom line of amadeus-logo
Unpredictable & interactive analysis of terabytes of data
Amadeus Revenue Accounting Metadata Search
Big Data Paris, 11 March 2015
Laurent Dollé [email protected]
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Amadeus today
1
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Amadeus In a few words
Amadeus is a technology company dedicated to the
global travel industry.
We are present in 195 countries with a worldwide team of more than 11,000 people.
Our solutions help improve the
business performance of travel agencies, corporations, airlines,
airports, hotels, railways and more.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Connecting The travel industry
Cruiselines
Hotels
Car rental
Ground handlers
Ferry operators
Ground transportation
Airports
Travel agencies
Insurance companies
Airlines
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Supporting The traveler life cycle
Post-trip
On trip
Pre-trip Buy/Purchase
Search
Inspire
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Robust Global operations
We designed & own our Data Processing Centres _ Central DC @ Erding, Germany
_ Remote DCs all over the globe
_ Recovery DC on standby in case of natural disasters
1.6+ billion transactions
processed per day
502+ million travel agency bookings processed in 2013
615+ million Passengers Boarded in 2013
95% of the world’s scheduled network
airline seats
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Close To our customers
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Our commitment To innovation
_ Amadeus has invested €2.9bn in
Research & Development since 2004.
_ Nominated within “top 3” software companies in 2013 European Union Industrial R&D Investment Scorecard.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Amadeus growth is powered by a
sustainable
transaction-based business model
Global air travel Is a growth industry
Source: IATA. Airline Industry forecast 2013-2017
2.98 billion air passengers
2012 2017
3.91 billion air passengers
31 % growth
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Amadeus Revenue Accounting
2
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Revenue of a flight ticket
is shared
_ Travel agent
_ Governments
_ Airlines: many can be involved
(marketing & operating)
What for?
Passenger Revenue Accounting
Amadeus Revenue Accounting handles cash flows
on behalf of airlines
_ Tracking
_ Error handling & optimisation
_ Reporting: analysis & audit
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Distribution IT
• Data centres
• Platforms and applications
• Sales & marketing infrastructure
• Customers
In common
Increasing accuracy By leveraging our GDS position
Real-time tracking of airline’s
passenger sales revenue
_ at usage time: effective revenue
_ at sale time, weeks before:
expected revenue
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
_Facilitate
strategic decisions
_Optimise revenue accounting
processes
Amadeus Revenue Accounting Key benefits & features
Web apps, APIs & feeds hosted in the Amadeus cloud (SaaS)
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Metadata Search business needs
3
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
One of our launch partners is a
large European airline
_ transporting 35m+ passengers a year
_ key player in the
revenue accounting industry
Business needs Gathered from a launch partner
They requested a user-friendly way to query any data in our main operational database
_ Unpredictable ad-hoc search
_ Many advanced reporting requirements
Migrating
_ from their
in-house data warehouse
_ to our
cloud-based solution
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
_Graphical user interface edit, import, save & share queries
_Data warehouse fed in real time 4 years history (140m+ documents, versioned)
_ Interactive response times
_ Search further using
chained queries (patent pending)
Metadata Search The main promises
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
November 2013 User acceptance testing
December 2014 Migration & parallel running validation on production
Summer 2015 Production cut-over
Post cut-over SLA & optimisation based on usage statistics
Project milestones And possible impacts
Any delay or functional gap may
impact the whole project as application is used to validate
migration and parallel running phases.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
User-friendly
SQL graphical user interface
4
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
SQL paradigm Split into 2 functional areas
2 functional areas can be defined
_ Search criteria predicates filtering the results
_ Displayed data projections and related functions
SELECT A, SUM(B) WHERE A > C AND B > D GROUP BY A ORDER BY A
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Graphical user interface Query editor
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Graphical user interface Query editor
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Technical constraints
5
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Expecting fast answer to unpredictable queries
No index, no hint (almost)
_ Fields to be scanned unknown
_ Main-memory full scans to decrease response time
Need to scale out for sustainable performances
Support mainstream SQL DML statements
_ Aggregation
_ Cross-column comparison, Boolean logic
_ Sort
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Document timeline implemented to
retrieve efficiently the particular version of a document
based on arbitrary date, event name, flags
Efficient upserts & transactions needed to
replace or update multiple versions at each write
Resilient & user-friendly versioning Featuring a document timeline
1.0 Issuance
1.1 Issuance confirmation
2.0 Exchange
Timeline 3.0 Usage
3.1 Usage (replay)
3.2 Usage (replay)
Events out of timeline 2.1 Exchange (replay)
4.0 Exchange
conflict: 3.2 bumped out of timeline conflict
last issuance confirmation last 2.x last usage last issuance
last 1.x last 3.x last exchange
final event last 4.x
Flags
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Our main operational database is an Oracle document store containing
Protocol Buffers documents
(4000+ fields)
A schema-less document store would ease
_ the ETL transformation process
(400+ metadata fields to load)
_ the data model maintenance & synchronization between both databases
Schema-less document store For agile integration
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Consistency favoured over availability (CAP)
_ Expecting accuracy since data used by auditors
_ However: no operational impact application is not MCA
No contractual SLA
_ To be agreed after benchmarking on production
_ Interactive response times expected
with very few parallel users
_ Full outages out of business hours accepted
Consistency & availability And their impacts
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Runs on standard x86 architecture
C++, Python & Java drivers
Enterprise-grade security
_ SSL encryption
_ Kerberos authentication
_ Data-at-rest encryption
Integration In the Amadeus standards
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
_ Oracle Mounting all data in memory is irrelevant for cost & hardware reasons: 90TB for our biggest prospect.
_ MySQL cluster Technical & functional limitations,
complex to implement & maintain.
_ Impala Still young, with a steep learning curve. Distributed data analysis not exactly matching our use-case.
Considered alternatives To MongoDB
_ Couchbase Slightly behind MongoDB for document
search (index mandatory).
N1QL not finalized.
Key-value store not exactly matching our use-case.
_ Crescando Amadeus in-house R&D database engine
(index-less, main-memory only,
partitioning data at CPU core level).
Project terminated.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Technical architecture
6
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Microsharding solves this issue.
Database is highly sharded – as many shards as cores –
so that each shard spawns its own thread,
thus sharing efficiently the workload on the whole CPU power.
Enforcing parallel processing To speed up aggregation queries
A MongoDB daemon (mongod) processes
any incoming query on a single thread.
Modern hardware architectures features
many sockets (2-4) and many cores (8-16),
meaning wasted computing power
if we do not enforce parallel processing.
Our online analytical processing use-case implies
intense workload (full scans)
with limited concurrency as queries are queued and
run sequentially.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
_Microsharding validated, from 6 to 48 shards on 6 physical servers
Performances increase almost linearly in respect to the number of shards
_On-the-fly rebalancing validated Cleaning step is mandatory (12 shards and +)
Benchmarking CPU usage Through in-memory microsharding
0
50
100
150
200
250
300
350
400
0 10 20 30 40 50 60
tim
e
shards
Full scan
0
200
400
600
800
1000
1200
1400
1600
1800
0 10 20 30 40 50 60
tim
e
shards
Full scan with aggregation
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
_ Performances increase linearly in respect to the amount of scanned data
_ Positive impact of caching (light blue dots) validated on full scans only
Benchmarking scalability Through data ramp-up
0
2
4
6
8
10
12
0 200 400 600 800 1000 1200
tim
e
data size
Full scan
0
100
200
300
400
500
0 200 400 600 800 1000 1200
tim
e
data size
Full scan with aggregation
Behaviour reproduced for 2 shard distributions 24 & 48 shards on 6 physical servers, 100% in-memory
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Benchmarking scalability Through generated search criteria
0
2
4
6
8
10
0 10000 20000 30000 40000 50000
tim
e
search criteria pairs (A and B)
Full scan: OR & AND
0
0,5
1
1,5
2
0 10000 20000 30000 40000 50000
tim
e
search criteria
Full scan: IN
_ Performances increase linearly in respect to the amount of search criteria
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
6 physical data servers
_ Server HP ProLiant DL580 Gen8
4 sockets, x86, rack
_ 4x CPU Intel Xeon E7-4850 v2
2.30 GHz, 12 physical cores
_ RAM 512GB 40GB/s scanning speed
_ 2x flash cards Fusion-io ioScale 3.2TB 1.5GB/s read
3 virtual config servers
_ RAM 8GB
Production cluster setup Facts & figures
Overall cluster
_ 288 cores, 288 sharded replica sets (2x+1)
_ 3TB RAM, 38.4TB flash card storage
Currently 1 year of production data (4 expected)
_ 250m+ docs (1bn)
_ Data size 2.8TB (11TB) docs with padding
_ Average object size 11.9KB
_ File size 3.97TB (16TB) data & index extents
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
265ced1609a17cf1a5979880a2ad364653895ae8
Input queue
Error queue
RA
wo
rkfl
ow
Revenue Accounting operational database
Write
Read
REV
Sharded replica sets
Config servers
1st 2nd x
Mongo daemons & arbiter
Shell & drivers (C++, Python, Java)
mongoimport initial/massive feed
live feed
REV OBE BATCH CLUSTER - SLES
MONGODB CLUSTER - RHEL
on-call, debugging & ad-hoc investigation
AQG lib C++ driver
Shard router
service
live trigger
MSG live
gateway
Shard router
applicative
Shard router
applicative
REV OBE OLTP CLUSTER - SLES
SI
https
Browser
corrective feed
MSF front-end
edifact
JSON files
MSG batch
gateway AQG lib
C++ driver ORACLE C
LU
STER
Technical architecture
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Microsharding is a powerful way to increase response times, what else can bring value?
Database customisation And its results
NUMA
Kernel tuning
Striped replica set
Cgroups
Cgroups Prevent shards from competing for memory when data does not fit into RAM – especially with microsharding. Low-memory Cgroups may be compressed with zRAM/WiredTiger.
Kernel tuning Optimize Linux in case of CPU-bound effort (vs. IO-bound): small readahead, THP off, increase task scheduler.
NUMA Restrict access to CPU & memory for secondary daemons.
Striped replica set Span shards on all the available hardware, with secondary daemons replicated on different nodes for smooth failover.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
High availability & fault tolerance
265ced1609a17cf1a5979880a2ad364653895ae8
Mongo daemon
Mongo daemon
Mongo daemon
1st 2nd
Mongo daemons
1st 2nd x
Mongo daemons & arbiter
1st 2nd x
Mongo daemons & arbiter
1st 2nd x
Mongo daemons & arbiter
1st 2nd x
Mongo daemons & arbiter
1st 2nd x
Mongo daemons & arbiter
1st 2nd x
Mongo daemons & arbiter
2nd
1st 2nd
Mongo daemons
2nd
1st 2nd
Mongo daemons
2nd
UNSHARDED DATABASE SHARDS SHARDED REPLICA SETS SHARDED REPLICA SETS STRIPED & SHARDED REPLICA SETS
_ Many options & combinations possible
_ Updates performed on-the-fly
Horizontal scaling through sharding
High availability through replication (primary & secondary shards)
Cheaper, relaxed high-availability through arbiters (empty shards)
Hardware fault-tolerance through physical servers
C B A
Shard, replicate & stripe
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Production benchmarks
7
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Full scan aggregation is CPU-bound,
with a fixed entry cost for unwinds.
_ no unwind 3s
_ unwinds on 1, 2 or 3 levels 70s
Interactive response times promise is complied with
on basic use-cases
In the absence of concurrency,
response times are consistent across all tests.
Production response times And their lessons learnt
Indexes have a linear impact on response times.
Complex query with 4 match criteria
_ full scan 100s
_ index, 40% selectivity 40s
Complex query with 4 match criteria,
including field-on-field comparison
_ full scan 190s
_ index, 40% selectivity 70s
_ index, 75% selectivity 145s
Position of the match operator in the
aggregation pipeline can impact index usage.
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Integrated monitoring
8
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Ops Manager Flavours
MongoDB Ops Manager can be run
_ in the cloud
_ on premise
On-prem version features
_ an admin GUI
_ a monitoring API
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Ops Manager API Integrated in topology explorer
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Ops Manager API Integrated in ping watchdog
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Ops Manager API Integrated in real-time monitoring
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Ops Manager API Integrated in Ops workbench
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Feedback on 1 year of Open Source
9
265ced1609a17cf1a5979880a2ad364653895ae8
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
Need some basic help? Some expert advice? Or the source code?
Google can definitely help, but MongoDB too.
_ Turn Pre-sales Engineers & Solutions Architects into Trainers & Evangelists
_ Everybody can open tickets in MongoDB’s JIRA, but Commercial Support can
process them even faster for you (premium)
_ A dedicated Technical Account Manager can follow your project, provide ad-hoc support and chase tickets internally
Turn your employees into smart creatives _ Empower small teams, embrace agility, set broad objectives & watch the magic
_ Even internal use-cases might be addressed by accident
Services & empowerment Can help you go the extra mile
Change the Year in the Copyright field 1. Click ‘Insert’ in Top menu 2. Click ’Header & Footer’ 3. Write new Year in field ‘Footer’ 4. Click ‘Apply to All’
You can follow us on:
AmadeusITGroup amadeus.com/blog amadeus.com
Thank you
265ced1609a17cf1a5979880a2ad364653895ae8