Download - Google mesa

Google Mesa

Sameer TiwariHadoop Architect, Pivotal [email protected] @sameertech Aug 12, 2014

mailto:[email protected]

What is Mesa?

● Geo-Replicated, Near Real-Time, Scalable Data Warehousing for Google’s Internet Advertising Business.

● Ok so what is it really?o Its an Atomic, Consistent, Available, Near Real Time,

Scalable Store

Salient features

● DW for Ad serving at Google● Metadata on BigTable● Data on Colossus● Trillions of Queries/day, Millions/second● Support Multiple indexes● Runs on tens of thousands machines across

geos

Data Model● Table are specified by Table Schemas● Table Schema by, Key and Value Space

o K, V are setso Each is represented as column tupleso Specifies an aggregation function

● Each Col stored separately● For consistency updates are multi-versioned

and batched for throughput● Data is amenable to aggregation

Data Model

● Pre-aggregates data into Deltas (no repeated row keys/delta) and applies a version

● Compaction is multi-level● A Controller handles updates/ maintenance,

works with BigTable

Controller

● 4 sub-systemso Updateso Compactiono Checksumo Schema change

● Does not do any work, only schedules it

Storage and Indexes

- AO, log structured, read-only- Rows organized as compressed row-blocks- Indexes have starting entry of the row-block- Naive lookup

- Binary Search on index to find row-blocks

- Binary Search on the row-blocks

Query sub system

● Limited Query engine with Filtering/Predicate● Used by higher level systems

Dremel/MySQL● Has multiple stateless Query Servers● Works on both the BigTable and Colossus● Provides nice sharding and LB mechanism● Groups similar queries to a subset of

Servers

Multi Datacenter Deployment● Tables are multi-versioned

o (Serve old data while new is in-progress)● Committer is stateless and sends updates to

multiple Datacenterso Built on top of versionsDB. - Globally replicated and

consistent store build on top of distributed Paxos.● Data goes async across Mesa instances● Only Metadata is sync-repl using Paxos-

versionsDB

Optimizations● Delta pruning - similar to Filter pushdown● Resume-Key, Key per data block

o Data is returned a block at a time, so if a QueryServer dies, another one can pick it up.

● Parallelizing workloads: Uses MR to shardo While writing delta, Mesa sample row-keys which is

used to figure out the right number of Mappers/Reducers.

o The workers are the same 4 workers scheduled by the Controller

Optimizations

● Schema changes - two techniqueso Create, Copy, Replay and delete - Expensiveo Link and add default values - This is used in Mesa

● New Instances of Mesa use P2P mechanisms to come up and online.

Handling Data Corruption

● Mesa runs on ~50K boxes● Online - During updates.

o Fact: Each Mesa instance is logically same but physically may differ in deltas

o Check chksums of indexes/datao Row-order, key-range, aggregate values should be

same, across instances● Offline

o Run global chksums of all indexes

Reference

http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/42851.pdf