Google Mesa
Sameer TiwariHadoop Architect, Pivotal [email protected] @sameertech Aug 12, 2014
What is Mesa?
● Geo-Replicated, Near Real-Time, Scalable Data Warehousing for Google’s Internet Advertising Business.
● Ok so what is it really?o Its an Atomic, Consistent, Available, Near Real Time,
Scalable Store
Salient features
● DW for Ad serving at Google● Metadata on BigTable● Data on Colossus● Trillions of Queries/day, Millions/second● Support Multiple indexes● Runs on tens of thousands machines across
geos
Data Model● Table are specified by Table Schemas● Table Schema by, Key and Value Space
o K, V are setso Each is represented as column tupleso Specifies an aggregation function
● Each Col stored separately● For consistency updates are multi-versioned
and batched for throughput● Data is amenable to aggregation
Data Model
● Pre-aggregates data into Deltas (no repeated row keys/delta) and applies a version
● Compaction is multi-level● A Controller handles updates/ maintenance,
works with BigTable
Controller
● 4 sub-systemso Updateso Compactiono Checksumo Schema change
● Does not do any work, only schedules it
Storage and Indexes
- AO, log structured, read-only- Rows organized as compressed row-blocks- Indexes have starting entry of the row-block- Naive lookup
- Binary Search on index to find row-blocks
- Binary Search on the row-blocks
Query sub system
● Limited Query engine with Filtering/Predicate● Used by higher level systems
Dremel/MySQL● Has multiple stateless Query Servers● Works on both the BigTable and Colossus● Provides nice sharding and LB mechanism● Groups similar queries to a subset of
Servers
Multi Datacenter Deployment● Tables are multi-versioned
o (Serve old data while new is in-progress)● Committer is stateless and sends updates to
multiple Datacenterso Built on top of versionsDB. - Globally replicated and
consistent store build on top of distributed Paxos.● Data goes async across Mesa instances● Only Metadata is sync-repl using Paxos-
versionsDB
Optimizations● Delta pruning - similar to Filter pushdown● Resume-Key, Key per data block
o Data is returned a block at a time, so if a QueryServer dies, another one can pick it up.
● Parallelizing workloads: Uses MR to shardo While writing delta, Mesa sample row-keys which is
used to figure out the right number of Mappers/Reducers.
o The workers are the same 4 workers scheduled by the Controller
Optimizations
● Schema changes - two techniqueso Create, Copy, Replay and delete - Expensiveo Link and add default values - This is used in Mesa
● New Instances of Mesa use P2P mechanisms to come up and online.
Handling Data Corruption
● Mesa runs on ~50K boxes● Online - During updates.
o Fact: Each Mesa instance is logically same but physically may differ in deltas
o Check chksums of indexes/datao Row-order, key-range, aggregate values should be
same, across instances● Offline
o Run global chksums of all indexes
Reference
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/42851.pdf
Top Related