Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big...
Transcript of Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big...
![Page 1: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/1.jpg)
Big Data Infrastructure
Jimmy LinUniversity of Maryland
Monday, March 30, 2015
Session 8: NoSQL
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States���See http://creativecommons.org/licenses/by-nc-sa/3.0/us/ for details
![Page 2: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/2.jpg)
The Fundamental Problem¢ We want to keep track of mutable state in a scalable manner
¢ Assumptions:
l State organized in terms of many “records”l State unlikely to fit on single machine, must be distributed
¢ MapReduce won’t do!
(note: much of this material belongs in a distributed systems or databases course)
![Page 3: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/3.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
![Page 4: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/4.jpg)
We got 99 problems…¢ How do we keep replicas in sync?
¢ How do we synchronize transactions across multiple partitions?
¢ What happens to the cache when the underlying data changes?
![Page 5: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/5.jpg)
Relational Databases
… to the rescue!
Source: images.wikia.com/batman/images/b/b1/Bat_Signal.jpg
![Page 6: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/6.jpg)
What do RDBMSes provide?¢ Relational model with schemas
¢ Powerful, flexible query language
¢ Transactional semantics: ACID
¢ Rich ecosystem, lots of tool support
![Page 7: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/7.jpg)
How do RDBMSes do it?¢ Transactions on a single machine: (relatively) easy!
¢ Partition tables to keep transactions on a single machine
l Example: partition by user
¢ What about transactions that require multiple machine?l Example: transactions involving multiple users
Solution: Two-Phase Commit
![Page 8: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/8.jpg)
2PC: Sketch
Coordinator
subordinates
Okay everyone, PREPARE! YES
YES
YES
Good. ���COMMIT!
ACK!
ACK!
ACK!
DONE!
![Page 9: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/9.jpg)
2PC: Sketch
Coordinator
subordinates
Okay everyone, PREPARE! YES
YES
NO
ABORT!
![Page 10: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/10.jpg)
2PC: Sketch
Coordinator
subordinates
Okay everyone, PREPARE! YES
YES
YES
Good. ���COMMIT!
ACK!
ACK!ROLLBACK!
![Page 11: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/11.jpg)
2PC: Assumptions and Limitations¢ Assumptions:
l Persistent storage and write-ahead log at every nodel WAL is never permanently lost
¢ Limitations:l It’s blocking and slow
l What if the coordinator dies?
Solution: Paxos!(details beyond scope of this course)
![Page 12: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/12.jpg)
RDBMSes: Pain Points
Source: www.flickr.com/photos/spencerdahl/6075142688/
![Page 13: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/13.jpg)
#1: Must design up front, painful to evolve
Note: Flexible design doesn’t mean no design!
![Page 14: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/14.jpg)
Source: Wikipedia (Tortoise)
#2: 2PC is slow!
![Page 15: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/15.jpg)
#3: Cost!
Source: www.flickr.com/photos/gnusinn/3080378658/
![Page 16: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/16.jpg)
What do RDBMSes provide?¢ Relational model with schemas
¢ Powerful, flexible query language
¢ Transactional semantics: ACID
¢ Rich ecosystem, lots of tool support
What if we want a la carte?
Source: www.flickr.com/photos/vidiot/18556565/
![Page 17: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/17.jpg)
Features a la carte?¢ What if I’m willing to give up consistency for scalability?
¢ What if I’m willing to give up the relational model for something more flexible?
¢ What if I just want a cheaper solution?
Enter… NoSQL!
![Page 18: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/18.jpg)
Source: geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html
![Page 19: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/19.jpg)
NoSQL1. Horizontally scale “simple operations”
2. Replicate/distribute data over many servers
3. Simple call interface
4. Weaker concurrency model than ACID
5. Efficient use of distributed indexes and RAM
6. Flexible schemas
Source: Cattell (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record.
(Not only SQL)
![Page 20: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/20.jpg)
(Major) Types of NoSQL databases¢ Key-value stores
¢ Column-oriented databases
¢ Document stores
¢ Graph databases
![Page 21: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/21.jpg)
Source: Wikipedia (Keychain)
Key-Value Stores
![Page 22: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/22.jpg)
Key-Value Stores: Data Model¢ Stores associations between keys and values
¢ Keys are usually primitives
l For example, ints, strings, raw bytes, etc.
¢ Values can be primitive or complex: usually opaque to storel Primitives: ints, strings, etc.l Complex: JSON, HTML fragments, etc.
![Page 23: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/23.jpg)
Key-Value Stores: Operations¢ Very simple API:
l Get – fetch value associated with keyl Put – set value associated with key
¢ Optional operations:l Multi-get
l Multi-put
l Range queries
¢ Consistency model:l Atomic puts (usually)
l Cross-key operations: who knows?
![Page 24: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/24.jpg)
Key-Value Stores: Implementation¢ Non-persistent:
l Just a big in-memory hash table
¢ Persistentl Wrapper around a traditional RDBMS
What if data doesn’t fit on a single machine?
![Page 25: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/25.jpg)
Simple Solution: Partition!¢ Partition the key space across multiple machines
l Let’s say, hash partitioningl For n machines, store key k at machine h(k) mod n
¢ Okay… But:1. How do we know which physical machine to contact?
2. How do we add a new machine to the cluster?
3. What happens if a machine fails?
See the problems here?
![Page 26: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/26.jpg)
Clever Solution¢ Hash the keys
¢ Hash the machines also!
Distributed hash tables!(following combines ideas from several sources…)
![Page 27: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/27.jpg)
h = 0h = 2n – 1
![Page 28: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/28.jpg)
h = 0h = 2n – 1
Routing: Which machine holds the key?
Each machine holds pointers to predecessor and successor
Send request to any node, gets routed to correct one in O(n) hops
Can we do better?
![Page 29: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/29.jpg)
h = 0h = 2n – 1
Routing: Which machine holds the key?
Each machine holds pointers to predecessor and successor
Send request to any node, gets routed to correct one in O(log n) hops
+ “finger table”���(+2, +4, +8, …)
![Page 30: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/30.jpg)
h = 0h = 2n – 1
Routing: Which machine holds the key?
Simpler Solution
Service���Registry
![Page 31: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/31.jpg)
h = 0h = 2n – 1
New machine joins: What happens?
How do we rebuild the predecessor, successor, finger tables?
Stoica et al. (2001). Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. SIGCOMM.
Cf. Gossip Protoccols
![Page 32: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/32.jpg)
h = 0h = 2n – 1
Machine fails: What happens?
Solution: ReplicationN = 3, replicate +1, –1
Covered!
Covered!
How to actually replicate? Later…
![Page 33: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/33.jpg)
Another Refinement: Virtual Nodes¢ Don’t directly hash servers
¢ Create a large number of virtual nodes, map to physical servers
l Better load redistribution in event of machine failurel When new server joins, evenly shed load from other servers
![Page 34: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/34.jpg)
Source: Wikipedia (Table)
Bigtable
![Page 35: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/35.jpg)
Data Model¢ A table in Bigtable is a sparse, distributed, persistent
multidimensional sorted map
¢ Map indexed by a row key, column key, and a timestampl (row:string, column:string, time:int64) → uninterpreted byte array
¢ Supports lookups, inserts, deletesl Single row transactions only
Image Source: Chang et al., OSDI 2006
![Page 36: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/36.jpg)
Rows and Columns¢ Rows maintained in sorted lexicographic order
l Applications can exploit this property for efficient row scansl Row ranges dynamically partitioned into tablets
¢ Columns grouped into column familiesl Column key = family:qualifier
l Column families provide locality hints
l Unbounded number of columns
At the end of the day, it’s all key-value pairs!
![Page 37: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/37.jpg)
Bigtable Building Blocks¢ GFS
¢ Chubby
¢ SSTable
![Page 38: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/38.jpg)
SSTable¢ Basic building block of Bigtable
¢ Persistent, ordered immutable map from keys to values
l Stored in GFS
¢ Sequence of blocks on disk plus an index for block lookupl Can be completely mapped into memory
¢ Supported operations:
l Look up value associated with keyl Iterate key/value pairs within a key range
Index
64K block
64K block
64K block
SSTable
Source: Graphic from slides by Erik Paulson
![Page 39: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/39.jpg)
Tablet¢ Dynamically partitioned range of rows
¢ Built from multiple SSTables
Index
64K block
64K block
64K block
SSTable
Index
64K block
64K block
64K block
SSTable
Tablet Start:aardvark End:apple
Source: Graphic from slides by Erik Paulson
![Page 40: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/40.jpg)
Table¢ Multiple tablets make up the table
¢ SSTables can be shared
SSTable SSTable SSTable SSTable
Tablet
aardvark apple Tablet
apple_two_E boat
Source: Graphic from slides by Erik Paulson
![Page 41: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/41.jpg)
Architecture¢ Client library
¢ Single master server
¢ Tablet servers
![Page 42: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/42.jpg)
Bigtable Master¢ Assigns tablets to tablet servers
¢ Detects addition and expiration of tablet servers
¢ Balances tablet server load
¢ Handles garbage collection
¢ Handles schema changes
![Page 43: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/43.jpg)
Bigtable Tablet Servers¢ Each tablet server manages a set of tablets
l Typically between ten to a thousand tabletsl Each 100-200 MB by default
¢ Handles read and write requests to the tablets
¢ Splits tablets that have grown too large
![Page 44: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/44.jpg)
Tablet Location
Upon discovery, clients cache tablet locations
Image Source: Chang et al., OSDI 2006
![Page 45: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/45.jpg)
Tablet Assignment¢ Master keeps track of:
l Set of live tablet serversl Assignment of tablets to tablet servers
l Unassigned tablets
¢ Each tablet is assigned to one tablet server at a timel Tablet server maintains an exclusive lock on a file in Chubby
l Master monitors tablet servers and handles assignment
¢ Changes to tablet structurel Table creation/deletion (master initiated)
l Tablet merging (master initiated)l Tablet splitting (tablet server initiated)
![Page 46: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/46.jpg)
Tablet Serving
Image Source: Chang et al., OSDI 2006
“Log Structured Merge Trees”
![Page 47: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/47.jpg)
Compactions¢ Minor compaction
l Converts the memtable into an SSTablel Reduces memory usage and log traffic on restart
¢ Merging compactionl Reads the contents of a few SSTables and the memtable, and writes out
a new SSTablel Reduces number of SSTables
¢ Major compactionl Merging compaction that results in only one SSTable
l No deletion records, only live data
![Page 48: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/48.jpg)
Bigtable Applications¢ Data source and data sink for MapReduce
¢ Google’s web crawl
¢ Google Earth
¢ Google Analytics
![Page 49: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/49.jpg)
HBase
Image Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
![Page 50: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/50.jpg)
Challenges¢ Keep it simple!
¢ Keep it flexible!
¢ Push functionality onto application
… and it’s still hard to get right!Example: splitting and compaction storms
![Page 51: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/51.jpg)
Back to consistency…
![Page 52: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/52.jpg)
Consistency Scenarios¢ People you don’t want seeing your pictures:
l Alice removes mom from list of people who can view photosl Alice posts embarrassing pictures from Spring Break
l Can mom see Alice’s photo?
¢ Why am I still getting messages?l Bob unsubscribes from mailing list
l Message sent to mailing list right after
l Does Bob receive the message?
![Page 53: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/53.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
Let’s just focus on this
![Page 54: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/54.jpg)
Consistency
CAP “Theorem”
Availability
(Brewer, 2000)
Partition tolerance
… pick two
![Page 55: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/55.jpg)
CAP Tradeoffs¢ CA = consistency + availability
l E.g., parallel databases that use 2PC
¢ AP = availability + tolerance to partitionsl E.g., DNS, web caching
![Page 56: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/56.jpg)
Is this helpful?¢ CAP not really even a “theorem” because vague definitions
l More precise formulation came a few years later
Wait a sec, that doesn’t sound right!
Source: Abadi (2012) Consistency Tradeoffs in Modern Distributed Database System Design. IEEE Computer, 45(2):37-42
![Page 57: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/57.jpg)
Abadi Says…¢ CP makes no sense!
¢ CAP says, in the presence of P, choose A or C
l But you’d want to make this tradeoff even when there is no P
¢ Fundamental tradeoff is between consistency and latencyl Not available = (very) long latency
![Page 58: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/58.jpg)
Replication possibilities¢ Update sent to all replicas at the same time
l To guarantee consistency you need something like Paxos
¢ Update sent to a masterl Replication is synchronous
l Replication is asynchronous
l Combination of both
¢ Update sent to an arbitrary replica
All these possibilities involve tradeoffs!“eventual consistency”
![Page 59: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/59.jpg)
Move over, CAP¢ PACELC (“pass-elk”)
¢ PAC
l If there’s a partition, do we choose A or C?
¢ ELCl Otherwise, do we choose latency or consistency?
![Page 60: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/60.jpg)
Morale of the story: there’s no free lunch!
Source: www.phdcomics.com/comics/archive.php?comicid=1475
![Page 61: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/61.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
Quick look at this
![Page 62: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/62.jpg)
“Unit of Consistency”¢ Single record:
l Relatively straightforwardl Complex application logic to handle multi-record transactions
¢ Arbitrary transactions:l Requires 2PC/Paxos
¢ Middle ground: entity groupsl Groups of entities that share affinity
l Co-locate entity groups
l Provide transaction support within entity groups
l Example: user + user’s photos + user’s posts etc.
![Page 63: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/63.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
This is really hard!
![Page 64: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/64.jpg)
Source: Google
Now imagine multiple datacenters…What’s different?
![Page 65: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/65.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
Quick look at this
![Page 66: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/66.jpg)
Facebook Architecture
Source: www.facebook.com/note.php?note_id=23844338919
MySQL
memcached
Read path:Look in memcachedLook in MySQLPopulate in memcached
Write path:Write in MySQLRemove in memcached
Subsequent read:Look in MySQLPopulate in memcached
✔
![Page 67: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/67.jpg)
Facebook Architecture: Multi-DC
1. User updates first name from “Jason” to “Monkey”.
2. Write “Monkey” in master DB in CA, delete memcached entry in CA and VA.
3. Someone goes to profile in Virginia, read VA slave DB, get “Jason”.
4. Update VA memcache with first name as “Jason”.
5. Replication catches up. “Jason” stuck in memcached until another write!
Source: www.facebook.com/note.php?note_id=23844338919
MySQL
memcached
California
MySQL
memcached
Virginia
Replication lag
![Page 68: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/68.jpg)
Facebook Architecture
Source: www.facebook.com/note.php?note_id=23844338919
= stream of SQL statements
Solution: Piggyback on replication stream, tweak SQLREPLACE INTO profile (`first_name`) VALUES ('Monkey’) WHERE `user_id`='jsobel' MEMCACHE_DIRTY 'jsobel:first_name'
MySQL
memcached
California
MySQL
memcached
Virginia
Replication
![Page 69: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/69.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
Get’s go back to this again
![Page 70: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/70.jpg)
Yahoo’s PNUTS¢ Yahoo’s globally distributed/replicated key-value store
¢ Provides per-record timeline consistency
l Guarantees that all replicas provide all updates in same order
¢ Different classes of reads:l Read-any: may time travel!l Read-critical(required version): monotonic reads
l Read-latest
![Page 71: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/71.jpg)
PNUTS: Implementation Principles¢ Each record has a single master
l Asynchronous replication across datacentersl Allow for synchronous replicate within datacenters
l All updates routed to master first, updates applied, then propagated
l Protocols for recognizing master failure and load balancing
¢ Tradeoffs:l Different types of reads have different latencies
l Availability compromised when master fails and partition failure in protocol for transferring of mastership
![Page 72: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/72.jpg)
Three Core Ideas¢ Partitioning (sharding)
l For scalabilityl For latency
¢ Replicationl For robustness (availability)
l For throughput
¢ Cachingl For latency
Have our cake and eat it too?
![Page 73: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/73.jpg)
Google’s Spanner¢ Features:
l Full ACID translations across multiple datacenters, across continents!l External consistency: wrt globally-consistent timestamps!
¢ How?l TrueTime: globally synchronized API using GPSes and atomic clocks
l Use 2PC but use Paxos to replicate state
¢ Tradeoffs?
![Page 74: Big Data Infrastructure - GitHub Pageslintool.github.io/UMD-courses/bigdata-2015-Spring/... · Big Data Infrastructure Jimmy Lin University of Maryland Monday, March 30, 2015 Session](https://reader035.fdocuments.in/reader035/viewer/2022070709/5ebe3a2115c30f5767382c2b/html5/thumbnails/74.jpg)
Source: Wikipedia (Japanese rock garden)
Questions?