Should Voir Dire Become Voir Google - Ethical Implications ...
Google Spanner : our understanding of concepts and implications
-
Upload
harisankar-haridas -
Category
Technology
-
view
3.569 -
download
1
description
Transcript of Google Spanner : our understanding of concepts and implications
Google Spanner: our understanding of concepts and implications
Harisankar HDOS lab weekly seminar
8/Dec/2012http://harisankarh.wordpress.com
"Google Spanner: our understanding of concepts and implications" by Harisankar H is licensed under a Creative Commons Attribution 3.0 Unported License.
Outline
• Spanner
– User perspective
• User = application programmer/administrator
– System architecture
– Implications
Spanner: user perspective
• Global scale database with strict transactional guarantees– Global scale
• designed to work across datacenters in different continents• Claim: “designed to scale up to millions of nodes, hundreds of
datacenters, trillions of database rows”
– Strict transactional guarantees• Supports general transactions(even inter-row)• Stronger properties than serializability*
– replaced MySQL cluster storing their critical ad-related data
• Reliable even during wide-area natural disasters
– Supports hierarchical schema of tables• Semi-relational
– Supports SQL-like query and definition language
– User-defined locality and availability
* means: explained in later slides
Need for Spanner
• Limitations of existing systems– BigTable, (could apply to NoSQL systems in general)
• Needed complex, evolving schemas
• Only eventual consistency across data centers– Needed wide-area replication with strong consistency
• Transactional scope limited to single row– Needed general cross-row transactions
– Megastore, (relational db-like system)
• Low performance– Layered on top of BigTable
» High communication costs
– Less efficient replica consistency algorithms*
• Better transactional guarantees in Spanner*
Spanner: transactional guarantee• External consistency
– Stricter than serializability
– E.g.,
T1
T2
T3
physical time
T1 T2T3
T1 T2 T3
T1T2 T3
T1T2 T3
Serial ordering
T2 after T1
External consistency: motivation
• Facebook-like example from OSDI talk
T1: unfriend Tom
T2: post comment
T3: view Jerry’s profile
physical time
by Jerry
by Tom
Jerry unfriends Tom to write a controversial comment
T1: Jerry unfriends TomT2: Jerry posts comment T3: Tom views Jerry’s profile
If serial order is as above, Jerry will be in trouble!
Formally, “If commit of T1 preceded the initiation of a new transaction T2 in wall-clock(physical) time, then commit of T1 should precede commit of T2 in the serial ordering also. ”
Spanner: transactional guarantee
• Additional (weaker)transaction modes for performance– Read-only transaction supporting snapshot isolation
• Snapshot isolation– Transactions read a consistent snapshot of the database– Values written should not have conflicting updates after the
snapshot was read– E.g., R1(X)R1(Y) R2(X)R2(Y) W2(Y) W1(X) is allowed– Weaker than serializability, but more efficient(lock-free)– Spanner do not allow writes for these transactions
» Probably, that is how they preserve isolation
– Snapshot read• Read of a consistent state of the database in the past
Hierarchical data model
– Universes(Spanner deployment)• Databases(collection of tables)
– Tables with schemas
» Ordered Rows, columns
» One or more primary-key columns
• Rows named during primary keys
– Hierarchies of tables
» Directory tables(top of table hierarchy)
• Directories
• Each row in directory table(with key K) along with the rows in descendant tables that start with K form a directory
Fig: a
Figures (a),(b) from Spanner, OSDI 2012 paper
User perspective: database configuration
• Database placement and reliability– Administrator:
• Create options which specify number of replicas and placement
– E.g., option (a): North America: 5 replicas, Europe: 3 replicas
option (b): Latin America: 3 replicas …
– Application• Directory is the smallest unit for which these properties can
be specified
• Tag each directory or database with these options– E.g., TomDir1: option (b)
JerryDir3: option (a) ….
Next: System architecture
Spanner architecture: basics
• Replica consistency– Using Paxos protocol
• Different Paxos groups for different sets of directories– Can be across data centers
• Concurrency control– Using two phase locking
• Chose over optimistic methods because of long-lived transactions(order of minutes)
• Transaction coordination– 2 phase commit
• 2 phase commit on top of Paxos ensures availability
• Timestamps for transactions and data items– To support snapshot isolation and snapshot reads– Multiple timestamped versions of data items maintained
Spanner components
Zone 1(physical location)
Span servers(data)
…
Zone master(assign data)
Location proxy(locate data)Location proxies(locate data)
*TrueTime
Service
Universe master(status + interactive debugging)
Placement driver(move data across zones automatically)
Zone 2(physical location)
Span servers(data)
…
Zone master(assign data)
Location proxy(locate data)Location proxies(locate data)
……
Network
Zones, directories and Paxos groups
Fig: (b)Figures (a),(b) from Spanner, OSDI 2012 paper
Replication-related components• Tablet: unit of storage
– Bag of directories
– Abstraction on top of underlying DFS Colossus
• Single Paxos state machine(replica) per tablet
• Replicas of each tablet form a Paxos group
• Leader elected among a Paxos group
dirs
Tablet replica: DC1,n2….
Tablet replica: DC2,n8….
….
Paxos group
Paxos leader
Transaction-related components
Tablet replica: DC1,n2….
Tablet replica: DC2,n8….
….
Paxos group(Coordinator)
Paxos leader
Coordinator leader(2PC +2PL)
Tablet replica:….
Tablet replica:….
….
Paxos group(Participant)
Paxos leader
Participant leader
…..
Coordinator slave
Participant slaveTransaction T5:
Next:
• Serializability ensured by the already explained components
• External consistency implemented with help of TrueTime service
– True time service also used for leader election using timed leases
TrueTime + transaction implementation
[by Aditya]
Implications of Spanner
[REMOVED]
Thank you
• Image credits– Figures (a),(b) from Spanner, OSDI 2012 paper