Cryptographic Security Secret Sharing, Vanishing Data 1Dennis Kafura – CS5204 – Operating Systems.
Big Table 1Dennis Kafura – CS5204 – Operating Systems.
-
Upload
garey-hamilton -
Category
Documents
-
view
217 -
download
1
Transcript of Big Table 1Dennis Kafura – CS5204 – Operating Systems.
Big Table
1Dennis Kafura – CS5204 – Operating Systems
Bigtable
Introduction to Bigtable
Paper summary with this lecture.
Bigtable is a Google product
Google = Clever"We settled on this data model after examining a variety
of potential uses of a Bigtable-like system.“
"The implementation described in the previous section
required a number of refinements to achieve the high
performance, availability, and reliability required by our
users."
Dennis Kafura – CS5204 – Operating Systems 2
Bigtable
Focus Today
Structure
Recovery System
Table Distribution
The API
Dennis Kafura – CS5204 – Operating Systems 3
Bigtable
Structure
Goals for this section Understand the relation to GFS Know what the parts of the system are Know how they work together
Dennis Kafura – CS5204 – Operating Systems 4
Bigtable
Backup Files
GFSData Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data DataData Data Data Data
GFSData Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data DataData Data Data Data
Dennis Kafura – CS5204 – Operating Systems 5
Bigtable
Characters
Chubby – A file system whose files/directories have individual locks on all files. These locks are used to coordinate the rest of the system.
SSTable – A slim map sorted by key. It is the most basic primitive in the structure.
Deletion – Since SSTables are immutable, any deletion takes the form of another record which is interpreted as a deletion.
Master – The server which does no client-oriented work, but directs the efforts of all tablet servers.
Tablet Server – Contains the Bigtable data and handles client read/write interactions.
Dennis Kafura – CS5204 – Operating Systems 6
Just a whimsical introduction
Bigtable
Characters
Table – Tables exist only as a high-level construct. At the low level the table is still and SSTable.
Tablet – One part of the Table. Each Tablet holds only 100MB-200MB of the whole. They are constantly splitting and merging.
Metatable – Is just kind of special. It’s whole purpose is to refer to the main table.
Root Tablet – If there is a king of the special, this is it. It is the only tablet which refers to the rest of the metatable.
Dennis Kafura – CS5204 – Operating Systems 7
Just a whimsical introduction
Bigtable
Relationships among the entities
File File File File File File File File
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
Dennis Kafura – CS5204 – Operating Systems 8
Bigtable
Let’s Look Deeper
A table is really only the exposed interface
The real data is stored in an SSTable
Bigtable inherits certain attributes from the underlying SSTable structure• Key and data types are raw character strings• Records are ordered by Key• Records are immutable.
Bigtable adds to this structure by adding dimensionality.• The row key determines the horizontal slice• The column family:name determines the vertical slice• The version number determines the final dimension• A tablet is really just a range of horizontal slices.
The combination of these features allows big table to work with ranges and filters in any of the three dimensions.
Dennis Kafura – CS5204 – Operating Systems 9
Bigtable
Recovery System
Goals for this section Understanding how to recover from a hardware
failure Understand the impact of loss of connectivity Understand the impact of a lost messages
Dennis Kafura – CS5204 – Operating Systems 10
Bigtable
What if things go wrong?
File
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
FileFile
?
1
2
3
4
5
6
Scenario 1: Tablet Server Loses Connectivity
Dennis Kafura – CS5204 – Operating Systems 11
Bigtable
What if things go wrong?
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
File1
2 3
4
5
6
Scenario 2: Master Server Loses Connectivity Part 1
FileFileFileFileFile
Dennis Kafura – CS5204 – Operating Systems 12
Bigtable
What if things go wrong?
File File File File File
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
File
A-Z
Scenario 2: Master Server Loses Connectivity Part 2
A-F, L-P
6
7
8
S1 S2 S3S4
G-K Q-Z
Dennis Kafura – CS5204 – Operating Systems 13
Bigtable
What if things go wrong?
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
File
Scenario 2: Master Server Loses Connectivity Part 3
File FileFile
9
10
FileFile
12
11
A-F, L-P A-F
Dennis Kafura – CS5204 – Operating Systems 14
Bigtable
What if things go wrong?
File File File
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
Scenario 4: Metadata is lost and new Master
1 ?2
3
46
File File File
4
7
5
Dennis Kafura – CS5204 – Operating Systems 15
Bigtable
Table Distribution System
Goals for this section Understand the process for adding/removing a
server Understand how to handle an overwhelmed server Understand how to handle deletions/changes to
the database.
Dennis Kafura – CS5204 – Operating Systems 16
Bigtable
Server Join/Leave Responsibilities
File FileFile
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
+
+
+ –
–
–
Dennis Kafura – CS5204 – Operating Systems 17
Bigtable
Tablet Growth/Shrinkage
Ideal: 100MB-200MB
Undersized: <100MB
Oversized: >200MB
FileFile
File
Merger
Split
Dennis Kafura – CS5204 – Operating Systems 18
Bigtable
If You Can’t Handle the Heat
File File File File File File File File
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
100%115%115%115%160%
User interactions may cause hot spots where requests are more frequent than the baseline
!
Dennis Kafura – CS5204 – Operating Systems 19
Bigtable
Move the Kitchen
File File File File File File File File
Is a pointer to
Owns the lock to
Controls the contents of
Is broken into
Creates and manages
Is “Live” On
100% 100% 113% 100% 113%
After redistributing the work load, hot spots are easier to deal with and the labor is more evenly divided.
Note that granularity in this image does not show updated pointers from metatable or locks on Chubby files
Dennis Kafura – CS5204 – Operating Systems 20
Bigtable
What if I Want to Delete Something?
GFS
Changes &Deletions
ExistingSSTables
Memtable
Tablet in RAM
The process of merging an SSTable with the Memtable is known as a compaction.Minor Compactions • Involve at least one SSTable• Grow the set of SSTables• May contain deletionsMajor Compactions• Include all SSTables• Reduce the set of SSTables
NewSSTable
Dennis Kafura – CS5204 – Operating Systems 21
Bigtable
The API
Goals for this section Explain how this differs from SQL. How to create your own table. Using Bigtable as a hash table/vector.
Dennis Kafura – CS5204 – Operating Systems 22
Bigtable
If You Had to Perform a Project
Lon Lat City
123 87 New Oslo
78 23 New Canada
-100 67 New Bermuda
45 59 New England
171 -45 Old Hampshire
-165 21 Old Mexico
0 66 Old England
78 -51 New Ireland
41 0 New Equador
100 12 Old Zealand
Projects are notoriously inefficient
Checking an extensive table is ALWAYS to be avoided
With an a truly ENORMOUS table is a very bad idea
Dennis Kafura – CS5204 – Operating Systems 23
Bigtable
If You Had to Perform a Join
Bigtable is quite sparse.
Imagine this was your table and only the red spots had data (everything else is null).
Joining with nulls create semantic nonsense.
Joining on a null creates more nulls.
Dennis Kafura – CS5204 – Operating Systems 24
Bigtable
Completely Configurable Bigtable Structure
Excellent Business Ownership Records
Records will be state_city for alphabetical ordering
Column families will be Better Business Bureau ratings
Columns will be business names
Version will be ownership purchase date
Data will be owner name, address, phone and email.
Ranked X type businesses
Records will be region_city for geographical ordering
Column families will designate types of services
Columns will be specific business names
Version will be automated Data will be popularity by
customer vote with address.
Dennis Kafura – CS5204 – Operating Systems 25
Bigtable
Multiple Tools for Fine Control
MapReduce – MapReduce is closed on Bigtable (i.e. MR(Bt)Bt). Use it to determine the most successful owner (based on average BBB rank).
Sawzall – A script language which can execute actions with tablet server clock cycles. Use it to determine the vote history of a set of businesses for graphing purposes.
Regular Expressions – Can be used for any combination of record, column and data recognition schemes. Use it to determine all the best voted hotels in a region.
Dennis Kafura – CS5204 – Operating Systems 26
Bigtable
Order Large Groups of Data
I’d like to have all the demographic statistics for the states A-L.
I’d like to have the hotel listings for cities in Pennsylvania.
I’d like to have hockey scores for all pro, semi-pro and college teams in the last three years.
I want to see all the Google searches in the last 24 hours.
Dennis Kafura – CS5204 – Operating Systems 27
Bigtable
Only Take What You Want
I’d like to have all the demographic statistics for the states A-L. But I’ll only look at ethnic percentages
I’d like to have the hotel listings for cities in Pennsylvania. But I only want the ones in Harrisburg
I’d like to have hockey scores for all pro, semi-pro and college teams in the last three years.
But I just want to see the Black Hawks
I want to see all the Google searches in the last 24 hours. But only the ones for www.Disney.com
Dennis Kafura – CS5204 – Operating Systems 28
Bigtable
Summary
Structure of the system Methods for recovery Data management Characteristics of the API
Dennis Kafura – CS5204 – Operating Systems 29