Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.)...

65
Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon [email protected] 10/22/2012 Fall 2012: CSE 704 Web-scale Data Management 1

Transcript of Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.)...

Page 1: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

1

Bigtable: A Distributed Storage System for Structured Data

Fay Chang et al. (Google, Inc.)Presenter: Kyungho [email protected]

10/22/2012

Page 2: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

2

Motivation and Design Goal

• Distributed Storage System for Structured Data– Scalability• Petabytes of data on Thousands of

(commodity) machines

–Wide Applicability• Throughput-oriented and Latency-sensitive

– High Performance– High Availability

10/22/2012

Page 3: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

3

Data Model

10/22/2012

Page 4: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

4

Data Model

• Not a Full Relational Data Model• Provides a simple data model– Supports Dynamic Control over Data

Layout– Allows clients to reason about the

locality properties

10/22/2012

Page 5: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

5

Data Model – A Big Table

• A Table in Bigtable is a:– Sparse– Distributed– Persistent–Multidimensional – Sorted map

10/22/2012

Page 6: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

6

Data Model

• Data is indexed using row and column names

• Data is treated as uninterpreted strings– (row:string, column:string, time:int64)

string

• Data locality can be controlled through careful choices of the schema

10/22/2012

Page 7: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

7

Data Model

• Rows– Data maintained in lexicographic order by

row key– Tablet: rows with consecutive keys

• Units of distribution and load balancing

• Columns– Column families

• Family:qualifier

• Cells• Timestamps10/22/2012

Page 8: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

8

Data Model – WebTable Example

10/22/2012

A large collection of web pages and related information

Page 9: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

9

Data Model – WebTable Example

Row Key

Tablet - Group of rows with consecutive keys.

Unit of DistributionBigtable maintains data in lexicographic order by row key

10/22/2012

Page 10: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

10

Data Model – WebTable Example

Column FamilyColumn family is the unit of access control

10/22/2012

Page 11: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

11

Data Model – WebTable Example

ColumnColumn key is specified by “Column family:qualifier”

10/22/2012

Page 12: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

12

Data Model – WebTable Example

ColumnYou can add a column in a column family if the column family was created10/22/2012

Page 13: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

13

Data Model – WebTable Example

CellCell: the storage referenced by a particular row key, column key, and timestamp

10/22/2012

Page 14: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

14

Data Model – WebTable Example

Different cells in a table can contain multiple versions

indexed by timestamp

10/22/2012

Page 15: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

15

API

10/22/2012

Page 16: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

16

API

• Write or Delete values in Bigtable• Look up values from individual rows• Iterate over a subset of the data in a

table

10/22/2012

Page 17: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

17

API – Update a Row

10/22/2012

Page 18: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

18

API – Update a Row

Opens a Table

10/22/2012

Page 19: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

19

API – Update a Row

We’re going to mutate the row

10/22/2012

Page 20: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

20

API – Update a Row

Store a new item under the column key

“anchor:www.c-span.org”

10/22/2012

Page 21: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

21

API – Update a Row

Delete an item under the column key

“anchor:www.abc.com”

10/22/2012

Page 22: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

22

API – Update a Row

Atomic Mutation

10/22/2012

Page 23: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

23

API – Iterate over a Table

10/22/2012

Create a Scanner instance

Page 24: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

24

API – Iterate over a Table

10/22/2012

Access “anchor” column family

Page 25: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

25

API – Iterate over a Table

10/22/2012

Specify “return all versions”

Page 26: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

26

API – Iterate over a Table

10/22/2012

Specify a row key

Page 27: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

27

API – Iterate over a Table

10/22/2012

Iterate over rows

Page 28: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

28

API – Other Features

• Single row transaction• Client-supplied scripts in the address

space of the server• Input source/Output target for

MapReduce jobs

10/22/2012

Page 29: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

29

A Typical Google Machine

10/22/2012

Page 30: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

30

A Google Cluster

10/22/2012

Page 31: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

31

A Google Cluster

10/22/2012

Page 32: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

32

Building Blocks

• Chubby– Highly-available and persistent

distributed lock service

• GFS– Store logs and data files– SSTable

• Google’s immutable file format• A persistent, ordered immutable map from

keys to values• http://code.google.com/p/leveldb/

10/22/2012

Page 33: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

34

Chubby

• Highly-available and persistent distributed lock service– 5 replicas, one is elected as a master– Paxos– Provides a namespace that consists of

directories and small files

10/22/2012

Page 34: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

35

Implementation

• Client Library• Master– one and only one!

• Tablet Servers–Many

10/22/2012

Page 35: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

36

Implementation - Master

• Responsible for assigning tablets to table servers– Addition/removal of tablet server– Tablet-server load balancing– Garbage collecting files in GFS

• Handles schema changes• Single master system (as GFS did)

10/22/2012

Page 36: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

37

Tablet Server

• Manages a set of tablets• Handles read and write requests to

the tablets• Splits tablets that have grown too

large

10/22/2012

Page 37: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

38

How Does a Client Find a Tablet?

10/22/2012

Page 38: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

39

Tablet Assignment

• Each tablet is assigned to at most one tablet server at a time

• When a tablet is unassigned, and a tablet server is available, the master assigns the tablet by sending a tablet load request

• Bigtable uses Chubby to keep track of tablet servers

10/22/2012

Page 39: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

40

Tablet Assignment

• Detecting a tablet server which is no longer serving its tablets– The master periodically asks each tablet server for the

status of its lock– If a tablet server reports it has lost its lock, or if the master

cannot reach a tablet server,– The master attempts to acquire an exclusive lock on the

server’s file– If the lock acquire is successful -> Chubby is alive, so the

tablet server must have a problem– The master deletes the server’s file in Chubby to ensure the

tablet server can never serve again– Then, the master move all the tablets that were previously

assigned to that server into the set of unassigned tablets

10/22/2012

Page 40: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

41

Tablet Assignment

• When a master is started, the master…– Grabs a unique master lock in Chubby– Scans the servers directory in Chubby to

find the live servers– Communicates with every live tablet

server to discover the current tablet assignment

– Scans the METADATA table and adds unassigned tablets to the set of unassigned tablets10/22/2012

Page 41: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

42

Tablet Serving

10/22/2012

Page 42: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

43

Tablet Serving

• Memtable– A sorted buffer–Maintains the updates on a row-by-row

basis– Each row is copy-on-write to maintain

row-level consistency– Older updates are stored in a sequence

of SSTable

10/22/2012

Page 43: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

44

Tablet Serving

10/22/2012

Page 44: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

45

Tablet Serving - Write

• Write operation– The server checks if the operation is

valid– A valid mutation is written to the

commit log– After the write has been committed, its

contents are inserted into the memtable

10/22/2012

Page 45: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

46

Tablet Serving

10/22/2012

Page 46: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

47

Tablet Serving - Read

• Read operation– Check if the operation is valid– A valid operation is executed on a

merged view of the sequence of SSTables and the memtable

– The merged view can be formed efficiently since SSTables and the memtable are lexicographically sorted data structure

10/22/2012

Page 47: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

48

Tablet Serving - Recover

10/22/2012

Page 48: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

49

Tablet Serving - Recover

• Recover a table– A tablet server reads its metadata from

METADATA table– The metadata contains the list of

SSTables that comprise a tablet and a set of redo points

– The server reads the indices of the SSTables into memory and reconstructs the memtable by applying all of the updates that have committed since the redo points 10/22/2012

Page 49: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

50

Compaction

• Minor compaction–When the memtable size reaches a

threshold, the memtable is frozen, a new memtable is created, and the frozen memtable is converted to an SSTable

• Major compaction– Rewrite multiple SSTables into one

SSTable

10/22/2012

Page 50: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

51

Compaction

memtable

SSTable

Memory

GFS

Write Op

Commit LogSSTable

SSTableSSTable

10/22/2012

Page 51: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

52

Compaction

memtable

SSTable

Memory

GFS

Write Op

Commit LogSSTable

SSTableSSTable

Threshold reached

10/22/2012

Page 52: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

53

Compaction

memtable

SSTable

Memory

GFS

Write Op

Commit LogSSTable

SSTableSSTableSSTable

Threshold reached

10/22/2012

Page 53: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

54

Compaction

memtable

SSTable

Memory

GFS

Write Op

Commit LogSSTable

SSTableSSTableSSTable

A new memtable

10/22/2012

Page 54: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

55

Compaction

memtable

SSTable

Memory

GFS

Write Op

Commit Log

Major compaction

10/22/2012

Page 55: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

56

Schema Management

• Bigtable schemas are stored in Chubby

• The master update the schema by rewriting the corresponding schema file in Chubby

10/22/2012

Page 56: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

57

Optimization

• Locality Group– Client defined– An abstraction that enables clients to

control their data’s storage layout– A separate SSTable is generated for

each locality group in each tablet during compaction

– A locality group can be declared to be in-memory

10/22/2012

Page 57: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

58

Optimization

• Compression– Client can control whether the SSTables

for a locality group are compressed

10/22/2012

Page 58: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

59

Optimization

• Two-level Caching for Read Performance– Scan cache: • higher level. • Caches the key-value pairs returned by the

SSTable interface to the tablet server code

– Block cache: • lower level• Caches SSTable blocks

10/22/2012

Page 59: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

61

Optimization

• Commit-Log Implementation– Using one log per tablet server– Recovery?• A tablet server hosted 100 tablets failed• 100 other machines were each assigned a

single tablet• 100 reads?• Sort the commit log by <table, row name,

log seq #>

–Writing commit logs• Two log-writer threads

10/22/2012

Page 60: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

62

Performance Evaluation

• Sequential writes/reads– Row keys with names 0 to R-1, partitioned into 10N equal-

sized ranges– Wrote a single string under each row key– 1GB / tablet server

• Scan– Uses Bigtable Scan API

• Random writes/reads– Similar to Sequential write/read, but the row key was

hashed

• Random reads (Mem)– 100MB / tablet server, the locality group is marked as in-memory

10/22/2012

Page 61: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

63

Single Tablet Server Performance

10/22/2012

Page 62: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

64

Aggregate Throughput

10/22/2012

Page 63: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

65

Real Applications

10/22/2012

Page 64: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

66

Lessons Learned

• Failures!• Delay new features until it is clear

how the new features will be used• Monitoring• Simple Design!

10/22/2012

Page 65: Bigtable: A Distributed Storage System for Structured Data Fay Chang et al. (Google, Inc.) Presenter: Kyungho Jeon kyunghoj@buffalo.edu 10/22/2012 Fall.

Fall 2012: CSE 704 Web-scale Data Management

67

Acknowledgement

• Jeff Dean, “Handling Large Datasets at Google: Current Systems and Future Directions”

10/22/2012